POC: postgres_fdw insert batching

Started by Tomas Vondraover 5 years ago115 messages
#1Tomas Vondra
tomas.vondra@2ndquadrant.com
3 attachment(s)

Hi,

One of the issues I'm fairly regularly reminded by users/customers is
that inserting into tables sharded using FDWs are rather slow. We do
even get it reported on pgsql-bugs from time to time [1]/messages/by-id/CACnz+Q1q0+2KoJam9LyNMk8JmdC6qYHXWB895Wu2xcpoip18xQ@mail.gmail.com.

Some of the slowness / overhead is expected, doe to the latency between
machines in the sharded setup. Even just 1ms latency will make it way
more expensive than a single instance.

But let's do a simple experiment, comparing a hash-partitioned regular
partitions, and one with FDW partitions in the same instance. Scripts to
run this are attached. The duration of inserting 1M rows to this table
(average of 10 runs on my laptop) looks like this:

regular: 2872 ms
FDW: 64454 ms

Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
which is probably where most single-DC clusters are :-(

Now, the primary reason why the performance degrades like this is that
while FDW has batching for SELECT queries (i.e. we read larger chunks of
data from the cursors), we don't have that for INSERTs (or other DML).
Every time you insert a row, it has to go all the way down into the
partition synchronously.

For some use cases this may be reduced by having many independent
connnections from different users, so the per-user latency is higher but
acceptable. But if you need to import larger amounts of data (say, a CSV
file for analytics, ...) this may not work.

Some time ago I wrote an ugly PoC adding batching, just to see how far
would it get us, and it seems quite promising - results for he same
INSERT benchmarks look like this:

FDW batching: 4584 ms

So, rather nice improvement, I'd say ...

Before I spend more time hacking on this, I have a couple open questions
about the design, restrictions etc.

1) Extend the FDW API?

In the patch, the batching is simply "injected" into the existing insert
API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
extend the API with a "batched" version of the method, so that we can
easily determine whether the FDW supports batching or not - it would
require changes in the callers, though. OTOH it might be useful for
COPY, where we could do something similar to multi_insert (COPY already
benefits from this patch, but it does not use the batching built-into
COPY).

2) What about the insert results?

I'm not sure what to do about "result" status for the inserted rows. We
only really "stash" the rows into a buffer, so we don't know if it will
succeed or not. The patch simply assumes it will succeed, but that's
clearly wrong, and it may result in reporting a wrong number or rows.

The patch also disables the batching when the insert has a RETURNING
clause, because there's just a single slot (for the currently inserted
row). I suppose a "batching" method would take an array of slots.

3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.

3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

For cases that can replace INSERT with COPY this would be enough, but
unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
this :-(

4) Expected consistency?

I'm not entirely sure what are the consistency expectations for FDWs.
Currently the FDW nodes pointing to the same server share a connection,
so the inserted rows might be visible to other nodes. But if we only
stash the rows in a local buffer for a while, that's no longer true. So
maybe this breaks the consistency expectations?

But maybe that's OK - I'm not sure how the prepared statements/cursors
affect this. I can imagine restricting the batching only to plans where
this is not an issue (single FDW node or something), but it seems rather
fragile and undesirable.

I was thinking about adding a GUC to enable/disable the batching at some
level (global, server, table, ...) but it seems like a bad match because
the consistency expectations likely depend on a query. There should be a
GUC to set the batch size, though (it's hardcoded to 100 for now).

regards

[1]: /messages/by-id/CACnz+Q1q0+2KoJam9LyNMk8JmdC6qYHXWB895Wu2xcpoip18xQ@mail.gmail.com

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

fdw.sqlapplication/sqlDownload
local.sqlapplication/sqlDownload
0001-fdw-insert-batching-v1.patchtext/plain; charset=us-asciiDownload
From df2cf502909886fbfc86f93f36b2daba03f785e4 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tv@fuzzy.cz>
Date: Sun, 28 Jun 2020 14:31:18 +0200
Subject: [PATCH] patch

---
 contrib/postgres_fdw/deparse.c      |  73 ++++++++
 contrib/postgres_fdw/postgres_fdw.c | 261 ++++++++++++++++++++++++----
 contrib/postgres_fdw/postgres_fdw.h |   5 +
 3 files changed, 305 insertions(+), 34 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..374d2f5dbb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1758,6 +1758,79 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * deparse remote batch INSERT statement
+ *
+ * The statement text is appended to buf, and we also create an integer List
+ * of the columns being retrieved by WITH CHECK OPTION or RETURNING (if any),
+ * which is returned to *retrieved_attrs.
+ */
+void
+deparseBatchInsertSql(StringInfo buf, RangeTblEntry *rte,
+					  Index rtindex, Relation rel,
+					  List *targetAttrs, bool doNothing,
+					  List *withCheckOptionList, List *returningList,
+					  List **retrieved_attrs, int batchSize)
+{
+	AttrNumber	pindex;
+	bool		first;
+	ListCell   *lc;
+	int			i;
+
+	appendStringInfoString(buf, "INSERT INTO ");
+	deparseRelation(buf, rel);
+
+	if (targetAttrs)
+	{
+		appendStringInfoChar(buf, '(');
+
+		first = true;
+		foreach(lc, targetAttrs)
+		{
+			int			attnum = lfirst_int(lc);
+
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			deparseColumnRef(buf, rtindex, attnum, rte, false);
+		}
+
+		appendStringInfoString(buf, ") VALUES ");
+
+		pindex = 1;
+		for (i = 0; i < batchSize; i++)
+		{
+			if (i > 0)
+				appendStringInfoString(buf, ", ");
+
+			appendStringInfoString(buf, "(");
+
+			first = true;
+			foreach(lc, targetAttrs)
+			{
+				if (!first)
+					appendStringInfoString(buf, ", ");
+				first = false;
+
+				appendStringInfo(buf, "$%d", pindex);
+				pindex++;
+			}
+
+			appendStringInfoChar(buf, ')');
+		}
+	}
+	else
+		appendStringInfoString(buf, " DEFAULT VALUES");
+
+	if (doNothing)
+		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
+
+	deparseReturningList(buf, rte, rtindex, rel,
+						 rel->trigdesc && rel->trigdesc->trig_insert_after_row,
+						 withCheckOptionList, returningList, retrieved_attrs);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..17421f6b65 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -56,6 +56,8 @@ PG_MODULE_MAGIC;
 /* If no remote estimates, assume a sort costs 20% extra */
 #define DEFAULT_FDW_SORT_MULTIPLIER 1.2
 
+#define BATCH_SIZE					100
+
 /*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
@@ -93,6 +95,8 @@ enum FdwModifyPrivateIndex
 {
 	/* SQL statement to execute remotely (as a String node) */
 	FdwModifyPrivateUpdateSql,
+	/* SQL statement to execute remotely (as a String node) */
+	FdwModifyPrivateUpdateBatchSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
 	/* has-returning flag (as an integer Value node) */
@@ -172,9 +176,11 @@ typedef struct PgFdwModifyState
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the scan */
 	char	   *p_name;			/* name of prepared statement, if created */
+	char	   *p_name_batch;	/* name of prepared batch statement, if created */
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *batch_query;	/* text of INSERT/UPDATE/DELETE command */
 	List	   *target_attrs;	/* list of target attribute numbers */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
@@ -187,6 +193,12 @@ typedef struct PgFdwModifyState
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
+	/* batching of values */
+	MemoryContext batch_cxt;
+	int			maxbatched;
+	int			nbatched;
+	const char  **values;
+
 	/* for update row movement if subplan result rel */
 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
 											 * created */
@@ -427,6 +439,7 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   CmdType operation,
 											   Plan *subplan,
 											   char *query,
+											   char *batch_query,
 											   List *target_attrs,
 											   bool has_returning,
 											   List *retrieved_attrs);
@@ -435,6 +448,11 @@ static TupleTableSlot *execute_foreign_modify(EState *estate,
 											  CmdType operation,
 											  TupleTableSlot *slot,
 											  TupleTableSlot *planSlot);
+static TupleTableSlot *flush_foreign_modify(EState *estate,
+											  ResultRelInfo *resultRelInfo,
+											  CmdType operation,
+											  TupleTableSlot *slot,
+											  TupleTableSlot *planSlot);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
@@ -1659,13 +1677,16 @@ postgresPlanForeignModify(PlannerInfo *root,
 	RangeTblEntry *rte = planner_rt_fetch(resultRelation, root);
 	Relation	rel;
 	StringInfoData sql;
+	StringInfoData batch_sql;
 	List	   *targetAttrs = NIL;
 	List	   *withCheckOptionList = NIL;
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	List	   *priv = NIL;
 
 	initStringInfo(&sql);
+	initStringInfo(&batch_sql);
 
 	/*
 	 * Core code already has some lock on each rel being planned, so we can
@@ -1752,6 +1773,11 @@ postgresPlanForeignModify(PlannerInfo *root,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
 							 &retrieved_attrs);
+
+			deparseBatchInsertSql(&batch_sql, rte, resultRelation, rel,
+								  targetAttrs, doNothing,
+								  withCheckOptionList, returningList,
+								  &retrieved_attrs, BATCH_SIZE);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1775,10 +1801,13 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
-					  targetAttrs,
-					  makeInteger((retrieved_attrs != NIL)),
-					  retrieved_attrs);
+	priv = lappend(priv, makeString(sql.data));
+	priv = lappend(priv, makeString(batch_sql.data));
+	priv = lappend(priv, targetAttrs);
+	priv = lappend(priv, makeInteger((retrieved_attrs != NIL)));
+	priv = lappend(priv, retrieved_attrs);
+
+	return priv;
 }
 
 /*
@@ -1794,6 +1823,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 {
 	PgFdwModifyState *fmstate;
 	char	   *query;
+	char	   *batch_query;
 	List	   *target_attrs;
 	bool		has_returning;
 	List	   *retrieved_attrs;
@@ -1809,6 +1839,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	/* Deconstruct fdw_private data. */
 	query = strVal(list_nth(fdw_private,
 							FdwModifyPrivateUpdateSql));
+	batch_query = strVal(list_nth(fdw_private,
+							FdwModifyPrivateUpdateBatchSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
 	has_returning = intVal(list_nth(fdw_private,
@@ -1827,6 +1859,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->operation,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
+									batch_query,
 									target_attrs,
 									has_returning,
 									retrieved_attrs);
@@ -1925,6 +1958,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
 	StringInfoData sql;
+	StringInfoData batch_sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
@@ -1946,6 +1980,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 						RelationGetRelationName(rel))));
 
 	initStringInfo(&sql);
+	initStringInfo(&batch_sql);
 
 	/* We transmit all columns that are defined in the foreign table. */
 	for (attnum = 1; attnum <= tupdesc->natts; attnum++)
@@ -2002,6 +2037,12 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 					 resultRelInfo->ri_returningList,
 					 &retrieved_attrs);
 
+	/* Construct the SQL command string. */
+	deparseBatchInsertSql(&batch_sql, rte, resultRelation, rel, targetAttrs, doNothing,
+					 resultRelInfo->ri_WithCheckOptions,
+					 resultRelInfo->ri_returningList,
+					 &retrieved_attrs, BATCH_SIZE);
+
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
 									rte,
@@ -2009,6 +2050,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									CMD_INSERT,
 									NULL,
 									sql.data,
+									batch_sql.data,
 									targetAttrs,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
@@ -2040,6 +2082,9 @@ postgresEndForeignInsert(EState *estate,
 
 	Assert(fmstate != NULL);
 
+	/* Send over remaining data to insert. */
+	flush_foreign_modify(estate, resultRelInfo, CMD_INSERT, NULL, NULL);
+
 	/*
 	 * If the fmstate has aux_fmstate set, get the aux_fmstate (see
 	 * postgresBeginForeignInsert())
@@ -3536,6 +3581,7 @@ create_foreign_modify(EState *estate,
 					  CmdType operation,
 					  Plan *subplan,
 					  char *query,
+					  char *batch_query,
 					  List *target_attrs,
 					  bool has_returning,
 					  List *retrieved_attrs)
@@ -3571,15 +3617,24 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	fmstate->batch_query = batch_query;
 	fmstate->target_attrs = target_attrs;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
+	fmstate->nbatched = 0;
+	fmstate->maxbatched = BATCH_SIZE * list_length(target_attrs);
+	fmstate->values = palloc(fmstate->maxbatched * sizeof(char *));
+
 	/* Create context for per-tuple temp workspace. */
 	fmstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_SIZES);
 
+	fmstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+											   "postgres_fdw batch data",
+											   ALLOCSET_DEFAULT_SIZES);
+
 	/* Prepare for input conversion of RETURNING results. */
 	if (fmstate->has_returning)
 		fmstate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
@@ -3646,7 +3701,9 @@ execute_foreign_modify(EState *estate,
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
-	int			n_rows;
+	int			n_rows = 0;
+	MemoryContext	oldctx;
+	int			i;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
@@ -3677,48 +3734,163 @@ execute_foreign_modify(EState *estate,
 	/* Convert parameters needed by prepared statement to text form */
 	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
 
-	/*
-	 * Execute the prepared statement.
-	 */
-	if (!PQsendQueryPrepared(fmstate->conn,
-							 fmstate->p_name,
-							 fmstate->p_nums,
-							 p_values,
-							 NULL,
-							 NULL,
-							 0))
-		pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+	/* copy the parameters to the batch */
+	oldctx = MemoryContextSwitchTo(fmstate->batch_cxt);
+
+	for (i = 0; i < fmstate->p_nums; i++)
+		if (p_values[i] == NULL)
+			fmstate->values[fmstate->nbatched++] = NULL;
+		else
+			fmstate->values[fmstate->nbatched++] = pstrdup(p_values[i]);
+
+	MemoryContextSwitchTo(oldctx);
+
+	Assert(fmstate->nbatched <= fmstate->maxbatched);
+
+	/* if the batch is "full" we need to flush it */
+	if (fmstate->nbatched == fmstate->maxbatched || fmstate->has_returning)
+	{
+		/*
+		 * Execute the prepared statement.
+		 */
+		if (fmstate->has_returning)
+		{
+			if (!PQsendQueryPrepared(fmstate->conn,
+								 fmstate->p_name,
+								 fmstate->nbatched,
+								 fmstate->values,
+								 NULL,
+								 NULL,
+								 0))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+		}
+		else if (!PQsendQueryPrepared(fmstate->conn,
+								 fmstate->p_name_batch,
+								 fmstate->nbatched,
+								 fmstate->values,
+								 NULL,
+								 NULL,
+								 0))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->batch_query);
+
+		/*
+		 * Get the result, and check for success.
+		 *
+		 * We don't use a PG_TRY block here, so be careful not to throw error
+		 * without releasing the PGresult.
+		 */
+		res = pgfdw_get_result(fmstate->conn, fmstate->query);
+		if (PQresultStatus(res) !=
+			(fmstate->has_returning ? PGRES_TUPLES_OK : PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		/* Check number of rows affected, and fetch RETURNING tuple if any */
+		if (fmstate->has_returning)
+		{
+			n_rows = PQntuples(res);
+			if (n_rows > 0)
+				store_returning_result(fmstate, slot, res);
+		}
+		else
+			n_rows = atoi(PQcmdTuples(res));
+
+		/* And clean up */
+		PQclear(res);
+
+		MemoryContextReset(fmstate->batch_cxt);
+
+		fmstate->nbatched = 0;
+	}
+	else
+		/* XXX we don't know if the future insert succeeds */
+		n_rows = 1;
+
+	MemoryContextReset(fmstate->temp_cxt);
 
 	/*
-	 * Get the result, and check for success.
-	 *
-	 * We don't use a PG_TRY block here, so be careful not to throw error
-	 * without releasing the PGresult.
+	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	res = pgfdw_get_result(fmstate->conn, fmstate->query);
-	if (PQresultStatus(res) !=
-		(fmstate->has_returning ? PGRES_TUPLES_OK : PGRES_COMMAND_OK))
-		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+	return (n_rows > 0) ? slot : NULL;
+}
 
-	/* Check number of rows affected, and fetch RETURNING tuple if any */
-	if (fmstate->has_returning)
+/*
+ * flush_foreign_modify
+ *		Perform foreign-table modification as required, and fetch RETURNING
+ *		result if any.  (This is the shared guts of postgresExecForeignInsert,
+ *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ */
+static TupleTableSlot *
+flush_foreign_modify(EState *estate,
+					   ResultRelInfo *resultRelInfo,
+					   CmdType operation,
+					   TupleTableSlot *slot,
+					   TupleTableSlot *planSlot)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	PGresult   *res;
+	int			n_rows = 0;
+	int			i;
+
+	/* The operation should be INSERT, UPDATE, or DELETE */
+	Assert(operation == CMD_INSERT ||
+		   operation == CMD_UPDATE ||
+		   operation == CMD_DELETE);
+
+	/* Set up the prepared statement on the remote server, if we didn't yet */
+	if (!fmstate->p_name)
+		prepare_foreign_modify(fmstate);
+
+	/* if the batch is "full" we need to flush it */
+	i = 0;
+	while (i < fmstate->nbatched)
 	{
-		n_rows = PQntuples(res);
-		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+		/*
+		 * Execute the prepared statement.
+		 */
+		if (!PQsendQueryPrepared(fmstate->conn,
+								 fmstate->p_name,
+								 fmstate->p_nums,
+								 &fmstate->values[i],
+								 NULL,
+								 NULL,
+								 0))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+		/*
+		 * Get the result, and check for success.
+		 *
+		 * We don't use a PG_TRY block here, so be careful not to throw error
+		 * without releasing the PGresult.
+		 */
+		res = pgfdw_get_result(fmstate->conn, fmstate->query);
+		if (PQresultStatus(res) !=
+			(fmstate->has_returning ? PGRES_TUPLES_OK : PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+		/* Check number of rows affected, and fetch RETURNING tuple if any */
+		if (fmstate->has_returning)
+		{
+			n_rows = PQntuples(res);
+			if (n_rows > 0)
+				store_returning_result(fmstate, slot, res);
+		}
+		else
+			n_rows = atoi(PQcmdTuples(res));
+
+		/* And clean up */
+		PQclear(res);
+
+		i += fmstate->p_nums;
 	}
-	else
-		n_rows = atoi(PQcmdTuples(res));
 
-	/* And clean up */
-	PQclear(res);
+	Assert(i == fmstate->nbatched);
 
 	MemoryContextReset(fmstate->temp_cxt);
 
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return NULL;
 }
 
 /*
@@ -3729,7 +3901,9 @@ static void
 prepare_foreign_modify(PgFdwModifyState *fmstate)
 {
 	char		prep_name[NAMEDATALEN];
+	char		prep_name_batch[NAMEDATALEN];
 	char	   *p_name;
+	char	   *p_name_batch;
 	PGresult   *res;
 
 	/* Construct name we'll use for the prepared statement. */
@@ -3737,6 +3911,11 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 			 GetPrepStmtNumber(fmstate->conn));
 	p_name = pstrdup(prep_name);
 
+	/* Construct name we'll use for the batch prepared statement. */
+	snprintf(prep_name_batch, sizeof(prep_name_batch), "pgsql_fdw_prep_%u",
+			 GetPrepStmtNumber(fmstate->conn));
+	p_name_batch = pstrdup(prep_name_batch);
+
 	/*
 	 * We intentionally do not specify parameter types here, but leave the
 	 * remote server to derive them by default.  This avoids possible problems
@@ -3762,8 +3941,22 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
 	PQclear(res);
 
+	if (fmstate->batch_query &&
+		!PQsendPrepare(fmstate->conn,
+					   p_name_batch,
+					   fmstate->batch_query,
+					   0,
+					   NULL))
+		pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->batch_query);
+
+	res = pgfdw_get_result(fmstate->conn, fmstate->batch_query);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->batch_query);
+	PQclear(res);
+
 	/* This action shows that the prepare has been done. */
 	fmstate->p_name = p_name;
+	fmstate->p_name_batch = p_name_batch;
 }
 
 /*
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..7e4342cab6 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -162,6 +162,11 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
 							 List **retrieved_attrs);
+extern void deparseBatchInsertSql(StringInfo buf, RangeTblEntry *rte,
+							 Index rtindex, Relation rel,
+							 List *targetAttrs, bool doNothing,
+							 List *withCheckOptionList, List *returningList,
+							 List **retrieved_attrs, int batchSize);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
-- 
2.25.4

#2Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#1)
Re: POC: postgres_fdw insert batching

Hi Tomas,

On Mon, Jun 29, 2020 at 12:10 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hi,

One of the issues I'm fairly regularly reminded by users/customers is
that inserting into tables sharded using FDWs are rather slow. We do
even get it reported on pgsql-bugs from time to time [1].

Some of the slowness / overhead is expected, doe to the latency between
machines in the sharded setup. Even just 1ms latency will make it way
more expensive than a single instance.

But let's do a simple experiment, comparing a hash-partitioned regular
partitions, and one with FDW partitions in the same instance. Scripts to
run this are attached. The duration of inserting 1M rows to this table
(average of 10 runs on my laptop) looks like this:

regular: 2872 ms
FDW: 64454 ms

Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
which is probably where most single-DC clusters are :-(

Now, the primary reason why the performance degrades like this is that
while FDW has batching for SELECT queries (i.e. we read larger chunks of
data from the cursors), we don't have that for INSERTs (or other DML).
Every time you insert a row, it has to go all the way down into the
partition synchronously.

For some use cases this may be reduced by having many independent
connnections from different users, so the per-user latency is higher but
acceptable. But if you need to import larger amounts of data (say, a CSV
file for analytics, ...) this may not work.

Some time ago I wrote an ugly PoC adding batching, just to see how far
would it get us, and it seems quite promising - results for he same
INSERT benchmarks look like this:

FDW batching: 4584 ms

So, rather nice improvement, I'd say ...

Very nice indeed.

Before I spend more time hacking on this, I have a couple open questions
about the design, restrictions etc.

I think you may want to take a look this recent proposal by Andrey Lepikhov:

* [POC] Fast COPY FROM command for the table with foreign partitions *
/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#3Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Tomas Vondra (#1)
Re: POC: postgres_fdw insert batching

On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

FDW batching: 4584 ms

So, rather nice improvement, I'd say ...

Very nice.

Before I spend more time hacking on this, I have a couple open questions
about the design, restrictions etc.

1) Extend the FDW API?

In the patch, the batching is simply "injected" into the existing insert
API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
extend the API with a "batched" version of the method, so that we can
easily determine whether the FDW supports batching or not - it would
require changes in the callers, though. OTOH it might be useful for
COPY, where we could do something similar to multi_insert (COPY already
benefits from this patch, but it does not use the batching built-into
COPY).

Amit Langote has pointed out a related patch being discussed on hackers at [1]/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru.

That patch introduces a new API. But if we can do it without
introducing a new API that will be good. FDWs which can support
batching can just modify their code and don't have to implement and
manage a new API. We already have a handful of those APIs.

2) What about the insert results?

I'm not sure what to do about "result" status for the inserted rows. We
only really "stash" the rows into a buffer, so we don't know if it will
succeed or not. The patch simply assumes it will succeed, but that's
clearly wrong, and it may result in reporting a wrong number or rows.

I didn't get this. We are executing an INSERT on the foreign server,
so we get the number of rows INSERTed from that server. We should just
add those up across batches. If there's a failure, it would abort the
transaction, local as well as remote.

The patch also disables the batching when the insert has a RETURNING
clause, because there's just a single slot (for the currently inserted
row). I suppose a "batching" method would take an array of slots.

It will be a rare case when a bulk load also has a RETURNING clause.
So, we can leave with this restriction. We should try to choose a
design which allows that restriction to be lifted in the future. But I
doubt that restriction will be a serious one.

3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.

Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In your patch, I see that an INSERT statement with batch is
constructed as INSERT INTO ... VALUES (...), (...) as many values as
the batch size. That won't work as is for UPDATE/DELETE since we can't
pass multiple pairs of ctids and columns to be updated for each ctid
in one statement. Maybe we could build as many UPDATE/DELETE
statements as the size of a batch, but that would be ugly. What we
need is a feature like a batch prepared statement in libpq similar to
what JDBC supports
((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
This will allow a single prepared statement to be executed with a
batch of parameters, each batch corresponding to one foreign DML
statement.

3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

I think this won't support RETURNING as well. But if we could somehow
use copy protocol to send the data to the foreign server and yet treat
it as INSERT, that might work. I think we have find out which performs
better COPY or batch INSERT.

For cases that can replace INSERT with COPY this would be enough, but
unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
this :-(

Agreed, if we want to support bulk UPDATE/DELETE as well.

4) Expected consistency?

I'm not entirely sure what are the consistency expectations for FDWs.
Currently the FDW nodes pointing to the same server share a connection,
so the inserted rows might be visible to other nodes. But if we only
stash the rows in a local buffer for a while, that's no longer true. So
maybe this breaks the consistency expectations?

But maybe that's OK - I'm not sure how the prepared statements/cursors
affect this. I can imagine restricting the batching only to plans where
this is not an issue (single FDW node or something), but it seems rather
fragile and undesirable.

I think that area is grey. Depending upon where the cursor is
positioned when a DML node executes a query, the data fetched from
cursor may or may not see the effect of DML. The cursor position is
based on the batch size so we already have problems in this area I
think. Assuming that the DML and SELECT are independent this will
work. So, the consistency problems exists, it will just be modulated
by batching DML. I doubt that's related to this feature exclusively
and should be solved independent of this feature.

I was thinking about adding a GUC to enable/disable the batching at some
level (global, server, table, ...) but it seems like a bad match because
the consistency expectations likely depend on a query. There should be a
GUC to set the batch size, though (it's hardcoded to 100 for now).

Similar to fetch_size, it should foreign server, table level setting, IMO.

[1]: /messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

--
Best Wishes,
Ashutosh Bapat

#4Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Ashutosh Bapat (#3)
Re: POC: postgres_fdw insert batching

On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.

Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify. So, I'm not sure we
really need the bulk UPDATE/DELETE.

3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

I think we have find out which performs
better COPY or batch INSERT.

Maybe I'm missing something, but I think the COPY patch [1] seems more
promising to me, because 1) it would not get the remote side's planner
and executor involved, and 2) the data would be loaded more
efficiently by multi-insert on the demote side. (Yeah, COPY doesn't
support RETURNING, but it's rare that RETURNING is needed in a bulk
load, as you mentioned.)

[1] /messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

Best regards,
Etsuro Fujita

#5Ashutosh Bapat
ashutosh.bapat@2ndquadrant.com
In reply to: Etsuro Fujita (#4)
Re: POC: postgres_fdw insert batching

On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.

Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify. So, I'm not sure we
really need the bulk UPDATE/DELETE.

That may not be true for a partitioned table whose partitions are foreign
tables. Esp. given the work that Amit Langote is doing [1]/messages/by-id/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com -- Best Wishes, Ashutosh. It really
depends on the ability of postgres_fdw to detect that the DML modifying
each of the partitions can be pushed down. That may not come easily.

3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

I think we have find out which performs
better COPY or batch INSERT.

Maybe I'm missing something, but I think the COPY patch [1] seems more
promising to me, because 1) it would not get the remote side's planner
and executor involved, and 2) the data would be loaded more
efficiently by multi-insert on the demote side. (Yeah, COPY doesn't
support RETURNING, but it's rare that RETURNING is needed in a bulk
load, as you mentioned.)

[1]

/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

Best regards,
Etsuro Fujita

[1]: /messages/by-id/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com -- Best Wishes, Ashutosh
/messages/by-id/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
--
Best Wishes,
Ashutosh

#6Amit Langote
amitlangote09@gmail.com
In reply to: Ashutosh Bapat (#5)
Re: POC: postgres_fdw insert batching

On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:

On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.

Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify. So, I'm not sure we
really need the bulk UPDATE/DELETE.

That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.

While it's true that how to accommodate the DirectModify API in the
new inherited update/delete planning approach is an open question on
that thread, I would eventually like to find an answer to that. That
is, that work shouldn't result in losing the foreign partition's
ability to use DirectModify API to optimize updates/deletes.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#7Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#6)
Re: POC: postgres_fdw insert batching

On Tue, Jun 30, 2020 at 2:54 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:

On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.

Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify. So, I'm not sure we
really need the bulk UPDATE/DELETE.

That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.

While it's true that how to accommodate the DirectModify API in the
new inherited update/delete planning approach is an open question on
that thread, I would eventually like to find an answer to that. That
is, that work shouldn't result in losing the foreign partition's
ability to use DirectModify API to optimize updates/deletes.

That would be great!

Best regards,
Etsuro Fujita

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Ashutosh Bapat (#3)
Re: POC: postgres_fdw insert batching

On Mon, Jun 29, 2020 at 04:22:15PM +0530, Ashutosh Bapat wrote:

On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

FDW batching: 4584 ms

So, rather nice improvement, I'd say ...

Very nice.

Before I spend more time hacking on this, I have a couple open questions
about the design, restrictions etc.

1) Extend the FDW API?

In the patch, the batching is simply "injected" into the existing insert
API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
extend the API with a "batched" version of the method, so that we can
easily determine whether the FDW supports batching or not - it would
require changes in the callers, though. OTOH it might be useful for
COPY, where we could do something similar to multi_insert (COPY already
benefits from this patch, but it does not use the batching built-into
COPY).

Amit Langote has pointed out a related patch being discussed on hackers at [1].

That patch introduces a new API. But if we can do it without
introducing a new API that will be good. FDWs which can support
batching can just modify their code and don't have to implement and
manage a new API. We already have a handful of those APIs.

I don't think extending the API is a big issue - the FDW code will need
changing anyway, so this seems minor.

I'll take a look at the COPY patch - I agree it seems like a good idea,
although it can be less convenient in various caes (e.g. I've seen a lot
of INSERT ... SELECT queries in sharded systems, etc.).

2) What about the insert results?

I'm not sure what to do about "result" status for the inserted rows. We
only really "stash" the rows into a buffer, so we don't know if it will
succeed or not. The patch simply assumes it will succeed, but that's
clearly wrong, and it may result in reporting a wrong number or rows.

I didn't get this. We are executing an INSERT on the foreign server,
so we get the number of rows INSERTed from that server. We should just
add those up across batches. If there's a failure, it would abort the
transaction, local as well as remote.

True, but it's not the FDW code doing the counting - it's the caller,
depending on whether the ExecForeignInsert returns a valid slot or NULL.
So it's not quite possible to just return a number of inserted tuples,
as returned by the remote server.

The patch also disables the batching when the insert has a RETURNING
clause, because there's just a single slot (for the currently inserted
row). I suppose a "batching" method would take an array of slots.

It will be a rare case when a bulk load also has a RETURNING clause.
So, we can leave with this restriction. We should try to choose a
design which allows that restriction to be lifted in the future. But I
doubt that restriction will be a serious one.

3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.

Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In your patch, I see that an INSERT statement with batch is
constructed as INSERT INTO ... VALUES (...), (...) as many values as
the batch size. That won't work as is for UPDATE/DELETE since we can't
pass multiple pairs of ctids and columns to be updated for each ctid
in one statement. Maybe we could build as many UPDATE/DELETE
statements as the size of a batch, but that would be ugly. What we
need is a feature like a batch prepared statement in libpq similar to
what JDBC supports
((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
This will allow a single prepared statement to be executed with a
batch of parameters, each batch corresponding to one foreign DML
statement.

I'm pretty sure we could make it work with some array/unnest tricks to
build a relation, and use that as a source of data.

3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

I think this won't support RETURNING as well. But if we could somehow
use copy protocol to send the data to the foreign server and yet treat
it as INSERT, that might work. I think we have find out which performs
better COPY or batch INSERT.

I don't see why not support both, the use cases are somewhat different I
think.

For cases that can replace INSERT with COPY this would be enough, but
unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
this :-(

Agreed, if we want to support bulk UPDATE/DELETE as well.

4) Expected consistency?

I'm not entirely sure what are the consistency expectations for FDWs.
Currently the FDW nodes pointing to the same server share a connection,
so the inserted rows might be visible to other nodes. But if we only
stash the rows in a local buffer for a while, that's no longer true. So
maybe this breaks the consistency expectations?

But maybe that's OK - I'm not sure how the prepared statements/cursors
affect this. I can imagine restricting the batching only to plans where
this is not an issue (single FDW node or something), but it seems rather
fragile and undesirable.

I think that area is grey. Depending upon where the cursor is
positioned when a DML node executes a query, the data fetched from
cursor may or may not see the effect of DML. The cursor position is
based on the batch size so we already have problems in this area I
think. Assuming that the DML and SELECT are independent this will
work. So, the consistency problems exists, it will just be modulated
by batching DML. I doubt that's related to this feature exclusively
and should be solved independent of this feature.

OK, thanks for the feedback.

I was thinking about adding a GUC to enable/disable the batching at some
level (global, server, table, ...) but it seems like a bad match because
the consistency expectations likely depend on a query. There should be a
GUC to set the batch size, though (it's hardcoded to 100 for now).

Similar to fetch_size, it should foreign server, table level setting, IMO.

[1] /messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

Yeah, I agree we should have a GUC to define the batch size. What I had
in mind was something that would allow us to enable/disable batching to
increase the consistency guarantees, or something like that. I think
simple GUCs are a poor solution for that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Ashutosh Bapat
ashutosh.bapat@2ndquadrant.com
In reply to: Tomas Vondra (#8)
Re: POC: postgres_fdw insert batching

On Tue, 30 Jun 2020 at 22:23, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

I didn't get this. We are executing an INSERT on the foreign server,
so we get the number of rows INSERTed from that server. We should just
add those up across batches. If there's a failure, it would abort the
transaction, local as well as remote.

True, but it's not the FDW code doing the counting - it's the caller,
depending on whether the ExecForeignInsert returns a valid slot or NULL.
So it's not quite possible to just return a number of inserted tuples,
as returned by the remote server.

Hmm yes, now I remember that bit. So for every row buffered, we return a
valid slot without knowing whether that row was inserted on the remote
server or not. I think we have that problem even now where a single INSERT
might result in multiple INSERTs on the remote server (rare but not
completely impossible).

In your patch, I see that an INSERT statement with batch is
constructed as INSERT INTO ... VALUES (...), (...) as many values as
the batch size. That won't work as is for UPDATE/DELETE since we can't
pass multiple pairs of ctids and columns to be updated for each ctid
in one statement. Maybe we could build as many UPDATE/DELETE
statements as the size of a batch, but that would be ugly. What we
need is a feature like a batch prepared statement in libpq similar to
what JDBC supports
((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
This will allow a single prepared statement to be executed with a
batch of parameters, each batch corresponding to one foreign DML
statement.

I'm pretty sure we could make it work with some array/unnest tricks to
build a relation, and use that as a source of data.

That sounds great. The solution will be limited to postgres_fdw only.

I don't see why not support both, the use cases are somewhat different I
think.

+1, if we can do both.

--
Best Wishes,
Ashutosh

#10Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#1)
Re: POC: postgres_fdw insert batching

Hi,

On 2020-06-28 17:10:02 +0200, Tomas Vondra wrote:

3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

For cases that can replace INSERT with COPY this would be enough, but
unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
this :-(

I personally think - and I realize that that might be annoying to
somebody looking to make an incremental improvement - that the
nodeModifyTable.c and copy.c code dealing with DML has become too
complicated to add features like this without a larger
refactoring. Leading to choices like this, whether to add a feature in
one place but not the other.

I think before we add more complexity, we ought to centralize and clean
up the DML handling code so most is shared between copy.c and
nodeModifyTable.c. Then we can much more easily add batching to FDWs, to
CTAS, to INSERT INTO SELECT etc, for which there are patches already.

4) Expected consistency?

I'm not entirely sure what are the consistency expectations for FDWs.
Currently the FDW nodes pointing to the same server share a connection,
so the inserted rows might be visible to other nodes. But if we only
stash the rows in a local buffer for a while, that's no longer true. So
maybe this breaks the consistency expectations?

Given that for local queries that's not the case (since the snapshot
won't have those changes visible), I think we shouldn't be too concerned
about that. If anything we should be concerned about the opposite.

If we are concerned, perhaps we could add functionality to flush all
pending changes before executing further statements?

I was thinking about adding a GUC to enable/disable the batching at some
level (global, server, table, ...) but it seems like a bad match because
the consistency expectations likely depend on a query. There should be a
GUC to set the batch size, though (it's hardcoded to 100 for now).

Hm. If libpq allowed to utilize pipelining ISTM the answer here would be
to not batch by building a single statement with all rows as a VALUES,
but issue the single INSERTs in a pipelined manner. That'd probably
remove all behavioural differences. I really wish somebody would pick
up that libpq patch again.

Greetings,

Andres Freund

#11Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Tomas Vondra (#1)
Re: POC: postgres_fdw insert batching

On 6/28/20 8:10 PM, Tomas Vondra wrote:

Now, the primary reason why the performance degrades like this is that
while FDW has batching for SELECT queries (i.e. we read larger chunks of
data from the cursors), we don't have that for INSERTs (or other DML).
Every time you insert a row, it has to go all the way down into the
partition synchronously.

You added new fields into the PgFdwModifyState struct. Why you didn't
reused ResultRelInfo::ri_CopyMultiInsertBuffer field and
CopyMultiInsertBuffer machinery as storage for incoming tuples?

--
regards,
Andrey Lepikhov
Postgres Professional

#12Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andrey V. Lepikhov (#11)
Re: POC: postgres_fdw insert batching

On Fri, Jul 10, 2020 at 09:28:44AM +0500, Andrey V. Lepikhov wrote:

On 6/28/20 8:10 PM, Tomas Vondra wrote:

Now, the primary reason why the performance degrades like this is that
while FDW has batching for SELECT queries (i.e. we read larger chunks of
data from the cursors), we don't have that for INSERTs (or other DML).
Every time you insert a row, it has to go all the way down into the
partition synchronously.

You added new fields into the PgFdwModifyState struct. Why you didn't
reused ResultRelInfo::ri_CopyMultiInsertBuffer field and
CopyMultiInsertBuffer machinery as storage for incoming tuples?

Because I was focused on speeding-up inserts, and that is not using
CopyMultiInsertBuffer I think. I agree the way the tuples are stored
may be improved, of course.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#12)
Re: POC: postgres_fdw insert batching

On Sun, Jul 12, 2020 at 02:11:01AM +0200, Tomas Vondra wrote:

Because I was focused on speeding-up inserts, and that is not using
CopyMultiInsertBuffer I think. I agree the way the tuples are stored
may be improved, of course.

The CF bot is telling that the regression tests of postgres_fdw are
crashing. Could you look at that?
--
Michael

#14tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#12)
RE: POC: postgres_fdw insert batching

Hello Tomas san,

Thank you for picking up this. I'm interested in this topic, too. (As an aside, we'd like to submit a bulk insert patch for ECPG in the near future.)

As others referred, Andrey-san's fast COPY to foreign partitions is also promising. But I think your bulk INSERT is a separate feature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS SELECT.

Is there anything that makes you worry and stops development? Could I give it a try to implement this (I'm not sure I can, sorry. I'm worried if we can change the executor's call chain easily.)

1) Extend the FDW API?

Yes, I think, because FDWs for other DBMSs will benefit from this. (But it's questionable whether we want users to transfer data in Postgres database to other DBMSs...)

MySQL and SQL Server has the same bulk insert syntax as Postgres, i.e., INSERT INTO table VALUES(record1), (record2), ... Oracle doesn't have this syntax, but it can use CTE as follows:

INSERT INTO table
WITH t AS (
SELECT record1 FROM DUAL UNION ALL
SELECT record2 FROM DUAL UNION ALL
...
)
SELECT * FROM t;

And many DBMSs should have CTAS, INSERT SELECT, and INSERT SELECT record1 UNION ALL SELECT record2 ...

The API would simply be:

TupleTableSlot **
ExecForeignMultiInsert(EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot **slot,
TupleTableSlot **planSlot,
int numSlots);

2) What about the insert results?

I'm wondering if we can report success or failure of each inserted row, because the remote INSERT will fail entirely. Other FDWs may be able to do it, so the API can be like above.

For the same reason, support for RETURNING clause will vary from DBMS to DBMS.

3) What about the other DML operations (DELETE/UPDATE)?

I don't think they are necessary for the time being. If we want them, they will be implemented using the libpq batch/pipelining as Andres-san said.

3) Should we do batching for COPY insteads?

I'm thinking of issuing INSERT with multiple records as your patch does, because:

* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY.

* COPY doesn't invoke rules unlike INSERT. (I don't think the rule is a feature what users care about, though.) Also, I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY.

[1]: Fast COPY FROM command for the table with foreign partitions /messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru
Fast COPY FROM command for the table with foreign partitions
/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

Regards
Takayuki Tsunakawa

#15Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: tsunakawa.takay@fujitsu.com (#14)
Re: POC: postgres_fdw insert batching

On Thu, Oct 08, 2020 at 02:40:10AM +0000, tsunakawa.takay@fujitsu.com wrote:

Hello Tomas san,

Thank you for picking up this. I'm interested in this topic, too. (As an aside, we'd like to submit a bulk insert patch for ECPG in the near future.)

As others referred, Andrey-san's fast COPY to foreign partitions is also promising. But I think your bulk INSERT is a separate feature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS SELECT.

Is there anything that makes you worry and stops development? Could I give it a try to implement this (I'm not sure I can, sorry. I'm worried if we can change the executor's call chain easily.)

It's primarily a matter of having too much other stuff on my plate, thus
not having time to work on this feature. I was not too worried about any
particular issue, but I wanted some feedback before spending more time
on extending the API.

I'm not sure when I'll have time to work on this again, so if you are
interested and willing to work on it, please go ahead. I'll gladly do
reviews and help you with it.

1) Extend the FDW API?

Yes, I think, because FDWs for other DBMSs will benefit from this. (But it's questionable whether we want users to transfer data in Postgres database to other DBMSs...)

I think transferring data to other databases is fine - interoperability
is a big advantage for users, I don't see it as something threatening
the PostgreSQL project. I doubt this would make it more likely for users
to migrate from PostgreSQL - there are many ways to do that already.

MySQL and SQL Server has the same bulk insert syntax as Postgres, i.e., INSERT INTO table VALUES(record1), (record2), ... Oracle doesn't have this syntax, but it can use CTE as follows:

INSERT INTO table
WITH t AS (
SELECT record1 FROM DUAL UNION ALL
SELECT record2 FROM DUAL UNION ALL
...
)
SELECT * FROM t;

And many DBMSs should have CTAS, INSERT SELECT, and INSERT SELECT record1 UNION ALL SELECT record2 ...

True. In some cases INSERT may be replaced by COPY, but it has various
other features too.

The API would simply be:

TupleTableSlot **
ExecForeignMultiInsert(EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot **slot,
TupleTableSlot **planSlot,
int numSlots);

+1, seems quite reasonable

2) What about the insert results?

I'm wondering if we can report success or failure of each inserted row, because the remote INSERT will fail entirely. Other FDWs may be able to do it, so the API can be like above.

Yeah. I think handling complete failure should not be very difficult,
but there are cases that worry me more. For example, what if there's a
before trigger (on the remote db) that "skips" inserting some of the
rows by returning NULL?

For the same reason, support for RETURNING clause will vary from DBMS to DBMS.

Yeah. I wonder if the FDW needs to indicate which features are supported
by the ExecForeignMultiInsert, e.g. by adding a function that decides
whether batch insert is supported (it might also do that internally by
calling ExecForeignInsert, of course).

3) What about the other DML operations (DELETE/UPDATE)?

I don't think they are necessary for the time being. If we want them, they will be implemented using the libpq batch/pipelining as Andres-san said.

I agree.

3) Should we do batching for COPY insteads?

I'm thinking of issuing INSERT with multiple records as your patch does, because:

* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY.

* COPY doesn't invoke rules unlike INSERT. (I don't think the rule is a feature what users care about, though.) Also, I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY.

I agree.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#15)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@2ndquadrant.com>

I'm not sure when I'll have time to work on this again, so if you are
interested and willing to work on it, please go ahead. I'll gladly do
reviews and help you with it.

Thank you very much.

I think transferring data to other databases is fine - interoperability
is a big advantage for users, I don't see it as something threatening
the PostgreSQL project. I doubt this would make it more likely for users
to migrate from PostgreSQL - there are many ways to do that already.

Definitely true. Users may want to use INSERT SELECT to do some data transformation in their OLTP database and load it into a non-Postgres data warehouse.

Yeah. I think handling complete failure should not be very difficult,
but there are cases that worry me more. For example, what if there's a
before trigger (on the remote db) that "skips" inserting some of the
rows by returning NULL?

Yeah. I wonder if the FDW needs to indicate which features are supported
by the ExecForeignMultiInsert, e.g. by adding a function that decides
whether batch insert is supported (it might also do that internally by
calling ExecForeignInsert, of course).

Thanks for your advice. I'll try to address them.

Regards
Takayuki Tsunakawa

#17tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: tsunakawa.takay@fujitsu.com (#16)
4 attachment(s)
RE: POC: postgres_fdw insert batching

Hello,

The attached patch implements the new bulk insert routine for postgres_fdw and the executor utilizing it. It passes make check-world.

I measured performance in a basic non-partitioned case by modifying Tomas-san's scripts. They perform an INSERT SELECT statement that copies one million records. The table consists of two integer columns, with a primary key on one of those them. You can run the attached prepare.sql to set up once. local.sql inserts to the table directly, while fdw.sql inserts through a foreign table.

The performance results, the average time of 5 runs, were as follows on a Linux host where the average round-trip time of "ping localhost" was 34 us:

master, local: 6.1 seconds
master, fdw: 125.3 seconds
patched, fdw: 11.1 seconds (11x improvement)

The patch accumulates at most 100 records in ModifyTableState before inserting in bulk. Also, when an input record is targeted for a different relation (= partition) than that for already accumulated records, insert the accumulated records and store the new record for later insert.

[Issues]

1. Do we want a GUC parameter, say, max_bulk_insert_records = (integer), to control the number of records inserted at once?
The range of allowed values would be between 1 and 1,000. 1 disables bulk insert.
The possible reason of the need for this kind of parameter would be to limit the amount of memory used for accumulated records, which could be prohibitively large if each record is big. I don't think this is a must, but I think we can have it.

2. Should we accumulate records per relation in ResultRelInfo instead?
That is, when inserting into a partitioned table that has foreign partitions, delay insertion until a certain number of input records accumulate, and then insert accumulated records per relation (e.g., 50 records to relation A, 30 records to relation B, and 20 records to relation C.) If we do that,

* The order of insertion differs from the order of input records. Is it OK?

* Should the maximum count of accumulated records be applied per relation or the query?
When many foreign partitions belong to a partitioned table, if the former is chosen, it may use much memory in total. If the latter is chosen, the records per relation could be few and thus the benefit of bulk insert could be small.

Regards
Takayuki Tsunakawa

Attachments:

fdw.sqlapplication/octet-stream; name=fdw.sqlDownload
local.sqlapplication/octet-stream; name=local.sqlDownload
prepare.sqlapplication/octet-stream; name=prepare.sqlDownload
v1-0001-Add-bulk-insert-for-foreign-tables.patchapplication/octet-stream; name=v1-0001-Add-bulk-insert-for-foreign-tables.patchDownload
From 29ff4fbd97386eb0f2ab8a9bc80420c74fc4fd82 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 10 Nov 2020 09:27:56 +0900
Subject: [PATCH v1] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c         |   3 +-
 contrib/postgres_fdw/postgres_fdw.c    | 233 ++++++++++++++++++++++++++-------
 contrib/postgres_fdw/postgres_fdw.h    |   2 +-
 doc/src/sgml/fdwhandler.sgml           |  64 ++++++++-
 src/backend/executor/nodeModifyTable.c | 135 +++++++++++++++++++
 src/include/foreign/fdwapi.h           |   7 +
 src/include/nodes/execnodes.h          |   6 +
 7 files changed, 395 insertions(+), 55 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df1..5aa81db 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1706,7 +1706,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1749,6 +1749,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaac..f7be4be 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -86,8 +86,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -95,6 +97,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -175,7 +179,9 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			len;			/* length of some part of query */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -184,6 +190,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* bulk operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -342,6 +351,11 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBulkInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -428,20 +442,23 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
-static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void finish_foreign_modify(PgFdwModifyState *fmstate, bool release_conn);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -529,6 +546,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBulkInsert = postgresExecForeignBulkInsert;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1663,7 +1681,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *withCheckOptionList = NIL;
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
+	List	   *retvalList;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1751,7 +1771,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1775,10 +1795,12 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	retvalList = list_make4(makeString(sql.data),
 					  targetAttrs,
-					  makeInteger((retrieved_attrs != NIL)),
-					  retrieved_attrs);
+					  makeInteger(values_end_len),
+					  makeInteger((retrieved_attrs != NIL)));
+	retvalList = lappend(retvalList, retrieved_attrs);
+	return retvalList;
 }
 
 /*
@@ -1796,6 +1818,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1811,6 +1834,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1828,6 +1853,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1845,7 +1871,37 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBulkInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBulkInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1854,7 +1910,7 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1872,8 +1928,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1886,8 +1947,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1905,7 +1971,7 @@ postgresEndForeignModify(EState *estate,
 		return;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, true);
 }
 
 /*
@@ -1924,6 +1990,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2000,7 +2067,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2010,6 +2077,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2048,7 +2116,7 @@ postgresEndForeignInsert(EState *estate,
 		fmstate = fmstate->aux_fmstate;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, true);
 }
 
 /*
@@ -3538,6 +3606,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int len,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3572,7 +3641,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->len = len;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3624,6 +3696,8 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3634,26 +3708,75 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBulkInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	int			i, j;
+	int			pindex;
+	bool		first;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			finish_foreign_modify(fmstate, false);
+
+		/*
+		 * Recreate INSERT command string with numSlots records in its
+		 * VALUES clause
+		 */
+
+		/* Copy up to the end of the first record from the original query */
+		initStringInfo(&sql);
+		appendBinaryStringInfo(&sql, fmstate->orig_query, fmstate->len);
+
+		/* Add records to VALUES clause */
+		pindex = fmstate->p_nums + 1;
+		for (i = 0; i < *numSlots - 1; i++)
+		{
+			appendStringInfoString(&sql, ", (");
+
+			first = true;
+			for (j = 0; j < fmstate->p_nums; j++)
+			{
+				if (!first)
+					appendStringInfoString(&sql, ", ");
+				first = false;
+
+				appendStringInfo(&sql, "$%d", pindex);
+				pindex++;
+			}
+
+			appendStringInfoChar(&sql, ')');
+		}
+
+		/* Copy stuff after VALUES clause from the original query */
+		appendStringInfoString(&sql, fmstate->orig_query + fmstate->len);
+
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3666,7 +3789,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3676,14 +3799,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3704,9 +3827,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3716,10 +3840,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3779,19 +3905,23 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
@@ -3799,32 +3929,37 @@ convert_prep_stmt_params(PgFdwModifyState *fmstate,
 	}
 
 	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3873,7 +4008,8 @@ store_returning_result(PgFdwModifyState *fmstate,
  *		Release resources for a foreign insert/update/delete operation
  */
 static void
-finish_foreign_modify(PgFdwModifyState *fmstate)
+finish_foreign_modify(PgFdwModifyState *fmstate,
+	bool release_conn)
 {
 	Assert(fmstate != NULL);
 
@@ -3897,8 +4033,11 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	}
 
 	/* Release remote connection */
-	ReleaseConnection(fmstate->conn);
-	fmstate->conn = NULL;
+	if (release_conn)
+	{
+		ReleaseConnection(fmstate->conn);
+		fmstate->conn = NULL;
+	}
 }
 
 /*
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410d..459a9ca 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c92934..cdf2959 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBulkInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,56 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBulkInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBulkInsert</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +792,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBulkInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +825,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBulkInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 29e07b7..6c4b33e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,15 @@
 #include "utils/rel.h"
 
 
+#define BULK_INSERT_ROWS 100
+
+static void ExecBulkInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +398,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -442,6 +452,55 @@ ExecInsert(ModifyTableState *mtstate,
 									   CMD_INSERT);
 
 		/*
+		 * If the FDW supports bulk insert, accumulate tuples and insert them
+		 * in bulk
+		 */
+		if (resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert &&
+			resultRelInfo->ri_projectReturning == NULL)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or	a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the bulk insert
+			 */
+			if (mtstate->mt_nslots == BULK_INSERT_ROWS ||
+				(mtstate->mt_nslots > 0 &&
+				 mtstate->bulk_rri != resultRelInfo))
+			{
+				ExecBulkInsert(mtstate, resultRelInfo,
+							   mtstate->mt_slots, mtstate->mt_planslots,
+							   mtstate->mt_nslots,
+							   estate, canSetTag);
+				mtstate->mt_nslots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (mtstate->mt_slots == NULL)
+			{
+				mtstate->mt_slots = palloc(sizeof(TupleTableSlot *) *
+										   BULK_INSERT_ROWS);
+				mtstate->mt_planslots = palloc(sizeof(TupleTableSlot *) *
+										   BULK_INSERT_ROWS);
+			}
+
+			mtstate->mt_slots[mtstate->mt_nslots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(mtstate->mt_slots[mtstate->mt_nslots], slot);
+			mtstate->mt_planslots[mtstate->mt_nslots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(mtstate->mt_planslots[mtstate->mt_nslots], planSlot);
+
+			mtstate->mt_nslots++;
+			mtstate->bulk_rri = resultRelInfo;
+
+			MemoryContextSwitchTo(oldContext);
+			return NULL;
+		}
+
+		/*
 		 * insert into foreign table: let the FDW do it
 		 */
 		slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
@@ -702,6 +761,73 @@ ExecInsert(ModifyTableState *mtstate,
 }
 
 /* ----------------------------------------------------------------
+ *		ExecBulkInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBulkInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+	{
+		estate->es_processed += numInserted;
+		setLastTid(&slot->tts_tid);
+	}
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
+/* ----------------------------------------------------------------
  *		ExecDelete
  *
  *		DELETE is like UPDATE, except that we delete the tuple and no
@@ -2156,6 +2282,15 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
+	 * Insert remaining tuples for bulk insert.
+	 */
+	if (node->mt_nslots > 0)
+		ExecBulkInsert(node, node->bulk_rri,
+					   node->mt_slots, node->mt_planslots,
+					   node->mt_nslots,
+					   estate, node->canSetTag);
+
+	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
 	fireASTriggers(node);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556df..c7eeff2 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,12 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBulkInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +215,7 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBulkInsert_function ExecForeignBulkInsert;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d6..e16d7a1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1191,6 +1191,12 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
+
+	/* bulk insert stuff */
+	int			mt_nslots;		/* number of slots in the array */
+	TupleTableSlot **mt_slots;	/* input tuples for bulk insert */
+	TupleTableSlot **mt_planslots;
+	ResultRelInfo *bulk_rri;	/* target relation for bulk insert */
 } ModifyTableState;
 
 /* ----------------
-- 
2.10.1

#18Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#17)
Re: POC: postgres_fdw insert batching

Hi,

Thanks for working on this!

On 11/10/20 1:45 AM, tsunakawa.takay@fujitsu.com wrote:

Hello,

The attached patch implements the new bulk insert routine for
postgres_fdw and the executor utilizing it. It passes make
check-world.

I haven't done any testing yet, just a quick review.

I see the patch builds the "bulk" query in execute_foreign_modify. IMO
that's something we should do earlier, when we're building the simple
query (for 1-row inserts). I'd understand if you were concerned about
overhead in case of 1-row inserts, trying to not plan the bulk query
until necessary, but I'm not sure this actually helps.

Or was the goal to build a query for every possible number of slots? I
don't think that's really useful, considering it requires deallocating
the old plan, preparing a new one, etc. IMO it should be sufficient to
have two queries - one for 1-row inserts, one for the full batch. The
last incomplete batch can be inserted using a loop of 1-row queries.

That's what my patch was doing, but I'm not insisting on that - it just
seems like a better approach to me. So feel free to argue why this is
better.

I measured performance in a basic non-partitioned case by modifying
Tomas-san's scripts. They perform an INSERT SELECT statement that
copies one million records. The table consists of two integer
columns, with a primary key on one of those them. You can run the
attached prepare.sql to set up once. local.sql inserts to the table
directly, while fdw.sql inserts through a foreign table.

The performance results, the average time of 5 runs, were as follows
on a Linux host where the average round-trip time of "ping localhost"
was 34 us:

master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw:
11.1 seconds (11x improvement)

Nice. I think we can't really get much closer to local master, so 6.1
vs. 11.1 seconds look quite acceptable.

The patch accumulates at most 100 records in ModifyTableState before
inserting in bulk. Also, when an input record is targeted for a
different relation (= partition) than that for already accumulated
records, insert the accumulated records and store the new record for
later insert.

[Issues]

1. Do we want a GUC parameter, say, max_bulk_insert_records =
(integer), to control the number of records inserted at once? The
range of allowed values would be between 1 and 1,000. 1 disables
bulk insert. The possible reason of the need for this kind of
parameter would be to limit the amount of memory used for accumulated
records, which could be prohibitively large if each record is big. I
don't think this is a must, but I think we can have it.

I think it'd be good to have such GUC, even if only for testing and
development. We should probably have a way to disable the batching,
which the GUC could also do, I think. So +1 to have the GUC.

2. Should we accumulate records per relation in ResultRelInfo
instead? That is, when inserting into a partitioned table that has
foreign partitions, delay insertion until a certain number of input
records accumulate, and then insert accumulated records per relation
(e.g., 50 records to relation A, 30 records to relation B, and 20
records to relation C.) If we do that,

I think there's a chunk of text missing here? If we do that, then what?

Anyway, I don't see why accumulating the records in ResultRelInfo would
be better than what the patch does now. It seems to me like fairly
specific to FDWs, so keeping it int FDW state seems appropriate. What
would be the advantage of stashing it in ResultRelInfo?

* The order of insertion differs from the order of input records. Is
it OK?

I think that's OK for most use cases, and if it's not (e.g. when there's
something requiring the exact order of writes) then it's not possible to
use batching. That's one of the reasons why I think we should have a GUC
to disable the batching.

* Should the maximum count of accumulated records be applied per
relation or the query? When many foreign partitions belong to a
partitioned table, if the former is chosen, it may use much memory in
total. If the latter is chosen, the records per relation could be
few and thus the benefit of bulk insert could be small.

I think it needs to be applied per relation, because that's the level at
which we can do it easily and consistently. The whole point is to send
data in sufficiently large chunks to minimize the communication overhead
(latency etc.), but if you enforce it "per query" that seems hard.

Imagine you're inserting data into a table with many partitions - how do
you pick the number of rows to accumulate? The table may have 10 or 1000
partitions, we may be inserting into all partitions or just a small
subset, not all partitions may be foreign, etc. It seems pretty
difficult to pick and enforce a reliable limit at the query level. But
maybe I'm missing something and it's easier than I think?

Of course, you're entirely correct enforcing this at the partition level
may require a lot of memory. Sadly, I don't see a way around that,
except for (a) disabling batching or (b) ordering the data to insert
data into one partition at a time.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#18)
Re: POC: postgres_fdw insert batching

On 11/10/20 4:05 PM, Tomas Vondra wrote:

Hi,

Thanks for working on this!

On 11/10/20 1:45 AM, tsunakawa.takay@fujitsu.com wrote:

Hello,

The attached patch implements the new bulk insert routine for
postgres_fdw and the executor utilizing it. It passes make
check-world.

I haven't done any testing yet, just a quick review.

I see the patch builds the "bulk" query in execute_foreign_modify. IMO
that's something we should do earlier, when we're building the simple
query (for 1-row inserts). I'd understand if you were concerned about
overhead in case of 1-row inserts, trying to not plan the bulk query
until necessary, but I'm not sure this actually helps.

Or was the goal to build a query for every possible number of slots? I
don't think that's really useful, considering it requires deallocating
the old plan, preparing a new one, etc. IMO it should be sufficient to
have two queries - one for 1-row inserts, one for the full batch. The
last incomplete batch can be inserted using a loop of 1-row queries.

That's what my patch was doing, but I'm not insisting on that - it just
seems like a better approach to me. So feel free to argue why this is
better.

I measured performance in a basic non-partitioned case by modifying
Tomas-san's scripts. They perform an INSERT SELECT statement that
copies one million records. The table consists of two integer
columns, with a primary key on one of those them. You can run the
attached prepare.sql to set up once. local.sql inserts to the table
directly, while fdw.sql inserts through a foreign table.

The performance results, the average time of 5 runs, were as follows
on a Linux host where the average round-trip time of "ping localhost"
was 34 us:

master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw:
11.1 seconds (11x improvement)

Nice. I think we can't really get much closer to local master, so 6.1
vs. 11.1 seconds look quite acceptable.

The patch accumulates at most 100 records in ModifyTableState before
inserting in bulk. Also, when an input record is targeted for a
different relation (= partition) than that for already accumulated
records, insert the accumulated records and store the new record for
later insert.

[Issues]

1. Do we want a GUC parameter, say, max_bulk_insert_records =
(integer), to control the number of records inserted at once? The
range of allowed values would be between 1 and 1,000. 1 disables
bulk insert. The possible reason of the need for this kind of
parameter would be to limit the amount of memory used for accumulated
records, which could be prohibitively large if each record is big. I
don't think this is a must, but I think we can have it.

I think it'd be good to have such GUC, even if only for testing and
development. We should probably have a way to disable the batching,
which the GUC could also do, I think. So +1 to have the GUC.

2. Should we accumulate records per relation in ResultRelInfo
instead? That is, when inserting into a partitioned table that has
foreign partitions, delay insertion until a certain number of input
records accumulate, and then insert accumulated records per relation
(e.g., 50 records to relation A, 30 records to relation B, and 20
records to relation C.) If we do that,

I think there's a chunk of text missing here? If we do that, then what?

Anyway, I don't see why accumulating the records in ResultRelInfo would
be better than what the patch does now. It seems to me like fairly
specific to FDWs, so keeping it int FDW state seems appropriate. What
would be the advantage of stashing it in ResultRelInfo?

* The order of insertion differs from the order of input records. Is
it OK?

I think that's OK for most use cases, and if it's not (e.g. when there's
something requiring the exact order of writes) then it's not possible to
use batching. That's one of the reasons why I think we should have a GUC
to disable the batching.

* Should the maximum count of accumulated records be applied per
relation or the query? When many foreign partitions belong to a
partitioned table, if the former is chosen, it may use much memory in
total. If the latter is chosen, the records per relation could be
few and thus the benefit of bulk insert could be small.

I think it needs to be applied per relation, because that's the level at
which we can do it easily and consistently. The whole point is to send
data in sufficiently large chunks to minimize the communication overhead
(latency etc.), but if you enforce it "per query" that seems hard.

Imagine you're inserting data into a table with many partitions - how do
you pick the number of rows to accumulate? The table may have 10 or 1000
partitions, we may be inserting into all partitions or just a small
subset, not all partitions may be foreign, etc. It seems pretty
difficult to pick and enforce a reliable limit at the query level. But
maybe I'm missing something and it's easier than I think?

Of course, you're entirely correct enforcing this at the partition level
may require a lot of memory. Sadly, I don't see a way around that,
except for (a) disabling batching or (b) ordering the data to insert
data into one partition at a time.

Two more comments regarding this:

1) If we want to be more strict about the memory consumption, we should
probably set the limit in terms of memory, not number of rows. Currently
the 100 rows may be 10kB or 10MB, there's no way to know. Of course,
this is not the only place with this issue.

2) I wonder what the COPY FROM patch [1]/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru does in this regard. I don't
have time to check right now, but I suggest we try to do the same thing,
if only to be consistent.

[1]: /messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru
/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#18)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

I see the patch builds the "bulk" query in execute_foreign_modify. IMO
that's something we should do earlier, when we're building the simple
query (for 1-row inserts). I'd understand if you were concerned about
overhead in case of 1-row inserts, trying to not plan the bulk query
until necessary, but I'm not sure this actually helps.

Or was the goal to build a query for every possible number of slots? I
don't think that's really useful, considering it requires deallocating
the old plan, preparing a new one, etc. IMO it should be sufficient to
have two queries - one for 1-row inserts, one for the full batch. The
last incomplete batch can be inserted using a loop of 1-row queries.

That's what my patch was doing, but I'm not insisting on that - it just
seems like a better approach to me. So feel free to argue why this is
better.

Don't be concerned, the processing is not changed for 1-row inserts: the INSERT query string is built in PlanForeignModify(), and the remote statement is prepared in execute_foreign_modify() during the first call to ExecForeignInsert() and it's reused for subsequent ExecForeignInsert() calls.

The re-creation of INSERT query string and its corresponding PREPARE happen when the number of tuples to be inserted is different from the previous call to ExecForeignInsert()/ExecForeignBulkInsert(). That's because we don't know how many tuples will be inserted during planning (PlanForeignModify) or execution (until the scan ends for SELECT). For example, if we insert 10,030 rows with the bulk size 100, the flow is:

PlanForeignModify():
build the INSERT query string for 1 row
ExecForeignBulkInsert(100):
drop the INSERT query string and prepared statement for 1 row
build the query string and prepare statement for 100 row INSERT
execute it
ExecForeignBulkInsert(100):
reuse the prepared statement for 100 row INSERT and execute it
...
ExecForeignBulkInsert(30):
drop the INSERT query string and prepared statement for 100 row
build the query string and prepare statement for 30 row INSERT
execute it

I think it'd be good to have such GUC, even if only for testing and
development. We should probably have a way to disable the batching,
which the GUC could also do, I think. So +1 to have the GUC.

OK, I'll add it. The name would be max_bulk_insert_tuples, because a) it might cover bulk insert for local relations in the future, and b) "tuple" is used in cpu_(index_)tuple_cost and parallel_tuple_cost, while "row" or "record" is not used in GUC (except for row_security).

The valid range would be between 1 and 1,000 (I said 10,000 previously, but I think it's overreaction and am a bit worried about unforseen trouble too many tuples might cause.) 1 disables the bulk processing and uses the traditonal ExecForeignInsert(). The default value is 100 (would 1 be sensible as a default value to avoid surprising users by increased memory usage?)

2. Should we accumulate records per relation in ResultRelInfo
instead? That is, when inserting into a partitioned table that has
foreign partitions, delay insertion until a certain number of input
records accumulate, and then insert accumulated records per relation
(e.g., 50 records to relation A, 30 records to relation B, and 20
records to relation C.) If we do that,

I think there's a chunk of text missing here? If we do that, then what?

Sorry, the two bullets below there are what follows. Perhaps I should have written ":" instead of ",".

Anyway, I don't see why accumulating the records in ResultRelInfo would
be better than what the patch does now. It seems to me like fairly
specific to FDWs, so keeping it int FDW state seems appropriate. What
would be the advantage of stashing it in ResultRelInfo?

I thought of distributing input records to their corresponding partitions' ResultRelInfos. For example, input record for partition 1 comes, store it in the ResultRelInfo for partition 1, then input record for partition 2 comes, store it in the ResultRelInfo for partition 2. When a ResultRelInfo accumulates some number of rows, insert the accumulated rows therein into the partition. When the input endds, perform bulk inserts for ResultRelInfos that have accumulated rows.

I think that's OK for most use cases, and if it's not (e.g. when there's
something requiring the exact order of writes) then it's not possible to
use batching. That's one of the reasons why I think we should have a GUC
to disable the batching.

Agreed.

* Should the maximum count of accumulated records be applied per
relation or the query? When many foreign partitions belong to a
partitioned table, if the former is chosen, it may use much memory in
total. If the latter is chosen, the records per relation could be
few and thus the benefit of bulk insert could be small.

I think it needs to be applied per relation, because that's the level at
which we can do it easily and consistently. The whole point is to send
data in sufficiently large chunks to minimize the communication overhead
(latency etc.), but if you enforce it "per query" that seems hard.

Imagine you're inserting data into a table with many partitions - how do
you pick the number of rows to accumulate? The table may have 10 or 1000
partitions, we may be inserting into all partitions or just a small
subset, not all partitions may be foreign, etc. It seems pretty
difficult to pick and enforce a reliable limit at the query level. But
maybe I'm missing something and it's easier than I think?

Of course, you're entirely correct enforcing this at the partition level
may require a lot of memory. Sadly, I don't see a way around that,
except for (a) disabling batching or (b) ordering the data to insert
data into one partition at a time.

OK, I think I'll try doing like that, after waiting for other opinions some days.

Two more comments regarding this:

1) If we want to be more strict about the memory consumption, we should
probably set the limit in terms of memory, not number of rows. Currently
the 100 rows may be 10kB or 10MB, there's no way to know. Of course,
this is not the only place with this issue.

2) I wonder what the COPY FROM patch [1] does in this regard. I don't
have time to check right now, but I suggest we try to do the same thing,
if only to be consistent.

[1]
/messages/by-id/3d0909dc-3691-a576-208a-909
86e55489f%40postgrespro.ru

That COPY FROM patch uses the tuple accumulation mechanism for local tables as-is. That is, it accumulates at most 1,000 tuples per partition.

/*
* No more than this many tuples per CopyMultiInsertBuffer
*
* Caution: Don't make this too big, as we could end up with this many
* CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
* multiInsertBuffers list. Increasing this can cause quadratic growth in
* memory requirements during copies into partitioned tables with a large
* number of partitions.
*/
#define MAX_BUFFERED_TUPLES 1000

Regards
Takayuki Tsunakawa

#21Noname
Tim.Colles@ed.ac.uk
In reply to: tsunakawa.takay@fujitsu.com (#20)
RE: POC: postgres_fdw insert batching

On Wed, 11 Nov 2020, tsunakawa.takay@fujitsu.com wrote:

This email was sent to you by someone outside of the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

I see the patch builds the "bulk" query in execute_foreign_modify. IMO
that's something we should do earlier, when we're building the simple
query (for 1-row inserts). I'd understand if you were concerned about
overhead in case of 1-row inserts, trying to not plan the bulk query
until necessary, but I'm not sure this actually helps.

Or was the goal to build a query for every possible number of slots? I
don't think that's really useful, considering it requires deallocating
the old plan, preparing a new one, etc. IMO it should be sufficient to
have two queries - one for 1-row inserts, one for the full batch. The
last incomplete batch can be inserted using a loop of 1-row queries.

That's what my patch was doing, but I'm not insisting on that - it just
seems like a better approach to me. So feel free to argue why this is
better.

Don't be concerned, the processing is not changed for 1-row inserts: the INSERT query string is built in PlanForeignModify(), and the remote statement is prepared in execute_foreign_modify() during the first call to ExecForeignInsert() and it's reused for subsequent ExecForeignInsert() calls.

The re-creation of INSERT query string and its corresponding PREPARE happen when the number of tuples to be inserted is different from the previous call to ExecForeignInsert()/ExecForeignBulkInsert(). That's because we don't know how many tuples will be inserted during planning (PlanForeignModify) or execution (until the scan ends for SELECT). For example, if we insert 10,030 rows with the bulk size 100, the flow is:

PlanForeignModify():
build the INSERT query string for 1 row
ExecForeignBulkInsert(100):
drop the INSERT query string and prepared statement for 1 row
build the query string and prepare statement for 100 row INSERT
execute it
ExecForeignBulkInsert(100):
reuse the prepared statement for 100 row INSERT and execute it
...
ExecForeignBulkInsert(30):
drop the INSERT query string and prepared statement for 100 row
build the query string and prepare statement for 30 row INSERT
execute it

I think it'd be good to have such GUC, even if only for testing and
development. We should probably have a way to disable the batching,
which the GUC could also do, I think. So +1 to have the GUC.

OK, I'll add it. The name would be max_bulk_insert_tuples, because a) it might cover bulk insert for local relations in the future, and b) "tuple" is used in cpu_(index_)tuple_cost and parallel_tuple_cost, while "row" or "record" is not used in GUC (except for row_security).

The valid range would be between 1 and 1,000 (I said 10,000 previously, but I think it's overreaction and am a bit worried about unforseen trouble too many tuples might cause.) 1 disables the bulk processing and uses the traditonal ExecForeignInsert(). The default value is 100 (would 1 be sensible as a default value to avoid surprising users by increased memory usage?)

2. Should we accumulate records per relation in ResultRelInfo
instead? That is, when inserting into a partitioned table that has
foreign partitions, delay insertion until a certain number of input
records accumulate, and then insert accumulated records per relation
(e.g., 50 records to relation A, 30 records to relation B, and 20
records to relation C.) If we do that,

I think there's a chunk of text missing here? If we do that, then what?

Sorry, the two bullets below there are what follows. Perhaps I should have written ":" instead of ",".

Anyway, I don't see why accumulating the records in ResultRelInfo would
be better than what the patch does now. It seems to me like fairly
specific to FDWs, so keeping it int FDW state seems appropriate. What
would be the advantage of stashing it in ResultRelInfo?

I thought of distributing input records to their corresponding partitions' ResultRelInfos. For example, input record for partition 1 comes, store it in the ResultRelInfo for partition 1, then input record for partition 2 comes, store it in the ResultRelInfo for partition 2. When a ResultRelInfo accumulates some number of rows, insert the accumulated rows therein into the partition. When the input endds, perform bulk inserts for ResultRelInfos that have accumulated rows.

I think that's OK for most use cases, and if it's not (e.g. when there's
something requiring the exact order of writes) then it's not possible to
use batching. That's one of the reasons why I think we should have a GUC
to disable the batching.

Agreed.

* Should the maximum count of accumulated records be applied per
relation or the query? When many foreign partitions belong to a
partitioned table, if the former is chosen, it may use much memory in
total. If the latter is chosen, the records per relation could be
few and thus the benefit of bulk insert could be small.

I think it needs to be applied per relation, because that's the level at
which we can do it easily and consistently. The whole point is to send
data in sufficiently large chunks to minimize the communication overhead
(latency etc.), but if you enforce it "per query" that seems hard.

Imagine you're inserting data into a table with many partitions - how do
you pick the number of rows to accumulate? The table may have 10 or 1000
partitions, we may be inserting into all partitions or just a small
subset, not all partitions may be foreign, etc. It seems pretty
difficult to pick and enforce a reliable limit at the query level. But
maybe I'm missing something and it's easier than I think?

Of course, you're entirely correct enforcing this at the partition level
may require a lot of memory. Sadly, I don't see a way around that,
except for (a) disabling batching or (b) ordering the data to insert
data into one partition at a time.

OK, I think I'll try doing like that, after waiting for other opinions some days.

Two more comments regarding this:

1) If we want to be more strict about the memory consumption, we should
probably set the limit in terms of memory, not number of rows. Currently
the 100 rows may be 10kB or 10MB, there's no way to know. Of course,
this is not the only place with this issue.

2) I wonder what the COPY FROM patch [1] does in this regard. I don't
have time to check right now, but I suggest we try to do the same thing,
if only to be consistent.

[1]
/messages/by-id/3d0909dc-3691-a576-208a-909
86e55489f%40postgrespro.ru

That COPY FROM patch uses the tuple accumulation mechanism for local tables as-is. That is, it accumulates at most 1,000 tuples per partition.

/*
* No more than this many tuples per CopyMultiInsertBuffer
*
* Caution: Don't make this too big, as we could end up with this many
* CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
* multiInsertBuffers list. Increasing this can cause quadratic growth in
* memory requirements during copies into partitioned tables with a large
* number of partitions.
*/
#define MAX_BUFFERED_TUPLES 1000

Regards
Takayuki Tsunakawa

Does this patch affect trigger semantics on the base table?

At the moment when I insert 1000 rows into a postgres_fdw table using a
single insert statement (e.g. INSERT INTO fdw_foo SELECT ... FROM bar) I
naively expect a "statement level" trigger on the base table to trigger
once. But this is not the case. The postgres_fdw implements this
operation as 1000 separate insert statements on the base table, so the
trigger happens 1000 times instead of once. Hence there is no
distinction between using a statement level and a row level trigger on
the base table in this context.

So would this patch change the behaviour so only 10 separate insert
statements (each of 100 rows) would be made against the base table?
If so thats useful as it means improving performance using statement
level triggers becomes possible. But it would also result in more
obscure semantics and might break user processes dependent on the
existing behaviour after the patch is applied.

BTW is this subtlety documented, I haven't found anything but happy
to be proved wrong?

Tim

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

#22tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Noname (#21)
RE: POC: postgres_fdw insert batching

From: timc@corona.is.ed.ac.uk <timc@corona.is.ed.ac.uk> On Behalf Of

Does this patch affect trigger semantics on the base table?

At the moment when I insert 1000 rows into a postgres_fdw table using a
single insert statement (e.g. INSERT INTO fdw_foo SELECT ... FROM bar) I
naively expect a "statement level" trigger on the base table to trigger
once. But this is not the case. The postgres_fdw implements this
operation as 1000 separate insert statements on the base table, so the
trigger happens 1000 times instead of once. Hence there is no
distinction between using a statement level and a row level trigger on
the base table in this context.

So would this patch change the behaviour so only 10 separate insert
statements (each of 100 rows) would be made against the base table?
If so thats useful as it means improving performance using statement
level triggers becomes possible. But it would also result in more
obscure semantics and might break user processes dependent on the
existing behaviour after the patch is applied.

Yes, the times the statement trigger defined on the base (remote) table will be reduced, as you said.

BTW is this subtlety documented, I haven't found anything but happy
to be proved wrong?

Unfortunately, there doesn't seem to be any description on triggers on base tables. For example, if the local foreign table has an AFTER ROW trigger and its remote base table has a BEFORE ROW trigger that modifies the input record, it seems that the AFTER ROW trigger doesn't see the modified record.

Regards
Takayuki Tsunakawa

#23tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: tsunakawa.takay@fujitsu.com (#20)
6 attachment(s)
RE: POC: postgres_fdw insert batching

Hello,

Modified the patch as I talked with Tomas-san. The performance results of loading one million records into a hash-partitioned table with 8 partitions are as follows:

unpatched, local: 8.6 seconds
unpatched, fdw: 113.7 seconds
patched, fdw: 12.5 seconds (9x improvement)

The test scripts are also attached. Run prepare.sql once to set up tables and source data. Run local_part.sql and fdw_part.sql to load source data into a partitioned table with local partitions and a partitioned table with foreign tables respectively.

Regards
Takayuki Tsunakawa

Attachments:

fdw.sqlapplication/octet-stream; name=fdw.sqlDownload
fdw_part.sqlapplication/octet-stream; name=fdw_part.sqlDownload
local.sqlapplication/octet-stream; name=local.sqlDownload
local_part.sqlapplication/octet-stream; name=local_part.sqlDownload
prepare.sqlapplication/octet-stream; name=prepare.sqlDownload
v2-0001-Add-bulk-insert-for-foreign-tables.patchapplication/octet-stream; name=v2-0001-Add-bulk-insert-for-foreign-tables.patchDownload
From 9d386f450f5771325b971a338c99153d84351aad Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 10 Nov 2020 09:27:56 +0900
Subject: [PATCH v2] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c                |   3 +-
 contrib/postgres_fdw/postgres_fdw.c           | 233 ++++++++++++++++++++------
 contrib/postgres_fdw/postgres_fdw.h           |   2 +-
 doc/src/sgml/config.sgml                      |  21 +++
 doc/src/sgml/fdwhandler.sgml                  |  64 ++++++-
 src/backend/executor/nodeModifyTable.c        | 151 +++++++++++++++++
 src/backend/utils/misc/guc.c                  |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/executor/nodeModifyTable.h        |   2 +
 src/include/foreign/fdwapi.h                  |   7 +
 src/include/nodes/execnodes.h                 |   5 +
 11 files changed, 446 insertions(+), 55 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df1..5aa81db 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1706,7 +1706,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1749,6 +1749,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaac..f7be4be 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -86,8 +86,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -95,6 +97,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -175,7 +179,9 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			len;			/* length of some part of query */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -184,6 +190,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* bulk operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -342,6 +351,11 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBulkInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -428,20 +442,23 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
-static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void finish_foreign_modify(PgFdwModifyState *fmstate, bool release_conn);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -529,6 +546,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBulkInsert = postgresExecForeignBulkInsert;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1663,7 +1681,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *withCheckOptionList = NIL;
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
+	List	   *retvalList;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1751,7 +1771,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1775,10 +1795,12 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	retvalList = list_make4(makeString(sql.data),
 					  targetAttrs,
-					  makeInteger((retrieved_attrs != NIL)),
-					  retrieved_attrs);
+					  makeInteger(values_end_len),
+					  makeInteger((retrieved_attrs != NIL)));
+	retvalList = lappend(retvalList, retrieved_attrs);
+	return retvalList;
 }
 
 /*
@@ -1796,6 +1818,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1811,6 +1834,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1828,6 +1853,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1845,7 +1871,37 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBulkInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBulkInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1854,7 +1910,7 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1872,8 +1928,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1886,8 +1947,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1905,7 +1971,7 @@ postgresEndForeignModify(EState *estate,
 		return;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, true);
 }
 
 /*
@@ -1924,6 +1990,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2000,7 +2067,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2010,6 +2077,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2048,7 +2116,7 @@ postgresEndForeignInsert(EState *estate,
 		fmstate = fmstate->aux_fmstate;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, true);
 }
 
 /*
@@ -3538,6 +3606,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int len,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3572,7 +3641,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->len = len;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3624,6 +3696,8 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3634,26 +3708,75 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBulkInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	int			i, j;
+	int			pindex;
+	bool		first;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			finish_foreign_modify(fmstate, false);
+
+		/*
+		 * Recreate INSERT command string with numSlots records in its
+		 * VALUES clause
+		 */
+
+		/* Copy up to the end of the first record from the original query */
+		initStringInfo(&sql);
+		appendBinaryStringInfo(&sql, fmstate->orig_query, fmstate->len);
+
+		/* Add records to VALUES clause */
+		pindex = fmstate->p_nums + 1;
+		for (i = 0; i < *numSlots - 1; i++)
+		{
+			appendStringInfoString(&sql, ", (");
+
+			first = true;
+			for (j = 0; j < fmstate->p_nums; j++)
+			{
+				if (!first)
+					appendStringInfoString(&sql, ", ");
+				first = false;
+
+				appendStringInfo(&sql, "$%d", pindex);
+				pindex++;
+			}
+
+			appendStringInfoChar(&sql, ')');
+		}
+
+		/* Copy stuff after VALUES clause from the original query */
+		appendStringInfoString(&sql, fmstate->orig_query + fmstate->len);
+
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3666,7 +3789,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3676,14 +3799,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3704,9 +3827,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3716,10 +3840,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3779,19 +3905,23 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
@@ -3799,32 +3929,37 @@ convert_prep_stmt_params(PgFdwModifyState *fmstate,
 	}
 
 	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3873,7 +4008,8 @@ store_returning_result(PgFdwModifyState *fmstate,
  *		Release resources for a foreign insert/update/delete operation
  */
 static void
-finish_foreign_modify(PgFdwModifyState *fmstate)
+finish_foreign_modify(PgFdwModifyState *fmstate,
+	bool release_conn)
 {
 	Assert(fmstate != NULL);
 
@@ -3897,8 +4033,11 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	}
 
 	/* Release remote connection */
-	ReleaseConnection(fmstate->conn);
-	fmstate->conn = NULL;
+	if (release_conn)
+	{
+		ReleaseConnection(fmstate->conn);
+		fmstate->conn = NULL;
+	}
 }
 
 /*
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410d..459a9ca 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f043433..e14c2b8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8530,6 +8530,27 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-bulk-insert-tuples" xreflabel="max_bulk_insert_tuples">
+      <term><varname>max_bulk_insert_tuples</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_bulk_insert_tuples</varname></primary>
+       <secondary>configuration parameter</secondary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the maximum number of tuples to accumulate and insert in bulk
+        into a foreign table. This applies to each partition when the insert
+        target is a partitioned table.
+        The valid range is <literal>1</literal>, which disables bulk insert,
+        to <literal>1000</literal>.
+        This takes effect only if the foreign data wrapper supports
+        bulk insert.
+        The default is <literal>100</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
      <sect2 id="runtime-config-client-format">
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c92934..cdf2959 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBulkInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,56 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBulkInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBulkInsert</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +792,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBulkInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +825,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBulkInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 29e07b7..9c46036 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,15 @@
 #include "utils/rel.h"
 
 
+int max_bulk_insert_tuples;
+
+static void ExecBulkInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +398,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -442,6 +452,56 @@ ExecInsert(ModifyTableState *mtstate,
 									   CMD_INSERT);
 
 		/*
+		 * If the FDW supports bulk insert, accumulate tuples and insert them
+		 * in bulk
+		 */
+		if (max_bulk_insert_tuples > 1 &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert &&
+			resultRelInfo->ri_projectReturning == NULL)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or	a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the bulk insert
+			 */
+			if (resultRelInfo->ri_NumSlots == max_bulk_insert_tuples)
+			{
+				ExecBulkInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   max_bulk_insert_tuples);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   max_bulk_insert_tuples);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+			return NULL;
+		}
+
+		/*
 		 * insert into foreign table: let the FDW do it
 		 */
 		slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
@@ -702,6 +762,73 @@ ExecInsert(ModifyTableState *mtstate,
 }
 
 /* ----------------------------------------------------------------
+ *		ExecBulkInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBulkInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+	{
+		estate->es_processed += numInserted;
+		setLastTid(&slot->tts_tid);
+	}
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
+/* ----------------------------------------------------------------
  *		ExecDelete
  *
  *		DELETE is like UPDATE, except that we delete the tuple and no
@@ -1940,6 +2067,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	ResultRelInfo **resultRelInfos;
+	int			num_partitions;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2156,6 +2286,27 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
+	 * Insert remaining tuples for bulk insert.
+	 */
+	if (proute)
+		resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	else
+	{
+		resultRelInfos = &resultRelInfo;
+		num_partitions = 1;
+	}
+	for (int i = 0; i < num_partitions; i++)
+	{
+		resultRelInfo = resultRelInfos[i];
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBulkInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
+	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
 	fireASTriggers(node);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64e..7168295 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -47,6 +47,7 @@
 #include "commands/vacuum.h"
 #include "commands/variable.h"
 #include "common/string.h"
+#include "executor/nodeModifyTable.h"
 #include "funcapi.h"
 #include "jit/jit.h"
 #include "libpq/auth.h"
@@ -3378,6 +3379,17 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"max_bulk_insert_tuples", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the maximum number of tuples to insert in bulk into a foreign table."),
+			NULL,
+			0
+		},
+		&max_bulk_insert_tuples,
+		100, 1, 1000,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"tcp_user_timeout", PGC_USERSET, CLIENT_CONN_OTHER,
 			gettext_noop("TCP user timeout."),
 			gettext_noop("A value of 0 uses the system default."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f..9016f1f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -676,6 +676,7 @@
 #xmloption = 'content'
 #gin_fuzzy_search_limit = 0
 #gin_pending_list_limit = 4MB
+#max_bulk_insert_tuples = 100
 
 # - Locale and Formatting -
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 46a2dc9..f082792 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,6 +15,8 @@
 
 #include "nodes/execnodes.h"
 
+extern int max_bulk_insert_tuples;
+
 extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
 									   EState *estate, TupleTableSlot *slot,
 									   CmdType cmdtype);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556df..c7eeff2 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,12 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBulkInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +215,7 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBulkInsert_function ExecForeignBulkInsert;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d6..3d67ded 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,11 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* bulk insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	TupleTableSlot **ri_Slots;		/* input tuples for bulk insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
-- 
2.10.1

#24Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#23)
3 attachment(s)
Re: POC: postgres_fdw insert batching

On 11/17/20 10:11 AM, tsunakawa.takay@fujitsu.com wrote:

Hello,

Modified the patch as I talked with Tomas-san. The performance
results of loading one million records into a hash-partitioned table
with 8 partitions are as follows:

unpatched, local: 8.6 seconds unpatched, fdw: 113.7 seconds patched,
fdw: 12.5 seconds (9x improvement)

The test scripts are also attached. Run prepare.sql once to set up
tables and source data. Run local_part.sql and fdw_part.sql to load
source data into a partitioned table with local partitions and a
partitioned table with foreign tables respectively.

Unfortunately, this does not compile for me, because nodeModifyTable
calls ExecGetTouchedPartitions, which is not defined anywhere. Not sure
what's that about, so I simply commented-out this. That probably fails
the partitioned cases, but it allowed me to do some review and testing.

As for the patch, I have a couple of comments

1) As I mentioned before, I really don't think we should be doing
deparsing in execute_foreign_modify - that's something that should
happen earlier, and should be in a deparse.c function.

2) I think the GUC should be replaced with an server/table option,
similar to fetch_size.

The attached patch tries to address both of these points.

Firstly, it adds a new deparseBulkInsertSql function, that builds a
query for the "full" batch, and then uses those two queries - when we
get a full batch we use the bulk query, otherwise we use the single-row
query in a loop. IMO this is cleaner than deparsing queries ad hoc in
the execute_foreign_modify.

Of course, this might be worse when we don't have a full batch, e.g. for
a query that insert only 50 rows with batch_size=100. If this case is
common, one option would be lowering the batch_size accordingly. If we
really want to improve this case too, I suggest we pass more info than
just a position of the VALUES clause - that seems a bit too hackish.

Secondly, it adds the batch_size option to server/foreign table, and
uses that. This is not complete, though. postgresPlanForeignModify
currently passes a hard-coded value at the moment, it needs to lookup
the correct value for the server/table from RelOptInfo or something. And
I suppose ModifyTable inftractructure will need to determine the value
in order to pass the correct number of slots to the FDW API.

The are a couple other smaller changes. E.g. it undoes changes to
finish_foreign_modify, and instead calls separate functions to prepare
the bulk statement. It also adds list_make5/list_make6 macros, so as to
not have to do strange stuff with the parameter lists.

A finally, this should probably add a bunch of regression tests.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v3-0001-Add-bulk-insert-for-foreign-tables.patchtext/x-patch; charset=UTF-8; name=v3-0001-Add-bulk-insert-for-foreign-tables.patchDownload
From 6a7031c800dff8fff9e1e64e0278494f3acd686f Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 10 Nov 2020 09:27:56 +0900
Subject: [PATCH 1/3] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c                |   3 +-
 contrib/postgres_fdw/postgres_fdw.c           | 233 ++++++++++++++----
 contrib/postgres_fdw/postgres_fdw.h           |   2 +-
 doc/src/sgml/config.sgml                      |  21 ++
 doc/src/sgml/fdwhandler.sgml                  |  64 ++++-
 src/backend/executor/nodeModifyTable.c        | 151 ++++++++++++
 src/backend/utils/misc/guc.c                  |  12 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/executor/nodeModifyTable.h        |   2 +
 src/include/foreign/fdwapi.h                  |   7 +
 src/include/nodes/execnodes.h                 |   5 +
 11 files changed, 446 insertions(+), 55 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df19fe..5aa81db08e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1706,7 +1706,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1749,6 +1749,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..f7be4bec17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -86,8 +86,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -95,6 +97,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -175,7 +179,9 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			len;			/* length of some part of query */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -184,6 +190,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* bulk operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -342,6 +351,11 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBulkInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -428,20 +442,23 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
-static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void finish_foreign_modify(PgFdwModifyState *fmstate, bool release_conn);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -529,6 +546,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBulkInsert = postgresExecForeignBulkInsert;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1663,7 +1681,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *withCheckOptionList = NIL;
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
+	List	   *retvalList;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1751,7 +1771,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1775,10 +1795,12 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	retvalList = list_make4(makeString(sql.data),
 					  targetAttrs,
-					  makeInteger((retrieved_attrs != NIL)),
-					  retrieved_attrs);
+					  makeInteger(values_end_len),
+					  makeInteger((retrieved_attrs != NIL)));
+	retvalList = lappend(retvalList, retrieved_attrs);
+	return retvalList;
 }
 
 /*
@@ -1796,6 +1818,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1811,6 +1834,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1828,6 +1853,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1845,7 +1871,37 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBulkInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBulkInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1854,7 +1910,7 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1872,8 +1928,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1886,8 +1947,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1905,7 +1971,7 @@ postgresEndForeignModify(EState *estate,
 		return;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, true);
 }
 
 /*
@@ -1924,6 +1990,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2000,7 +2067,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2010,6 +2077,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2048,7 +2116,7 @@ postgresEndForeignInsert(EState *estate,
 		fmstate = fmstate->aux_fmstate;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate);
+	finish_foreign_modify(fmstate, true);
 }
 
 /*
@@ -3538,6 +3606,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int len,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3572,7 +3641,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->len = len;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3624,6 +3696,8 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3634,26 +3708,75 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBulkInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	int			i, j;
+	int			pindex;
+	bool		first;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			finish_foreign_modify(fmstate, false);
+
+		/*
+		 * Recreate INSERT command string with numSlots records in its
+		 * VALUES clause
+		 */
+
+		/* Copy up to the end of the first record from the original query */
+		initStringInfo(&sql);
+		appendBinaryStringInfo(&sql, fmstate->orig_query, fmstate->len);
+
+		/* Add records to VALUES clause */
+		pindex = fmstate->p_nums + 1;
+		for (i = 0; i < *numSlots - 1; i++)
+		{
+			appendStringInfoString(&sql, ", (");
+
+			first = true;
+			for (j = 0; j < fmstate->p_nums; j++)
+			{
+				if (!first)
+					appendStringInfoString(&sql, ", ");
+				first = false;
+
+				appendStringInfo(&sql, "$%d", pindex);
+				pindex++;
+			}
+
+			appendStringInfoChar(&sql, ')');
+		}
+
+		/* Copy stuff after VALUES clause from the original query */
+		appendStringInfoString(&sql, fmstate->orig_query + fmstate->len);
+
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3666,7 +3789,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3676,14 +3799,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3704,9 +3827,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3716,10 +3840,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3779,19 +3905,23 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
@@ -3799,32 +3929,37 @@ convert_prep_stmt_params(PgFdwModifyState *fmstate,
 	}
 
 	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3873,7 +4008,8 @@ store_returning_result(PgFdwModifyState *fmstate,
  *		Release resources for a foreign insert/update/delete operation
  */
 static void
-finish_foreign_modify(PgFdwModifyState *fmstate)
+finish_foreign_modify(PgFdwModifyState *fmstate,
+	bool release_conn)
 {
 	Assert(fmstate != NULL);
 
@@ -3897,8 +4033,11 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	}
 
 	/* Release remote connection */
-	ReleaseConnection(fmstate->conn);
-	fmstate->conn = NULL;
+	if (release_conn)
+	{
+		ReleaseConnection(fmstate->conn);
+		fmstate->conn = NULL;
+	}
 }
 
 /*
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..459a9ca6ab 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,7 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a632cf98ba..51bfe445b0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8533,6 +8533,27 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-max-bulk-insert-tuples" xreflabel="max_bulk_insert_tuples">
+      <term><varname>max_bulk_insert_tuples</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>max_bulk_insert_tuples</varname></primary>
+       <secondary>configuration parameter</secondary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Sets the maximum number of tuples to accumulate and insert in bulk
+        into a foreign table. This applies to each partition when the insert
+        target is a partitioned table.
+        The valid range is <literal>1</literal>, which disables bulk insert,
+        to <literal>1000</literal>.
+        This takes effect only if the foreign data wrapper supports
+        bulk insert.
+        The default is <literal>100</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
      <sect2 id="runtime-config-client-format">
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..cdf29595d7 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBulkInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,56 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBulkInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBulkInsert</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +792,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBulkInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +825,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBulkInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 29e07b7228..9c46036127 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,15 @@
 #include "utils/rel.h"
 
 
+int max_bulk_insert_tuples;
+
+static void ExecBulkInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +398,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -441,6 +451,56 @@ ExecInsert(ModifyTableState *mtstate,
 			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
 									   CMD_INSERT);
 
+		/*
+		 * If the FDW supports bulk insert, accumulate tuples and insert them
+		 * in bulk
+		 */
+		if (max_bulk_insert_tuples > 1 &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert &&
+			resultRelInfo->ri_projectReturning == NULL)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or	a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the bulk insert
+			 */
+			if (resultRelInfo->ri_NumSlots == max_bulk_insert_tuples)
+			{
+				ExecBulkInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   max_bulk_insert_tuples);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   max_bulk_insert_tuples);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+			return NULL;
+		}
+
 		/*
 		 * insert into foreign table: let the FDW do it
 		 */
@@ -701,6 +761,73 @@ ExecInsert(ModifyTableState *mtstate,
 	return result;
 }
 
+/* ----------------------------------------------------------------
+ *		ExecBulkInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBulkInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+	{
+		estate->es_processed += numInserted;
+		setLastTid(&slot->tts_tid);
+	}
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecDelete
  *
@@ -1940,6 +2067,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	ResultRelInfo **resultRelInfos;
+	int			num_partitions;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2155,6 +2285,27 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	/*
+	 * Insert remaining tuples for bulk insert.
+	 */
+	if (proute)
+		resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	else
+	{
+		resultRelInfos = &resultRelInfo;
+		num_partitions = 1;
+	}
+	for (int i = 0; i < num_partitions; i++)
+	{
+		resultRelInfo = resultRelInfos[i];
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBulkInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bb34630e8e..fdffbf97c0 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -47,6 +47,7 @@
 #include "commands/vacuum.h"
 #include "commands/variable.h"
 #include "common/string.h"
+#include "executor/nodeModifyTable.h"
 #include "funcapi.h"
 #include "jit/jit.h"
 #include "libpq/auth.h"
@@ -3377,6 +3378,17 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"max_bulk_insert_tuples", PGC_USERSET, CLIENT_CONN_STATEMENT,
+			gettext_noop("Sets the maximum number of tuples to insert in bulk into a foreign table."),
+			NULL,
+			0
+		},
+		&max_bulk_insert_tuples,
+		100, 1, 1000,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"tcp_user_timeout", PGC_USERSET, CLIENT_CONN_OTHER,
 			gettext_noop("TCP user timeout."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9cb571f7cc..9016f1f7bd 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -676,6 +676,7 @@
 #xmloption = 'content'
 #gin_fuzzy_search_limit = 0
 #gin_pending_list_limit = 4MB
+#max_bulk_insert_tuples = 100
 
 # - Locale and Formatting -
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 46a2dc9511..f082792a47 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,6 +15,8 @@
 
 #include "nodes/execnodes.h"
 
+extern int max_bulk_insert_tuples;
+
 extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
 									   EState *estate, TupleTableSlot *slot,
 									   CmdType cmdtype);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..c7eeff2257 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,12 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBulkInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +215,7 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBulkInsert_function ExecForeignBulkInsert;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d68d6..3d67ded2ca 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,11 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* bulk insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	TupleTableSlot **ri_Slots;		/* input tuples for bulk insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
-- 
2.26.2

v3-0002-make-it-compile.patchtext/x-patch; charset=UTF-8; name=v3-0002-make-it-compile.patchDownload
From 845844a60936f750d77b02fce696fbb7179a1984 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Wed, 18 Nov 2020 01:15:49 +0100
Subject: [PATCH 2/3] make it compile

---
 src/backend/executor/nodeModifyTable.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c46036127..e20c613fed 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2289,7 +2289,9 @@ ExecModifyTable(PlanState *pstate)
 	 * Insert remaining tuples for bulk insert.
 	 */
 	if (proute)
-		resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	{
+		// resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	}
 	else
 	{
 		resultRelInfos = &resultRelInfo;
-- 
2.26.2

v3-0003-reworks.patchtext/x-patch; charset=UTF-8; name=v3-0003-reworks.patchDownload
From 306c05fdc7ab7181a63b583095b42f1ddc9e6b05 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Wed, 18 Nov 2020 01:08:44 +0100
Subject: [PATCH 3/3] reworks

---
 contrib/postgres_fdw/deparse.c         |  78 ++++-
 contrib/postgres_fdw/option.c          |  14 +
 contrib/postgres_fdw/postgres_fdw.c    | 389 ++++++++++++++++---------
 contrib/postgres_fdw/postgres_fdw.h    |   8 +-
 doc/src/sgml/config.sgml               |  21 --
 src/backend/executor/nodeModifyTable.c |   5 +-
 src/backend/nodes/list.c               |  32 ++
 src/include/nodes/pg_list.h            |  30 ++
 8 files changed, 415 insertions(+), 162 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 5aa81db08e..3d08aae987 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1706,7 +1706,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs, int *values_end_len)
+				 List **retrieved_attrs)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1749,7 +1749,81 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
-	*values_end_len = buf->len;
+
+	if (doNothing)
+		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
+
+	deparseReturningList(buf, rte, rtindex, rel,
+						 rel->trigdesc && rel->trigdesc->trig_insert_after_row,
+						 withCheckOptionList, returningList, retrieved_attrs);
+}
+
+/*
+ * deparse remote bulk INSERT statement
+ *
+ * The statement text is appended to buf, and we also create an integer List
+ * of the columns being retrieved by WITH CHECK OPTION or RETURNING (if any),
+ * which is returned to *retrieved_attrs.
+ */
+void
+deparseBulkInsertSql(StringInfo buf, RangeTblEntry *rte,
+					 Index rtindex, Relation rel,
+					 List *targetAttrs, bool doNothing,
+					 List *withCheckOptionList, List *returningList,
+					 List **retrieved_attrs, int batchSize)
+{
+	AttrNumber	pindex;
+	bool		first;
+	ListCell   *lc;
+	int			i;
+
+	appendStringInfoString(buf, "INSERT INTO ");
+	deparseRelation(buf, rel);
+
+
+	if (targetAttrs)
+	{
+		appendStringInfoChar(buf, '(');
+
+		first = true;
+		foreach(lc, targetAttrs)
+		{
+			int			attnum = lfirst_int(lc);
+
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			deparseColumnRef(buf, rtindex, attnum, rte, false);
+		}
+
+		appendStringInfoString(buf, ") VALUES");
+
+		pindex = 1;
+
+		for (i = 0; i < batchSize; i++)
+		{
+			if (i > 0)
+				appendStringInfoString(buf, ", ");
+
+			appendStringInfoString(buf, "(");
+
+			first = true;
+			foreach(lc, targetAttrs)
+			{
+				if (!first)
+					appendStringInfoString(buf, ", ");
+				first = false;
+
+				appendStringInfo(buf, "$%d", pindex);
+				pindex++;
+			}
+
+			appendStringInfoChar(buf, ')');
+		}
+	}
+	else
+		appendStringInfoString(buf, " DEFAULT VALUES");
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1a03e02263..32bd4194eb 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f7be4bec17..4a78b21634 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -84,10 +84,9 @@ enum FdwScanPrivateIndex
  * a ModifyTable node referencing a postgres_fdw foreign table.  We store:
  *
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
- * 2) Integer list of target attribute numbers for INSERT/UPDATE
+ * 2) bulk INSERT statement text to be sent to the remote server (or NIL)
+ * 3) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Length till the end of VALUES clause for INSERT
- *	  (-1 for a DELETE/UPDATE)
  * 4) Boolean flag showing if the remote query has a RETURNING clause
  * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
@@ -95,14 +94,16 @@ enum FdwModifyPrivateIndex
 {
 	/* SQL statement to execute remotely (as a String node) */
 	FdwModifyPrivateUpdateSql,
+	/* bulk SQL statement to execute remotely (as a String node) */
+	FdwModifyPrivateBulkUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
-	/* Length till the end of VALUES clause (as an integer Value node) */
-	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
-	FdwModifyPrivateRetrievedAttrs
+	FdwModifyPrivateRetrievedAttrs,
+	/* INSERT batch size (number of tuples sent at once) */
+	FdwModifyPrivateBatchSize
 };
 
 /*
@@ -176,12 +177,12 @@ typedef struct PgFdwModifyState
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the scan */
 	char	   *p_name;			/* name of prepared statement, if created */
+	char	   *p_name_bulk;	/* name of prepared statement, if created */
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
-	char	   *orig_query;		/* original text of INSERT command */
+	char	   *query_bulk;		/* text of bulk INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
-	int			len;			/* length of some part of query */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -191,7 +192,7 @@ typedef struct PgFdwModifyState
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
 	/* bulk operation stuff */
-	int			num_slots;		/* number of slots to insert */
+	int			batch_size;		/* maximum number of rows to insert */
 
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
@@ -441,10 +442,11 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   CmdType operation,
 											   Plan *subplan,
 											   char *query,
+											   char *query_bulk,
 											   List *target_attrs,
-											   int len,
 											   bool has_returning,
-											   List *retrieved_attrs);
+											   List *retrieved_attrs,
+											   int batch_size);
 static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
@@ -452,13 +454,15 @@ static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  TupleTableSlot **planSlots,
 											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
+static void prepare_foreign_modify_bulk(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
 											 TupleTableSlot **slots,
 											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
-static void finish_foreign_modify(PgFdwModifyState *fmstate, bool release_conn);
+static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -619,6 +623,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
 	fpinfo->shippable_extensions = NIL;
 	fpinfo->fetch_size = 100;
+	fpinfo->batch_size = 100;
 
 	apply_server_options(fpinfo);
 	apply_table_options(fpinfo);
@@ -1677,15 +1682,15 @@ postgresPlanForeignModify(PlannerInfo *root,
 	RangeTblEntry *rte = planner_rt_fetch(resultRelation, root);
 	Relation	rel;
 	StringInfoData sql;
+	StringInfoData bulksql;
 	List	   *targetAttrs = NIL;
 	List	   *withCheckOptionList = NIL;
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
-	List	   *retvalList;
 	bool		doNothing = false;
-	int			values_end_len = -1;
 
 	initStringInfo(&sql);
+	initStringInfo(&bulksql);
 
 	/*
 	 * Core code already has some lock on each rel being planned, so we can
@@ -1771,7 +1776,11 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs, &values_end_len);
+							 &retrieved_attrs);
+			deparseBulkInsertSql(&bulksql, rte, resultRelation, rel,
+							 	 targetAttrs, doNothing,
+							 	 withCheckOptionList, returningList,
+							 	 &retrieved_attrs, 100); /* FIXME pass the batch size */
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1795,12 +1804,12 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	retvalList = list_make4(makeString(sql.data),
+	return list_make6(makeString(sql.data),
+					  makeString(bulksql.data),
 					  targetAttrs,
-					  makeInteger(values_end_len),
-					  makeInteger((retrieved_attrs != NIL)));
-	retvalList = lappend(retvalList, retrieved_attrs);
-	return retvalList;
+					  makeInteger((retrieved_attrs != NIL)),
+					  retrieved_attrs,
+					  makeInteger(100));
 }
 
 /*
@@ -1816,10 +1825,11 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 {
 	PgFdwModifyState *fmstate;
 	char	   *query;
+	char	   *query_bulk;
 	List	   *target_attrs;
 	bool		has_returning;
-	int			values_end_len;
 	List	   *retrieved_attrs;
+	int			batch_size;
 	RangeTblEntry *rte;
 
 	/*
@@ -1832,14 +1842,16 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	/* Deconstruct fdw_private data. */
 	query = strVal(list_nth(fdw_private,
 							FdwModifyPrivateUpdateSql));
+	query_bulk = strVal(list_nth(fdw_private,
+								 FdwModifyPrivateBulkUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
-	values_end_len = intVal(list_nth(fdw_private,
-									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
 										FdwModifyPrivateRetrievedAttrs);
+	batch_size = intVal(list_nth(fdw_private,
+								 FdwModifyPrivateBatchSize));
 
 	/* Find RTE. */
 	rte = exec_rt_fetch(resultRelInfo->ri_RangeTableIndex,
@@ -1852,10 +1864,11 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->operation,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
+									query_bulk,
 									target_attrs,
-									values_end_len,
 									has_returning,
-									retrieved_attrs);
+									retrieved_attrs,
+									batch_size);
 
 	resultRelInfo->ri_FdwState = fmstate;
 }
@@ -1971,7 +1984,7 @@ postgresEndForeignModify(EState *estate,
 		return;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate, true);
+	finish_foreign_modify(fmstate);
 }
 
 /*
@@ -1990,8 +2003,8 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
-	int			values_end_len;
 	StringInfoData sql;
+	StringInfoData bulksql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
@@ -2013,6 +2026,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 						RelationGetRelationName(rel))));
 
 	initStringInfo(&sql);
+	initStringInfo(&bulksql);
 
 	/* We transmit all columns that are defined in the foreign table. */
 	for (attnum = 1; attnum <= tupdesc->natts; attnum++)
@@ -2067,7 +2081,13 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs, &values_end_len);
+					 &retrieved_attrs);
+
+	/* FIXME only do this when batch size != 1 */
+	deparseBulkInsertSql(&bulksql, rte, resultRelation, rel, targetAttrs, doNothing,
+						 resultRelInfo->ri_WithCheckOptions,
+						 resultRelInfo->ri_returningList,
+						 &retrieved_attrs, 100); /* FIXME set the batch size */
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2076,10 +2096,11 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									CMD_INSERT,
 									NULL,
 									sql.data,
+									bulksql.data,
 									targetAttrs,
-									values_end_len,
 									retrieved_attrs != NIL,
-									retrieved_attrs);
+									retrieved_attrs,
+									100);	/* FIXME pass the correct batch_size */
 
 	/*
 	 * If the given resultRelInfo already has PgFdwModifyState set, it means
@@ -2116,7 +2137,7 @@ postgresEndForeignInsert(EState *estate,
 		fmstate = fmstate->aux_fmstate;
 
 	/* Destroy the execution state */
-	finish_foreign_modify(fmstate, true);
+	finish_foreign_modify(fmstate);
 }
 
 /*
@@ -3605,10 +3626,11 @@ create_foreign_modify(EState *estate,
 					  CmdType operation,
 					  Plan *subplan,
 					  char *query,
+					  char *query_bulk,
 					  List *target_attrs,
-					  int len,
 					  bool has_returning,
-					  List *retrieved_attrs)
+					  List *retrieved_attrs,
+					  int batch_size)
 {
 	PgFdwModifyState *fmstate;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
@@ -3641,10 +3663,8 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
-	if (operation == CMD_INSERT)
-		fmstate->orig_query = pstrdup(fmstate->query);
+	fmstate->query_bulk = query_bulk;
 	fmstate->target_attrs = target_attrs;
-	fmstate->len = len;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3696,7 +3716,7 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
-	fmstate->num_slots = 1;
+	fmstate->batch_size = batch_size;
 
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
@@ -3724,57 +3744,19 @@ execute_foreign_modify(EState *estate,
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
-	int			i, j;
-	int			pindex;
-	bool		first;
-	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
-	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	/* if we have a full insert batch, allocate the bulk prepared statement
+	 * if needed */
+	if (operation == CMD_INSERT)
 	{
-		/* Destroy the prepared statement created previously */
-		if (fmstate->p_name)
-			finish_foreign_modify(fmstate, false);
-
-		/*
-		 * Recreate INSERT command string with numSlots records in its
-		 * VALUES clause
-		 */
-
-		/* Copy up to the end of the first record from the original query */
-		initStringInfo(&sql);
-		appendBinaryStringInfo(&sql, fmstate->orig_query, fmstate->len);
-
-		/* Add records to VALUES clause */
-		pindex = fmstate->p_nums + 1;
-		for (i = 0; i < *numSlots - 1; i++)
-		{
-			appendStringInfoString(&sql, ", (");
-
-			first = true;
-			for (j = 0; j < fmstate->p_nums; j++)
-			{
-				if (!first)
-					appendStringInfoString(&sql, ", ");
-				first = false;
-
-				appendStringInfo(&sql, "$%d", pindex);
-				pindex++;
-			}
-
-			appendStringInfoChar(&sql, ')');
-		}
-
-		/* Copy stuff after VALUES clause from the original query */
-		appendStringInfoString(&sql, fmstate->orig_query + fmstate->len);
-
-		pfree(fmstate->query);
-		fmstate->query = sql.data;
-		fmstate->num_slots = *numSlots;
+		if ((fmstate->batch_size == *numSlots) &&
+			(!fmstate->p_name_bulk))
+			prepare_foreign_modify_bulk(fmstate);
 	}
 
 	/* Set up the prepared statement on the remote server, if we didn't yet */
@@ -3798,45 +3780,92 @@ execute_foreign_modify(EState *estate,
 		ctid = (ItemPointer) DatumGetPointer(datum);
 	}
 
-	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
-
 	/*
 	 * Execute the prepared statement.
 	 */
-	if (!PQsendQueryPrepared(fmstate->conn,
-							 fmstate->p_name,
-							 fmstate->p_nums * (*numSlots),
-							 p_values,
-							 NULL,
-							 NULL,
-							 0))
-		pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+	if (fmstate->batch_size == *numSlots)
+	{
+		/* Convert parameters needed by prepared statement to text form */
+		p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
+
+		if (!PQsendQueryPrepared(fmstate->conn,
+								 fmstate->p_name_bulk,
+								 fmstate->p_nums * fmstate->batch_size,
+								 p_values,
+								 NULL,
+								 NULL,
+								 0))
+			pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
 
-	/*
-	 * Get the result, and check for success.
-	 *
-	 * We don't use a PG_TRY block here, so be careful not to throw error
-	 * without releasing the PGresult.
-	 */
-	res = pgfdw_get_result(fmstate->conn, fmstate->query);
-	if (PQresultStatus(res) !=
-		(fmstate->has_returning ? PGRES_TUPLES_OK : PGRES_COMMAND_OK))
-		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+		/*
+		 * Get the result, and check for success.
+		 *
+		 * We don't use a PG_TRY block here, so be careful not to throw error
+		 * without releasing the PGresult.
+		 */
+		res = pgfdw_get_result(fmstate->conn, fmstate->query);
+		if (PQresultStatus(res) !=
+			(fmstate->has_returning ? PGRES_TUPLES_OK : PGRES_COMMAND_OK))
+			pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
 
-	/* Check number of rows affected, and fetch RETURNING tuple if any */
-	if (fmstate->has_returning)
-	{
-		Assert(*numSlots == 1);
-		n_rows = PQntuples(res);
-		if (n_rows > 0)
-			store_returning_result(fmstate, slots[0], res);
+		/* Check number of rows affected, and fetch RETURNING tuple if any */
+		if (fmstate->has_returning)
+		{
+			Assert(*numSlots == 1);
+			n_rows = PQntuples(res);
+			if (n_rows > 0)
+				store_returning_result(fmstate, slots[0], res);
+		}
+		else
+			n_rows = atoi(PQcmdTuples(res));
+
+		/* And clean up */
+		PQclear(res);
 	}
 	else
-		n_rows = atoi(PQcmdTuples(res));
+	{
+		int i;
 
-	/* And clean up */
-	PQclear(res);
+		for (i = 0; i < *numSlots; i++)
+		{
+			/* Convert parameters needed by prepared statement to text form */
+			p_values = convert_prep_stmt_params(fmstate, ctid, &slots[i], 1);
+
+			if (!PQsendQueryPrepared(fmstate->conn,
+									 fmstate->p_name,
+									 fmstate->p_nums,
+									 p_values,
+									 NULL,
+									 NULL,
+									 0))
+				pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
+
+			/*
+			 * Get the result, and check for success.
+			 *
+			 * We don't use a PG_TRY block here, so be careful not to throw error
+			 * without releasing the PGresult.
+			 */
+			res = pgfdw_get_result(fmstate->conn, fmstate->query);
+			if (PQresultStatus(res) !=
+				(fmstate->has_returning ? PGRES_TUPLES_OK : PGRES_COMMAND_OK))
+				pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
+
+			/* Check number of rows affected, and fetch RETURNING tuple if any */
+			if (fmstate->has_returning)
+			{
+				Assert(*numSlots == 1);
+				n_rows += PQntuples(res);
+				if (PQntuples(res) > 0)
+					store_returning_result(fmstate, slots[i], res);
+			}
+			else
+				n_rows += atoi(PQcmdTuples(res));
+
+			/* And clean up */
+			PQclear(res);
+		}
+	}
 
 	MemoryContextReset(fmstate->temp_cxt);
 
@@ -3848,6 +3877,51 @@ execute_foreign_modify(EState *estate,
 	return (n_rows > 0) ? slots : NULL;
 }
 
+/*
+ * prepare_foreign_modify
+ *		Establish a prepared statement for execution of INSERT/UPDATE/DELETE
+ */
+static void
+prepare_foreign_modify_bulk(PgFdwModifyState *fmstate)
+{
+	char		prep_name[NAMEDATALEN];
+	char	   *p_name;
+	PGresult   *res;
+
+	/* Construct name we'll use for the prepared statement. */
+	snprintf(prep_name, sizeof(prep_name), "pgsql_fdw_prep_%u",
+			 GetPrepStmtNumber(fmstate->conn));
+	p_name = pstrdup(prep_name);
+
+	/*
+	 * We intentionally do not specify parameter types here, but leave the
+	 * remote server to derive them by default.  This avoids possible problems
+	 * with the remote server using different type OIDs than we do.  All of
+	 * the prepared statements we use in this module are simple enough that
+	 * the remote server will make the right choices.
+	 */
+	if (!PQsendPrepare(fmstate->conn,
+					   p_name,
+					   fmstate->query_bulk,
+					   0,
+					   NULL))
+		pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query_bulk);
+
+	/*
+	 * Get the result, and check for success.
+	 *
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_get_result(fmstate->conn, fmstate->query);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query_bulk);
+	PQclear(res);
+
+	/* This action shows that the prepare has been done. */
+	fmstate->p_name_bulk = p_name;
+}
+
 /*
  * prepare_foreign_modify
  *		Establish a prepared statement for execution of INSERT/UPDATE/DELETE
@@ -4003,41 +4077,71 @@ store_returning_result(PgFdwModifyState *fmstate,
 	PG_END_TRY();
 }
 
+
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
+
+static void
+deallocate_query_bulk(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name_bulk)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name_bulk);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name_bulk = NULL;
+}
+
+
 /*
  * finish_foreign_modify
  *		Release resources for a foreign insert/update/delete operation
  */
 static void
-finish_foreign_modify(PgFdwModifyState *fmstate,
-	bool release_conn)
+finish_foreign_modify(PgFdwModifyState *fmstate)
 {
 	Assert(fmstate != NULL);
 
-	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	/* If we created prepared statements, destroy them */
+	deallocate_query(fmstate);
+	deallocate_query_bulk(fmstate);
 
 	/* Release remote connection */
-	if (release_conn)
-	{
-		ReleaseConnection(fmstate->conn);
-		fmstate->conn = NULL;
-	}
+	ReleaseConnection(fmstate->conn);
+	fmstate->conn = NULL;
 }
 
 /*
@@ -5483,6 +5587,8 @@ apply_server_options(PgFdwRelationInfo *fpinfo)
 				ExtractExtensionList(defGetString(def), false);
 		else if (strcmp(def->defname, "fetch_size") == 0)
 			fpinfo->fetch_size = strtol(defGetString(def), NULL, 10);
+		else if (strcmp(def->defname, "batch_size") == 0)
+			fpinfo->batch_size = strtol(defGetString(def), NULL, 10);
 	}
 }
 
@@ -5504,6 +5610,8 @@ apply_table_options(PgFdwRelationInfo *fpinfo)
 			fpinfo->use_remote_estimate = defGetBoolean(def);
 		else if (strcmp(def->defname, "fetch_size") == 0)
 			fpinfo->fetch_size = strtol(defGetString(def), NULL, 10);
+		else if (strcmp(def->defname, "batch_size") == 0)
+			fpinfo->batch_size = strtol(defGetString(def), NULL, 10);
 	}
 }
 
@@ -5538,6 +5646,7 @@ merge_fdw_options(PgFdwRelationInfo *fpinfo,
 	fpinfo->shippable_extensions = fpinfo_o->shippable_extensions;
 	fpinfo->use_remote_estimate = fpinfo_o->use_remote_estimate;
 	fpinfo->fetch_size = fpinfo_o->fetch_size;
+	fpinfo->batch_size = fpinfo_o->batch_size;
 
 	/* Merge the table level options from either side of the join. */
 	if (fpinfo_i)
@@ -5559,6 +5668,12 @@ merge_fdw_options(PgFdwRelationInfo *fpinfo,
 		 * relation sizes.
 		 */
 		fpinfo->fetch_size = Max(fpinfo_o->fetch_size, fpinfo_i->fetch_size);
+
+		/*
+		 * XXX Not sure it's particularly useful to merge batch_size for
+		 * join relations, but it it can't hurt.
+		 */
+		fpinfo->batch_size = Max(fpinfo_o->batch_size, fpinfo_i->batch_size);
 	}
 }
 
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 459a9ca6ab..ea2fdfb9c1 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -85,6 +85,7 @@ typedef struct PgFdwRelationInfo
 	UserMapping *user;			/* only set in use_remote_estimate mode */
 
 	int			fetch_size;		/* fetch size for this remote table */
+	int			batch_size;		/* insert batch size for this remote table */
 
 	/*
 	 * Name of the relation, for use while EXPLAINing ForeignScan.  It is used
@@ -161,7 +162,12 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs, int *values_end_len);
+							 List **retrieved_attrs);
+extern void deparseBulkInsertSql(StringInfo buf, RangeTblEntry *rte,
+								 Index rtindex, Relation rel,
+								 List *targetAttrs, bool doNothing,
+								 List *withCheckOptionList, List *returningList,
+								 List **retrieved_attrs, int batchSize);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 51bfe445b0..a632cf98ba 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8533,27 +8533,6 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-max-bulk-insert-tuples" xreflabel="max_bulk_insert_tuples">
-      <term><varname>max_bulk_insert_tuples</varname> (<type>integer</type>)
-      <indexterm>
-       <primary><varname>max_bulk_insert_tuples</varname></primary>
-       <secondary>configuration parameter</secondary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Sets the maximum number of tuples to accumulate and insert in bulk
-        into a foreign table. This applies to each partition when the insert
-        target is a partitioned table.
-        The valid range is <literal>1</literal>, which disables bulk insert,
-        to <literal>1000</literal>.
-        This takes effect only if the foreign data wrapper supports
-        bulk insert.
-        The default is <literal>100</literal>.
-       </para>
-      </listitem>
-     </varlistentry>
-
      </variablelist>
     </sect2>
      <sect2 id="runtime-config-client-format">
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e20c613fed..4932014515 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -57,7 +57,10 @@
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
-
+/*
+ * FIXME this should be removed / replaced with the foreign server/table
+ * batch_size option.
+ */
 int max_bulk_insert_tuples;
 
 static void ExecBulkInsert(ModifyTableState *mtstate,
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index efa44342c4..56c63ba2a3 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,38 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
+List *
+list_make6_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5,
+				ListCell datum6)
+{
+	List	   *list = new_list(t, 6);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	list->elements[5] = datum6;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index cda77a841e..195a8c1818 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,14 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
+#define list_make6(x1,x2,x3,x4,x5,x6) \
+	list_make6_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5), list_make_ptr_cell(x6))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +232,14 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
+#define list_make6_int(x1,x2,x3,x4,x5,x6) \
+	list_make6_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5), list_make_int_cell(x6))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +251,14 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
+#define list_make6_oid(x1,x2,x3,x4,x5,x6) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5), list_make_oid_cell(x6))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +544,12 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
+extern List *list_make6_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5, ListCell datum6);
 
 extern pg_nodiscard List *lappend(List *list, void *datum);
 extern pg_nodiscard List *lappend_int(List *list, int datum);
-- 
2.26.2

#25tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#24)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Unfortunately, this does not compile for me, because nodeModifyTable calls
ExecGetTouchedPartitions, which is not defined anywhere. Not sure what's
that about, so I simply commented-out this. That probably fails the partitioned
cases, but it allowed me to do some review and testing.

Ouch, sorry. I'm ashamed to have forgotten including execPartition.c.

The are a couple other smaller changes. E.g. it undoes changes to
finish_foreign_modify, and instead calls separate functions to prepare the bulk
statement. It also adds list_make5/list_make6 macros, so as to not have to do
strange stuff with the parameter lists.

Thanks, I'll take them thankfully! I wonder why I didn't think of separating deallocate_query() from finish_foreign_modify() ... perhaps my brain was dying. As for list_make5/6(), I saw your first patch avoid adding them, so I thought you found them ugly (and I felt so, too.) But thinking now, there's no reason to hesitate it.

A finally, this should probably add a bunch of regression tests.

Sure.

1) As I mentioned before, I really don't think we should be doing deparsing in
execute_foreign_modify - that's something that should happen earlier, and
should be in a deparse.c function.

...

The attached patch tries to address both of these points.

Firstly, it adds a new deparseBulkInsertSql function, that builds a query for the
"full" batch, and then uses those two queries - when we get a full batch we use
the bulk query, otherwise we use the single-row query in a loop. IMO this is
cleaner than deparsing queries ad hoc in the execute_foreign_modify.

...

Of course, this might be worse when we don't have a full batch, e.g. for a query
that insert only 50 rows with batch_size=100. If this case is common, one
option would be lowering the batch_size accordingly. If we really want to
improve this case too, I suggest we pass more info than just a position of the
VALUES clause - that seems a bit too hackish.

...

Secondly, it adds the batch_size option to server/foreign table, and uses that.
This is not complete, though. postgresPlanForeignModify currently passes a
hard-coded value at the moment, it needs to lookup the correct value for the
server/table from RelOptInfo or something. And I suppose ModifyTable
inftractructure will need to determine the value in order to pass the correct
number of slots to the FDW API.

I can sort of understand your feeling, but I'd like to reconstruct the query and prepare it in execute_foreign_modify() because:

* Some of our customers use bulk insert in ECPG (INSERT ... VALUES(record1, (record2), ...) to insert variable number of records per query. (Oracle's Pro*C has such a feature.) So, I want to be prepared to enable such a thing with FDW.

* The number of records to insert is not known during planning (in general), so it feels natural to get prepared during execution phase, or not unnatural at least.

* I wanted to avoid the overhead of building the full query string for 100-record insert statement during query planning, because it may be a bit costly for usual 1-record inserts. (The overhead may be hidden behind the high communication cost of postgres_fdw, though.)

So, in terms of code cleanness, how about moving my code for rebuilding query string from execute_foreign_modify() to some new function in deparse.c?

2) I think the GUC should be replaced with an server/table option, similar to
fetch_size.

Hmm, batch_size differs from fetch_size. fetch_size is a postgres_fdw-specific feature with no relevant FDW routine, while batch_size is a configuration parameter for all FDWs that implement ExecForeignBulkInsert(). The ideas I can think of are:

1. Follow JDBC/ODBC and add standard FDW properties. For example, the JDBC standard defines standard connection pool properties such as maxPoolSize and minPoolSize. JDBC drivers have to provide them with those defined names. Likewise, the FDW interface requires FDW implementors to handle the foreign server option name "max_bulk_insert_tuples" if he/she wants to provide bulk insert feature and implement ExecForeignBulkInsert(). The core executor gets that setting from the FDW by calling a new FDW routine like GetMaxBulkInsertTuples(). Sigh...

2. Add a new max_bulk_insert_tuples reloption to CREATE/ALTER FOREIGN TABLE. executor gets the value from Relation and uses it. (But is this a table-specific configuration? I don't think so, sigh...)

3. Adopt the current USERSET GUC max_bulk_insert_tuples. I think this is enough because the user can change the setting per session, application, and database.

Regards
Takayuki Tsunakawa

#26Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#25)
Re: POC: postgres_fdw insert batching

On 11/19/20 3:43 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Unfortunately, this does not compile for me, because
nodeModifyTable calls ExecGetTouchedPartitions, which is not
defined anywhere. Not sure what's that about, so I simply
commented-out this. That probably fails the partitioned cases, but
it allowed me to do some review and testing.

Ouch, sorry. I'm ashamed to have forgotten including
execPartition.c.

No reason to feel ashamed. Mistakes do happen from time to time.

The are a couple other smaller changes. E.g. it undoes changes to
finish_foreign_modify, and instead calls separate functions to
prepare the bulk statement. It also adds list_make5/list_make6
macros, so as to not have to do strange stuff with the parameter
lists.

Thanks, I'll take them thankfully! I wonder why I didn't think of
separating deallocate_query() from finish_foreign_modify() ...
perhaps my brain was dying. As for list_make5/6(), I saw your first
patch avoid adding them, so I thought you found them ugly (and I felt
so, too.) But thinking now, there's no reason to hesitate it.

I think it's often easier to look changes like deallocate_query with a
bit of distance, not while hacking on the patch and just trying to make
it work somehow.

For the list_make# stuff, I think I've decided to do the simplest thing
possible in extension, without having to recompile the server. But I
think for a proper patch it's better to keep it more readable.

...

1) As I mentioned before, I really don't think we should be doing
deparsing in execute_foreign_modify - that's something that should
happen earlier, and should be in a deparse.c function.

...

The attached patch tries to address both of these points.

Firstly, it adds a new deparseBulkInsertSql function, that builds a
query for the "full" batch, and then uses those two queries - when
we get a full batch we use the bulk query, otherwise we use the
single-row query in a loop. IMO this is cleaner than deparsing
queries ad hoc in the execute_foreign_modify.

...

Of course, this might be worse when we don't have a full batch,
e.g. for a query that insert only 50 rows with batch_size=100. If
this case is common, one option would be lowering the batch_size
accordingly. If we really want to improve this case too, I suggest
we pass more info than just a position of the VALUES clause - that
seems a bit too hackish.

...

Secondly, it adds the batch_size option to server/foreign table,
and uses that. This is not complete, though.
postgresPlanForeignModify currently passes a hard-coded value at
the moment, it needs to lookup the correct value for the
server/table from RelOptInfo or something. And I suppose
ModifyTable inftractructure will need to determine the value in
order to pass the correct number of slots to the FDW API.

I can sort of understand your feeling, but I'd like to reconstruct
the query and prepare it in execute_foreign_modify() because:

* Some of our customers use bulk insert in ECPG (INSERT ...
VALUES(record1, (record2), ...) to insert variable number of records
per query. (Oracle's Pro*C has such a feature.) So, I want to be
prepared to enable such a thing with FDW.

* The number of records to insert is not known during planning (in
general), so it feels natural to get prepared during execution phase,
or not unnatural at least.

I think we should differentiate between "deparsing" and "preparing".

* I wanted to avoid the overhead of building the full query string
for 100-record insert statement during query planning, because it may
be a bit costly for usual 1-record inserts. (The overhead may be
hidden behind the high communication cost of postgres_fdw, though.)

Hmm, ok. I haven't tried how expensive that would be, but my assumption
was it's much cheaper than the latency we save. But maybe I'm wrong.

So, in terms of code cleanness, how about moving my code for
rebuilding query string from execute_foreign_modify() to some new
function in deparse.c?

That might work, yeah. I suggest we do this:

1) try to use the same approach for both single-row inserts and larger
batches, to not have a lot of different branches

2) modify deparseInsertSql to produce not the "final" query but some
intermediate representation useful to generate queries inserting
arbitrary number of rows

3) in execute_foreign_modify remember the last number of rows, and only
rebuild/replan the query when it changes

2) I think the GUC should be replaced with an server/table option,
similar to fetch_size.

Hmm, batch_size differs from fetch_size. fetch_size is a
postgres_fdw-specific feature with no relevant FDW routine, while
batch_size is a configuration parameter for all FDWs that implement
ExecForeignBulkInsert(). The ideas I can think of are:

1. Follow JDBC/ODBC and add standard FDW properties. For example,
the JDBC standard defines standard connection pool properties such as
maxPoolSize and minPoolSize. JDBC drivers have to provide them with
those defined names. Likewise, the FDW interface requires FDW
implementors to handle the foreign server option name
"max_bulk_insert_tuples" if he/she wants to provide bulk insert
feature and implement ExecForeignBulkInsert(). The core executor
gets that setting from the FDW by calling a new FDW routine like
GetMaxBulkInsertTuples(). Sigh...

2. Add a new max_bulk_insert_tuples reloption to CREATE/ALTER FOREIGN
TABLE. executor gets the value from Relation and uses it. (But is
this a table-specific configuration? I don't think so, sigh...)

I do agree there's a difference between fetch_size and batch_size. For
fetch_size, it's internal to postgres_fdw - no external code needs to
know about it. For batch_size that's not the case, the ModifyTable core
code needs to be aware of that.

That means the "batch_size" is becoming part of the API, and IMO the way
to do that is by exposing it as an explicit API method. So +1 to add
something like GetMaxBulkInsertTuples.

It still needs to be configurable at the server/table level, though. The
new API method should only inform ModifyTable about the final max batch
size the FDW decided to use.

3. Adopt the current USERSET GUC max_bulk_insert_tuples. I think
this is enough because the user can change the setting per session,
application, and database.

I don't think this is usable in practice, because a single session may
be using multiple FDW servers, with different implementations, latency
to the data nodes, etc. It's unlikely a single GUC value will be
suitable for all of them.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#26)
1 attachment(s)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

I don't think this is usable in practice, because a single session may
be using multiple FDW servers, with different implementations, latency
to the data nodes, etc. It's unlikely a single GUC value will be
suitable for all of them.

That makes sense. The row size varies from table to table, so the user may want to tune this option to reduce memory consumption.

I think the attached patch has reflected all your comments. I hope this will pass..

Regards
Takayuki Tsunakawa

Attachments:

v4-0001-Add-bulk-insert-for-foreign-tables.patchapplication/octet-stream; name=v4-0001-Add-bulk-insert-for-foreign-tables.patchDownload
From 1b78dabf3564461baaba99dcfe5f39b4fc576efc Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>
Date: Tue, 10 Nov 2020 09:27:56 +0900
Subject: [PATCH v4] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c                 |  43 +++-
 contrib/postgres_fdw/expected/postgres_fdw.out | 116 +++++++++-
 contrib/postgres_fdw/option.c                  |  14 ++
 contrib/postgres_fdw/postgres_fdw.c            | 280 ++++++++++++++++++++-----
 contrib/postgres_fdw/postgres_fdw.h            |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  92 ++++++++
 doc/src/sgml/fdwhandler.sgml                   |  89 +++++++-
 doc/src/sgml/postgres-fdw.sgml                 |  13 ++
 src/backend/executor/execPartition.c           |  11 +
 src/backend/executor/nodeModifyTable.c         | 154 ++++++++++++++
 src/backend/nodes/list.c                       |  15 ++
 src/include/executor/execPartition.h           |   1 +
 src/include/foreign/fdwapi.h                   |  10 +
 src/include/nodes/execnodes.h                  |   5 +
 src/include/nodes/pg_list.h                    |  15 ++
 15 files changed, 798 insertions(+), 65 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df1..eac2645 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1706,7 +1706,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1749,6 +1749,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1759,6 +1760,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * The statement text is appended to buf, and we also create an integer List
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06..4a8900a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9035,3 +9035,117 @@ ERROR:  08006
 COMMIT;
 -- Clean up
 DROP PROCEDURE terminate_backend_and_wait(text);
+-- ===================================================================
+-- bulk insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable bulk insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1a03e02..32bd419 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaac..7581e83 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -86,8 +86,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -95,6 +97,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -175,7 +179,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			len;			/* length of some part of query */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -184,6 +191,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* bulk operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -342,6 +352,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBulkInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetMaxBulkInsertTuples(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -428,20 +444,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -529,6 +549,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBulkInsert = postgresExecForeignBulkInsert;
+	routine->GetMaxBulkInsertTuples = postgresGetMaxBulkInsertTuples;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1664,6 +1686,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1751,7 +1774,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1775,8 +1798,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1796,6 +1820,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1811,6 +1836,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1828,6 +1855,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1845,7 +1873,37 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBulkInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBulkInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1854,7 +1912,7 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1863,6 +1921,16 @@ postgresExecForeignInsert(EState *estate,
 }
 
 /*
+ * postgresGetMaxBulkInsertTuples
+ *		Report the maximum number of tuples that can be inserted in bulk
+ */
+static int
+postgresGetMaxBulkInsertTuples(ResultRelInfo *resultRelInfo)
+{
+	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+}
+
+/*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
  */
@@ -1872,8 +1940,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1886,8 +1959,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1924,6 +2002,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2000,7 +2079,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2010,6 +2089,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -3538,6 +3618,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int len,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3546,6 +3627,7 @@ create_foreign_modify(EState *estate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Oid			userid;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	AttrNumber	n_params;
 	Oid			typefnoid;
@@ -3572,7 +3654,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->len = len;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3624,6 +3709,44 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+	{
+		/* Check the foreign table option. */
+		foreach(lc, table->options)
+		{
+			DefElem    *def = (DefElem *) lfirst(lc);
+
+			if (strcmp(def->defname, "batch_size") == 0)
+			{
+				fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+				break;
+			}
+		}
+
+		/* Check the foreign server option if the table option is not set. */
+		if (fmstate->batch_size == 0)
+		{
+			server = GetForeignServer(table->serverid);
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "batch_size") == 0)
+				{
+					fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+					break;
+				}
+			}
+		}
+
+		/* If neither the table nor server option is set, set the default. */
+		if (fmstate->batch_size == 0)
+			fmstate->batch_size = 100;
+	}
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3634,26 +3757,47 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBulkInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Recreate INSERT command string with numSlots records in its
+		 * VALUES clause
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->len,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3666,7 +3810,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3676,14 +3820,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3704,9 +3848,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3716,10 +3861,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3779,19 +3926,23 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
@@ -3799,32 +3950,37 @@ convert_prep_stmt_params(PgFdwModifyState *fmstate,
 	}
 
 	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3878,23 +4034,7 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
@@ -3902,6 +4042,34 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 }
 
 /*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
+/*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
  *		UPDATE/DELETE .. RETURNING on a join directly
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410d..27e84ee 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c54..d1e78b9 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2697,3 +2697,95 @@ COMMIT;
 
 -- Clean up
 DROP PROCEDURE terminate_backend_and_wait(text);
+
+-- ===================================================================
+-- bulk insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable bulk insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
+
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c92934..5123d3f 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBulkInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBulkInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBulkInsert</function> or
+     <function>GetMaxBulkInsertTuples</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetMaxBulkInsertTuples(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBulkInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBulkInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBulkInsert</function> or
+     <function>GetMaxBulkInsertTuples</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBulkInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBulkInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e6fd214..97eeb64 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>100</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 86594bd..257e725 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2193,3 +2193,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
 		}
 	}
 }
+
+/*
+ * ExecGetTouchedPartitions -- Get the partitions touched by
+ * this routing
+ */
+ResultRelInfo **
+ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count)
+{
+	*count = proute->num_partitions;
+	return proute->partitions;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 29e07b7..8acbfe3 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBulkInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -433,6 +441,8 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 	else if (resultRelInfo->ri_FdwRoutine)
 	{
+		int			max_bulk_insert_tuples = 1;
+
 		/*
 		 * Compute stored generated columns
 		 */
@@ -442,6 +452,59 @@ ExecInsert(ModifyTableState *mtstate,
 									   CMD_INSERT);
 
 		/*
+		 * If the FDW supports bulk insert, accumulate tuples and insert them
+		 * in bulk
+		 */
+		if (resultRelInfo->ri_FdwRoutine->GetMaxBulkInsertTuples)
+			max_bulk_insert_tuples =
+				resultRelInfo->ri_FdwRoutine->GetMaxBulkInsertTuples(resultRelInfo);
+		if (max_bulk_insert_tuples > 1 &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert &&
+			resultRelInfo->ri_projectReturning == NULL)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or	a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the bulk insert
+			 */
+			if (resultRelInfo->ri_NumSlots == max_bulk_insert_tuples)
+			{
+				ExecBulkInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   max_bulk_insert_tuples);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   max_bulk_insert_tuples);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+			return NULL;
+		}
+
+		/*
 		 * insert into foreign table: let the FDW do it
 		 */
 		slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
@@ -702,6 +765,73 @@ ExecInsert(ModifyTableState *mtstate,
 }
 
 /* ----------------------------------------------------------------
+ *		ExecBulkInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBulkInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBulkInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+	{
+		estate->es_processed += numInserted;
+		setLastTid(&slot->tts_tid);
+	}
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
+/* ----------------------------------------------------------------
  *		ExecDelete
  *
  *		DELETE is like UPDATE, except that we delete the tuple and no
@@ -1940,6 +2070,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	ResultRelInfo **resultRelInfos;
+	int			num_partitions;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2156,6 +2289,27 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
+	 * Insert remaining tuples for bulk insert.
+	 */
+	if (proute)
+		resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	else
+	{
+		resultRelInfos = &resultRelInfo;
+		num_partitions = 1;
+	}
+	for (int i = 0; i < num_partitions; i++)
+	{
+		resultRelInfo = resultRelInfos[i];
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBulkInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
+	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
 	fireASTriggers(node);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index efa4434..83944ff 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 473c4cd..a2b8181 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,6 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern ResultRelInfo **ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556df..88e1f26 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBulkInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetMaxBulkInsertTuples_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBulkInsert_function ExecForeignBulkInsert;
+	GetMaxBulkInsertTuples_function GetMaxBulkInsertTuples;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d6..3d67ded 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,11 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* bulk insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	TupleTableSlot **ri_Slots;		/* input tuples for bulk insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index ec23101..62a0e1f 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern List *lappend(List *list, void *datum);
 extern List *lappend_int(List *list, int datum);
-- 
2.10.1

#28Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#27)
Re: POC: postgres_fdw insert batching

On 11/23/20 3:17 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

I don't think this is usable in practice, because a single session
may be using multiple FDW servers, with different implementations,
latency to the data nodes, etc. It's unlikely a single GUC value
will be suitable for all of them.

That makes sense. The row size varies from table to table, so the
user may want to tune this option to reduce memory consumption.

I think the attached patch has reflected all your comments. I hope
this will pass..

Thanks - I didn't have time for a thorough review at the moment, so I
only skimmed through the diff and did a couple very simple tests. And I
think overall it looks quite nice.

A couple minor comments/questions:

1) We're calling it "batch_size" but the API function is named
postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
to postgresGetModifyBatchSize()? That has the advantage it'd work if we
ever add support for batching to UPDATE/DELETE.

2) Do we have to lookup the batch_size in create_foreign_modify (in
server/table options)? I'd have expected to look it up while planning
the modify and then pass it through the list, just like the other
FdwModifyPrivateIndex stuff. But maybe that's not possible.

3) That reminds me - should we show the batching info on EXPLAIN? That
seems like a fairly interesting thing to show to the user. Perhaps
showing the average batch size would also be useful? Or maybe not, we
create the batches as large as possible, with the last one smaller.

4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
over for every tuple. I don't know it that has measurable impact, but it
seems a bit excessive IMO. I don't think we should support the batch
size changing during execution (seems tricky).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#29tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#28)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

1) We're calling it "batch_size" but the API function is named
postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
to postgresGetModifyBatchSize()? That has the advantage it'd work if we
ever add support for batching to UPDATE/DELETE.

Actually, I was in two minds whether the term batch or bulk is better. Because Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK COLLECT INTO array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch updates" and its API method names (addBatch, executeBatch).

But it seems better or common to use batch according to the etymology and the following Stack Overflow page:

https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch

OTOH, as for the name GetModifyBatchSize() you suggest, I think GetInsertBatchSize may be better. That is, this API deals with multiple records in a single INSERT statement. Your GetModifyBatchSize will be reserved for statement batching when libpq has supported batch/pipelining to execute multiple INSERT/UPDATE/DELETE statements, as in the following JDBC batch updates. What do you think?

CODE EXAMPLE 14-1 Creating and executing a batch of insert statements
--------------------------------------------------
Statement stmt = con.createStatement();
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)");

// submit a batch of update commands for execution
int[] updateCounts = stmt.executeBatch();
--------------------------------------------------

2) Do we have to lookup the batch_size in create_foreign_modify (in
server/table options)? I'd have expected to look it up while planning
the modify and then pass it through the list, just like the other
FdwModifyPrivateIndex stuff. But maybe that's not possible.

Don't worry, create_foreign_modify() is called from PlanForeignModify() during planning. Unfortunately, it's also called from BeginForeignInsert(), but other stuff passed to create_foreign_modify() including the query string is constructed there.

3) That reminds me - should we show the batching info on EXPLAIN? That
seems like a fairly interesting thing to show to the user. Perhaps
showing the average batch size would also be useful? Or maybe not, we
create the batches as large as possible, with the last one smaller.

Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change dynamically based on the planning or system state unlike shared buffers and parallel workers. OTOH, I sometimes want to see what configuration parameter values the user set, such as work_mem, enable_*, and shared_buffers, together with the query plan (EXPLAIN and auto_explain). For example, it'd be nice if EXPLAIN (parameters on) could do that. Some relevant FDW-related parameters could be included in that output.

4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
over for every tuple. I don't know it that has measurable impact, but it
seems a bit excessive IMO. I don't think we should support the batch
size changing during execution (seems tricky).

Don't worry about this, too. GetMaxBulkInsertTuples() just returns a value that was already saved in a struct in create_foreign_modify().

Regards
Takayuki Tsunakawa

#30Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#29)
Re: POC: postgres_fdw insert batching

On 11/24/20 9:45 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

1) We're calling it "batch_size" but the API function is named
postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
to postgresGetModifyBatchSize()? That has the advantage it'd work if we
ever add support for batching to UPDATE/DELETE.

Actually, I was in two minds whether the term batch or bulk is better. Because Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK COLLECT INTO array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch updates" and its API method names (addBatch, executeBatch).

But it seems better or common to use batch according to the etymology and the following Stack Overflow page:

https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch

OTOH, as for the name GetModifyBatchSize() you suggest, I think GetInsertBatchSize may be better. That is, this API deals with multiple records in a single INSERT statement. Your GetModifyBatchSize will be reserved for statement batching when libpq has supported batch/pipelining to execute multiple INSERT/UPDATE/DELETE statements, as in the following JDBC batch updates. What do you think?

I don't know. I was really only thinking about batching in the context
of a single DML command, not about batching of multiple commands at the
protocol level. IMHO it's far more likely we'll add support for batching
for DELETE/UPDATE than libpq pipelining, which seems rather different
from how the FDW API works. Which is why I was suggesting to use a name
that would work for all DML commands, not just for inserts.

CODE EXAMPLE 14-1 Creating and executing a batch of insert statements
--------------------------------------------------
Statement stmt = con.createStatement();
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)");

// submit a batch of update commands for execution
int[] updateCounts = stmt.executeBatch();
--------------------------------------------------

Sure. We already have a patch to support something like this at the
libpq level, IIRC. But I'm not sure how well that matches the FDW API
approach in general.

2) Do we have to lookup the batch_size in create_foreign_modify (in
server/table options)? I'd have expected to look it up while planning
the modify and then pass it through the list, just like the other
FdwModifyPrivateIndex stuff. But maybe that's not possible.

Don't worry, create_foreign_modify() is called from PlanForeignModify() during planning. Unfortunately, it's also called from BeginForeignInsert(), but other stuff passed to create_foreign_modify() including the query string is constructed there.

Hmm, ok.

3) That reminds me - should we show the batching info on EXPLAIN? That
seems like a fairly interesting thing to show to the user. Perhaps
showing the average batch size would also be useful? Or maybe not, we
create the batches as large as possible, with the last one smaller.

Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change dynamically based on the planning or system state unlike shared buffers and parallel workers. OTOH, I sometimes want to see what configuration parameter values the user set, such as work_mem, enable_*, and shared_buffers, together with the query plan (EXPLAIN and auto_explain). For example, it'd be nice if EXPLAIN (parameters on) could do that. Some relevant FDW-related parameters could be included in that output.

Not sure, but I'd guess knowing whether batching is used would be
useful. We only print the single-row SQL query, which kinda gives the
impression that there's no batching.

4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
over for every tuple. I don't know it that has measurable impact, but it
seems a bit excessive IMO. I don't think we should support the batch
size changing during execution (seems tricky).

Don't worry about this, too. GetMaxBulkInsertTuples() just returns a value that was already saved in a struct in create_foreign_modify().

Well, I do worry for two reasons.

Firstly, the fact that in postgres_fdw the call is cheap does not mean
it'll be like that in every other FDW. Presumably, the other FDWs might
cache it in the struct and do the same thing, of course.

But the fact that we're calling it over and over for each row kinda
seems like we allow the value to change during execution, but I very
much doubt the code is expecting that. I haven't tried, but assume the
function first returns 10 and then 100. ISTM the code will allocate
ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
That can't end well. Sure, we can claim it's a bug in the FDW extension,
but it's also due to the API design.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#31Craig Ringer
craig.ringer@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#14)
Re: POC: postgres_fdw insert batching

On Thu, Oct 8, 2020 at 10:40 AM tsunakawa.takay@fujitsu.com <
tsunakawa.takay@fujitsu.com> wrote:

Thank you for picking up this. I'm interested in this topic, too. (As an
aside, we'd like to submit a bulk insert patch for ECPG in the near future.)

As others referred, Andrey-san's fast COPY to foreign partitions is also
promising. But I think your bulk INSERT is a separate feature and offers
COPY cannot do -- data transformation during loading with INSERT SELECT and
CREATE TABLE AS SELECT.

Is there anything that makes you worry and stops development? Could I
give it a try to implement this (I'm not sure I can, sorry. I'm worried if
we can change the executor's call chain easily.)

I suggest that when developing this, you keep in mind the ongoing work on
the libpq pipelining/batching enhancements, and also the way many
interfaces to foreign data sources support asynchronous, concurrent
operations.

Best results with postgres_fdw insert batching would be achieved if it can
also send its batches as asynchronous queries and only block when it's
required to report on the results of the work. This will also be true of
any other FDW where the backing remote interface can support asynchronous
concurrent or pipelined operation.

I'd argue it's pretty much vital for decent performance when talking to a
cloud database from an on-prem server for example, or any other time that
round-trip-time reduction is important.

The most important characteristic of an FDW API to permit this would be
decoupling of request and response into separate non-blocking calls that
don't have to occur in ordered pairs. Instead of "insert_foo(foo) ->
insert_result", have "queue_insert_foo(foo) -> future_result",
"get_result_if_available(future_result) -> maybe result" and
"get_result_blocking(future_result) -> result". Permit multiple
queue_insert_foo(...)s without a/b interleaving with result fetches being
required.

Ideally it'd be able to accumulate small batches of inserts locally and
send a batch to the remote end once it's accumulated enough. But instead of
blocking waiting for the result, return control to the executor after
sending, without forcing a socket flush (which might block) and without
waiting to learn what the outcome was. Allow new batches to be accumulated
and sent before the results of the first batch are received, so long as
it's within the same executor node so we don't make any unfortunate
mistakes with mixing things up in compound statements or functions etc.
Only report outcomes like rowcounts lazily when results are received, or
when required to do so.

If now we have

REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT

and batching would give us

{ REQUEST, REQUEST} -> [block] -> { RESULT, RESULT }
~~ round trip delay ~~
{ REQUEST, REQUEST} -> [block] -> { RESULT, RESULT }

consider if room can be left in the batching API to permit:

{ REQUEST, REQUEST} -> [nonblocking send...]
{ REQUEST, REQUEST} -> [nonblocking send...]
~~ round trip delay ~~
[....] -> RESULT, RESULT
[....] -> RESULT, RESULT

... where we only actually block at the point where the result is required
as input into the next node.

Honestly I don't know the executor structure well enough to say if this is
even remotely feasible right now. Maybe Andres may be able to comment. But
please keep it in mind if you're thinking of making FDW API changes.

#32tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#30)
1 attachment(s)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

On 11/24/20 9:45 AM, tsunakawa.takay@fujitsu.com wrote:

OTOH, as for the name GetModifyBatchSize() you suggest, I think

GetInsertBatchSize may be better. That is, this API deals with multiple
records in a single INSERT statement. Your GetModifyBatchSize will be
reserved for statement batching when libpq has supported batch/pipelining to
execute multiple INSERT/UPDATE/DELETE statements, as in the following
JDBC batch updates. What do you think?

I don't know. I was really only thinking about batching in the context
of a single DML command, not about batching of multiple commands at the
protocol level. IMHO it's far more likely we'll add support for batching
for DELETE/UPDATE than libpq pipelining, which seems rather different
from how the FDW API works. Which is why I was suggesting to use a name
that would work for all DML commands, not just for inserts.

Right, I can't imagine now how the interaction among the client, server core and FDWs would be regarding the statement batching. So I'll take your suggested name.

Not sure, but I'd guess knowing whether batching is used would be
useful. We only print the single-row SQL query, which kinda gives the
impression that there's no batching.

Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.

Don't worry about this, too. GetMaxBulkInsertTuples() just returns a value

that was already saved in a struct in create_foreign_modify().

Well, I do worry for two reasons.

Firstly, the fact that in postgres_fdw the call is cheap does not mean
it'll be like that in every other FDW. Presumably, the other FDWs might
cache it in the struct and do the same thing, of course.

But the fact that we're calling it over and over for each row kinda
seems like we allow the value to change during execution, but I very
much doubt the code is expecting that. I haven't tried, but assume the
function first returns 10 and then 100. ISTM the code will allocate
ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
That can't end well. Sure, we can claim it's a bug in the FDW extension,
but it's also due to the API design.

You worried about other FDWs than postgres_fdw. That's reasonable. I insisted in other threads that PG developers care only about postgres_fdw, not other FDWs, when designing the FDW interface, but I myself made the same mistake. I made changes so that the executor calls GetModifyBatchSize() once per relation per statement.

Regards
Takayuki Tsunakawa

Attachments:

v5-0001-Add-bulk-insert-for-foreign-tables.patchapplication/octet-stream; name=v5-0001-Add-bulk-insert-for-foreign-tables.patchDownload
From d6f2eb12d06b9f630f0bb749a33e0bbb5b0c78cb Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>
Date: Tue, 10 Nov 2020 09:27:56 +0900
Subject: [PATCH v5] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c                 |  43 +++-
 contrib/postgres_fdw/expected/postgres_fdw.out | 116 +++++++++-
 contrib/postgres_fdw/option.c                  |  14 ++
 contrib/postgres_fdw/postgres_fdw.c            | 282 ++++++++++++++++++++-----
 contrib/postgres_fdw/postgres_fdw.h            |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  92 ++++++++
 doc/src/sgml/fdwhandler.sgml                   |  89 +++++++-
 doc/src/sgml/postgres-fdw.sgml                 |  13 ++
 src/backend/executor/execPartition.c           |  11 +
 src/backend/executor/nodeModifyTable.c         | 153 ++++++++++++++
 src/backend/nodes/list.c                       |  15 ++
 src/include/executor/execPartition.h           |   1 +
 src/include/foreign/fdwapi.h                   |  10 +
 src/include/nodes/execnodes.h                  |   6 +
 src/include/nodes/pg_list.h                    |  15 ++
 15 files changed, 800 insertions(+), 65 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df1..eac2645 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1706,7 +1706,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1749,6 +1749,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1759,6 +1760,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * The statement text is appended to buf, and we also create an integer List
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06..6487cec 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9035,3 +9035,117 @@ ERROR:  08006
 COMMIT;
 -- Clean up
 DROP PROCEDURE terminate_backend_and_wait(text);
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1a03e02..32bd419 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaac..b4075a0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -86,8 +86,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -95,6 +97,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -175,7 +179,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			len;			/* length of some part of query */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -184,6 +191,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* batch operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -342,6 +352,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBatchInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetModifyBatchSize(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -428,20 +444,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -529,6 +549,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBatchInsert = postgresExecForeignBatchInsert;
+	routine->GetModifyBatchSize = postgresGetModifyBatchSize;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1664,6 +1686,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1751,7 +1774,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1775,8 +1798,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1796,6 +1820,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1811,6 +1836,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1828,6 +1855,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1845,7 +1873,37 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBatchInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBatchInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1854,7 +1912,7 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1863,6 +1921,16 @@ postgresExecForeignInsert(EState *estate,
 }
 
 /*
+ * postgresGetModifyBatchSize
+ *		Report the maximum number of tuples that can be inserted in bulk
+ */
+static int
+postgresGetModifyBatchSize(ResultRelInfo *resultRelInfo)
+{
+	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+}
+
+/*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
  */
@@ -1872,8 +1940,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1886,8 +1959,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? *rslot : NULL;
 }
 
 /*
@@ -1924,6 +2002,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2000,7 +2079,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2010,6 +2089,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2635,6 +2715,8 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 										  FdwModifyPrivateUpdateSql));
 
 		ExplainPropertyText("Remote SQL", sql, es);
+		if (rinfo->ri_BatchSize > 0)
+			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
 }
 
@@ -3538,6 +3620,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int len,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3546,6 +3629,7 @@ create_foreign_modify(EState *estate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Oid			userid;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	AttrNumber	n_params;
 	Oid			typefnoid;
@@ -3572,7 +3656,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->len = len;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3624,6 +3711,44 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+	{
+		/* Check the foreign table option. */
+		foreach(lc, table->options)
+		{
+			DefElem    *def = (DefElem *) lfirst(lc);
+
+			if (strcmp(def->defname, "batch_size") == 0)
+			{
+				fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+				break;
+			}
+		}
+
+		/* Check the foreign server option if the table option is not set. */
+		if (fmstate->batch_size == 0)
+		{
+			server = GetForeignServer(table->serverid);
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "batch_size") == 0)
+				{
+					fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+					break;
+				}
+			}
+		}
+
+		/* If neither the table nor server option is set, set the default. */
+		if (fmstate->batch_size == 0)
+			fmstate->batch_size = 100;
+	}
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3634,26 +3759,47 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBatchInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Recreate INSERT command string with numSlots records in its
+		 * VALUES clause
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->len,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3666,7 +3812,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3676,14 +3822,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3704,9 +3850,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3716,10 +3863,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3779,19 +3928,23 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
@@ -3799,32 +3952,37 @@ convert_prep_stmt_params(PgFdwModifyState *fmstate,
 	}
 
 	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3878,23 +4036,7 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
@@ -3902,6 +4044,34 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 }
 
 /*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
+/*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
  *		UPDATE/DELETE .. RETURNING on a join directly
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410d..27e84ee 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c54..e4b2b59 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2697,3 +2697,95 @@ COMMIT;
 
 -- Clean up
 DROP PROCEDURE terminate_backend_and_wait(text);
+
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
+
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c92934..02a34b4 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBatchInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBatchInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetModifyBatchSize(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBatchInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBatchInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBatchInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBatchInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e6fd214..97eeb64 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>100</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 86594bd..257e725 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2193,3 +2193,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
 		}
 	}
 }
+
+/*
+ * ExecGetTouchedPartitions -- Get the partitions touched by
+ * this routing
+ */
+ResultRelInfo **
+ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count)
+{
+	*count = proute->num_partitions;
+	return proute->partitions;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 29e07b7..8df1ee9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBatchInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -442,6 +450,60 @@ ExecInsert(ModifyTableState *mtstate,
 									   CMD_INSERT);
 
 		/*
+		 * If the FDW supports batch insert, accumulate tuples and insert them
+		 * in bulk
+		 */
+		if (resultRelInfo->ri_FdwRoutine->GetModifyBatchSize &&
+			resultRelInfo->ri_BatchSize == 0)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
+		if (resultRelInfo->ri_BatchSize > 1 &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+			resultRelInfo->ri_projectReturning == NULL)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or	a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the batch insert
+			 */
+			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
+			{
+				ExecBatchInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+			return NULL;
+		}
+
+		/*
 		 * insert into foreign table: let the FDW do it
 		 */
 		slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
@@ -702,6 +764,73 @@ ExecInsert(ModifyTableState *mtstate,
 }
 
 /* ----------------------------------------------------------------
+ *		ExecBatchInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBatchInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+	{
+		estate->es_processed += numInserted;
+		setLastTid(&slot->tts_tid);
+	}
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
+/* ----------------------------------------------------------------
  *		ExecDelete
  *
  *		DELETE is like UPDATE, except that we delete the tuple and no
@@ -1940,6 +2069,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	ResultRelInfo **resultRelInfos;
+	int			num_partitions;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2156,6 +2288,27 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
+	 * Insert remaining tuples for batch insert.
+	 */
+	if (proute)
+		resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	else
+	{
+		resultRelInfos = &resultRelInfo;
+		num_partitions = 1;
+	}
+	for (int i = 0; i < num_partitions; i++)
+	{
+		resultRelInfo = resultRelInfos[i];
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBatchInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
+	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
 	fireASTriggers(node);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index efa4434..83944ff 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 473c4cd..a2b8181 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,6 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern ResultRelInfo **ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556df..6f45280 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBatchInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetModifyBatchSize_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBatchInsert_function ExecForeignBatchInsert;
+	GetModifyBatchSize_function GetModifyBatchSize;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d6..b8cd373 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,12 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* batch insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	int			ri_BatchSize;		/* max slots inserted in a single batch */
+	TupleTableSlot **ri_Slots;		/* input tuples for batch insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index ec23101..62a0e1f 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern List *lappend(List *list, void *datum);
 extern List *lappend_int(List *list, int datum);
-- 
2.10.1

#33tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Craig Ringer (#31)
RE: POC: postgres_fdw insert batching

From: Craig Ringer <craig.ringer@enterprisedb.com>

I suggest that when developing this, you keep in mind the ongoing work on the libpq pipelining/batching enhancements, and also the way many interfaces to foreign data sources support asynchronous, concurrent operations.

Yes, thank you, I bear it in mind. I understand it's a feature for batching multiple kinds of SQL statements like DBC's batch updates.

I'd argue it's pretty much vital for decent performance when talking to a cloud database from an on-prem server for example, or any other time that round-trip-time reduction is important.

Yeah, I'm thinking of the data migration and integration as the prominent use case.

Regards
Takayuki Tsunakawa

#34Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#33)
Re: POC: postgres_fdw insert batching

On 11/25/20 7:31 AM, tsunakawa.takay@fujitsu.com wrote:

From: Craig Ringer <craig.ringer@enterprisedb.com>

I suggest that when developing this, you keep in mind the ongoing
work on the libpq pipelining/batching enhancements, and also the
way many interfaces to foreign data sources support asynchronous,
concurrent operations.

Yes, thank you, I bear it in mind. I understand it's a feature for
batching multiple kinds of SQL statements like DBC's batch updates.

I haven't followed the libpq pipelining thread very closely. It does
seem related, but I'm not sure if it's a good match for this patch, or
how far is it from being committable ...

I'd argue it's pretty much vital for decent performance when
talking to a cloud database from an on-prem server for example, or
any other time that round-trip-time reduction is important.

Yeah, I'm thinking of the data migration and integration as the
prominent use case.

Well, good that we all agree this is a useful feature to have (in
general). The question is whether postgres_fdw should be doing batching
on it's onw (per this thread) or rely on some other feature (libpq
pipelining). I haven't followed the other thread, so I don't have an
opinion on that.

Note however we're doing two things here, actually - we're implementing
custom batching for postgres_fdw, but we're also extending the FDW API
to allow other implementations do the same thing. And most of them won't
be able to rely on the connection library providing that, I believe.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#35tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#34)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Well, good that we all agree this is a useful feature to have (in
general). The question is whether postgres_fdw should be doing batching
on it's onw (per this thread) or rely on some other feature (libpq
pipelining). I haven't followed the other thread, so I don't have an
opinion on that.

Well, as someone said in this thread, I think bulk insert is much more common than updates/deletes. Thus, major DBMSs have INSERT VALUES(record1), (record2)... and INSERT SELECT. Oracle has direct path INSERT in addition. As for the comparison of INSERT with multiple records and libpq batching (= multiple INSERTs), I think the former is more efficient because the amount of data transfer is less and the parsing-planning of INSERT for each record is eliminated.

I never deny the usefulness of libpq batch/pipelining, but I'm not sure if app developers would really use it. If they want to reduce the client-server round-trips, won't they use traditional stored procedures? Yes, the stored procedure language is very DBMS-specific. Then, I'd like to know what kind of well-known applications are using standard batching API like JDBC's batch updates. (Sorry, I think that should be discussed in libpq batch/pipelining thread and this thread should not be polluted.)

Note however we're doing two things here, actually - we're implementing
custom batching for postgres_fdw, but we're also extending the FDW API
to allow other implementations do the same thing. And most of them won't
be able to rely on the connection library providing that, I believe.

I'm afraid so, too. Then, postgres_fdw would be an example that other FDW developers would look at when they use INSERT with multiple records.

Regards
Takayuki Tsunakawa

#36Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#35)
Re: POC: postgres_fdw insert batching

On 11/26/20 2:48 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Well, good that we all agree this is a useful feature to have (in
general). The question is whether postgres_fdw should be doing
batching on it's onw (per this thread) or rely on some other
feature (libpq pipelining). I haven't followed the other thread,
so I don't have an opinion on that.

Well, as someone said in this thread, I think bulk insert is much
more common than updates/deletes. Thus, major DBMSs have INSERT
VALUES(record1), (record2)... and INSERT SELECT. Oracle has direct
path INSERT in addition. As for the comparison of INSERT with
multiple records and libpq batching (= multiple INSERTs), I think
the former is more efficient because the amount of data transfer is
less and the parsing-planning of INSERT for each record is
eliminated.

I never deny the usefulness of libpq batch/pipelining, but I'm not
sure if app developers would really use it. If they want to reduce
the client-server round-trips, won't they use traditional stored
procedures? Yes, the stored procedure language is very
DBMS-specific. Then, I'd like to know what kind of well-known
applications are using standard batching API like JDBC's batch
updates. (Sorry, I think that should be discussed in libpq
batch/pipelining thread and this thread should not be polluted.)

Not sure how is this related to app developers? I think the idea was
that the libpq features might be useful between the two PostgreSQL
instances. I.e. the postgres_fdw would use the libpq batching to send
chunks of data to the other side.

Note however we're doing two things here, actually - we're
implementing custom batching for postgres_fdw, but we're also
extending the FDW API to allow other implementations do the same
thing. And most of them won't be able to rely on the connection
library providing that, I believe.

I'm afraid so, too. Then, postgres_fdw would be an example that
other FDW developers would look at when they use INSERT with
multiple records.

Well, my point was that we could keep the API, but maybe it should be
implemented using the proposed libpq batching. They could still use the
postgres_fdw example how to use the API, but the internals would need to
be different, of course.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Tomas Vondra (#36)
Re: POC: postgres_fdw insert batching

On Fri, Nov 27, 2020 at 3:34 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Not sure how is this related to app developers? I think the idea was
that the libpq features might be useful between the two PostgreSQL
instances. I.e. the postgres_fdw would use the libpq batching to send
chunks of data to the other side.

Right. Or at least, when designing the FDW API, do so in a way that doesn't
strictly enforce Request/Response alternation without interleaving, so you
can benefit from it in the future.

It's hardly just libpq after all. A *lot* of client libraries and drivers
will be capable of non-blocking reads or writes with multiple ones in
flight at once. Any REST-like API generally can, for example. So for
performance reasons we should if possible avoid baking the assumption that
a request cannot be made until the response from the previous request is
received, and instead have a wait interface to use for when a new request
requires the prior response's result before it can proceed.

Well, my point was that we could keep the API, but maybe it should be

implemented using the proposed libpq batching. They could still use the
postgres_fdw example how to use the API, but the internals would need to
be different, of course.

Sure. Or just allow room for it in the FDW API, though using the pipelining
support natively would be nice.

If the FDW interface allows Pg to

Insert A
Insert B
Insert C
Wait for outcome of insert A
...

then that'll be useful for using libpq pipelining, but also FDWs for all
sorts of other DBs, especially cloud-y ones where latency is a big concern.

#38tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#36)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Not sure how is this related to app developers? I think the idea was
that the libpq features might be useful between the two PostgreSQL
instances. I.e. the postgres_fdw would use the libpq batching to send
chunks of data to the other side.

Well, my point was that we could keep the API, but maybe it should be
implemented using the proposed libpq batching. They could still use the
postgres_fdw example how to use the API, but the internals would need to
be different, of course.

Yes, I understand them. I just wondered if app developers use the statement batching API for libpq or JDBC in what kind of apps. That is, I talked about the batching API itself, not related to FDW. (So, I mentioned I think I should ask such a question in the libpq batching thread.)

I expect postgresExecForeignBatchInsert() would be able to use the libpq batching API, because it receives an array of tuples and can generate and issue INSERT statement for each tuple. But I'm not sure either if the libpq batching is likely to be committed in the near future. (The thread looks too long...) Anyway, this thread's batch insert can be progressed (and hopefully committed), and once the libpq batching has been committed, we can give it a try to use it and modify postgres_fdw to see if we can get further performance boost.

Regards
Takayuki Tsunakawa

#39Craig Ringer
craig.ringer@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#38)
Re: POC: postgres_fdw insert batching

On Fri, Nov 27, 2020 at 10:47 AM tsunakawa.takay@fujitsu.com <
tsunakawa.takay@fujitsu.com> wrote:

Covering this one first:

I expect postgresExecForeignBatchInsert() would be able to use the libpq

batching API, because it receives an array of tuples and can generate and
issue INSERT statement for each tuple.

Sure, you can generate big multi-inserts. Or even do a COPY. But you still
have to block for a full round-trip until the foreign server replies. So if
you have 6000 calls to postgresExecForeignBatchInsert() during a single
query, and a 100ms round trip time to the foreign server, you're going to
waste 6000*0.1 = 600s = 10 min blocked in postgresExecForeignBatchInsert()
waiting for results from the foreign server.

Such batches have some major downsides:

* The foreign server cannot start executing the first query in the batch
until the last query in the batch has been accumulated and the whole batch
has been sent to the foreign server;
* The FDW has to block waiting for the batch to execute on the foreign
server and for a full network round-trip before it can start another batch
or let the backend do other work
This means RTTs get multiplied by batch counts. Still a lot better than
individual statements, but plenty slow for high latency connections.

* Prepare 1000 rows to insert [10ms]
* INSERT 1000 values [100ms RTT + 50ms foreign server execution time]
* Prepare 1000 rows to insert [10ms]
* INSERT 1000 values [100ms RTT + 50ms foreign server execution time]
* ...

If you can instead send new inserts (or sets of inserts) to the foreign
server without having to wait for the result of the previous batch to
arrive, you can spend 100ms total waiting for results instead of 10 mins.
You can start the execution of the first query earlier, spend less time
blocked waiting on network, and let the local backend continue doing other
work while the foreign server is busy executing the statements.

The time spent preparing local rows to insert now overlaps with the RTT and
remote execution time, instead of happening serially. And there only has to
be one RTT wait, assuming the foreign server and network can keep up with
the rate we are generating requests at.

I can throw together some diagrams if it'll help. But in the libpq
pipelining patch I demonstrated a 300 times (3000%) performance improvement
on a test workload...

Anyway, this thread's batch insert can be progressed (and hopefully

committed), and once the libpq batching has been committed, we can give it
a try to use it and modify postgres_fdw to see if we can get further
performance boost.

My point is that you should seriously consider whether batching is the
appropriate interface here, or whether the FDW can expose a pipeline-like
"queue work" then "wait for results" interface. That can be used to
implement batching exactly as currently proposed, it does not have to wait
for any libpq pipelining features. But it can *also* be used to implement
concurrent async requests in other FDWs, and to implement pipelining in
postgres_fdw once the needed libpq support is available.

I don't know the FDW to postgres API well enough, and it's possible I'm
talking entirely out of my hat here.

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Not sure how is this related to app developers? I think the idea was
that the libpq features might be useful between the two PostgreSQL
instances. I.e. the postgres_fdw would use the libpq batching to send
chunks of data to the other side.

Well, my point was that we could keep the API, but maybe it should be
implemented using the proposed libpq batching. They could still use the
postgres_fdw example how to use the API, but the internals would need to
be different, of course.

Yes, I understand them. I just wondered if app developers use the
statement batching API for libpq or JDBC in what kind of apps.

For JDBC, yes, it's used very heavily and has been for a long time, because
PgJDBC doesn't rely on libpq - it implements the protocol directly and
isn't bound by libpq's limitations. The application interface for it in
JDBC is a batch interface [1]https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch()[2]https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#addBatch(), not a pipelined interface, so that's what
PgJDBC users interact with [3]https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/test/jdbc2/BatchExecuteTest.java but batch execution is implemented using
protocol pipelining support inside PgJDBC [4]https://github.com/pgjdbc/pgjdbc/blob/ff22a3c31bb423b08637c237cb2e5bc288008e18/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L492. A while ago I did some work
on deadlock prevention to work around issues with PgJDBC's implementation
[5]: https://github.com/pgjdbc/pgjdbc/issues/194
address customer needs in real world applications. The latter increased
application performance over 50x through round-trip elimination.

For libpq, no, batching and pipelining are not yet used by anybody because
application authors have to write to the libpq API and there hasn't been
any in-core support for batching. We've had async / non-blocking support
for a while, but it still enforces strict request/response ordering without
interleaving, so application authors cannot make use of the same postgres
server and protocol capabilities as PgJDBC. Most other drivers (like
psqlODBC and psycopg2) are implemented on top of libpq, so they inherit the
same limitations.

I don't expect most application authors to adopt pipelining directly,
mainly because hardly anyone writes application code against libpq anyway.
But drivers written on top of libpq will be able to adopt it to expose the
batching, pipeline, or async/callback/event driven interfaces supported by
their client database language interface specifications, or expose their
own extension interfaces to give users callback-driven or batched query
capabilities. In particular, psqlODBC will be able to implement ODBC batch
query [6]https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/executing-batches?view=sql-server-ver15 efficiently. Right now psqlODBC can't execute batches efficiently
via libpq, since it must perform one round-trip per query. It will be able
to use the libpq pipelining API to greatly reduce round trips.

But I'm not sure either if the libpq batching is likely to be committed
in the near future. (The thread looks too long...)

I think it's getting there tbh.

Regards
Takayuki Tsunakawa

[1]: https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch()
https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch()
[2]: https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#addBatch()
https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#addBatch()
[3]: https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/test/jdbc2/BatchExecuteTest.java
https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/test/jdbc2/BatchExecuteTest.java
[4]: https://github.com/pgjdbc/pgjdbc/blob/ff22a3c31bb423b08637c237cb2e5bc288008e18/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L492
https://github.com/pgjdbc/pgjdbc/blob/ff22a3c31bb423b08637c237cb2e5bc288008e18/pgjdbc/src/main/java/org/postgresql/core/v3/QueryExecutorImpl.java#L492
[5]: https://github.com/pgjdbc/pgjdbc/issues/194
[6]: https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/executing-batches?view=sql-server-ver15
https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/executing-batches?view=sql-server-ver15

#40tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Craig Ringer (#39)
RE: POC: postgres_fdw insert batching

From: Craig Ringer <craig.ringer@enterprisedb.com>

But in the libpq pipelining patch I demonstrated a 300 times (3000%) performance improvement on a test workload...

Wow, impressive number. I've just seen it in the beginning of the libpq pipelining thread (oh, already four years ago..!) Could you share the workload and the network latency (ping time)? I'm sorry I'm just overlooking it.

Thank you for your (always) concise explanation. I'd like to check other DBMSs and your rich references for the FDW interface. (My first intuition is that many major DBMSs might not have client C APIs that can be used to implement an async pipelining FDW interface. Also, I'm afraid it requires major surgery or reform of executor. I don't want it to delay the release of reasonably good (10x) improvement with the synchronous interface.)

(It'd be kind of you to send emails in text format. I've changed the format of this reply from HTML to text.)

Regards
Takayuki Tsunakawa

#41Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#40)
Re: POC: postgres_fdw insert batching

On 11/27/20 7:05 AM, tsunakawa.takay@fujitsu.com wrote:

From: Craig Ringer <craig.ringer@enterprisedb.com>

But in the libpq pipelining patch I demonstrated a 300 times
(3000%) performance improvement on a test workload...

Wow, impressive number. I've just seen it in the beginning of the
libpq pipelining thread (oh, already four years ago..!) Could you
share the workload and the network latency (ping time)? I'm sorry
I'm just overlooking it.

Thank you for your (always) concise explanation. I'd like to check
other DBMSs and your rich references for the FDW interface. (My
first intuition is that many major DBMSs might not have client C APIs
that can be used to implement an async pipelining FDW interface.
Also, I'm afraid it requires major surgery or reform of executor. I
don't want it to delay the release of reasonably good (10x)
improvement with the synchronous interface.)

I do agree that pipelining is nice, and can bring huge improvements.

However, the FDW interface as it's implemented today is not designed to
allow that, I believe (we pretty much just invoke the FWD callbacks as
if it was a local AM). It assumes the calls are synchronous, and
redesigning it to work in async way is a much larger/complex patch than
what's being discussed here.

I do think the FDW extension proposed here (adding the bulk-insert
callback) is useful in general, for two reasons: (a) even if most client
libraries support some sort of pipelining, some don't, and (b) I'd bet
it's still more efficient to send one large insert than pipelining many
individual inserts.

That being said, I'm against expanding the scope of this patch to also
require redesign of the whole FDW infrastructure - that would likely
mean no such improvement landing in PG14. If the libpq pipelining patch
seems likely to get committed, we can try using it for the bulk insert
callback (instead of the current multi-value stuff).

(It'd be kind of you to send emails in text format. I've changed the
format of this reply from HTML to text.)

Craig's client is sending messages in both text/plain and text/html. You
probably need to tell your client to prefer that over html, somehow.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#42Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Tomas Vondra (#41)
Re: POC: postgres_fdw insert batching

On Sat, 28 Nov 2020, 10:10 Tomas Vondra, <tomas.vondra@enterprisedb.com>
wrote:

On 11/27/20 7:05 AM, tsunakawa.takay@fujitsu.com wrote:

However, the FDW interface as it's implemented today is not designed to
allow that, I believe (we pretty much just invoke the FWD callbacks as
if it was a local AM). It assumes the calls are synchronous, and
redesigning it to work in async way is a much larger/complex patch than
what's being discussed here.

I do think the FDW extension proposed here (adding the bulk-insert
callback) is useful in general, for two reasons: (a) even if most client
libraries support some sort of pipelining, some don't, and (b) I'd bet
it's still more efficient to send one large insert than pipelining many
individual inserts.

That being said, I'm against expanding the scope of this patch to also
require redesign of the whole FDW infrastructure - that would likely
mean no such improvement landing in PG14. If the libpq pipelining patch
seems likely to get committed, we can try using it for the bulk insert
callback (instead of the current multi-value stuff).

I totally agree on all points. It was not my intent to expand the scope of
this significantly and I really don't want to hold it back.

I raised the interface consideration in case it was something easy to
accommodate. It's not, so that's done, topic over.

#43Craig Ringer
craig.ringer@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#40)
Re: POC: postgres_fdw insert batching

On Fri, 27 Nov 2020, 14:06 tsunakawa.takay@fujitsu.com,
<tsunakawa.takay@fujitsu.com> wrote:

Also, I'm afraid it requires major surgery or reform of executor. I
don't want it to delay the release of reasonably good (10x)
improvement with the synchronous interface.)

Totally sensible. If it isn't feasible without significant executor
change that's all that needs to be said.

I was afraid that'd be the case given the executor's pull flow but
just didn't know enough.

It was not my intention to hold this patch up or greatly expand its
scope. I'll spend some time testing it out and have a closer read soon
to see if I can help progress it.

I know Andres did some initial work on executor parallelism and
pipelining. I should take a look.

But in the libpq pipelining patch I demonstrated a 300 times (3000%) performance improvement on a test workload...

Wow, impressive number. I've just seen it in the beginning of the libpq pipelining thread (oh, already four years ago..!)

Yikes.

Could you share the workload and the network latency (ping time)? I'm sorry I'm just overlooking it.

I thought I gave it at the time, and a demo program. IIRC it was just
doing small multi row inserts or single row inserts. Latency would've
been a couple of hundred ms probably, I think I did something like
running on my laptop (Australia, ADSL) to a server on AWS US or EU.

Thank you for your (always) concise explanation.

You joke! I am many things but despite my best efforts concise is
rarely one of them.

I'd like to check other DBMSs and your rich references for the FDW interface. (My first intuition is that many major DBMSs might not have client C APIs that can be used to implement an async pipelining FDW interface.

Likely correct for C APIs of other traditional DBMSes. I'd be less
sure about newer non SQL ones, especially cloud oriented. For example
DynamoDB supports at least async requests in the Java client [3]https://aws.amazon.com/blogs/developer/asynchronous-requests-with-the-aws-sdk-for-java/ and
C++ client [4]https://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_dynamo_d_b_1_1_dynamo_d_b_client.html#ab631edaccca5f3f8988af15e7e9aa4f0; it's not immediately clear if requests can be
pipelined, but the API suggests they can.

Most things with a REST-like API can do a fair bit of concurrency
though. Multiple async nonblocking HTTP connections can be serviced at
once. Or HTTP/1.1 pipelining can be used [1]https://en.wikipedia.org/wiki/HTTP_pipelining, or even better HTTP/2.0
streams [2]https://blog.restcase.com/http2-benefits-for-rest-apis/. This is relevant for any REST-like API.

(It'd be kind of you to send emails in text format. I've changed the format of this reply from HTML to text.)

I try to remember. Stupid Gmail. Sorry. On mobile it offers very
little control over format but I'll do my best when I can.

[1]: https://en.wikipedia.org/wiki/HTTP_pipelining
[2]: https://blog.restcase.com/http2-benefits-for-rest-apis/
[3]: https://aws.amazon.com/blogs/developer/asynchronous-requests-with-the-aws-sdk-for-java/
[4]: https://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_dynamo_d_b_1_1_dynamo_d_b_client.html#ab631edaccca5f3f8988af15e7e9aa4f0

#44David Fetter
david@fetter.org
In reply to: tsunakawa.takay@fujitsu.com (#32)
Re: POC: postgres_fdw insert batching

On Wed, Nov 25, 2020 at 05:04:36AM +0000, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

On 11/24/20 9:45 AM, tsunakawa.takay@fujitsu.com wrote:

OTOH, as for the name GetModifyBatchSize() you suggest, I think

GetInsertBatchSize may be better. That is, this API deals with multiple
records in a single INSERT statement. Your GetModifyBatchSize will be
reserved for statement batching when libpq has supported batch/pipelining to
execute multiple INSERT/UPDATE/DELETE statements, as in the following
JDBC batch updates. What do you think?

I don't know. I was really only thinking about batching in the context
of a single DML command, not about batching of multiple commands at the
protocol level. IMHO it's far more likely we'll add support for batching
for DELETE/UPDATE than libpq pipelining, which seems rather different
from how the FDW API works. Which is why I was suggesting to use a name
that would work for all DML commands, not just for inserts.

Right, I can't imagine now how the interaction among the client, server core and FDWs would be regarding the statement batching. So I'll take your suggested name.

Not sure, but I'd guess knowing whether batching is used would be
useful. We only print the single-row SQL query, which kinda gives the
impression that there's no batching.

Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.

Don't worry about this, too. GetMaxBulkInsertTuples() just returns a value

that was already saved in a struct in create_foreign_modify().

Well, I do worry for two reasons.

Firstly, the fact that in postgres_fdw the call is cheap does not mean
it'll be like that in every other FDW. Presumably, the other FDWs might
cache it in the struct and do the same thing, of course.

But the fact that we're calling it over and over for each row kinda
seems like we allow the value to change during execution, but I very
much doubt the code is expecting that. I haven't tried, but assume the
function first returns 10 and then 100. ISTM the code will allocate
ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
That can't end well. Sure, we can claim it's a bug in the FDW extension,
but it's also due to the API design.

You worried about other FDWs than postgres_fdw. That's reasonable. I insisted in other threads that PG developers care only about postgres_fdw, not other FDWs, when designing the FDW interface, but I myself made the same mistake. I made changes so that the executor calls GetModifyBatchSize() once per relation per statement.

Please pardon me for barging in late in this discussion, but if we're
going to be using a bulk API here, wouldn't it make more sense to use
COPY, except where RETURNING is specified, in place of INSERT?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#45tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Craig Ringer (#43)
RE: POC: postgres_fdw insert batching

From: Craig Ringer <craig.ringer@enterprisedb.com>

It was not my intention to hold this patch up or greatly expand its
scope. I'll spend some time testing it out and have a closer read soon
to see if I can help progress it.

Thank you, I'm relieved to hear that. Last weekend, I was scared of a possible mood that's something like "We won't accept the insert speedup patch for foreign tables unless you take full advantage of pipelining and achieve maximum conceivable speed!"

I thought I gave it at the time, and a demo program. IIRC it was just
doing small multi row inserts or single row inserts. Latency would've
been a couple of hundred ms probably, I think I did something like
running on my laptop (Australia, ADSL) to a server on AWS US or EU.

a couple of hundred ms, so that would be dominant in each prepare-send-execute-receive, possibly even for batch insert with hundreds of rows in each batch. Then, the synchronous batch insert of the current patch may achieve a few hundreds times speedup compared to a single row inserts when the batch size is hundreds or more.

I'd like to check other DBMSs and your rich references for the FDW interface.

(My first intuition is that many major DBMSs might not have client C APIs that
can be used to implement an async pipelining FDW interface.

Likely correct for C APIs of other traditional DBMSes. I'd be less
sure about newer non SQL ones, especially cloud oriented. For example
DynamoDB supports at least async requests in the Java client [3] and
C++ client [4]; it's not immediately clear if requests can be
pipelined, but the API suggests they can.

I've checked ODBC, MySQL, Microsoft Synapse Analytics, Redshift, and BigQuery, guessing that the data warehouse may have asynchronous/pipelining API that enables efficient data integration/migration. But none of them had one. (I seem to have spent too long and am a bit tired... but it was a bit fun as well.) They all support INSERT with multiple records in its VALUES clause. So, it will be useful to provide a synchronous batch insert FDW API. I guess Oracle's OCI has an asynchronous API, but I didn't check it.

As an aside, MySQL 8.0.16 added support for asynchronous execution in its C API, but it allows only one active SQL statement in each connection. Likewise, although the ODBC standard defines asynchronous execution (SQLSetStmtAttr(SQL_ASYNC_ENABLE) and SQLCompleteAsync), SQL Server and Synapse Analytics only allows only one active statement per connection. psqlODBC doesn't support asynchronous execution.

Most things with a REST-like API can do a fair bit of concurrency
though. Multiple async nonblocking HTTP connections can be serviced at
once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0
streams [2]. This is relevant for any REST-like API.

I'm not sure if this is related, Google deprecated Batch HTTP API [1]https://cloud.google.com/bigquery/batch.

[1]: https://cloud.google.com/bigquery/batch
https://cloud.google.com/bigquery/batch

Regards
Takayuki Tsunakawa

#46tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: David Fetter (#44)
RE: POC: postgres_fdw insert batching

From: David Fetter <david@fetter.org>

Please pardon me for barging in late in this discussion, but if we're
going to be using a bulk API here, wouldn't it make more sense to use
COPY, except where RETURNING is specified, in place of INSERT?

Please do not hesitate. I mentioned earlier in this thread that I think INSERT is better because:

--------------------------------------------------
* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY.

* COPY doesn't invoke rules unlike INSERT. (I don't think the rule is a feature what users care about, though.) Also, I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY.
--------------------------------------------------

Also, COPY to foreign tables currently uses INSERTs, the improvement of using COPY instead of INSERT is in progress [1]Fast COPY FROM command for the table with foreign partitions /messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru. Keeping "COPY uses COPY, INSERT uses INSERT" correspondence seems natural, and it makes COPY's high-speed advantage stand out.

[1]: Fast COPY FROM command for the table with foreign partitions /messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru
Fast COPY FROM command for the table with foreign partitions
/messages/by-id/3d0909dc-3691-a576-208a-90986e55489f@postgrespro.ru

Regards
Takayuki Tsunakawa

#47Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#32)
1 attachment(s)
Re: POC: postgres_fdw insert batching

Hi,

Attached is a v6 of this patch, rebased to current master and with some
minor improvements (mostly comments and renaming the "end" struct field
to "values_end" which I think is more descriptive).

The one thing that keeps bugging me is convert_prep_stmt_params - it
dies the right thing, but the code is somewhat confusing.

AFAICS the discussions about making this use COPY and/or libpq
pipelining (neither of which is committed yet) ended with the conclusion
that those changes are somewhat independent, and that it's worth getting
this committed in the current form. Barring objections, I'll push this
within the next couple days.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-bulk-insert-for-foreign-tables-v6.patchtext/x-patch; charset=UTF-8; name=0001-Add-bulk-insert-for-foreign-tables-v6.patchDownload
From 1e4a99c6d4a5221dadc9e7a9922bdd9e3ebe1310 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue, 12 Jan 2021 01:36:01 +0100
Subject: [PATCH] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c                |  43 ++-
 .../postgres_fdw/expected/postgres_fdw.out    | 116 ++++++-
 contrib/postgres_fdw/option.c                 |  14 +
 contrib/postgres_fdw/postgres_fdw.c           | 291 ++++++++++++++----
 contrib/postgres_fdw/postgres_fdw.h           |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  91 ++++++
 doc/src/sgml/fdwhandler.sgml                  |  89 +++++-
 doc/src/sgml/postgres-fdw.sgml                |  13 +
 src/backend/executor/execPartition.c          |  11 +
 src/backend/executor/nodeModifyTable.c        | 161 ++++++++++
 src/backend/nodes/list.c                      |  15 +
 src/include/executor/execPartition.h          |   1 +
 src/include/foreign/fdwapi.h                  |  10 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pg_list.h                   |  15 +
 15 files changed, 815 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..96bad17ded 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9053,3 +9053,117 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1fec3c3eea..64698c4da3 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..e6b1403ff1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -87,8 +87,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -96,6 +98,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -176,7 +180,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			values_end;		/* length up to the end of VALUES */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -185,6 +192,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* batch operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -343,6 +353,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBatchInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetModifyBatchSize(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -429,20 +445,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -530,6 +550,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBatchInsert = postgresExecForeignBatchInsert;
+	routine->GetModifyBatchSize = postgresGetModifyBatchSize;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1665,6 +1687,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1752,7 +1775,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1776,8 +1799,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1797,6 +1821,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1812,6 +1837,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1829,6 +1856,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1846,7 +1874,8 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1855,7 +1884,36 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBatchInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBatchInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1863,6 +1921,16 @@ postgresExecForeignInsert(EState *estate,
 	return rslot;
 }
 
+/*
+ * postgresGetModifyBatchSize
+ *		Report the maximum number of tuples that can be inserted in bulk
+ */
+static int
+postgresGetModifyBatchSize(ResultRelInfo *resultRelInfo)
+{
+	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+}
+
 /*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
@@ -1873,8 +1941,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1887,8 +1960,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1925,6 +2003,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2001,7 +2080,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2011,6 +2090,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2636,6 +2716,9 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 										  FdwModifyPrivateUpdateSql));
 
 		ExplainPropertyText("Remote SQL", sql, es);
+
+		if (rinfo->ri_BatchSize > 0)
+			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
 }
 
@@ -3530,6 +3613,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int values_end,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3538,6 +3622,7 @@ create_foreign_modify(EState *estate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Oid			userid;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	AttrNumber	n_params;
 	Oid			typefnoid;
@@ -3564,7 +3649,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->values_end = values_end;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3616,6 +3704,44 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+	{
+		/* Check the foreign table option. */
+		foreach(lc, table->options)
+		{
+			DefElem    *def = (DefElem *) lfirst(lc);
+
+			if (strcmp(def->defname, "batch_size") == 0)
+			{
+				fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+				break;
+			}
+		}
+
+		/* Check the foreign server option if the table option is not set. */
+		if (fmstate->batch_size == 0)
+		{
+			server = GetForeignServer(table->serverid);
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "batch_size") == 0)
+				{
+					fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+					break;
+				}
+			}
+		}
+
+		/* If neither the table nor server option is set, set the default. */
+		if (fmstate->batch_size == 0)
+			fmstate->batch_size = 100;
+	}
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3626,26 +3752,50 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBatchInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	/*
+	 * If the existing query was deparsed and prepared for a different number
+	 * of rows, rebuild it for the proper number.
+	 */
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Build INSERT string with numSlots records in its VALUES clause.
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3658,7 +3808,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3668,14 +3818,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3696,9 +3846,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3708,10 +3859,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3771,52 +3924,64 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
+
+	/* ctid is provided only for UPDATE/DELETE, which don't allow batching */
+	Assert(!(tupleid != NULL && numSlots > 1));
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
 		pindex++;
 	}
 
-	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	/* get following parameters from slots */
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3870,29 +4035,41 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
 	fmstate->conn = NULL;
 }
 
+/*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
 /*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a1bc..1f67b4d9fd 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08b98..fd5abf2471 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2711,3 +2711,94 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..02a34b40b3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBatchInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBatchInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetModifyBatchSize(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBatchInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBatchInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBatchInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBatchInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e6fd2143c1..97eeb64a02 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>100</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a0a9..b0a354ad6f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2192,3 +2192,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
 		}
 	}
 }
+
+/*
+ * ExecGetTouchedPartitions -- Get the partitions touched by
+ * this routing
+ */
+ResultRelInfo **
+ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count)
+{
+	*count = proute->num_partitions;
+	return proute->partitions;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d7b8f65591..bf77d2491c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBatchInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -441,6 +449,70 @@ ExecInsert(ModifyTableState *mtstate,
 			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
 									   CMD_INSERT);
 
+		/*
+		 * Determine if the FDW supports batch insert and determine the batch
+		 * size (a FDW may support batching, but it mayb e disabled for the
+		 * server/table). Do this only once, at the beginning - we don't want
+		 * the batch size to change during execution.
+		 */
+		if (resultRelInfo->ri_FdwRoutine->GetModifyBatchSize &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+			resultRelInfo->ri_projectReturning == NULL &&
+			resultRelInfo->ri_BatchSize == 0)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
+
+		Assert(resultRelInfo->ri_BatchSize >= 0);
+
+		/*
+		 * If the FDW supports batching, and batching is requested, accumulate
+		 * rows and insert them in batches. Otherwise use the per-row inserts.
+		 */
+		if (resultRelInfo->ri_BatchSize > 1)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the batch insert
+			 */
+			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
+			{
+				ExecBatchInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+
+			return NULL;
+		}
+
 		/*
 		 * insert into foreign table: let the FDW do it
 		 */
@@ -698,6 +770,70 @@ ExecInsert(ModifyTableState *mtstate,
 	return result;
 }
 
+/* ----------------------------------------------------------------
+ *		ExecBatchInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBatchInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+		estate->es_processed += numInserted;
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecDelete
  *
@@ -1937,6 +2073,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	ResultRelInfo **resultRelInfos;
+	int			num_partitions;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2152,6 +2291,28 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	/*
+	 * Insert remaining tuples for batch insert.
+	 */
+	if (proute)
+		resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	else
+	{
+		resultRelInfos = &resultRelInfo;
+		num_partitions = 1;
+	}
+
+	for (int i = 0; i < num_partitions; i++)
+	{
+		resultRelInfo = resultRelInfos[i];
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBatchInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index c4eba6b053..dbf6b30233 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..2bb5a85fb1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,6 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern ResultRelInfo **ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499fb1..7946ca82f6 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBatchInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetModifyBatchSize_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBatchInsert_function ExecForeignBatchInsert;
+	GetModifyBatchSize_function GetModifyBatchSize;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..d65099c94a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,12 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* batch insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	int			ri_BatchSize;		/* max slots inserted in a single batch */
+	TupleTableSlot **ri_Slots;		/* input tuples for batch insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 710dcd37ef..404e03f132 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern pg_nodiscard List *lappend(List *list, void *datum);
 extern pg_nodiscard List *lappend_int(List *list, int datum);
-- 
2.26.2

#48tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#47)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Attached is a v6 of this patch, rebased to current master and with some minor
improvements (mostly comments and renaming the "end" struct field to
"values_end" which I think is more descriptive).

Thank you very much. In fact, my initial patches used values_end, and I changed it to len considering that it may be used for bulk UPDATEand DELETE in the future. But I think values_end is easier to understand its role, too.

Regards
Takayuki Tsunakawa

#49Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#47)
Re: POC: postgres_fdw insert batching

Hi Tomas, Tsunakawa-san,

Thanks for your work on this.

On Tue, Jan 12, 2021 at 11:06 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

AFAICS the discussions about making this use COPY and/or libpq
pipelining (neither of which is committed yet) ended with the conclusion
that those changes are somewhat independent, and that it's worth getting
this committed in the current form. Barring objections, I'll push this
within the next couple days.

I was trying this out today (been meaning to do so for a while) and
noticed that this fails when there are AFTER ROW triggers on the
foreign table. Here's an example:

create extension postgres_fdw ;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table p (a numeric primary key);
create foreign table fp (a int) server lb options (table_name 'p');
create function print_row () returns trigger as $$ begin raise notice
'%', new; return null; end; $$ language plpgsql;
create trigger after_insert_trig after insert on fp for each row
execute function print_row();
insert into fp select generate_series (1, 10);
<crashes>

Apparently, the new code seems to assume that batching wouldn't be
active when the original query contains RETURNING clause but some
parts fail to account for the case where RETURNING is added to the
query to retrieve the tuple to pass to the AFTER TRIGGER.
Specifically, the Assert in the following block in
execute_foreign_modify() is problematic:

/* Check number of rows affected, and fetch RETURNING tuple if any */
if (fmstate->has_returning)
{
Assert(*numSlots == 1);
n_rows = PQntuples(res);
if (n_rows > 0)
store_returning_result(fmstate, slots[0], res);
}

--
Amit Langote
EDB: http://www.enterprisedb.com

#50Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#49)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On 1/13/21 10:15 AM, Amit Langote wrote:

Hi Tomas, Tsunakawa-san,

Thanks for your work on this.

On Tue, Jan 12, 2021 at 11:06 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

AFAICS the discussions about making this use COPY and/or libpq
pipelining (neither of which is committed yet) ended with the conclusion
that those changes are somewhat independent, and that it's worth getting
this committed in the current form. Barring objections, I'll push this
within the next couple days.

I was trying this out today (been meaning to do so for a while) and
noticed that this fails when there are AFTER ROW triggers on the
foreign table. Here's an example:

create extension postgres_fdw ;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table p (a numeric primary key);
create foreign table fp (a int) server lb options (table_name 'p');
create function print_row () returns trigger as $$ begin raise notice
'%', new; return null; end; $$ language plpgsql;
create trigger after_insert_trig after insert on fp for each row
execute function print_row();
insert into fp select generate_series (1, 10);
<crashes>

Apparently, the new code seems to assume that batching wouldn't be
active when the original query contains RETURNING clause but some
parts fail to account for the case where RETURNING is added to the
query to retrieve the tuple to pass to the AFTER TRIGGER.
Specifically, the Assert in the following block in
execute_foreign_modify() is problematic:

/* Check number of rows affected, and fetch RETURNING tuple if any */
if (fmstate->has_returning)
{
Assert(*numSlots == 1);
n_rows = PQntuples(res);
if (n_rows > 0)
store_returning_result(fmstate, slots[0], res);
}

Thanks for the report. Yeah, I think there's a missing check in
ExecInsert. Adding

(!resultRelInfo->ri_TrigDesc->trig_insert_after_row)

solves this. But now I'm wondering if this is the wrong place to make
this decision. I mean, why should we make the decision here, when the
decision whether to have a RETURNING clause is made in postgres_fdw in
deparseReturningList? We don't really know what the other FDWs will do,
for example.

So I think we should just move all of this into GetModifyBatchSize. We
can start with ri_BatchSize = 0. And then do

if (resultRelInfo->ri_BatchSize == 0)
resultRelInfo->ri_BatchSize =
resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);

if (resultRelInfo->ri_BatchSize > 1)
{
... do batching ...
}

The GetModifyBatchSize would always return value > 0, so either 1 (no
batching) or >1 (batching).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-bulk-insert-for-foreign-tables-v7.patchtext/x-patch; charset=UTF-8; name=0001-Add-bulk-insert-for-foreign-tables-v7.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..96bad17ded 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9053,3 +9053,117 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1fec3c3eea..64698c4da3 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..e6b1403ff1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -87,8 +87,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -96,6 +98,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -176,7 +180,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			values_end;		/* length up to the end of VALUES */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -185,6 +192,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* batch operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -343,6 +353,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBatchInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetModifyBatchSize(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -429,20 +445,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -530,6 +550,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBatchInsert = postgresExecForeignBatchInsert;
+	routine->GetModifyBatchSize = postgresGetModifyBatchSize;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1665,6 +1687,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1752,7 +1775,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1776,8 +1799,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1797,6 +1821,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1812,6 +1837,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1829,6 +1856,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1846,7 +1874,8 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1855,7 +1884,36 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBatchInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBatchInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1863,6 +1921,16 @@ postgresExecForeignInsert(EState *estate,
 	return rslot;
 }
 
+/*
+ * postgresGetModifyBatchSize
+ *		Report the maximum number of tuples that can be inserted in bulk
+ */
+static int
+postgresGetModifyBatchSize(ResultRelInfo *resultRelInfo)
+{
+	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+}
+
 /*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
@@ -1873,8 +1941,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1887,8 +1960,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1925,6 +2003,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2001,7 +2080,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2011,6 +2090,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2636,6 +2716,9 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 										  FdwModifyPrivateUpdateSql));
 
 		ExplainPropertyText("Remote SQL", sql, es);
+
+		if (rinfo->ri_BatchSize > 0)
+			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
 }
 
@@ -3530,6 +3613,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int values_end,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3538,6 +3622,7 @@ create_foreign_modify(EState *estate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Oid			userid;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	AttrNumber	n_params;
 	Oid			typefnoid;
@@ -3564,7 +3649,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->values_end = values_end;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3616,6 +3704,44 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+	{
+		/* Check the foreign table option. */
+		foreach(lc, table->options)
+		{
+			DefElem    *def = (DefElem *) lfirst(lc);
+
+			if (strcmp(def->defname, "batch_size") == 0)
+			{
+				fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+				break;
+			}
+		}
+
+		/* Check the foreign server option if the table option is not set. */
+		if (fmstate->batch_size == 0)
+		{
+			server = GetForeignServer(table->serverid);
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "batch_size") == 0)
+				{
+					fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+					break;
+				}
+			}
+		}
+
+		/* If neither the table nor server option is set, set the default. */
+		if (fmstate->batch_size == 0)
+			fmstate->batch_size = 100;
+	}
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3626,26 +3752,50 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBatchInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	/*
+	 * If the existing query was deparsed and prepared for a different number
+	 * of rows, rebuild it for the proper number.
+	 */
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Build INSERT string with numSlots records in its VALUES clause.
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3658,7 +3808,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3668,14 +3818,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3696,9 +3846,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3708,10 +3859,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3771,52 +3924,64 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
+
+	/* ctid is provided only for UPDATE/DELETE, which don't allow batching */
+	Assert(!(tupleid != NULL && numSlots > 1));
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
 		pindex++;
 	}
 
-	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	/* get following parameters from slots */
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3870,29 +4035,41 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
 	fmstate->conn = NULL;
 }
 
+/*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
 /*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a1bc..1f67b4d9fd 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08b98..fd5abf2471 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2711,3 +2711,94 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..02a34b40b3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBatchInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBatchInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetModifyBatchSize(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBatchInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBatchInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBatchInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBatchInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e6fd2143c1..97eeb64a02 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>100</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a0a9..b0a354ad6f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2192,3 +2192,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
 		}
 	}
 }
+
+/*
+ * ExecGetTouchedPartitions -- Get the partitions touched by
+ * this routing
+ */
+ResultRelInfo **
+ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count)
+{
+	*count = proute->num_partitions;
+	return proute->partitions;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d7b8f65591..7ab99ceb53 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBatchInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -441,6 +449,71 @@ ExecInsert(ModifyTableState *mtstate,
 			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
 									   CMD_INSERT);
 
+		/*
+		 * Determine if the FDW supports batch insert and determine the batch
+		 * size (a FDW may support batching, but it mayb e disabled for the
+		 * server/table). Do this only once, at the beginning - we don't want
+		 * the batch size to change during execution.
+		 */
+		if (resultRelInfo->ri_FdwRoutine->GetModifyBatchSize &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+			resultRelInfo->ri_projectReturning == NULL &&
+			(!resultRelInfo->ri_TrigDesc->trig_insert_after_row) &&
+			resultRelInfo->ri_BatchSize == 0)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
+
+		Assert(resultRelInfo->ri_BatchSize >= 0);
+
+		/*
+		 * If the FDW supports batching, and batching is requested, accumulate
+		 * rows and insert them in batches. Otherwise use the per-row inserts.
+		 */
+		if (resultRelInfo->ri_BatchSize > 1)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the batch insert
+			 */
+			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
+			{
+				ExecBatchInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+
+			return NULL;
+		}
+
 		/*
 		 * insert into foreign table: let the FDW do it
 		 */
@@ -698,6 +771,70 @@ ExecInsert(ModifyTableState *mtstate,
 	return result;
 }
 
+/* ----------------------------------------------------------------
+ *		ExecBatchInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBatchInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+		estate->es_processed += numInserted;
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecDelete
  *
@@ -1937,6 +2074,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	ResultRelInfo **resultRelInfos;
+	int			num_partitions;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2152,6 +2292,28 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	/*
+	 * Insert remaining tuples for batch insert.
+	 */
+	if (proute)
+		resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	else
+	{
+		resultRelInfos = &resultRelInfo;
+		num_partitions = 1;
+	}
+
+	for (int i = 0; i < num_partitions; i++)
+	{
+		resultRelInfo = resultRelInfos[i];
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBatchInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index c4eba6b053..dbf6b30233 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..2bb5a85fb1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,6 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern ResultRelInfo **ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499fb1..7946ca82f6 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBatchInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetModifyBatchSize_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBatchInsert_function ExecForeignBatchInsert;
+	GetModifyBatchSize_function GetModifyBatchSize;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..d65099c94a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,12 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* batch insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	int			ri_BatchSize;		/* max slots inserted in a single batch */
+	TupleTableSlot **ri_Slots;		/* input tuples for batch insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 710dcd37ef..404e03f132 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern pg_nodiscard List *lappend(List *list, void *datum);
 extern pg_nodiscard List *lappend_int(List *list, int datum);
#51Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#50)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On 1/13/21 3:43 PM, Tomas Vondra wrote:

...

Thanks for the report. Yeah, I think there's a missing check in
ExecInsert. Adding

(!resultRelInfo->ri_TrigDesc->trig_insert_after_row)

solves this. But now I'm wondering if this is the wrong place to make
this decision. I mean, why should we make the decision here, when the
decision whether to have a RETURNING clause is made in postgres_fdw in
deparseReturningList? We don't really know what the other FDWs will do,
for example.

So I think we should just move all of this into GetModifyBatchSize. We
can start with ri_BatchSize = 0. And then do

if (resultRelInfo->ri_BatchSize == 0)
resultRelInfo->ri_BatchSize =
resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);

if (resultRelInfo->ri_BatchSize > 1)
{
... do batching ...
}

The GetModifyBatchSize would always return value > 0, so either 1 (no
batching) or >1 (batching).

FWIW the attached v8 patch does this - most of the conditions are moved
to the GetModifyBatchSize() callback. I've removed the check for the
BatchInsert callback, though - the FDW knows whether it supports that,
and it seems a bit pointless at the moment as there are no other batch
callbacks. Maybe we should add an Assert somewhere, though?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-bulk-insert-for-foreign-tables-v8.patchtext/x-patch; charset=UTF-8; name=0001-Add-bulk-insert-for-foreign-tables-v8.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..96bad17ded 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9053,3 +9053,117 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1fec3c3eea..64698c4da3 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..9fa540aca8 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -87,8 +87,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -96,6 +98,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -176,7 +180,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			values_end;		/* length up to the end of VALUES */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -185,6 +192,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* batch operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -343,6 +353,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBatchInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetModifyBatchSize(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -429,20 +445,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -530,6 +550,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBatchInsert = postgresExecForeignBatchInsert;
+	routine->GetModifyBatchSize = postgresGetModifyBatchSize;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1665,6 +1687,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1752,7 +1775,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1776,8 +1799,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1797,6 +1821,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1812,6 +1837,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1829,6 +1856,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1846,7 +1874,8 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1855,7 +1884,36 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBatchInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBatchInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1863,6 +1921,22 @@ postgresExecForeignInsert(EState *estate,
 	return rslot;
 }
 
+/*
+ * postgresGetModifyBatchSize
+ *		Report the maximum number of tuples that can be inserted in bulk
+ */
+static int
+postgresGetModifyBatchSize(ResultRelInfo *resultRelInfo)
+{
+	/* Disable batching with RETURNING clause. */
+	if (resultRelInfo->ri_projectReturning != NULL ||
+		(resultRelInfo->ri_TrigDesc && resultRelInfo->ri_TrigDesc->trig_insert_after_row))
+		return 1;
+
+	/* Otherwise use the batch size specified for server/table. */
+	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+}
+
 /*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
@@ -1873,8 +1947,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1887,8 +1966,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1925,6 +2009,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2001,7 +2086,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2011,6 +2096,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2636,6 +2722,9 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 										  FdwModifyPrivateUpdateSql));
 
 		ExplainPropertyText("Remote SQL", sql, es);
+
+		if (rinfo->ri_BatchSize > 0)
+			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
 }
 
@@ -3530,6 +3619,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int values_end,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3538,6 +3628,7 @@ create_foreign_modify(EState *estate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Oid			userid;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	AttrNumber	n_params;
 	Oid			typefnoid;
@@ -3564,7 +3655,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->values_end = values_end;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3616,6 +3710,44 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+	{
+		/* Check the foreign table option. */
+		foreach(lc, table->options)
+		{
+			DefElem    *def = (DefElem *) lfirst(lc);
+
+			if (strcmp(def->defname, "batch_size") == 0)
+			{
+				fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+				break;
+			}
+		}
+
+		/* Check the foreign server option if the table option is not set. */
+		if (fmstate->batch_size == 0)
+		{
+			server = GetForeignServer(table->serverid);
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "batch_size") == 0)
+				{
+					fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+					break;
+				}
+			}
+		}
+
+		/* If neither the table nor server option is set, set the default. */
+		if (fmstate->batch_size == 0)
+			fmstate->batch_size = 100;
+	}
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3626,26 +3758,50 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBatchInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	/*
+	 * If the existing query was deparsed and prepared for a different number
+	 * of rows, rebuild it for the proper number.
+	 */
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Build INSERT string with numSlots records in its VALUES clause.
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3658,7 +3814,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3668,14 +3824,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3696,9 +3852,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3708,10 +3865,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3771,52 +3930,64 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
+
+	/* ctid is provided only for UPDATE/DELETE, which don't allow batching */
+	Assert(!(tupleid != NULL && numSlots > 1));
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
 		pindex++;
 	}
 
-	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	/* get following parameters from slots */
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3870,29 +4041,41 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
 	fmstate->conn = NULL;
 }
 
+/*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
 /*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a1bc..1f67b4d9fd 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08b98..fd5abf2471 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2711,3 +2711,94 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..02a34b40b3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBatchInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBatchInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetModifyBatchSize(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBatchInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBatchInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBatchInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBatchInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e6fd2143c1..97eeb64a02 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>100</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a0a9..b0a354ad6f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2192,3 +2192,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
 		}
 	}
 }
+
+/*
+ * ExecGetTouchedPartitions -- Get the partitions touched by
+ * this routing
+ */
+ResultRelInfo **
+ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count)
+{
+	*count = proute->num_partitions;
+	return proute->partitions;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d7b8f65591..f9602c5a8a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBatchInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -441,6 +449,71 @@ ExecInsert(ModifyTableState *mtstate,
 			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
 									   CMD_INSERT);
 
+		/*
+		 * Determine if the FDW supports batch insert and determine the batch
+		 * size (a FDW may support batching, but it may be disabled for the
+		 * server/table). Do this only once, at the beginning - we don't want
+		 * the batch size to change during execution.
+		 */
+		if (resultRelInfo->ri_FdwRoutine->GetModifyBatchSize &&
+			resultRelInfo->ri_BatchSize == 0)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
+
+		if (resultRelInfo->ri_BatchSize == 0)
+			resultRelInfo->ri_BatchSize = 1;
+
+		Assert(resultRelInfo->ri_BatchSize >= 0);
+
+		/*
+		 * If the FDW supports batching, and batching is requested, accumulate
+		 * rows and insert them in batches. Otherwise use the per-row inserts.
+		 */
+		if (resultRelInfo->ri_BatchSize > 1)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the batch insert
+			 */
+			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
+			{
+				ExecBatchInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+
+			return NULL;
+		}
+
 		/*
 		 * insert into foreign table: let the FDW do it
 		 */
@@ -698,6 +771,70 @@ ExecInsert(ModifyTableState *mtstate,
 	return result;
 }
 
+/* ----------------------------------------------------------------
+ *		ExecBatchInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBatchInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+		estate->es_processed += numInserted;
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecDelete
  *
@@ -1937,6 +2074,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	ResultRelInfo **resultRelInfos;
+	int			num_partitions;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2152,6 +2292,28 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	/*
+	 * Insert remaining tuples for batch insert.
+	 */
+	if (proute)
+		resultRelInfos = ExecGetTouchedPartitions(proute, &num_partitions);
+	else
+	{
+		resultRelInfos = &resultRelInfo;
+		num_partitions = 1;
+	}
+
+	for (int i = 0; i < num_partitions; i++)
+	{
+		resultRelInfo = resultRelInfos[i];
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBatchInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index c4eba6b053..dbf6b30233 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..2bb5a85fb1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,6 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern ResultRelInfo **ExecGetTouchedPartitions(PartitionTupleRouting *proute, int *count);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499fb1..7946ca82f6 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBatchInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetModifyBatchSize_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBatchInsert_function ExecForeignBatchInsert;
+	GetModifyBatchSize_function GetModifyBatchSize;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..d65099c94a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,12 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* batch insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	int			ri_BatchSize;		/* max slots inserted in a single batch */
+	TupleTableSlot **ri_Slots;		/* input tuples for batch insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 710dcd37ef..404e03f132 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern pg_nodiscard List *lappend(List *list, void *datum);
 extern pg_nodiscard List *lappend_int(List *list, int datum);
#52tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#51)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

FWIW the attached v8 patch does this - most of the conditions are moved to the
GetModifyBatchSize() callback. I've removed the check for the BatchInsert
callback, though - the FDW knows whether it supports that, and it seems a bit
pointless at the moment as there are no other batch callbacks. Maybe we
should add an Assert somewhere, though?

Thank you. I'm in favor this idea that the decision to support RETURNING and trigger is left to the FDW. I don' think of the need for another Assert, as the caller has one for the returned batch size.

Regards
Takayuki Tsunakawa

#53Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#51)
Re: POC: postgres_fdw insert batching

Hi,

On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/13/21 3:43 PM, Tomas Vondra wrote:

Thanks for the report. Yeah, I think there's a missing check in
ExecInsert. Adding

(!resultRelInfo->ri_TrigDesc->trig_insert_after_row)

solves this. But now I'm wondering if this is the wrong place to make
this decision. I mean, why should we make the decision here, when the
decision whether to have a RETURNING clause is made in postgres_fdw in
deparseReturningList? We don't really know what the other FDWs will do,
for example.

So I think we should just move all of this into GetModifyBatchSize. We
can start with ri_BatchSize = 0. And then do

if (resultRelInfo->ri_BatchSize == 0)
resultRelInfo->ri_BatchSize =
resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);

if (resultRelInfo->ri_BatchSize > 1)
{
... do batching ...
}

The GetModifyBatchSize would always return value > 0, so either 1 (no
batching) or >1 (batching).

FWIW the attached v8 patch does this - most of the conditions are moved
to the GetModifyBatchSize() callback.

Thanks. A few comments:

* I agree with leaving it up to an FDW to look at the properties of
the table and of the operation being performed to decide whether or
not to use batching, although maybe BeginForeignModify() is a better
place for putting that logic instead of GetModifyBatchSize()? So, in
create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
being set to match the table's or the server's value for the
batch_size option, make it also consider the things that prevent
batching and set the execution state's batch_size based on that.
GetModifyBatchSize() simply returns that value.

* Regarding the timing of calling GetModifyBatchSize() to set
ri_BatchSize, I wonder if it wouldn't be better to call it just once,
say from ExecInitModifyTable(), right after BeginForeignModify()
returns? I don't quite understand why it is being called from
ExecInsert(). Can the batch size change once the execution starts?

* Lastly, how about calling it GetForeignModifyBatchSize() to be
consistent with other nearby callbacks?

I've removed the check for the
BatchInsert callback, though - the FDW knows whether it supports that,
and it seems a bit pointless at the moment as there are no other batch
callbacks. Maybe we should add an Assert somewhere, though?

Hmm, not checking whether BatchInsert() exists may not be good idea,
because if an FDW's GetModifyBatchSize() returns a value > 1 but
there's no BatchInsert() function to call, ExecBatchInsert() would
trip. I don't see the newly added documentation telling FDW authors
to either define both or none.

Regarding how this plays with partitions, I don't think we need
ExecGetTouchedPartitions(), because you can get the routed-to
partitions using es_tuple_routing_result_relations. Also, perhaps
it's a good idea to put the "finishing" ExecBatchInsert() calls into a
function ExecFinishBatchInsert(). Maybe the logic to choose the
relations to perform the finishing calls on will get complicated in
the future as batching is added for updates/deletes too and it seems
better to encapsulate that in the separate function than have it out
in the open in ExecModifyTable().

(Sorry about being so late reviewing this.)

--
Amit Langote
EDB: http://www.enterprisedb.com

#54Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#53)
Re: POC: postgres_fdw insert batching

On 1/14/21 9:58 AM, Amit Langote wrote:

Hi,

On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/13/21 3:43 PM, Tomas Vondra wrote:

Thanks for the report. Yeah, I think there's a missing check in
ExecInsert. Adding

(!resultRelInfo->ri_TrigDesc->trig_insert_after_row)

solves this. But now I'm wondering if this is the wrong place to make
this decision. I mean, why should we make the decision here, when the
decision whether to have a RETURNING clause is made in postgres_fdw in
deparseReturningList? We don't really know what the other FDWs will do,
for example.

So I think we should just move all of this into GetModifyBatchSize. We
can start with ri_BatchSize = 0. And then do

if (resultRelInfo->ri_BatchSize == 0)
resultRelInfo->ri_BatchSize =
resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);

if (resultRelInfo->ri_BatchSize > 1)
{
... do batching ...
}

The GetModifyBatchSize would always return value > 0, so either 1 (no
batching) or >1 (batching).

FWIW the attached v8 patch does this - most of the conditions are moved
to the GetModifyBatchSize() callback.

Thanks. A few comments:

* I agree with leaving it up to an FDW to look at the properties of
the table and of the operation being performed to decide whether or
not to use batching, although maybe BeginForeignModify() is a better
place for putting that logic instead of GetModifyBatchSize()? So, in
create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
being set to match the table's or the server's value for the
batch_size option, make it also consider the things that prevent
batching and set the execution state's batch_size based on that.
GetModifyBatchSize() simply returns that value.

* Regarding the timing of calling GetModifyBatchSize() to set
ri_BatchSize, I wonder if it wouldn't be better to call it just once,
say from ExecInitModifyTable(), right after BeginForeignModify()
returns? I don't quite understand why it is being called from
ExecInsert(). Can the batch size change once the execution starts?

But it should be called just once. The idea is that initially we have
batch_size=0, and the fist call returns value that is >= 1. So we never
call it again. But maybe it could be called from BeginForeignModify, in
which case we'd not need this logic with first setting it to 0 etc.

* Lastly, how about calling it GetForeignModifyBatchSize() to be
consistent with other nearby callbacks?

Yeah, good point.

I've removed the check for the
BatchInsert callback, though - the FDW knows whether it supports that,
and it seems a bit pointless at the moment as there are no other batch
callbacks. Maybe we should add an Assert somewhere, though?

Hmm, not checking whether BatchInsert() exists may not be good idea,
because if an FDW's GetModifyBatchSize() returns a value > 1 but
there's no BatchInsert() function to call, ExecBatchInsert() would
trip. I don't see the newly added documentation telling FDW authors
to either define both or none.

Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
it can't hurt, I guess. I'll ad it back.

Regarding how this plays with partitions, I don't think we need
ExecGetTouchedPartitions(), because you can get the routed-to
partitions using es_tuple_routing_result_relations. Also, perhaps

I'm not very familiar with es_tuple_routing_result_relations, but that
doesn't seem to work. I've replaced the flushing code at the end of
ExecModifyTable with a loop over es_tuple_routing_result_relations, but
then some of the rows are missing (i.e. not flushed).

it's a good idea to put the "finishing" ExecBatchInsert() calls into a
function ExecFinishBatchInsert(). Maybe the logic to choose the
relations to perform the finishing calls on will get complicated in
the future as batching is added for updates/deletes too and it seems
better to encapsulate that in the separate function than have it out
in the open in ExecModifyTable().

IMO that'd be an over-engineering at this point. We don't need such
separate function yet, so why complicate the API? If we need it in the
future, we can add it.

(Sorry about being so late reviewing this.)

thanks

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#55Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#54)
Re: POC: postgres_fdw insert batching

On Thu, Jan 14, 2021 at 21:57 Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 1/14/21 9:58 AM, Amit Langote wrote:

Hi,

On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/13/21 3:43 PM, Tomas Vondra wrote:

Thanks for the report. Yeah, I think there's a missing check in
ExecInsert. Adding

(!resultRelInfo->ri_TrigDesc->trig_insert_after_row)

solves this. But now I'm wondering if this is the wrong place to make
this decision. I mean, why should we make the decision here, when the
decision whether to have a RETURNING clause is made in postgres_fdw in
deparseReturningList? We don't really know what the other FDWs will do,
for example.

So I think we should just move all of this into GetModifyBatchSize. We
can start with ri_BatchSize = 0. And then do

if (resultRelInfo->ri_BatchSize == 0)
resultRelInfo->ri_BatchSize =
resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);

if (resultRelInfo->ri_BatchSize > 1)
{
... do batching ...
}

The GetModifyBatchSize would always return value > 0, so either 1 (no
batching) or >1 (batching).

FWIW the attached v8 patch does this - most of the conditions are moved
to the GetModifyBatchSize() callback.

Thanks. A few comments:

* I agree with leaving it up to an FDW to look at the properties of
the table and of the operation being performed to decide whether or
not to use batching, although maybe BeginForeignModify() is a better
place for putting that logic instead of GetModifyBatchSize()? So, in
create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
being set to match the table's or the server's value for the
batch_size option, make it also consider the things that prevent
batching and set the execution state's batch_size based on that.
GetModifyBatchSize() simply returns that value.

* Regarding the timing of calling GetModifyBatchSize() to set
ri_BatchSize, I wonder if it wouldn't be better to call it just once,
say from ExecInitModifyTable(), right after BeginForeignModify()
returns? I don't quite understand why it is being called from
ExecInsert(). Can the batch size change once the execution starts?

But it should be called just once. The idea is that initially we have
batch_size=0, and the fist call returns value that is >= 1. So we never
call it again. But maybe it could be called from BeginForeignModify, in
which case we'd not need this logic with first setting it to 0 etc.

Right, although I was thinking that maybe ri_BatchSize itself is not to be
written to by the FDW. Not to say that’s doing anything wrong though.

* Lastly, how about calling it GetForeignModifyBatchSize() to be

consistent with other nearby callbacks?

Yeah, good point.

I've removed the check for the
BatchInsert callback, though - the FDW knows whether it supports that,
and it seems a bit pointless at the moment as there are no other batch
callbacks. Maybe we should add an Assert somewhere, though?

Hmm, not checking whether BatchInsert() exists may not be good idea,
because if an FDW's GetModifyBatchSize() returns a value > 1 but
there's no BatchInsert() function to call, ExecBatchInsert() would
trip. I don't see the newly added documentation telling FDW authors
to either define both or none.

Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
it can't hurt, I guess. I'll ad it back.

Regarding how this plays with partitions, I don't think we need
ExecGetTouchedPartitions(), because you can get the routed-to
partitions using es_tuple_routing_result_relations. Also, perhaps

I'm not very familiar with es_tuple_routing_result_relations, but that
doesn't seem to work. I've replaced the flushing code at the end of
ExecModifyTable with a loop over es_tuple_routing_result_relations, but
then some of the rows are missing (i.e. not flushed).

I should’ve mentioned es_opened_result_relations too which contain
non-routing result relations. So I really meant if (proute) then use
es_tuple_routing_result_relations, else es_opened_result_relations. This
should work as long as batching is only used for inserts.

it's a good idea to put the "finishing" ExecBatchInsert() calls into a

function ExecFinishBatchInsert(). Maybe the logic to choose the
relations to perform the finishing calls on will get complicated in
the future as batching is added for updates/deletes too and it seems
better to encapsulate that in the separate function than have it out
in the open in ExecModifyTable().

IMO that'd be an over-engineering at this point. We don't need such
separate function yet, so why complicate the API? If we need it in the
future, we can add it.

Fair enough.
--
Amit Langote
EDB: http://www.enterprisedb.com

#56Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#55)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On 1/14/21 2:57 PM, Amit Langote wrote:

On Thu, Jan 14, 2021 at 21:57 Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:

On 1/14/21 9:58 AM, Amit Langote wrote:

Hi,

On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
<tomas.vondra@enterprisedb.com

<mailto:tomas.vondra@enterprisedb.com>> wrote:

On 1/13/21 3:43 PM, Tomas Vondra wrote:

Thanks for the report. Yeah, I think there's a missing check in
ExecInsert. Adding

   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)

solves this. But now I'm wondering if this is the wrong place to

make

this decision. I mean, why should we make the decision here,

when the

decision whether to have a RETURNING clause is made in

postgres_fdw in

deparseReturningList? We don't really know what the other FDWs

will do,

for example.

So I think we should just move all of this into

GetModifyBatchSize. We

can start with ri_BatchSize = 0. And then do

   if (resultRelInfo->ri_BatchSize == 0)
     resultRelInfo->ri_BatchSize =
     

 resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);

   if (resultRelInfo->ri_BatchSize > 1)
   {
     ... do batching ...
   }

The GetModifyBatchSize would always return value > 0, so either

1 (no

batching) or >1 (batching).

FWIW the attached v8 patch does this - most of the conditions are

moved

to the GetModifyBatchSize() callback.

Thanks.  A few comments:

* I agree with leaving it up to an FDW to look at the properties of
the table and of the operation being performed to decide whether or
not to use batching, although maybe BeginForeignModify() is a better
place for putting that logic instead of GetModifyBatchSize()?  So, in
create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
being set to match the table's or the server's value for the
batch_size option, make it also consider the things that prevent
batching and set the execution state's batch_size based on that.
GetModifyBatchSize() simply returns that value.

* Regarding the timing of calling GetModifyBatchSize() to set
ri_BatchSize, I wonder if it wouldn't be better to call it just once,
say from ExecInitModifyTable(), right after BeginForeignModify()
returns?  I don't quite understand why it is being called from
ExecInsert().  Can the batch size change once the execution starts?

But it should be called just once. The idea is that initially we have
batch_size=0, and the fist call returns value that is >= 1. So we never
call it again. But maybe it could be called from BeginForeignModify, in
which case we'd not need this logic with first setting it to 0 etc.

Right, although I was thinking that maybe ri_BatchSize itself is not to
be written to by the FDW.  Not to say that’s doing anything wrong though.

* Lastly, how about calling it GetForeignModifyBatchSize() to be
consistent with other nearby callbacks?

Yeah, good point.

I've removed the check for the
BatchInsert callback, though - the FDW knows whether it supports

that,

and it seems a bit pointless at the moment as there are no other

batch

callbacks. Maybe we should add an Assert somewhere, though?

Hmm, not checking whether BatchInsert() exists may not be good idea,
because if an FDW's GetModifyBatchSize() returns a value > 1 but
there's no BatchInsert() function to call, ExecBatchInsert() would
trip.  I don't see the newly added documentation telling FDW authors
to either define both or none.

Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
it can't hurt, I guess. I'll ad it back.

Regarding how this plays with partitions, I don't think we need
ExecGetTouchedPartitions(), because you can get the routed-to
partitions using es_tuple_routing_result_relations.  Also, perhaps

I'm not very familiar with es_tuple_routing_result_relations, but that
doesn't seem to work. I've replaced the flushing code at the end of
ExecModifyTable with a loop over es_tuple_routing_result_relations, but
then some of the rows are missing (i.e. not flushed).

I should’ve mentioned es_opened_result_relations too which contain
non-routing result relations.  So I really meant if (proute) then use
es_tuple_routing_result_relations, else es_opened_result_relations. 
This should work as long as batching is only used for inserts.

Ah, right. That did the trick.

Attached is v9 with all of those tweaks, except for moving the BatchSize
call to BeginForeignModify - I tried that, but it did not seem like an
improvement, because we'd still need the checks for API callbacks in
ExecInsert for example. So I decided not to do that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-bulk-insert-for-foreign-tables-v9.patchtext/x-patch; charset=UTF-8; name=0001-Add-bulk-insert-for-foreign-tables-v9.patchDownload
From 825640430a5882edba7d9a0e21960e29922815ec Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Thu, 14 Jan 2021 15:59:49 +0100
Subject: [PATCH] v9

---
 contrib/postgres_fdw/deparse.c                |  43 ++-
 .../postgres_fdw/expected/postgres_fdw.out    | 116 ++++++-
 contrib/postgres_fdw/option.c                 |  14 +
 contrib/postgres_fdw/postgres_fdw.c           | 298 ++++++++++++++----
 contrib/postgres_fdw/postgres_fdw.h           |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  91 ++++++
 doc/src/sgml/fdwhandler.sgml                  |  89 +++++-
 doc/src/sgml/postgres-fdw.sgml                |  13 +
 src/backend/executor/nodeModifyTable.c        | 160 ++++++++++
 src/backend/nodes/list.c                      |  15 +
 src/include/foreign/fdwapi.h                  |  10 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pg_list.h                   |  15 +
 13 files changed, 809 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..96bad17ded 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9053,3 +9053,117 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1fec3c3eea..64698c4da3 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..85a072bc88 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -87,8 +87,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -96,6 +98,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -176,7 +180,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			values_end;		/* length up to the end of VALUES */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -185,6 +192,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* batch operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -343,6 +353,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBatchInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -429,20 +445,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -530,6 +550,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBatchInsert = postgresExecForeignBatchInsert;
+	routine->GetForeignModifyBatchSize = postgresGetForeignModifyBatchSize;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1665,6 +1687,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1752,7 +1775,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1776,8 +1799,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1797,6 +1821,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1812,6 +1837,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1829,6 +1856,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1846,7 +1874,8 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1855,7 +1884,36 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBatchInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBatchInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1863,6 +1921,23 @@ postgresExecForeignInsert(EState *estate,
 	return rslot;
 }
 
+/*
+ * postgresGetForeignModifyBatchSize
+ *		Report the maximum number of tuples that can be inserted in bulk
+ */
+static int
+postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
+{
+	/* Disable batching when we have to use RETURNING. */
+	if (resultRelInfo->ri_projectReturning != NULL ||
+		(resultRelInfo->ri_TrigDesc &&
+		 resultRelInfo->ri_TrigDesc->trig_insert_after_row))
+		return 1;
+
+	/* Otherwise use the batch size specified for server/table. */
+	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+}
+
 /*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
@@ -1873,8 +1948,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1887,8 +1967,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1925,6 +2010,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2001,7 +2087,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2011,6 +2097,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2636,6 +2723,9 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 										  FdwModifyPrivateUpdateSql));
 
 		ExplainPropertyText("Remote SQL", sql, es);
+
+		if (rinfo->ri_BatchSize > 0)
+			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
 }
 
@@ -3530,6 +3620,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int values_end,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3538,6 +3629,7 @@ create_foreign_modify(EState *estate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Oid			userid;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	AttrNumber	n_params;
 	Oid			typefnoid;
@@ -3564,7 +3656,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->values_end = values_end;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3616,6 +3711,44 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+	{
+		/* Check the foreign table option. */
+		foreach(lc, table->options)
+		{
+			DefElem    *def = (DefElem *) lfirst(lc);
+
+			if (strcmp(def->defname, "batch_size") == 0)
+			{
+				fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+				break;
+			}
+		}
+
+		/* Check the foreign server option if the table option is not set. */
+		if (fmstate->batch_size == 0)
+		{
+			server = GetForeignServer(table->serverid);
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "batch_size") == 0)
+				{
+					fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+					break;
+				}
+			}
+		}
+
+		/* If neither the table nor server option is set, set the default. */
+		if (fmstate->batch_size == 0)
+			fmstate->batch_size = 100;
+	}
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3626,26 +3759,50 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBatchInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	/*
+	 * If the existing query was deparsed and prepared for a different number
+	 * of rows, rebuild it for the proper number.
+	 */
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Build INSERT string with numSlots records in its VALUES clause.
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3658,7 +3815,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3668,14 +3825,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3696,9 +3853,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3708,10 +3866,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3771,52 +3931,64 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
+
+	/* ctid is provided only for UPDATE/DELETE, which don't allow batching */
+	Assert(!(tupleid != NULL && numSlots > 1));
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
 		pindex++;
 	}
 
-	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	/* get following parameters from slots */
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3870,29 +4042,41 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
 	fmstate->conn = NULL;
 }
 
+/*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
 /*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a1bc..1f67b4d9fd 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08b98..fd5abf2471 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2711,3 +2711,94 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..854913ae5f 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBatchInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBatchInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetForeignModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetForeignModifyBatchSize(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBatchInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBatchInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetForeignModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBatchInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBatchInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e6fd2143c1..97eeb64a02 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>100</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 921e695419..fc48d8cb75 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBatchInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
 			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
 									   CMD_INSERT);
 
+		/*
+		 * Determine if the FDW supports batch insert and determine the batch
+		 * size (a FDW may support batching, but it may be disabled for the
+		 * server/table). Do this only once, at the beginning - we don't want
+		 * the batch size to change during execution.
+		 */
+		if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+			resultRelInfo->ri_BatchSize == 0)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+
+		if (resultRelInfo->ri_BatchSize == 0)
+			resultRelInfo->ri_BatchSize = 1;
+
+		Assert(resultRelInfo->ri_BatchSize >= 0);
+
+		/*
+		 * If the FDW supports batching, and batching is requested, accumulate
+		 * rows and insert them in batches. Otherwise use the per-row inserts.
+		 */
+		if (resultRelInfo->ri_BatchSize > 1)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the batch insert
+			 */
+			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
+			{
+				ExecBatchInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+
+			return NULL;
+		}
+
 		/*
 		 * insert into foreign table: let the FDW do it
 		 */
@@ -698,6 +772,70 @@ ExecInsert(ModifyTableState *mtstate,
 	return result;
 }
 
+/* ----------------------------------------------------------------
+ *		ExecBatchInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBatchInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+		estate->es_processed += numInserted;
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecDelete
  *
@@ -1937,6 +2075,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	List				  *relinfos = NIL;
+	ListCell			  *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2152,6 +2293,25 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	/*
+	 * Insert remaining tuples for batch insert.
+	 */
+	if (proute)
+		relinfos = estate->es_tuple_routing_result_relations;
+	else
+		relinfos = estate->es_opened_result_relations;
+
+	foreach(lc, relinfos)
+	{
+		resultRelInfo = lfirst(lc);
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBatchInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index c4eba6b053..dbf6b30233 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499fb1..248f78da45 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBatchInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetForeignModifyBatchSize_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBatchInsert_function ExecForeignBatchInsert;
+	GetForeignModifyBatchSize_function GetForeignModifyBatchSize;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..d65099c94a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,12 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* batch insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	int			ri_BatchSize;		/* max slots inserted in a single batch */
+	TupleTableSlot **ri_Slots;		/* input tuples for batch insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 710dcd37ef..404e03f132 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern pg_nodiscard List *lappend(List *list, void *datum);
 extern pg_nodiscard List *lappend_int(List *list, int datum);
-- 
2.26.2

#57tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#56)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Attached is v9 with all of those tweaks, except for moving the BatchSize call to
BeginForeignModify - I tried that, but it did not seem like an improvement,
because we'd still need the checks for API callbacks in ExecInsert for example.
So I decided not to do that.

Thanks, Tomas-san. The patch looks good again.

Amit-san, thank you for teaching us about es_tuple_routing_result_relations and es_opened_result_relations.

Regards
Takayuki Tsunakawa

#58Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#56)
Re: POC: postgres_fdw insert batching

On Fri, Jan 15, 2021 at 12:05 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Attached is v9 with all of those tweaks,

Thanks.

except for moving the BatchSize
call to BeginForeignModify - I tried that, but it did not seem like an
improvement, because we'd still need the checks for API callbacks in
ExecInsert for example. So I decided not to do that.

Okay, so maybe not moving the whole logic into the FDW's
BeginForeignModify(), but at least if we move this...

@@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
+       /*
+        * Determine if the FDW supports batch insert and determine the batch
+        * size (a FDW may support batching, but it may be disabled for the
+        * server/table). Do this only once, at the beginning - we don't want
+        * the batch size to change during execution.
+        */
+       if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+           resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+           resultRelInfo->ri_BatchSize == 0)
+           resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

...into ExecInitModifyTable(), ExecInsert() only needs the following block:

       /*
+        * If the FDW supports batching, and batching is requested, accumulate
+        * rows and insert them in batches. Otherwise use the per-row inserts.
+        */
+       if (resultRelInfo->ri_BatchSize > 1)
+       {
+ ...

AFAICS, I don't see anything that will cause ri_BatchSize to become 0
once set so don't see the point of checking whether it needs to be set
again on every ExecInsert() call. Also, maybe not that important, but
shaving off 3 comparisons for every tuple would add up nicely IMHO
especially given that we're targeting bulk loads.

--
Amit Langote
EDB: http://www.enterprisedb.com

#59tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Langote (#58)
RE: POC: postgres_fdw insert batching

From: Amit Langote <amitlangote09@gmail.com>

Okay, so maybe not moving the whole logic into the FDW's
BeginForeignModify(), but at least if we move this...

@@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
+       /*
+        * Determine if the FDW supports batch insert and determine the
batch
+        * size (a FDW may support batching, but it may be disabled for the
+        * server/table). Do this only once, at the beginning - we don't want
+        * the batch size to change during execution.
+        */
+       if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+           resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+           resultRelInfo->ri_BatchSize == 0)
+           resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

...into ExecInitModifyTable(), ExecInsert() only needs the following block:

Does ExecInitModifyTable() know all leaf partitions where the tuples produced by VALUES or SELECT go? ExecInsert() doesn't find the target leaf partition for the first time through the call to ExecPrepareTupleRouting()? Leaf partitions can have different batch_size settings.

Regards
Takayuki Tsunakawa

#60Amit Langote
amitlangote09@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#59)
Re: POC: postgres_fdw insert batching

On Sat, Jan 16, 2021 at 12:00 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

Okay, so maybe not moving the whole logic into the FDW's
BeginForeignModify(), but at least if we move this...

@@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
+       /*
+        * Determine if the FDW supports batch insert and determine the
batch
+        * size (a FDW may support batching, but it may be disabled for the
+        * server/table). Do this only once, at the beginning - we don't want
+        * the batch size to change during execution.
+        */
+       if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+           resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+           resultRelInfo->ri_BatchSize == 0)
+           resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

...into ExecInitModifyTable(), ExecInsert() only needs the following block:

Does ExecInitModifyTable() know all leaf partitions where the tuples produced by VALUES or SELECT go? ExecInsert() doesn't find the target leaf partition for the first time through the call to ExecPrepareTupleRouting()? Leaf partitions can have different batch_size settings.

Good thing you reminded me that this is about inserts, and in that
case no, ExecInitModifyTable() doesn't know all leaf partitions, it
only sees the root table whose batch_size doesn't really matter. So
it's really ExecInitRoutingInfo() that I would recommend to set
ri_BatchSize; right after this block:

/*
* If the partition is a foreign table, let the FDW init itself for
* routing tuples to the partition.
*/
if (partRelInfo->ri_FdwRoutine != NULL &&
partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);

Note that ExecInitRoutingInfo() is called only once for a partition
when it is initialized after being inserted into for the first time.

For a non-partitioned targets, I'd still say set ri_BatchSize in
ExecInitModifyTable().

--
Amit Langote
EDB: http://www.enterprisedb.com

#61tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Langote (#60)
1 attachment(s)
RE: POC: postgres_fdw insert batching

Tomas-san,

From: Amit Langote <amitlangote09@gmail.com>

Good thing you reminded me that this is about inserts, and in that
case no, ExecInitModifyTable() doesn't know all leaf partitions, it
only sees the root table whose batch_size doesn't really matter. So
it's really ExecInitRoutingInfo() that I would recommend to set
ri_BatchSize; right after this block:

/*
* If the partition is a foreign table, let the FDW init itself for
* routing tuples to the partition.
*/
if (partRelInfo->ri_FdwRoutine != NULL &&
partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);

Note that ExecInitRoutingInfo() is called only once for a partition
when it is initialized after being inserted into for the first time.

For a non-partitioned targets, I'd still say set ri_BatchSize in
ExecInitModifyTable().

Attached is the patch that added call to GetModifyBatchSize() to the above two places. The regression test passes.

(FWIW, frankly, I prefer the previous version because the code is a bit smaller... Maybe we should refactor the code someday to reduce similar processings in both the partitioned case and non-partitioned case.)

Regards
Takayuki Tsunakawa

Attachments:

v10-0001-Add-bulk-insert-for-foreign-tables.patchapplication/octet-stream; name=v10-0001-Add-bulk-insert-for-foreign-tables.patchDownload
From 691c1457c21b8855b8ad23978b41956aeecc01c7 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Mon, 18 Jan 2021 15:36:44 +0900
Subject: [PATCH v10] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c                 |  43 +++-
 contrib/postgres_fdw/expected/postgres_fdw.out | 116 +++++++++-
 contrib/postgres_fdw/option.c                  |  14 ++
 contrib/postgres_fdw/postgres_fdw.c            | 302 ++++++++++++++++++++-----
 contrib/postgres_fdw/postgres_fdw.h            |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  91 ++++++++
 doc/src/sgml/fdwhandler.sgml                   |  89 +++++++-
 doc/src/sgml/postgres-fdw.sgml                 |  13 ++
 src/backend/executor/execPartition.c           |  12 +
 src/backend/executor/nodeModifyTable.c         | 157 +++++++++++++
 src/backend/nodes/list.c                       |  15 ++
 src/include/foreign/fdwapi.h                   |  10 +
 src/include/nodes/execnodes.h                  |   6 +
 src/include/nodes/pg_list.h                    |  15 ++
 14 files changed, 822 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4e..2d38ab2 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1764,6 +1765,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * The statement text is appended to buf, and we also create an integer List
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f..96bad17 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9053,3 +9053,117 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1fec3c3..64698c4 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d1..b317942 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -87,8 +87,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -96,6 +98,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -176,7 +180,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			values_end;		/* length up to the end of VALUES */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -185,6 +192,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* batch operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -343,6 +353,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBatchInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -429,20 +445,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -530,6 +550,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBatchInsert = postgresExecForeignBatchInsert;
+	routine->GetForeignModifyBatchSize = postgresGetForeignModifyBatchSize;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1665,6 +1687,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1752,7 +1775,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1776,8 +1799,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1797,6 +1821,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1812,6 +1837,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1829,6 +1856,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1846,7 +1874,8 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1855,7 +1884,36 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBatchInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBatchInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1864,6 +1922,27 @@ postgresExecForeignInsert(EState *estate,
 }
 
 /*
+ * postgresGetForeignModifyBatchSize
+ *		Report the maximum number of tuples that can be inserted in bulk
+ */
+static int
+postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
+{
+	/* In EXPLAIN without ANALYZE, ri_fdwstate is NULL */
+	if (resultRelInfo->ri_FdwState == NULL)
+		return 0;
+
+	/* Disable batching when we have to use RETURNING. */
+	if (resultRelInfo->ri_projectReturning != NULL ||
+		(resultRelInfo->ri_TrigDesc &&
+		 resultRelInfo->ri_TrigDesc->trig_insert_after_row))
+		return 1;
+
+	/* Otherwise use the batch size specified for server/table. */
+	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+}
+
+/*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
  */
@@ -1873,8 +1952,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1887,8 +1971,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1925,6 +2014,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2001,7 +2091,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2011,6 +2101,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2636,6 +2727,9 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 										  FdwModifyPrivateUpdateSql));
 
 		ExplainPropertyText("Remote SQL", sql, es);
+
+		if (rinfo->ri_BatchSize > 0)
+			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
 }
 
@@ -3530,6 +3624,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int values_end,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3538,6 +3633,7 @@ create_foreign_modify(EState *estate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Oid			userid;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	AttrNumber	n_params;
 	Oid			typefnoid;
@@ -3564,7 +3660,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->values_end = values_end;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3616,6 +3715,44 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+	{
+		/* Check the foreign table option. */
+		foreach(lc, table->options)
+		{
+			DefElem    *def = (DefElem *) lfirst(lc);
+
+			if (strcmp(def->defname, "batch_size") == 0)
+			{
+				fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+				break;
+			}
+		}
+
+		/* Check the foreign server option if the table option is not set. */
+		if (fmstate->batch_size == 0)
+		{
+			server = GetForeignServer(table->serverid);
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "batch_size") == 0)
+				{
+					fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+					break;
+				}
+			}
+		}
+
+		/* If neither the table nor server option is set, set the default. */
+		if (fmstate->batch_size == 0)
+			fmstate->batch_size = 100;
+	}
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3626,26 +3763,50 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBatchInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	/*
+	 * If the existing query was deparsed and prepared for a different number
+	 * of rows, rebuild it for the proper number.
+	 */
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Build INSERT string with numSlots records in its VALUES clause.
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3658,7 +3819,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3668,14 +3829,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3696,9 +3857,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3708,10 +3870,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3771,52 +3935,64 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
+
+	/* ctid is provided only for UPDATE/DELETE, which don't allow batching */
+	Assert(!(tupleid != NULL && numSlots > 1));
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
 		pindex++;
 	}
 
-	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	/* get following parameters from slots */
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3870,23 +4046,7 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
@@ -3894,6 +4054,34 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 }
 
 /*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
+/*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
  *		UPDATE/DELETE .. RETURNING on a join directly
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a..1f67b4d 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 25dbc08..fd5abf2 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2711,3 +2711,94 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 COMMIT;
+
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c92934..854913a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBatchInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBatchInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetForeignModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetForeignModifyBatchSize(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBatchInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBatchInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetForeignModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBatchInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBatchInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e6fd214..97eeb64 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>100</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a..9349e2c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -993,6 +993,18 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
 
+	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (partRelInfo->ri_FdwRoutine != NULL &&
+		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		partRelInfo->ri_BatchSize =
+			partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(partRelInfo);
+	Assert(partRelInfo->ri_BatchSize >= 0);
+
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 921e695..1412f97 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBatchInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -442,6 +450,55 @@ ExecInsert(ModifyTableState *mtstate,
 									   CMD_INSERT);
 
 		/*
+		 * If the FDW supports batching, and batching is requested, accumulate
+		 * rows and insert them in batches. Otherwise use the per-row inserts.
+		 */
+		if (resultRelInfo->ri_BatchSize > 1)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the batch insert
+			 */
+			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
+			{
+				ExecBatchInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+
+			return NULL;
+		}
+
+		/*
 		 * insert into foreign table: let the FDW do it
 		 */
 		slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
@@ -699,6 +756,70 @@ ExecInsert(ModifyTableState *mtstate,
 }
 
 /* ----------------------------------------------------------------
+ *		ExecBatchInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBatchInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+		estate->es_processed += numInserted;
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
+/* ----------------------------------------------------------------
  *		ExecDelete
  *
  *		DELETE is like UPDATE, except that we delete the tuple and no
@@ -1937,6 +2058,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	List				  *relinfos = NIL;
+	ListCell			  *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2153,6 +2277,25 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
+	 * Insert remaining tuples for batch insert.
+	 */
+	if (proute)
+		relinfos = estate->es_tuple_routing_result_relations;
+	else
+		relinfos = estate->es_opened_result_relations;
+
+	foreach(lc, relinfos)
+	{
+		resultRelInfo = lfirst(lc);
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBatchInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
+	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
 	fireASTriggers(node);
@@ -2651,6 +2794,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (!resultRelInfo->ri_usesFdwDirectModify &&
+		operation == CMD_INSERT &&
+		resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	Assert(resultRelInfo->ri_BatchSize >= 0);
+
+	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
 	 * to estate->es_auxmodifytables so that it will be run to completion by
 	 * ExecPostprocessPlan.  (It'd actually work fine to add the primary
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index c4eba6b..dbf6b30 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499..248f78d 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBatchInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetForeignModifyBatchSize_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBatchInsert_function ExecForeignBatchInsert;
+	GetForeignModifyBatchSize_function GetForeignModifyBatchSize;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f57..d65099c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,12 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* batch insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	int			ri_BatchSize;		/* max slots inserted in a single batch */
+	TupleTableSlot **ri_Slots;		/* input tuples for batch insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 710dcd3..404e03f 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern pg_nodiscard List *lappend(List *list, void *datum);
 extern pg_nodiscard List *lappend_int(List *list, int datum);
-- 
2.10.1

#62Zhihong Yu
zyu@yugabyte.com
In reply to: tsunakawa.takay@fujitsu.com (#61)
Re: POC: postgres_fdw insert batching

Hi, Takayuki-san:

+           if (batch_size <= 0)
+               ereport(ERROR,
+                       (errcode(ERRCODE_SYNTAX_ERROR),
+                        errmsg("%s requires a non-negative integer value",

It seems the message doesn't match the check w.r.t. the batch size of 0.

+ int numInserted = numSlots;

Since numInserted is filled by ExecForeignBatchInsert(), the initialization
can be done with 0.

Cheers

On Sun, Jan 17, 2021 at 10:52 PM tsunakawa.takay@fujitsu.com <
tsunakawa.takay@fujitsu.com> wrote:

Show quoted text

Tomas-san,

From: Amit Langote <amitlangote09@gmail.com>

Good thing you reminded me that this is about inserts, and in that
case no, ExecInitModifyTable() doesn't know all leaf partitions, it
only sees the root table whose batch_size doesn't really matter. So
it's really ExecInitRoutingInfo() that I would recommend to set
ri_BatchSize; right after this block:

/*
* If the partition is a foreign table, let the FDW init itself for
* routing tuples to the partition.
*/
if (partRelInfo->ri_FdwRoutine != NULL &&
partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);

Note that ExecInitRoutingInfo() is called only once for a partition
when it is initialized after being inserted into for the first time.

For a non-partitioned targets, I'd still say set ri_BatchSize in
ExecInitModifyTable().

Attached is the patch that added call to GetModifyBatchSize() to the above
two places. The regression test passes.

(FWIW, frankly, I prefer the previous version because the code is a bit
smaller... Maybe we should refactor the code someday to reduce similar
processings in both the partitioned case and non-partitioned case.)

Regards
Takayuki Tsunakawa

#63Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#61)
3 attachment(s)
Re: POC: postgres_fdw insert batching

On 1/18/21 7:51 AM, tsunakawa.takay@fujitsu.com wrote:

Tomas-san,

From: Amit Langote <amitlangote09@gmail.com>

Good thing you reminded me that this is about inserts, and in that
case no, ExecInitModifyTable() doesn't know all leaf partitions,
it only sees the root table whose batch_size doesn't really matter.
So it's really ExecInitRoutingInfo() that I would recommend to set
ri_BatchSize; right after this block:

/* * If the partition is a foreign table, let the FDW init itself
for * routing tuples to the partition. */ if
(partRelInfo->ri_FdwRoutine != NULL &&
partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
partRelInfo);

Note that ExecInitRoutingInfo() is called only once for a
partition when it is initialized after being inserted into for the
first time.

For a non-partitioned targets, I'd still say set ri_BatchSize in
ExecInitModifyTable().

Attached is the patch that added call to GetModifyBatchSize() to the
above two places. The regression test passes.

(FWIW, frankly, I prefer the previous version because the code is a
bit smaller... Maybe we should refactor the code someday to reduce
similar processings in both the partitioned case and non-partitioned
case.)

Less code would be nice, but it's not always the right thing to do,
unfortunately :-(

I took a look at this - there's a bit of bitrot due to 708d165ddb92c, so
attached is a rebased patch (0001) fixing that.

0002 adds a couple comments and minor tweaks

0003 addresses a couple shortcomings related to explain - we haven't
been showing the batch size for EXPLAIN (VERBOSE), because there'd be no
FdwState, so this tries to fix that. Furthermore, there were no tests
for EXPLAIN output with batch size, so I added a couple.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-bulk-insert-for-foreign-tables-v11.patchtext/x-patch; charset=UTF-8; name=0001-Add-bulk-insert-for-foreign-tables-v11.patchDownload
From 9425a8501543d4caf55a302a877745b0f83b6046 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 18 Jan 2021 15:18:14 +0100
Subject: [PATCH 1/3] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c                |  43 ++-
 .../postgres_fdw/expected/postgres_fdw.out    | 116 ++++++-
 contrib/postgres_fdw/option.c                 |  14 +
 contrib/postgres_fdw/postgres_fdw.c           | 302 ++++++++++++++----
 contrib/postgres_fdw/postgres_fdw.h           |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  91 ++++++
 doc/src/sgml/fdwhandler.sgml                  |  89 +++++-
 doc/src/sgml/postgres-fdw.sgml                |  13 +
 src/backend/executor/execPartition.c          |  12 +
 src/backend/executor/nodeModifyTable.c        | 157 +++++++++
 src/backend/nodes/list.c                      |  15 +
 src/include/foreign/fdwapi.h                  |  10 +
 src/include/nodes/execnodes.h                 |   6 +
 src/include/nodes/pg_list.h                   |  15 +
 14 files changed, 822 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 						 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 1cad311436..8c0fdb5a9a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8923,7 +8923,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9112,3 +9112,117 @@ SELECT * FROM postgres_fdw_get_connections() ORDER BY 1;
  loopback2   | t
 (1 row)
 
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1fec3c3eea..64698c4da3 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..b317942596 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -87,8 +87,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -96,6 +98,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -176,7 +180,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			values_end;		/* length up to the end of VALUES */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -185,6 +192,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* batch operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -343,6 +353,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBatchInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -429,20 +445,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -530,6 +550,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBatchInsert = postgresExecForeignBatchInsert;
+	routine->GetForeignModifyBatchSize = postgresGetForeignModifyBatchSize;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1665,6 +1687,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1752,7 +1775,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1776,8 +1799,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1797,6 +1821,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1812,6 +1837,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1829,6 +1856,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1846,7 +1874,8 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1855,7 +1884,36 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBatchInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBatchInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1863,6 +1921,27 @@ postgresExecForeignInsert(EState *estate,
 	return rslot;
 }
 
+/*
+ * postgresGetForeignModifyBatchSize
+ *		Report the maximum number of tuples that can be inserted in bulk
+ */
+static int
+postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
+{
+	/* In EXPLAIN without ANALYZE, ri_fdwstate is NULL */
+	if (resultRelInfo->ri_FdwState == NULL)
+		return 0;
+
+	/* Disable batching when we have to use RETURNING. */
+	if (resultRelInfo->ri_projectReturning != NULL ||
+		(resultRelInfo->ri_TrigDesc &&
+		 resultRelInfo->ri_TrigDesc->trig_insert_after_row))
+		return 1;
+
+	/* Otherwise use the batch size specified for server/table. */
+	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+}
+
 /*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
@@ -1873,8 +1952,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1887,8 +1971,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1925,6 +2014,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2001,7 +2091,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2011,6 +2101,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2636,6 +2727,9 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 										  FdwModifyPrivateUpdateSql));
 
 		ExplainPropertyText("Remote SQL", sql, es);
+
+		if (rinfo->ri_BatchSize > 0)
+			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
 }
 
@@ -3530,6 +3624,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int values_end,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3538,6 +3633,7 @@ create_foreign_modify(EState *estate,
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Oid			userid;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	AttrNumber	n_params;
 	Oid			typefnoid;
@@ -3564,7 +3660,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->values_end = values_end;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3616,6 +3715,44 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+	{
+		/* Check the foreign table option. */
+		foreach(lc, table->options)
+		{
+			DefElem    *def = (DefElem *) lfirst(lc);
+
+			if (strcmp(def->defname, "batch_size") == 0)
+			{
+				fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+				break;
+			}
+		}
+
+		/* Check the foreign server option if the table option is not set. */
+		if (fmstate->batch_size == 0)
+		{
+			server = GetForeignServer(table->serverid);
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "batch_size") == 0)
+				{
+					fmstate->batch_size = strtol(defGetString(def), NULL, 10);
+					break;
+				}
+			}
+		}
+
+		/* If neither the table nor server option is set, set the default. */
+		if (fmstate->batch_size == 0)
+			fmstate->batch_size = 100;
+	}
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3626,26 +3763,50 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBatchInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	/*
+	 * If the existing query was deparsed and prepared for a different number
+	 * of rows, rebuild it for the proper number.
+	 */
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Build INSERT string with numSlots records in its VALUES clause.
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3658,7 +3819,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3668,14 +3829,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3696,9 +3857,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3708,10 +3870,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3771,52 +3935,64 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
+
+	/* ctid is provided only for UPDATE/DELETE, which don't allow batching */
+	Assert(!(tupleid != NULL && numSlots > 1));
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
 		pindex++;
 	}
 
-	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	/* get following parameters from slots */
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3870,29 +4046,41 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
 	fmstate->conn = NULL;
 }
 
+/*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
 /*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a1bc..1f67b4d9fd 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index ebf6eb10a6..f152e9f8ca 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2738,3 +2738,94 @@ COMMIT;
 -- should not be output because they should be closed at the end of
 -- the above transaction.
 SELECT * FROM postgres_fdw_get_connections() ORDER BY 1;
+
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c9293414c..854913ae5f 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBatchInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBatchInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetForeignModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetForeignModifyBatchSize(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBatchInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBatchInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetForeignModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBatchInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBatchInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 6a91926da8..33fac42512 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>100</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a0a9..9349e2c859 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -993,6 +993,18 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
 
+	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (partRelInfo->ri_FdwRoutine != NULL &&
+		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		partRelInfo->ri_BatchSize =
+			partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(partRelInfo);
+	Assert(partRelInfo->ri_BatchSize >= 0);
+
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 921e695419..1412f97f20 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBatchInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -441,6 +449,55 @@ ExecInsert(ModifyTableState *mtstate,
 			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
 									   CMD_INSERT);
 
+		/*
+		 * If the FDW supports batching, and batching is requested, accumulate
+		 * rows and insert them in batches. Otherwise use the per-row inserts.
+		 */
+		if (resultRelInfo->ri_BatchSize > 1)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the batch insert
+			 */
+			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
+			{
+				ExecBatchInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+
+			return NULL;
+		}
+
 		/*
 		 * insert into foreign table: let the FDW do it
 		 */
@@ -698,6 +755,70 @@ ExecInsert(ModifyTableState *mtstate,
 	return result;
 }
 
+/* ----------------------------------------------------------------
+ *		ExecBatchInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBatchInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+		estate->es_processed += numInserted;
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecDelete
  *
@@ -1937,6 +2058,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	List				  *relinfos = NIL;
+	ListCell			  *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2152,6 +2276,25 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	/*
+	 * Insert remaining tuples for batch insert.
+	 */
+	if (proute)
+		relinfos = estate->es_tuple_routing_result_relations;
+	else
+		relinfos = estate->es_opened_result_relations;
+
+	foreach(lc, relinfos)
+	{
+		resultRelInfo = lfirst(lc);
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBatchInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2650,6 +2793,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
+	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (!resultRelInfo->ri_usesFdwDirectModify &&
+		operation == CMD_INSERT &&
+		resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	Assert(resultRelInfo->ri_BatchSize >= 0);
+
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
 	 * to estate->es_auxmodifytables so that it will be run to completion by
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index c4eba6b053..dbf6b30233 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499fb1..248f78da45 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBatchInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetForeignModifyBatchSize_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBatchInsert_function ExecForeignBatchInsert;
+	GetForeignModifyBatchSize_function GetForeignModifyBatchSize;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f570fa..d65099c94a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,12 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* batch insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	int			ri_BatchSize;		/* max slots inserted in a single batch */
+	TupleTableSlot **ri_Slots;		/* input tuples for batch insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 710dcd37ef..404e03f132 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern pg_nodiscard List *lappend(List *list, void *datum);
 extern pg_nodiscard List *lappend_int(List *list, int datum);
-- 
2.26.2

0002-tweaks-v11.patchtext/x-patch; charset=UTF-8; name=0002-tweaks-v11.patchDownload
From fe7478d4ce624fd51497e0f4df3f6526ee2b58e4 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 18 Jan 2021 17:40:51 +0100
Subject: [PATCH 2/3] tweaks

---
 contrib/postgres_fdw/deparse.c         | 8 ++++++++
 contrib/postgres_fdw/postgres_fdw.c    | 6 +++++-
 src/backend/executor/execPartition.c   | 3 ++-
 src/backend/executor/nodeModifyTable.c | 1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d38ab25cb..644b03fed5 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1705,6 +1705,9 @@ deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
  * The statement text is appended to buf, and we also create an integer List
  * of the columns being retrieved by WITH CHECK OPTION or RETURNING (if any),
  * which is returned to *retrieved_attrs.
+ *
+ * This also stores end position of the VALUES clause, so that we can rebuild
+ * an INSERT for a batch of rows later.
  */
 void
 deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
@@ -1767,6 +1770,8 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 /*
  * rebuild remote INSERT statement
  *
+ * Provided a number of rows in a batch, builds INSERT statement with the
+ * right number of parameters.
  */
 void
 rebuildInsertSql(StringInfo buf, char *orig_query,
@@ -1777,6 +1782,9 @@ rebuildInsertSql(StringInfo buf, char *orig_query,
 	int			pindex;
 	bool		first;
 
+	/* Make sure the values_end_len is sensible */
+	Assert((values_end_len > 0) && (values_end_len <= strlen(orig_query)));
+
 	/* Copy up to the end of the first record from the original query */
 	appendBinaryStringInfo(buf, orig_query, values_end_len);
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b317942596..0f6d0705ec 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1923,7 +1923,11 @@ postgresExecForeignBatchInsert(EState *estate,
 
 /*
  * postgresGetForeignModifyBatchSize
- *		Report the maximum number of tuples that can be inserted in bulk
+ *		Determine the maximum number of tuples that can be inserted in bulk
+ *
+ * Returns the batch size specified for server or table. When batching is not
+ * allowed (e.g. for tables with AFTER ROW triggers or with RETURNING clause),
+ * returns 1.
  */
 static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 9349e2c859..448a321630 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -996,13 +996,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	/*
 	 * Determine if the FDW supports batch insert and determine the batch
 	 * size (a FDW may support batching, but it may be disabled for the
-	 * server/table).
+	 * server/table or for this particular query).
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
 			partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(partRelInfo);
+
 	Assert(partRelInfo->ri_BatchSize >= 0);
 
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1412f97f20..48e4e2e34c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2805,6 +2805,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		resultRelInfo->ri_BatchSize =
 			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+
 	Assert(resultRelInfo->ri_BatchSize >= 0);
 
 	/*
-- 
2.26.2

0003-explain-v11.patchtext/x-patch; charset=UTF-8; name=0003-explain-v11.patchDownload
From 23ad7dd4050edbe731dce63a90f73da0c186e79a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 18 Jan 2021 18:11:09 +0100
Subject: [PATCH 3/3] explain

---
 .../postgres_fdw/expected/postgres_fdw.out    | 39 +++++++++--
 contrib/postgres_fdw/postgres_fdw.c           | 66 +++++++++++++++++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  2 +
 src/backend/executor/execPartition.c          |  6 +-
 src/backend/executor/nodeModifyTable.c        |  4 +-
 5 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 8c0fdb5a9a..b4a04d2c14 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3887,9 +3887,10 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft1
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+   Batch Size: 1
    ->  Result
          Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft1       '::character(10), NULL::user_enum
-(4 rows)
+(5 rows)
 
 ALTER TABLE "S 1"."T 1" RENAME TO "T 0";
 ALTER FOREIGN TABLE ft1 OPTIONS (SET table_name 'T 0');
@@ -3920,9 +3921,10 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft1
    Remote SQL: INSERT INTO "S 1"."T 0"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+   Batch Size: 1
    ->  Result
          Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft1       '::character(10), NULL::user_enum
-(4 rows)
+(5 rows)
 
 ALTER TABLE "S 1"."T 0" RENAME TO "T 1";
 ALTER FOREIGN TABLE ft1 OPTIONS (SET table_name 'T 1');
@@ -4244,12 +4246,13 @@ INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft2
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+   Batch Size: 1
    ->  Subquery Scan on "*SELECT*"
          Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1", NULL::integer, "*SELECT*"."?column?_2", NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2       '::character(10), NULL::user_enum
          ->  Foreign Scan on public.ft2 ft2_1
                Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
                Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" LIMIT 20::bigint
-(7 rows)
+(8 rows)
 
 INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
 INSERT INTO ft2 (c1,c2,c3)
@@ -5360,9 +5363,10 @@ INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
  Insert on public.ft2
    Output: (ft2.tableoid)::regclass
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+   Batch Size: 1
    ->  Result
          Output: 1200, 999, NULL::integer, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2       '::character(10), NULL::user_enum
-(5 rows)
+(6 rows)
 
 INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
  tableoid 
@@ -6212,9 +6216,10 @@ INSERT INTO rw_view VALUES (0, 5);
 --------------------------------------------------------------------------------
  Insert on public.foreign_tbl
    Remote SQL: INSERT INTO public.base_tbl(a, b) VALUES ($1, $2) RETURNING a, b
+   Batch Size: 1
    ->  Result
          Output: 0, 5
-(4 rows)
+(5 rows)
 
 INSERT INTO rw_view VALUES (0, 5); -- should fail
 ERROR:  new row violates check option for view "rw_view"
@@ -6225,9 +6230,10 @@ INSERT INTO rw_view VALUES (0, 15);
 --------------------------------------------------------------------------------
  Insert on public.foreign_tbl
    Remote SQL: INSERT INTO public.base_tbl(a, b) VALUES ($1, $2) RETURNING a, b
+   Batch Size: 1
    ->  Result
          Output: 0, 15
-(4 rows)
+(5 rows)
 
 INSERT INTO rw_view VALUES (0, 15); -- ok
 SELECT * FROM foreign_tbl;
@@ -9177,6 +9183,17 @@ AND ftoptions @> array['batch_size=40'];
 ROLLBACK;
 CREATE TABLE batch_table ( x int );
 CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Insert on public.ftable
+   Remote SQL: INSERT INTO public.batch_table(x) VALUES ($1)
+   Batch Size: 10
+   ->  Function Scan on pg_catalog.generate_series i
+         Output: i.i
+         Function Call: generate_series(1, 10)
+(6 rows)
+
 INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
 INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
 INSERT INTO ftable VALUES (32);
@@ -9191,6 +9208,16 @@ TRUNCATE batch_table;
 DROP FOREIGN TABLE ftable;
 -- Disable batch insert
 CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable VALUES (1), (2);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Insert on public.ftable
+   Remote SQL: INSERT INTO public.batch_table(x) VALUES ($1)
+   Batch Size: 1
+   ->  Values Scan on "*VALUES*"
+         Output: "*VALUES*".column1
+(5 rows)
+
 INSERT INTO ftable VALUES (1), (2);
 SELECT COUNT(*) FROM ftable;
  count 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0f6d0705ec..20fcc88e63 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -525,6 +525,7 @@ static void apply_table_options(PgFdwRelationInfo *fpinfo);
 static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
+static int get_batch_size_option(Relation rel);
 
 
 /*
@@ -1932,9 +1933,20 @@ postgresExecForeignBatchInsert(EState *estate,
 static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
-	/* In EXPLAIN without ANALYZE, ri_fdwstate is NULL */
-	if (resultRelInfo->ri_FdwState == NULL)
-		return 0;
+	int	batch_size;
+
+	/* should be called only once */
+	Assert(resultRelInfo->ri_BatchSize == 0);
+
+	/*
+	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
+	 * the option directly in server/table options. Otherwise just use the
+	 * value we determined earlier.
+	 */
+	if (resultRelInfo->ri_FdwState)
+		batch_size = ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+	else
+		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
 
 	/* Disable batching when we have to use RETURNING. */
 	if (resultRelInfo->ri_projectReturning != NULL ||
@@ -1943,7 +1955,7 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 		return 1;
 
 	/* Otherwise use the batch size specified for server/table. */
-	return ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+	return batch_size;
 }
 
 /*
@@ -2732,6 +2744,10 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 
 		ExplainPropertyText("Remote SQL", sql, es);
 
+		/*
+		 * For INSERT we should always have batch size >= 1, but UPDATE
+		 * and DELETE don't support batching so don't show the property.
+		 */
 		if (rinfo->ri_BatchSize > 0)
 			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
@@ -6769,3 +6785,45 @@ find_em_expr_for_input_target(PlannerInfo *root,
 	elog(ERROR, "could not find pathkey item to sort");
 	return NULL;				/* keep compiler quiet */
 }
+
+/*
+ * Determine batch size for a given foreign table. The option specified for
+ * a table has precedence.
+ */
+static int
+get_batch_size_option(Relation rel)
+{
+	Oid foreigntableid = RelationGetRelid(rel);
+	ForeignTable *table;
+	ForeignServer *server;
+	List	   *options;
+	ListCell   *lc;
+
+	/* we use 1 by default, which means "no batching" */
+	int batch_size = 1;
+
+	/*
+	 * Load options for table and server. We append server options after
+	 * table options, because table options take precedence.
+	 */
+	table = GetForeignTable(foreigntableid);
+	server = GetForeignServer(table->serverid);
+
+	options = NIL;
+	options = list_concat(options, table->options);
+	options = list_concat(options, server->options);
+
+	/* See if either table or server specifies batch_size. */
+	foreach(lc, options)
+	{
+		DefElem    *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "batch_size") == 0)
+		{
+			batch_size = strtol(defGetString(def), NULL, 10);
+			break;
+		}
+	}
+
+	return batch_size;
+}
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index f152e9f8ca..28b82f5f9d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2788,6 +2788,7 @@ ROLLBACK;
 CREATE TABLE batch_table ( x int );
 
 CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
 INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
 INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
 INSERT INTO ftable VALUES (32);
@@ -2798,6 +2799,7 @@ DROP FOREIGN TABLE ftable;
 
 -- Disable batch insert
 CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable VALUES (1), (2);
 INSERT INTO ftable VALUES (1), (2);
 SELECT COUNT(*) FROM ftable;
 DROP FOREIGN TABLE ftable;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 448a321630..1746cb8793 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -997,14 +997,18 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * Determine if the FDW supports batch insert and determine the batch
 	 * size (a FDW may support batching, but it may be disabled for the
 	 * server/table or for this particular query).
+	 *
+	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
 	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
 			partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(partRelInfo);
+	else
+		partRelInfo->ri_BatchSize = 1;
 
-	Assert(partRelInfo->ri_BatchSize >= 0);
+	Assert(partRelInfo->ri_BatchSize >= 1);
 
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 48e4e2e34c..9c36860704 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2805,8 +2805,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		resultRelInfo->ri_BatchSize =
 			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
 
-	Assert(resultRelInfo->ri_BatchSize >= 0);
+	Assert(resultRelInfo->ri_BatchSize >= 1);
 
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
-- 
2.26.2

#64tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#63)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

I took a look at this - there's a bit of bitrot due to 708d165ddb92c, so attached is
a rebased patch (0001) fixing that.

0002 adds a couple comments and minor tweaks

0003 addresses a couple shortcomings related to explain - we haven't been
showing the batch size for EXPLAIN (VERBOSE), because there'd be no
FdwState, so this tries to fix that. Furthermore, there were no tests for EXPLAIN
output with batch size, so I added a couple.

Thank you, good additions. They all look good.
Only one point: I think the code for retrieving batch_size in create_foreign_modify() can be replaced with a call to the new function in 0003.

God bless us.

Regards
Takayuki Tsunakawa

#65tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Zhihong Yu (#62)
RE: POC: postgres_fdw insert batching

Tomas-san, Zhihong-san,

From: Zhihong Yu <zyu@yugabyte.com>

+           if (batch_size <= 0)
+               ereport(ERROR,
+                       (errcode(ERRCODE_SYNTAX_ERROR),
+                        errmsg("%s requires a non-negative integer value",

It seems the message doesn't match the check w.r.t. the batch size of 0.

Ah, "non-negative" should be "positive". The message for the existing fetch_size should be fixed too. Tomas-san, could you include this as well? I'm sorry to trouble you.

+ int numInserted = numSlots;

Since numInserted is filled by ExecForeignBatchInsert(), the initialization can be done with 0.

No, the code is correct, since the batch function requires the number of rows to insert as input.

Regards
Takayuki Tsunakawa

#66Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#64)
Re: POC: postgres_fdw insert batching

On 1/19/21 2:28 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

I took a look at this - there's a bit of bitrot due to
708d165ddb92c, so attached is a rebased patch (0001) fixing that.

0002 adds a couple comments and minor tweaks

0003 addresses a couple shortcomings related to explain - we
haven't been showing the batch size for EXPLAIN (VERBOSE), because
there'd be no FdwState, so this tries to fix that. Furthermore,
there were no tests for EXPLAIN output with batch size, so I added
a couple.

Thank you, good additions. They all look good. Only one point: I
think the code for retrieving batch_size in create_foreign_modify()
can be replaced with a call to the new function in 0003.

OK. Can you prepare a final patch, squashing all the commits into a
single one, and perhaps use the function in create_foreign_modify?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#67tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#66)
1 attachment(s)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

OK. Can you prepare a final patch, squashing all the commits into a
single one, and perhaps use the function in create_foreign_modify?

Attached, including the message fix pointed by Zaihong-san.

Regards
Takayuki Tsunakawa

Attachments:

v12-0001-Add-bulk-insert-for-foreign-tables.patchapplication/octet-stream; name=v12-0001-Add-bulk-insert-for-foreign-tables.patchDownload
From fd8a0238b29615dd69e3dd092b6bee185e1ae5a6 Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa <tsunakawa.takay@fujitsu.com>
Date: Tue, 19 Jan 2021 12:41:13 +0900
Subject: [PATCH v12] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c                 |  51 +++-
 contrib/postgres_fdw/expected/postgres_fdw.out | 155 +++++++++++-
 contrib/postgres_fdw/option.c                  |  14 ++
 contrib/postgres_fdw/postgres_fdw.c            | 331 ++++++++++++++++++++-----
 contrib/postgres_fdw/postgres_fdw.h            |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  93 +++++++
 doc/src/sgml/fdwhandler.sgml                   |  89 ++++++-
 doc/src/sgml/postgres-fdw.sgml                 |  13 +
 src/backend/executor/execPartition.c           |  17 ++
 src/backend/executor/nodeModifyTable.c         | 160 ++++++++++++
 src/backend/nodes/list.c                       |  15 ++
 src/include/foreign/fdwapi.h                   |  10 +
 src/include/nodes/execnodes.h                  |   6 +
 src/include/nodes/pg_list.h                    |  15 ++
 14 files changed, 902 insertions(+), 72 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4e..644b03f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1705,13 +1705,16 @@ deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
  * The statement text is appended to buf, and we also create an integer List
  * of the columns being retrieved by WITH CHECK OPTION or RETURNING (if any),
  * which is returned to *retrieved_attrs.
+ *
+ * This also stores end position of the VALUES clause, so that we can rebuild
+ * an INSERT for a batch of rows later.
  */
 void
 deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, bool doNothing,
 				 List *withCheckOptionList, List *returningList,
-				 List **retrieved_attrs)
+				 List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1757,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1764,6 +1768,51 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 }
 
 /*
+ * rebuild remote INSERT statement
+ *
+ * Provided a number of rows in a batch, builds INSERT statement with the
+ * right number of parameters.
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+				 int values_end_len, int num_cols,
+				 int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Make sure the values_end_len is sensible */
+	Assert((values_end_len > 0) && (values_end_len <= strlen(orig_query)));
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+				appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
+/*
  * deparse remote UPDATE statement
  *
  * The statement text is appended to buf, and we also create an integer List
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 1cad311..b4a04d2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3887,9 +3887,10 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft1
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+   Batch Size: 1
    ->  Result
          Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft1       '::character(10), NULL::user_enum
-(4 rows)
+(5 rows)
 
 ALTER TABLE "S 1"."T 1" RENAME TO "T 0";
 ALTER FOREIGN TABLE ft1 OPTIONS (SET table_name 'T 0');
@@ -3920,9 +3921,10 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7;
 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft1
    Remote SQL: INSERT INTO "S 1"."T 0"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+   Batch Size: 1
    ->  Result
          Output: NULL::integer, 1001, 101, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft1       '::character(10), NULL::user_enum
-(4 rows)
+(5 rows)
 
 ALTER TABLE "S 1"."T 0" RENAME TO "T 1";
 ALTER FOREIGN TABLE ft1 OPTIONS (SET table_name 'T 1');
@@ -4244,12 +4246,13 @@ INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Insert on public.ft2
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+   Batch Size: 1
    ->  Subquery Scan on "*SELECT*"
          Output: "*SELECT*"."?column?", "*SELECT*"."?column?_1", NULL::integer, "*SELECT*"."?column?_2", NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2       '::character(10), NULL::user_enum
          ->  Foreign Scan on public.ft2 ft2_1
                Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
                Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" LIMIT 20::bigint
-(7 rows)
+(8 rows)
 
 INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
 INSERT INTO ft2 (c1,c2,c3)
@@ -5360,9 +5363,10 @@ INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
  Insert on public.ft2
    Output: (ft2.tableoid)::regclass
    Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+   Batch Size: 1
    ->  Result
          Output: 1200, 999, NULL::integer, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2       '::character(10), NULL::user_enum
-(5 rows)
+(6 rows)
 
 INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
  tableoid 
@@ -6212,9 +6216,10 @@ INSERT INTO rw_view VALUES (0, 5);
 --------------------------------------------------------------------------------
  Insert on public.foreign_tbl
    Remote SQL: INSERT INTO public.base_tbl(a, b) VALUES ($1, $2) RETURNING a, b
+   Batch Size: 1
    ->  Result
          Output: 0, 5
-(4 rows)
+(5 rows)
 
 INSERT INTO rw_view VALUES (0, 5); -- should fail
 ERROR:  new row violates check option for view "rw_view"
@@ -6225,9 +6230,10 @@ INSERT INTO rw_view VALUES (0, 15);
 --------------------------------------------------------------------------------
  Insert on public.foreign_tbl
    Remote SQL: INSERT INTO public.base_tbl(a, b) VALUES ($1, $2) RETURNING a, b
+   Batch Size: 1
    ->  Result
          Output: 0, 15
-(4 rows)
+(5 rows)
 
 INSERT INTO rw_view VALUES (0, 15); -- ok
 SELECT * FROM foreign_tbl;
@@ -8923,7 +8929,7 @@ DO $d$
     END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9112,3 +9118,138 @@ SELECT * FROM postgres_fdw_get_connections() ORDER BY 1;
  loopback2   | t
 (1 row)
 
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+BEGIN;
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
+CREATE TABLE batch_table ( x int );
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Insert on public.ftable
+   Remote SQL: INSERT INTO public.batch_table(x) VALUES ($1)
+   Batch Size: 10
+   ->  Function Scan on pg_catalog.generate_series i
+         Output: i.i
+         Function Call: generate_series(1, 10)
+(6 rows)
+
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+    34
+(1 row)
+
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable VALUES (1), (2);
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Insert on public.ftable
+   Remote SQL: INSERT INTO public.batch_table(x) VALUES ($1)
+   Batch Size: 1
+   ->  Values Scan on "*VALUES*"
+         Output: "*VALUES*".column1
+(5 rows)
+
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+ count 
+-------
+     2
+(1 row)
+
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    66
+(1 row)
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 1fec3c3..64698c4 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -142,6 +142,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 						 errmsg("%s requires a non-negative integer value",
 								def->defname)));
 		}
+		else if (strcmp(def->defname, "batch_size") == 0)
+		{
+			int			batch_size;
+
+			batch_size = strtol(defGetString(def), NULL, 10);
+			if (batch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 		else if (strcmp(def->defname, "password_required") == 0)
 		{
 			bool		pw_required = defGetBoolean(def);
@@ -203,6 +214,9 @@ InitPgFdwOptions(void)
 		/* fetch_size is available on both server and table */
 		{"fetch_size", ForeignServerRelationId, false},
 		{"fetch_size", ForeignTableRelationId, false},
+		/* batch_size is available on both server and table */
+		{"batch_size", ForeignServerRelationId, false},
+		{"batch_size", ForeignTableRelationId, false},
 		{"password_required", UserMappingRelationId, false},
 
 		/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d1..fd3c1c1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -87,8 +87,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved by RETURNING, if any
+ * 3) Length till the end of VALUES clause for INSERT
+ *	  (-1 for a DELETE/UPDATE)
+ * 4) Boolean flag showing if the remote query has a RETURNING clause
+ * 5) Integer list of attribute numbers retrieved by RETURNING, if any
  */
 enum FdwModifyPrivateIndex
 {
@@ -96,6 +98,8 @@ enum FdwModifyPrivateIndex
 	FdwModifyPrivateUpdateSql,
 	/* Integer list of target attribute numbers for INSERT/UPDATE */
 	FdwModifyPrivateTargetAttnums,
+	/* Length till the end of VALUES clause (as an integer Value node) */
+	FdwModifyPrivateLen,
 	/* has-returning flag (as an integer Value node) */
 	FdwModifyPrivateHasReturning,
 	/* Integer list of attribute numbers retrieved by RETURNING */
@@ -176,7 +180,10 @@ typedef struct PgFdwModifyState
 
 	/* extracted fdw_private data */
 	char	   *query;			/* text of INSERT/UPDATE/DELETE command */
+	char	   *orig_query;		/* original text of INSERT command */
 	List	   *target_attrs;	/* list of target attribute numbers */
+	int			values_end;		/* length up to the end of VALUES */
+	int			batch_size;		/* value of FDW option "batch_size" */
 	bool		has_returning;	/* is there a RETURNING clause? */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 
@@ -185,6 +192,9 @@ typedef struct PgFdwModifyState
 	int			p_nums;			/* number of parameters to transmit */
 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
 
+	/* batch operation stuff */
+	int			num_slots;		/* number of slots to insert */
+
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
@@ -343,6 +353,12 @@ static TupleTableSlot *postgresExecForeignInsert(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
 												 TupleTableSlot *planSlot);
+static TupleTableSlot **postgresExecForeignBatchInsert(EState *estate,
+												 ResultRelInfo *resultRelInfo,
+												 TupleTableSlot **slots,
+												 TupleTableSlot **planSlots,
+												 int *numSlots);
+static int	postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo);
 static TupleTableSlot *postgresExecForeignUpdate(EState *estate,
 												 ResultRelInfo *resultRelInfo,
 												 TupleTableSlot *slot,
@@ -429,20 +445,24 @@ static PgFdwModifyState *create_foreign_modify(EState *estate,
 											   Plan *subplan,
 											   char *query,
 											   List *target_attrs,
+											   int len,
 											   bool has_returning,
 											   List *retrieved_attrs);
-static TupleTableSlot *execute_foreign_modify(EState *estate,
+static TupleTableSlot **execute_foreign_modify(EState *estate,
 											  ResultRelInfo *resultRelInfo,
 											  CmdType operation,
-											  TupleTableSlot *slot,
-											  TupleTableSlot *planSlot);
+											  TupleTableSlot **slots,
+											  TupleTableSlot **planSlots,
+											  int *numSlots);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
 											 ItemPointer tupleid,
-											 TupleTableSlot *slot);
+											 TupleTableSlot **slots,
+											 int numSlots);
 static void store_returning_result(PgFdwModifyState *fmstate,
 								   TupleTableSlot *slot, PGresult *res);
 static void finish_foreign_modify(PgFdwModifyState *fmstate);
+static void deallocate_query(PgFdwModifyState *fmstate);
 static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
@@ -505,6 +525,7 @@ static void apply_table_options(PgFdwRelationInfo *fpinfo);
 static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 							  const PgFdwRelationInfo *fpinfo_o,
 							  const PgFdwRelationInfo *fpinfo_i);
+static int get_batch_size_option(Relation rel);
 
 
 /*
@@ -530,6 +551,8 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	routine->PlanForeignModify = postgresPlanForeignModify;
 	routine->BeginForeignModify = postgresBeginForeignModify;
 	routine->ExecForeignInsert = postgresExecForeignInsert;
+	routine->ExecForeignBatchInsert = postgresExecForeignBatchInsert;
+	routine->GetForeignModifyBatchSize = postgresGetForeignModifyBatchSize;
 	routine->ExecForeignUpdate = postgresExecForeignUpdate;
 	routine->ExecForeignDelete = postgresExecForeignDelete;
 	routine->EndForeignModify = postgresEndForeignModify;
@@ -1665,6 +1688,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 	List	   *returningList = NIL;
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
+	int			values_end_len = -1;
 
 	initStringInfo(&sql);
 
@@ -1752,7 +1776,7 @@ postgresPlanForeignModify(PlannerInfo *root,
 			deparseInsertSql(&sql, rte, resultRelation, rel,
 							 targetAttrs, doNothing,
 							 withCheckOptionList, returningList,
-							 &retrieved_attrs);
+							 &retrieved_attrs, &values_end_len);
 			break;
 		case CMD_UPDATE:
 			deparseUpdateSql(&sql, rte, resultRelation, rel,
@@ -1776,8 +1800,9 @@ postgresPlanForeignModify(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwModifyPrivateIndex, above.
 	 */
-	return list_make4(makeString(sql.data),
+	return list_make5(makeString(sql.data),
 					  targetAttrs,
+					  makeInteger(values_end_len),
 					  makeInteger((retrieved_attrs != NIL)),
 					  retrieved_attrs);
 }
@@ -1797,6 +1822,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 	char	   *query;
 	List	   *target_attrs;
 	bool		has_returning;
+	int			values_end_len;
 	List	   *retrieved_attrs;
 	RangeTblEntry *rte;
 
@@ -1812,6 +1838,8 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 							FdwModifyPrivateUpdateSql));
 	target_attrs = (List *) list_nth(fdw_private,
 									 FdwModifyPrivateTargetAttnums);
+	values_end_len = intVal(list_nth(fdw_private,
+									FdwModifyPrivateLen));
 	has_returning = intVal(list_nth(fdw_private,
 									FdwModifyPrivateHasReturning));
 	retrieved_attrs = (List *) list_nth(fdw_private,
@@ -1829,6 +1857,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
 									mtstate->mt_plans[subplan_index]->plan,
 									query,
 									target_attrs,
+									values_end_len,
 									has_returning,
 									retrieved_attrs);
 
@@ -1846,7 +1875,8 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *planSlot)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
-	TupleTableSlot *rslot;
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
 
 	/*
 	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
@@ -1855,7 +1885,36 @@ postgresExecForeignInsert(EState *estate,
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
-								   slot, planSlot);
+								   &slot, &planSlot, &numSlots);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot ? *rslot : NULL;
+}
+
+/*
+ * postgresExecForeignBatchInsert
+ *		Insert multiple rows into a foreign table
+ */
+static TupleTableSlot **
+postgresExecForeignBatchInsert(EState *estate,
+						  ResultRelInfo *resultRelInfo,
+						  TupleTableSlot **slots,
+						  TupleTableSlot **planSlots,
+						  int *numSlots)
+{
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot **rslot;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+								   slots, planSlots, numSlots);
 	/* Revert that change */
 	if (fmstate->aux_fmstate)
 		resultRelInfo->ri_FdwState = fmstate;
@@ -1864,6 +1923,42 @@ postgresExecForeignInsert(EState *estate,
 }
 
 /*
+ * postgresGetForeignModifyBatchSize
+ *		Determine the maximum number of tuples that can be inserted in bulk
+ *
+ * Returns the batch size specified for server or table. When batching is not
+ * allowed (e.g. for tables with AFTER ROW triggers or with RETURNING clause),
+ * returns 1.
+ */
+static int
+postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
+{
+	int	batch_size;
+
+	/* should be called only once */
+	Assert(resultRelInfo->ri_BatchSize == 0);
+
+	/*
+	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
+	 * the option directly in server/table options. Otherwise just use the
+	 * value we determined earlier.
+	 */
+	if (resultRelInfo->ri_FdwState)
+		batch_size = ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+	else
+		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
+
+	/* Disable batching when we have to use RETURNING. */
+	if (resultRelInfo->ri_projectReturning != NULL ||
+		(resultRelInfo->ri_TrigDesc &&
+		 resultRelInfo->ri_TrigDesc->trig_insert_after_row))
+		return 1;
+
+	/* Otherwise use the batch size specified for server/table. */
+	return batch_size;
+}
+
+/*
  * postgresExecForeignUpdate
  *		Update one row in a foreign table
  */
@@ -1873,8 +1968,13 @@ postgresExecForeignUpdate(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_UPDATE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1887,8 +1987,13 @@ postgresExecForeignDelete(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
-								  slot, planSlot);
+	TupleTableSlot **rslot;
+	int 			numSlots = 1;
+
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_DELETE,
+								  &slot, &planSlot, &numSlots);
+
+	return rslot ? rslot[0] : NULL;
 }
 
 /*
@@ -1925,6 +2030,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	RangeTblEntry *rte;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			attnum;
+	int			values_end_len;
 	StringInfoData sql;
 	List	   *targetAttrs = NIL;
 	List	   *retrieved_attrs = NIL;
@@ -2001,7 +2107,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	deparseInsertSql(&sql, rte, resultRelation, rel, targetAttrs, doNothing,
 					 resultRelInfo->ri_WithCheckOptions,
 					 resultRelInfo->ri_returningList,
-					 &retrieved_attrs);
+					 &retrieved_attrs, &values_end_len);
 
 	/* Construct an execution state. */
 	fmstate = create_foreign_modify(mtstate->ps.state,
@@ -2011,6 +2117,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									NULL,
 									sql.data,
 									targetAttrs,
+									values_end_len,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
@@ -2636,6 +2743,13 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
 										  FdwModifyPrivateUpdateSql));
 
 		ExplainPropertyText("Remote SQL", sql, es);
+
+		/*
+		 * For INSERT we should always have batch size >= 1, but UPDATE
+		 * and DELETE don't support batching so don't show the property.
+		 */
+		if (rinfo->ri_BatchSize > 0)
+			ExplainPropertyInteger("Batch Size", NULL, rinfo->ri_BatchSize, es);
 	}
 }
 
@@ -3530,6 +3644,7 @@ create_foreign_modify(EState *estate,
 					  Plan *subplan,
 					  char *query,
 					  List *target_attrs,
+					  int values_end,
 					  bool has_returning,
 					  List *retrieved_attrs)
 {
@@ -3564,7 +3679,10 @@ create_foreign_modify(EState *estate,
 
 	/* Set up remote query information. */
 	fmstate->query = query;
+	if (operation == CMD_INSERT)
+		fmstate->orig_query = pstrdup(fmstate->query);
 	fmstate->target_attrs = target_attrs;
+	fmstate->values_end = values_end;
 	fmstate->has_returning = has_returning;
 	fmstate->retrieved_attrs = retrieved_attrs;
 
@@ -3616,6 +3734,12 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Set batch_size from foreign server/table options. */
+	if (operation == CMD_INSERT)
+		fmstate->batch_size = get_batch_size_option(rel);
+
+	fmstate->num_slots = 1;
+
 	/* Initialize auxiliary state */
 	fmstate->aux_fmstate = NULL;
 
@@ -3626,26 +3750,50 @@ create_foreign_modify(EState *estate,
  * execute_foreign_modify
  *		Perform foreign-table modification as required, and fetch RETURNING
  *		result if any.  (This is the shared guts of postgresExecForeignInsert,
- *		postgresExecForeignUpdate, and postgresExecForeignDelete.)
+ *		postgresExecForeignBatchInsert, postgresExecForeignUpdate, and
+ *		postgresExecForeignDelete.)
  */
-static TupleTableSlot *
+static TupleTableSlot **
 execute_foreign_modify(EState *estate,
 					   ResultRelInfo *resultRelInfo,
 					   CmdType operation,
-					   TupleTableSlot *slot,
-					   TupleTableSlot *planSlot)
+					   TupleTableSlot **slots,
+					   TupleTableSlot **planSlots,
+					   int *numSlots)
 {
 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	ItemPointer ctid = NULL;
 	const char **p_values;
 	PGresult   *res;
 	int			n_rows;
+	StringInfoData sql;
 
 	/* The operation should be INSERT, UPDATE, or DELETE */
 	Assert(operation == CMD_INSERT ||
 		   operation == CMD_UPDATE ||
 		   operation == CMD_DELETE);
 
+	/*
+	 * If the existing query was deparsed and prepared for a different number
+	 * of rows, rebuild it for the proper number.
+	 */
+	if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
+	{
+		/* Destroy the prepared statement created previously */
+		if (fmstate->p_name)
+			deallocate_query(fmstate);
+
+		/*
+		 * Build INSERT string with numSlots records in its VALUES clause.
+		 */
+		initStringInfo(&sql);
+		rebuildInsertSql(&sql, fmstate->orig_query, fmstate->values_end,
+						 fmstate->p_nums, *numSlots - 1);
+		pfree(fmstate->query);
+		fmstate->query = sql.data;
+		fmstate->num_slots = *numSlots;
+	}
+
 	/* Set up the prepared statement on the remote server, if we didn't yet */
 	if (!fmstate->p_name)
 		prepare_foreign_modify(fmstate);
@@ -3658,7 +3806,7 @@ execute_foreign_modify(EState *estate,
 		Datum		datum;
 		bool		isNull;
 
-		datum = ExecGetJunkAttribute(planSlot,
+		datum = ExecGetJunkAttribute(planSlots[0],
 									 fmstate->ctidAttno,
 									 &isNull);
 		/* shouldn't ever get a null result... */
@@ -3668,14 +3816,14 @@ execute_foreign_modify(EState *estate,
 	}
 
 	/* Convert parameters needed by prepared statement to text form */
-	p_values = convert_prep_stmt_params(fmstate, ctid, slot);
+	p_values = convert_prep_stmt_params(fmstate, ctid, slots, *numSlots);
 
 	/*
 	 * Execute the prepared statement.
 	 */
 	if (!PQsendQueryPrepared(fmstate->conn,
 							 fmstate->p_name,
-							 fmstate->p_nums,
+							 fmstate->p_nums * (*numSlots),
 							 p_values,
 							 NULL,
 							 NULL,
@@ -3696,9 +3844,10 @@ execute_foreign_modify(EState *estate,
 	/* Check number of rows affected, and fetch RETURNING tuple if any */
 	if (fmstate->has_returning)
 	{
+		Assert(*numSlots == 1);
 		n_rows = PQntuples(res);
 		if (n_rows > 0)
-			store_returning_result(fmstate, slot, res);
+			store_returning_result(fmstate, slots[0], res);
 	}
 	else
 		n_rows = atoi(PQcmdTuples(res));
@@ -3708,10 +3857,12 @@ execute_foreign_modify(EState *estate,
 
 	MemoryContextReset(fmstate->temp_cxt);
 
+	*numSlots = n_rows;
+
 	/*
 	 * Return NULL if nothing was inserted/updated/deleted on the remote end
 	 */
-	return (n_rows > 0) ? slot : NULL;
+	return (n_rows > 0) ? slots : NULL;
 }
 
 /*
@@ -3771,52 +3922,64 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 static const char **
 convert_prep_stmt_params(PgFdwModifyState *fmstate,
 						 ItemPointer tupleid,
-						 TupleTableSlot *slot)
+						 TupleTableSlot **slots,
+						 int numSlots)
 {
 	const char **p_values;
+	int			i;
+	int			j;
 	int			pindex = 0;
 	MemoryContext oldcontext;
 
 	oldcontext = MemoryContextSwitchTo(fmstate->temp_cxt);
 
-	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums);
+	p_values = (const char **) palloc(sizeof(char *) * fmstate->p_nums * numSlots);
+
+	/* ctid is provided only for UPDATE/DELETE, which don't allow batching */
+	Assert(!(tupleid != NULL && numSlots > 1));
 
 	/* 1st parameter should be ctid, if it's in use */
 	if (tupleid != NULL)
 	{
+		Assert(numSlots == 1);
 		/* don't need set_transmission_modes for TID output */
 		p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
 											  PointerGetDatum(tupleid));
 		pindex++;
 	}
 
-	/* get following parameters from slot */
-	if (slot != NULL && fmstate->target_attrs != NIL)
+	/* get following parameters from slots */
+	if (slots != NULL && fmstate->target_attrs != NIL)
 	{
 		int			nestlevel;
 		ListCell   *lc;
 
 		nestlevel = set_transmission_modes();
 
-		foreach(lc, fmstate->target_attrs)
+		for (i = 0; i < numSlots; i++)
 		{
-			int			attnum = lfirst_int(lc);
-			Datum		value;
-			bool		isnull;
+			j = (tupleid != NULL) ? 1 : 0;
+			foreach(lc, fmstate->target_attrs)
+			{
+				int			attnum = lfirst_int(lc);
+				Datum		value;
+				bool		isnull;
 
-			value = slot_getattr(slot, attnum, &isnull);
-			if (isnull)
-				p_values[pindex] = NULL;
-			else
-				p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[pindex],
-													  value);
-			pindex++;
+				value = slot_getattr(slots[i], attnum, &isnull);
+				if (isnull)
+					p_values[pindex] = NULL;
+				else
+					p_values[pindex] = OutputFunctionCall(&fmstate->p_flinfo[j],
+														  value);
+				pindex++;
+				j++;
+			}
 		}
 
 		reset_transmission_modes(nestlevel);
 	}
 
-	Assert(pindex == fmstate->p_nums);
+	Assert(pindex == fmstate->p_nums * numSlots);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -3870,23 +4033,7 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 	Assert(fmstate != NULL);
 
 	/* If we created a prepared statement, destroy it */
-	if (fmstate->p_name)
-	{
-		char		sql[64];
-		PGresult   *res;
-
-		snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
-
-		/*
-		 * We don't use a PG_TRY block here, so be careful not to throw error
-		 * without releasing the PGresult.
-		 */
-		res = pgfdw_exec_query(fmstate->conn, sql);
-		if (PQresultStatus(res) != PGRES_COMMAND_OK)
-			pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
-		PQclear(res);
-		fmstate->p_name = NULL;
-	}
+	deallocate_query(fmstate);
 
 	/* Release remote connection */
 	ReleaseConnection(fmstate->conn);
@@ -3894,6 +4041,34 @@ finish_foreign_modify(PgFdwModifyState *fmstate)
 }
 
 /*
+ * deallocate_query
+ *		Deallocate a prepared statement for a foreign insert/update/delete
+ *		operation
+ */
+static void
+deallocate_query(PgFdwModifyState *fmstate)
+{
+	char		sql[64];
+	PGresult   *res;
+
+	/* do nothing if the query is not allocated */
+	if (!fmstate->p_name)
+		return;
+
+	snprintf(sql, sizeof(sql), "DEALLOCATE %s", fmstate->p_name);
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = pgfdw_exec_query(fmstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, fmstate->conn, true, sql);
+	PQclear(res);
+	fmstate->p_name = NULL;
+}
+
+/*
  * build_remote_returning
  *		Build a RETURNING targetlist of a remote query for performing an
  *		UPDATE/DELETE .. RETURNING on a join directly
@@ -6577,3 +6752,45 @@ find_em_expr_for_input_target(PlannerInfo *root,
 	elog(ERROR, "could not find pathkey item to sort");
 	return NULL;				/* keep compiler quiet */
 }
+
+/*
+ * Determine batch size for a given foreign table. The option specified for
+ * a table has precedence.
+ */
+static int
+get_batch_size_option(Relation rel)
+{
+	Oid foreigntableid = RelationGetRelid(rel);
+	ForeignTable *table;
+	ForeignServer *server;
+	List	   *options;
+	ListCell   *lc;
+
+	/* we use 1 by default, which means "no batching" */
+	int batch_size = 1;
+
+	/*
+	 * Load options for table and server. We append server options after
+	 * table options, because table options take precedence.
+	 */
+	table = GetForeignTable(foreigntableid);
+	server = GetForeignServer(table->serverid);
+
+	options = NIL;
+	options = list_concat(options, table->options);
+	options = list_concat(options, server->options);
+
+	/* See if either table or server specifies batch_size. */
+	foreach(lc, options)
+	{
+		DefElem    *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "batch_size") == 0)
+		{
+			batch_size = strtol(defGetString(def), NULL, 10);
+			break;
+		}
+	}
+
+	return batch_size;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 19ea27a..1f67b4d 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -161,7 +161,10 @@ extern void deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs, bool doNothing,
 							 List *withCheckOptionList, List *returningList,
-							 List **retrieved_attrs);
+							 List **retrieved_attrs, int *values_end_len);
+extern void rebuildInsertSql(StringInfo buf, char *orig_query,
+							 int values_end_len, int num_cols,
+							 int num_rows);
 extern void deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
 							 Index rtindex, Relation rel,
 							 List *targetAttrs,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index ebf6eb1..28b82f5 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2738,3 +2738,96 @@ COMMIT;
 -- should not be output because they should be closed at the end of
 -- the above transaction.
 SELECT * FROM postgres_fdw_get_connections() ORDER BY 1;
+
+-- ===================================================================
+-- batch insert
+-- ===================================================================
+
+BEGIN;
+
+CREATE SERVER batch10 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( batch_size '10' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+ALTER SERVER batch10 OPTIONS( SET batch_size '20' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=10'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'batch10'
+AND srvoptions @> array['batch_size=20'];
+
+CREATE FOREIGN TABLE table30 ( x int ) SERVER batch10 OPTIONS ( batch_size '30' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+ALTER FOREIGN TABLE table30 OPTIONS ( SET batch_size '40');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=30'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30'::regclass
+AND ftoptions @> array['batch_size=40'];
+
+ROLLBACK;
+
+CREATE TABLE batch_table ( x int );
+
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '10' );
+EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(1, 10) i;
+INSERT INTO ftable SELECT * FROM generate_series(11, 31) i;
+INSERT INTO ftable VALUES (32);
+INSERT INTO ftable VALUES (33), (34);
+SELECT COUNT(*) FROM ftable;
+TRUNCATE batch_table;
+DROP FOREIGN TABLE ftable;
+
+-- Disable batch insert
+CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '1' );
+EXPLAIN (VERBOSE, COSTS OFF) INSERT INTO ftable VALUES (1), (2);
+INSERT INTO ftable VALUES (1), (2);
+SELECT COUNT(*) FROM ftable;
+DROP FOREIGN TABLE ftable;
+DROP TABLE batch_table;
+
+-- Use partitioning
+CREATE TABLE batch_table ( x int ) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0', batch_size '10');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1', batch_size '1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 3, REMAINDER 2);
+
+INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
+SELECT COUNT(*) FROM batch_table;
+
+-- Clean up
+DROP TABLE batch_table CASCADE;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 9c92934..854913a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -523,8 +523,9 @@ BeginForeignModify(ModifyTableState *mtstate,
      Begin executing a foreign table modification operation.  This routine is
      called during executor startup.  It should perform any initialization
      needed prior to the actual table modifications.  Subsequently,
-     <function>ExecForeignInsert</function>, <function>ExecForeignUpdate</function> or
-     <function>ExecForeignDelete</function> will be called for each tuple to be
+     <function>ExecForeignInsert/ExecForeignBatchInsert</function>,
+     <function>ExecForeignUpdate</function> or
+     <function>ExecForeignDelete</function> will be called for tuple(s) to be
      inserted, updated, or deleted.
     </para>
 
@@ -614,6 +615,81 @@ ExecForeignInsert(EState *estate,
 
     <para>
 <programlisting>
+TupleTableSlot **
+ExecForeignBatchInsert(EState *estate,
+                  ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  TupleTableSlot *planSlots,
+                  int *numSlots);
+</programlisting>
+
+     Insert multiple tuples in bulk into the foreign table.
+     The parameters are the same for <function>ExecForeignInsert</function>
+     except <literal>slots</literal> and <literal>planSlots</literal> contain
+     multiple tuples and <literal>*numSlots></literal> specifies the number of
+     tuples in those arrays.
+    </para>
+
+    <para>
+     The return value is an array of slots containing the data that was
+     actually inserted (this might differ from the data supplied, for
+     example as a result of trigger actions.)
+     The passed-in <literal>slots</literal> can be re-used for this purpose.
+     The number of successfully inserted tuples is returned in
+     <literal>*numSlots</literal>.
+    </para>
+
+    <para>
+     The data in the returned slot is used only if the <command>INSERT</command>
+     statement involves a view
+     <literal>WITH CHECK OPTION</literal>; or if the foreign table has
+     an <literal>AFTER ROW</literal> trigger.  Triggers require all columns,
+     but the FDW could choose to optimize away returning some or all columns
+     depending on the contents of the
+     <literal>WITH CHECK OPTION</literal> constraints.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetForeignModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+     This function is not used if the <command>INSERT</command> has the
+     <literal>RETURNING></literal> clause.
+    </para>
+
+    <para>
+     Note that this function is also called when inserting routed tuples into
+     a foreign-table partition.  See the callback functions
+     described below that allow the FDW to support that.
+    </para>
+
+    <para>
+<programlisting>
+int
+GetForeignModifyBatchSize(ResultRelInfo *rinfo);
+</programlisting>
+
+     Report the maximum number of tuples that a single
+     <function>ExecForeignBatchInsert</function> call can handle for
+     the specified foreign table.  That is, The executor passes at most
+     the number of tuples that this function returns to
+     <function>ExecForeignBatchInsert</function>.
+     <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
+     the target foreign table.
+     The FDW is expected to provide a foreign server and/or foreign
+     table option for the user to set this value, or some hard-coded value.
+    </para>
+
+    <para>
+     If the <function>ExecForeignBatchInsert</function> or
+     <function>GetForeignModifyBatchSize</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will
+     use <function>ExecForeignInsert</function>.
+    </para>
+
+    <para>
+<programlisting>
 TupleTableSlot *
 ExecForeignUpdate(EState *estate,
                   ResultRelInfo *rinfo,
@@ -741,8 +817,9 @@ BeginForeignInsert(ModifyTableState *mtstate,
      in both cases when it is the partition chosen for tuple routing and the
      target specified in a <command>COPY FROM</command> command.  It should
      perform any initialization needed prior to the actual insertion.
-     Subsequently, <function>ExecForeignInsert</function> will be called for
-     each tuple to be inserted into the foreign table.
+     Subsequently, <function>ExecForeignInsert</function> or
+     <function>ExecForeignBatchInsert</function> will be called for
+     tuple(s) to be inserted into the foreign table.
     </para>
 
     <para>
@@ -773,8 +850,8 @@ BeginForeignInsert(ModifyTableState *mtstate,
     <para>
      Note that if the FDW does not support routable foreign-table partitions
      and/or executing <command>COPY FROM</command> on foreign tables, this
-     function or <function>ExecForeignInsert</function> subsequently called
-     must throw error as needed.
+     function or <function>ExecForeignInsert/ExecForeignBatchInsert</function>
+     subsequently called must throw error as needed.
     </para>
 
     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 6a91926..690f42b 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -354,6 +354,19 @@ OPTIONS (ADD password_required 'false');
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><literal>batch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</filename>
+       should insert in each insert operation. It can be specified for a
+       foreign table or a foreign server. The option specified on a table
+       overrides an option specified for the server.
+       The default is <literal>1</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+
    </variablelist>
 
   </sect3>
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a..1746cb8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -993,6 +993,23 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
 
+	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table or for this particular query).
+	 *
+	 * If the FDW does not support batching, we set the batch size to 1.
+	 */
+	if (partRelInfo->ri_FdwRoutine != NULL &&
+		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		partRelInfo->ri_BatchSize =
+			partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(partRelInfo);
+	else
+		partRelInfo->ri_BatchSize = 1;
+
+	Assert(partRelInfo->ri_BatchSize >= 1);
+
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 921e695..9c36860 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -58,6 +58,13 @@
 #include "utils/rel.h"
 
 
+static void ExecBatchInsert(ModifyTableState *mtstate,
+								 ResultRelInfo *resultRelInfo,
+								 TupleTableSlot **slots,
+								 TupleTableSlot **planSlots,
+								 int numSlots,
+								 EState *estate,
+								 bool canSetTag);
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 								 ResultRelInfo *resultRelInfo,
 								 ItemPointer conflictTid,
@@ -389,6 +396,7 @@ ExecInsert(ModifyTableState *mtstate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	MemoryContext oldContext;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -442,6 +450,55 @@ ExecInsert(ModifyTableState *mtstate,
 									   CMD_INSERT);
 
 		/*
+		 * If the FDW supports batching, and batching is requested, accumulate
+		 * rows and insert them in batches. Otherwise use the per-row inserts.
+		 */
+		if (resultRelInfo->ri_BatchSize > 1)
+		{
+			/*
+			 * If a certain number of tuples have already been accumulated,
+			 * or a tuple has come for a different relation than that for
+			 * the accumulated tuples, perform the batch insert
+			 */
+			if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize)
+			{
+				ExecBatchInsert(mtstate, resultRelInfo,
+							   resultRelInfo->ri_Slots,
+							   resultRelInfo->ri_PlanSlots,
+							   resultRelInfo->ri_NumSlots,
+							   estate, canSetTag);
+				resultRelInfo->ri_NumSlots = 0;
+			}
+
+			oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+			if (resultRelInfo->ri_Slots == NULL)
+			{
+				resultRelInfo->ri_Slots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+				resultRelInfo->ri_PlanSlots = palloc(sizeof(TupleTableSlot *) *
+										   resultRelInfo->ri_BatchSize);
+			}
+
+			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+										 slot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+										 planSlot->tts_ops);
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
+			resultRelInfo->ri_NumSlots++;
+
+			MemoryContextSwitchTo(oldContext);
+
+			return NULL;
+		}
+
+		/*
 		 * insert into foreign table: let the FDW do it
 		 */
 		slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
@@ -699,6 +756,70 @@ ExecInsert(ModifyTableState *mtstate,
 }
 
 /* ----------------------------------------------------------------
+ *		ExecBatchInsert
+ *
+ *		Insert multiple tuples in an efficient way.
+ *		Currently, this handles inserting into a foreign table without
+ *		RETURNING clause.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecBatchInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
+		   TupleTableSlot **slots,
+		   TupleTableSlot **planSlots,
+		   int numSlots,
+		   EState *estate,
+		   bool canSetTag)
+{
+	int			i;
+	int			numInserted = numSlots;
+	TupleTableSlot *slot = NULL;
+	TupleTableSlot **rslots;
+
+	/*
+	 * insert into foreign table: let the FDW do it
+	 */
+	rslots = resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
+																 resultRelInfo,
+																 slots,
+																 planSlots,
+																 &numInserted);
+
+	for (i = 0; i < numInserted; i++)
+	{
+		slot = rslots[i];
+
+		/*
+		 * AFTER ROW Triggers or RETURNING expressions might reference the
+		 * tableoid column, so (re-)initialize tts_tableOid before evaluating
+		 * them.
+		 */
+		slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, NIL,
+							 mtstate->mt_transition_capture);
+
+		/*
+		 * Check any WITH CHECK OPTION constraints from parent views.  See the
+		 * comment in ExecInsert.
+		 */
+		if (resultRelInfo->ri_WithCheckOptions != NIL)
+			ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	}
+
+	if (canSetTag && numInserted > 0)
+		estate->es_processed += numInserted;
+
+	for (i = 0; i < numSlots; i++)
+	{
+		ExecDropSingleTupleTableSlot(slots[i]);
+		ExecDropSingleTupleTableSlot(planSlots[i]);
+	}
+}
+
+/* ----------------------------------------------------------------
  *		ExecDelete
  *
  *		DELETE is like UPDATE, except that we delete the tuple and no
@@ -1937,6 +2058,9 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
+	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+	List				  *relinfos = NIL;
+	ListCell			  *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2153,6 +2277,25 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
+	 * Insert remaining tuples for batch insert.
+	 */
+	if (proute)
+		relinfos = estate->es_tuple_routing_result_relations;
+	else
+		relinfos = estate->es_opened_result_relations;
+
+	foreach(lc, relinfos)
+	{
+		resultRelInfo = lfirst(lc);
+		if (resultRelInfo->ri_NumSlots > 0)
+			ExecBatchInsert(node, resultRelInfo,
+						   resultRelInfo->ri_Slots,
+						   resultRelInfo->ri_PlanSlots,
+						   resultRelInfo->ri_NumSlots,
+						   estate, node->canSetTag);
+	}
+
+	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
 	fireASTriggers(node);
@@ -2651,6 +2794,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (!resultRelInfo->ri_usesFdwDirectModify &&
+		operation == CMD_INSERT &&
+		resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
+	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
 	 * to estate->es_auxmodifytables so that it will be run to completion by
 	 * ExecPostprocessPlan.  (It'd actually work fine to add the primary
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index c4eba6b..dbf6b30 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -277,6 +277,21 @@ list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 	return list;
 }
 
+List *
+list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+				ListCell datum3, ListCell datum4, ListCell datum5)
+{
+	List	   *list = new_list(t, 5);
+
+	list->elements[0] = datum1;
+	list->elements[1] = datum2;
+	list->elements[2] = datum3;
+	list->elements[3] = datum4;
+	list->elements[4] = datum5;
+	check_list_invariants(list);
+	return list;
+}
+
 /*
  * Make room for a new head cell in the given (non-NIL) list.
  *
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 2953499..248f78d 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -85,6 +85,14 @@ typedef TupleTableSlot *(*ExecForeignInsert_function) (EState *estate,
 													   TupleTableSlot *slot,
 													   TupleTableSlot *planSlot);
 
+typedef TupleTableSlot **(*ExecForeignBatchInsert_function) (EState *estate,
+													   ResultRelInfo *rinfo,
+													   TupleTableSlot **slots,
+													   TupleTableSlot **planSlots,
+													   int *numSlots);
+
+typedef int (*GetForeignModifyBatchSize_function) (ResultRelInfo *rinfo);
+
 typedef TupleTableSlot *(*ExecForeignUpdate_function) (EState *estate,
 													   ResultRelInfo *rinfo,
 													   TupleTableSlot *slot,
@@ -209,6 +217,8 @@ typedef struct FdwRoutine
 	PlanForeignModify_function PlanForeignModify;
 	BeginForeignModify_function BeginForeignModify;
 	ExecForeignInsert_function ExecForeignInsert;
+	ExecForeignBatchInsert_function ExecForeignBatchInsert;
+	GetForeignModifyBatchSize_function GetForeignModifyBatchSize;
 	ExecForeignUpdate_function ExecForeignUpdate;
 	ExecForeignDelete_function ExecForeignDelete;
 	EndForeignModify_function EndForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 48c3f57..d65099c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -446,6 +446,12 @@ typedef struct ResultRelInfo
 	/* true when modifying foreign table directly */
 	bool		ri_usesFdwDirectModify;
 
+	/* batch insert stuff */
+	int			ri_NumSlots;		/* number of slots in the array */
+	int			ri_BatchSize;		/* max slots inserted in a single batch */
+	TupleTableSlot **ri_Slots;		/* input tuples for batch insert */
+	TupleTableSlot **ri_PlanSlots;
+
 	/* list of WithCheckOption's to be checked */
 	List	   *ri_WithCheckOptions;
 
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 710dcd3..404e03f 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -213,6 +213,10 @@ list_length(const List *l)
 #define list_make4(x1,x2,x3,x4) \
 	list_make4_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
 					list_make_ptr_cell(x3), list_make_ptr_cell(x4))
+#define list_make5(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_List, list_make_ptr_cell(x1), list_make_ptr_cell(x2), \
+					list_make_ptr_cell(x3), list_make_ptr_cell(x4), \
+					list_make_ptr_cell(x5))
 
 #define list_make1_int(x1) \
 	list_make1_impl(T_IntList, list_make_int_cell(x1))
@@ -224,6 +228,10 @@ list_length(const List *l)
 #define list_make4_int(x1,x2,x3,x4) \
 	list_make4_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
 					list_make_int_cell(x3), list_make_int_cell(x4))
+#define list_make5_int(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_IntList, list_make_int_cell(x1), list_make_int_cell(x2), \
+					list_make_int_cell(x3), list_make_int_cell(x4), \
+					list_make_int_cell(x5))
 
 #define list_make1_oid(x1) \
 	list_make1_impl(T_OidList, list_make_oid_cell(x1))
@@ -235,6 +243,10 @@ list_length(const List *l)
 #define list_make4_oid(x1,x2,x3,x4) \
 	list_make4_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
 					list_make_oid_cell(x3), list_make_oid_cell(x4))
+#define list_make5_oid(x1,x2,x3,x4,x5) \
+	list_make5_impl(T_OidList, list_make_oid_cell(x1), list_make_oid_cell(x2), \
+					list_make_oid_cell(x3), list_make_oid_cell(x4), \
+					list_make_oid_cell(x5))
 
 /*
  * Locate the n'th cell (counting from 0) of the list.
@@ -520,6 +532,9 @@ extern List *list_make3_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3);
 extern List *list_make4_impl(NodeTag t, ListCell datum1, ListCell datum2,
 							 ListCell datum3, ListCell datum4);
+extern List *list_make5_impl(NodeTag t, ListCell datum1, ListCell datum2,
+							 ListCell datum3, ListCell datum4,
+							 ListCell datum5);
 
 extern pg_nodiscard List *lappend(List *list, void *datum);
 extern pg_nodiscard List *lappend_int(List *list, int datum);
-- 
2.10.1

#68Amit Langote
amitlangote09@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#67)
Re: POC: postgres_fdw insert batching

Tsunakawa-san,

On Tue, Jan 19, 2021 at 12:50 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

OK. Can you prepare a final patch, squashing all the commits into a
single one, and perhaps use the function in create_foreign_modify?

Attached, including the message fix pointed by Zaihong-san.

Thanks for adopting my suggestions regarding GetForeignModifyBatchSize().

I apologize in advance for being maybe overly pedantic, but I noticed
that, in ExecInitModifyTable(), you decided to place the call outside
the loop that goes over resultRelations (shown below), although my
intent was to ask to place it next to the BeginForeignModify() in that
loop.

resultRelInfo = mtstate->resultRelInfo;
i = 0;
forboth(l, node->resultRelations, l1, node->plans)
{
...
/* Also let FDWs init themselves for foreign-table result rels */
if (!resultRelInfo->ri_usesFdwDirectModify &&
resultRelInfo->ri_FdwRoutine != NULL &&
resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
{
List *fdw_private = (List *) list_nth(node->fdwPrivLists, i);

resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
resultRelInfo,
fdw_private,
i,
eflags);
}

Maybe it's fine today because we only care about inserts and there's
always only one entry in the resultRelations list in that case, but
that may not remain the case in the future.

--
Amit Langote
EDB: http://www.enterprisedb.com

#69tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Langote (#68)
RE: POC: postgres_fdw insert batching

From: Amit Langote <amitlangote09@gmail.com>

I apologize in advance for being maybe overly pedantic, but I noticed
that, in ExecInitModifyTable(), you decided to place the call outside
the loop that goes over resultRelations (shown below), although my
intent was to ask to place it next to the BeginForeignModify() in that
loop.

Actually, I tried to do it (adding the GetModifyBatchSize() call after BeginForeignModify()), but it failed. Because postgresfdwGetModifyBatchSize() wants to know if RETURNING is specified, and ResultRelInfo->projectReturning is created after the above part. Considering the context where GetModifyBatchSize() implementations may want to know the environment, I placed the call as late as possible in the initialization phase. As for the future(?) multi-target DML statements, I think we can change this together with other many(?) parts that assume a single target table.

Regards
Takayuki Tsunakawa

#70Amit Langote
amitlangote09@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#69)
Re: POC: postgres_fdw insert batching

On Tue, Jan 19, 2021 at 2:06 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

I apologize in advance for being maybe overly pedantic, but I noticed
that, in ExecInitModifyTable(), you decided to place the call outside
the loop that goes over resultRelations (shown below), although my
intent was to ask to place it next to the BeginForeignModify() in that
loop.

Actually, I tried to do it (adding the GetModifyBatchSize() call after BeginForeignModify()), but it failed. Because postgresfdwGetModifyBatchSize() wants to know if RETURNING is specified, and ResultRelInfo->projectReturning is created after the above part. Considering the context where GetModifyBatchSize() implementations may want to know the environment, I placed the call as late as possible in the initialization phase. As for the future(?) multi-target DML statements, I think we can change this together with other many(?) parts that assume a single target table.

Okay, sometime later then.

I wasn't sure if bringing it up here would be appropriate, but there's
a patch by me to refactor ModfiyTable result relation allocation that
will have to remember to move this code along to an appropriate place
[1]: https://commitfest.postgresql.org/31/2621/
handled. I will remember it when rebasing my patch over this.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: https://commitfest.postgresql.org/31/2621/

#71Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#70)
Re: POC: postgres_fdw insert batching

On 1/19/21 7:23 AM, Amit Langote wrote:

On Tue, Jan 19, 2021 at 2:06 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

I apologize in advance for being maybe overly pedantic, but I noticed
that, in ExecInitModifyTable(), you decided to place the call outside
the loop that goes over resultRelations (shown below), although my
intent was to ask to place it next to the BeginForeignModify() in that
loop.

Actually, I tried to do it (adding the GetModifyBatchSize() call after BeginForeignModify()), but it failed. Because postgresfdwGetModifyBatchSize() wants to know if RETURNING is specified, and ResultRelInfo->projectReturning is created after the above part. Considering the context where GetModifyBatchSize() implementations may want to know the environment, I placed the call as late as possible in the initialization phase. As for the future(?) multi-target DML statements, I think we can change this together with other many(?) parts that assume a single target table.

Okay, sometime later then.

I wasn't sure if bringing it up here would be appropriate, but there's
a patch by me to refactor ModfiyTable result relation allocation that
will have to remember to move this code along to an appropriate place
[1]. Thanks for the tip about the dependency on how RETURNING is
handled. I will remember it when rebasing my patch over this.

Thanks. The last version (v12) should be addressing all the comments and
seems fine to me, so barring objections I'll get that pushed shortly.

One thing that seems a bit annoying is that with the partitioned table
the explain (verbose) looks like this:

QUERY PLAN
-----------------------------------------------------
Insert on public.batch_table
-> Function Scan on pg_catalog.generate_series i
Output: i.i
Function Call: generate_series(1, 66)
(4 rows)

That is, there's no information about the batch size :-( But AFAICS
that's due to how explain shows (or rather does not) partitions in this
type of plan.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#72Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#71)
Re: POC: postgres_fdw insert batching

On Wed, Jan 20, 2021 at 1:01 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/19/21 7:23 AM, Amit Langote wrote:

On Tue, Jan 19, 2021 at 2:06 PM tsunakawa.takay@fujitsu.com

Actually, I tried to do it (adding the GetModifyBatchSize() call after BeginForeignModify()), but it failed. Because postgresfdwGetModifyBatchSize() wants to know if RETURNING is specified, and ResultRelInfo->projectReturning is created after the above part. Considering the context where GetModifyBatchSize() implementations may want to know the environment, I placed the call as late as possible in the initialization phase. As for the future(?) multi-target DML statements, I think we can change this together with other many(?) parts that assume a single target table.

Okay, sometime later then.

I wasn't sure if bringing it up here would be appropriate, but there's
a patch by me to refactor ModfiyTable result relation allocation that
will have to remember to move this code along to an appropriate place
[1]. Thanks for the tip about the dependency on how RETURNING is
handled. I will remember it when rebasing my patch over this.

Thanks. The last version (v12) should be addressing all the comments and
seems fine to me, so barring objections I'll get that pushed shortly.

+1

One thing that seems a bit annoying is that with the partitioned table
the explain (verbose) looks like this:

QUERY PLAN
-----------------------------------------------------
Insert on public.batch_table
-> Function Scan on pg_catalog.generate_series i
Output: i.i
Function Call: generate_series(1, 66)
(4 rows)

That is, there's no information about the batch size :-( But AFAICS
that's due to how explain shows (or rather does not) partitions in this
type of plan.

Yeah. Partition result relations are always lazily allocated for
INSERT, so EXPLAIN (without ANALYZE) has no idea what to show for
them, nor does it know which partitions will be used in the first
place. With ANALYZE however, you could get them from
es_tuple_routing_result_relations and maybe list them if you want, but
that sounds like a project on its own.

--
Amit Langote
EDB: http://www.enterprisedb.com

#73Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#72)
Re: POC: postgres_fdw insert batching

OK, pushed after a little bit of additional polishing (mostly comments).

Thanks everyone!

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#74Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#73)
1 attachment(s)
Re: POC: postgres_fdw insert batching

Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&amp;dt=2021-01-20%2023%3A08%3A15

It's a i386 machine running FreeBSD, so not sure what exactly it's picky
about. But when I tried running this under valgrind, I get some strange
failures in the new chunk in ExecInitModifyTable:

/*
* Determine if the FDW supports batch insert and determine the batch
* size (a FDW may support batching, but it may be disabled for the
* server/table).
*/
if (!resultRelInfo->ri_usesFdwDirectModify &&
operation == CMD_INSERT &&
resultRelInfo->ri_FdwRoutine != NULL &&
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
else
resultRelInfo->ri_BatchSize = 1;

Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like
that. I wouldn't be surprised if the 32-bit machine was pickier and
failing because of that.

A sample of the valgrind log is attached. It's pretty much just
repetitions of these three reports.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

valgrind.logtext/x-log; charset=UTF-8; name=valgrind.logDownload
#75Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#73)
Re: POC: postgres_fdw insert batching

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

OK, pushed after a little bit of additional polishing (mostly comments).
Thanks everyone!

florican reports this is seriously broken on 32-bit hardware:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&amp;dt=2021-01-20%2023%3A08%3A15

First guess is incorrect memory-allocation computations ...

regards, tom lane

#76Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#74)
Re: POC: postgres_fdw insert batching

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

diff --git a/src/backend/executor/nodeModifyTable.c
b/src/backend/executor/nodeModifyTable.c
index 9c36860704..a6a814454d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2798,17 +2798,17 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
      * size (a FDW may support batching, but it may be disabled for the
      * server/table).
      */
-    if (!resultRelInfo->ri_usesFdwDirectModify &&
+    if (!mtstate->resultRelInfo->ri_usesFdwDirectModify &&
         operation == CMD_INSERT &&
-        resultRelInfo->ri_FdwRoutine != NULL &&
-        resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-        resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-        resultRelInfo->ri_BatchSize =
-
 resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+        mtstate->resultRelInfo->ri_FdwRoutine != NULL &&
+        mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+        mtstate->resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+        mtstate->resultRelInfo->ri_BatchSize =
+
 mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(mtstate->resultRelInfo);
     else
-        resultRelInfo->ri_BatchSize = 1;
+        mtstate->resultRelInfo->ri_BatchSize = 1;
-    Assert(resultRelInfo->ri_BatchSize >= 1);
+    Assert(mtstate->resultRelInfo->ri_BatchSize >= 1);

/*
* Lastly, if this is not the primary (canSetTag) ModifyTable node,
add it

Cheers

On Wed, Jan 20, 2021 at 3:52 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Show quoted text

Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&amp;dt=2021-01-20%2023%3A08%3A15

It's a i386 machine running FreeBSD, so not sure what exactly it's picky
about. But when I tried running this under valgrind, I get some strange
failures in the new chunk in ExecInitModifyTable:

/*
* Determine if the FDW supports batch insert and determine the batch
* size (a FDW may support batching, but it may be disabled for the
* server/table).
*/
if (!resultRelInfo->ri_usesFdwDirectModify &&
operation == CMD_INSERT &&
resultRelInfo->ri_FdwRoutine != NULL &&
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
else
resultRelInfo->ri_BatchSize = 1;

Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like
that. I wouldn't be surprised if the 32-bit machine was pickier and
failing because of that.

A sample of the valgrind log is attached. It's pretty much just
repetitions of these three reports.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#77Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#74)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On 1/21/21 12:52 AM, Tomas Vondra wrote:

Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&amp;dt=2021-01-20%2023%3A08%3A15

It's a i386 machine running FreeBSD, so not sure what exactly it's picky
about. But when I tried running this under valgrind, I get some strange
failures in the new chunk in ExecInitModifyTable:

  /*
   * Determine if the FDW supports batch insert and determine the batch
   * size (a FDW may support batching, but it may be disabled for the
   * server/table).
   */
  if (!resultRelInfo->ri_usesFdwDirectModify &&
      operation == CMD_INSERT &&
      resultRelInfo->ri_FdwRoutine != NULL &&
      resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
      resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
      resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
  else
      resultRelInfo->ri_BatchSize = 1;

  Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like
that. I wouldn't be surprised if the 32-bit machine was pickier and
failing because of that.

A sample of the valgrind log is attached. It's pretty much just
repetitions of these three reports.

OK, it's definitely accessing uninitialized memory, because the
resultRelInfo (on line 2801, i.e. the "if" condition) looks like this:

(gdb) p resultRelInfo
$1 = (ResultRelInfo *) 0xe595988
(gdb) p *resultRelInfo
$2 = {type = 2139062142, ri_RangeTableIndex = 2139062143,
ri_RelationDesc = 0x7f7f7f7f7f7f7f7f, ri_NumIndices = 2139062143,
ri_IndexRelationDescs = 0x7f7f7f7f7f7f7f7f, ri_IndexRelationInfo =
0x7f7f7f7f7f7f7f7f,
ri_TrigDesc = 0x7f7f7f7f7f7f7f7f, ri_TrigFunctions =
0x7f7f7f7f7f7f7f7f, ri_TrigWhenExprs = 0x7f7f7f7f7f7f7f7f,
ri_TrigInstrument = 0x7f7f7f7f7f7f7f7f, ri_ReturningSlot =
0x7f7f7f7f7f7f7f7f, ri_TrigOldSlot = 0x7f7f7f7f7f7f7f7f,
ri_TrigNewSlot = 0x7f7f7f7f7f7f7f7f, ri_FdwRoutine =
0x7f7f7f7f7f7f7f7f, ri_FdwState = 0x7f7f7f7f7f7f7f7f,
ri_usesFdwDirectModify = 127, ri_NumSlots = 2139062143, ri_BatchSize =
2139062143, ri_Slots = 0x7f7f7f7f7f7f7f7f,
ri_PlanSlots = 0x7f7f7f7f7f7f7f7f, ri_WithCheckOptions =
0x7f7f7f7f7f7f7f7f, ri_WithCheckOptionExprs = 0x7f7f7f7f7f7f7f7f,
ri_ConstraintExprs = 0x7f7f7f7f7f7f7f7f, ri_GeneratedExprs =
0x7f7f7f7f7f7f7f7f,
ri_NumGeneratedNeeded = 2139062143, ri_junkFilter =
0x7f7f7f7f7f7f7f7f, ri_returningList = 0x7f7f7f7f7f7f7f7f,
ri_projectReturning = 0x7f7f7f7f7f7f7f7f, ri_onConflictArbiterIndexes =
0x7f7f7f7f7f7f7f7f,
ri_onConflict = 0x7f7f7f7f7f7f7f7f, ri_PartitionCheckExpr =
0x7f7f7f7f7f7f7f7f, ri_PartitionRoot = 0x7f7f7f7f7f7f7f7f,
ri_RootToPartitionMap = 0x8, ri_PartitionTupleSlot = 0x8,
ri_ChildToRootMap = 0xe5952b0,
ri_CopyMultiInsertBuffer = 0xe596740}
(gdb)

I may be wrong, but the most likely explanation seems to be this is due
to the junk filter initialization, which simply moves past the end of
the mtstate->resultRelInfo array.

It kinda seems the GetForeignModifyBatchSize call should happen before
that block. The attached patch fixes this for me (i.e. regression tests
pass with no valgrind reports.

Or did I get that wrong?

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

modifytable-fix.patchtext/x-patch; charset=UTF-8; name=modifytable-fix.patchDownload
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..2ac4999dc8 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2672,6 +2672,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
+	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (!resultRelInfo->ri_usesFdwDirectModify &&
+		operation == CMD_INSERT &&
+		resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* select first subplan */
 	mtstate->mt_whichplan = 0;
 	subplan = (Plan *) linitial(node->plans);
@@ -2793,23 +2810,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
-	/*
-	 * Determine if the FDW supports batch insert and determine the batch
-	 * size (a FDW may support batching, but it may be disabled for the
-	 * server/table).
-	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
-	else
-		resultRelInfo->ri_BatchSize = 1;
-
-	Assert(resultRelInfo->ri_BatchSize >= 1);
-
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
 	 * to estate->es_auxmodifytables so that it will be run to completion by
#78Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tom Lane (#75)
Re: POC: postgres_fdw insert batching

On 1/21/21 12:59 AM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

OK, pushed after a little bit of additional polishing (mostly comments).
Thanks everyone!

florican reports this is seriously broken on 32-bit hardware:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&amp;dt=2021-01-20%2023%3A08%3A15

First guess is incorrect memory-allocation computations ...

I know, although it seems more like an access to unitialized memory.
I've already posted a patch that resolves that for me on 64-bits (per
valgrind, I suppose it's the same issue).

I'm working on reproducing it on 32-bits, hopefully it won't take long.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#79Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Zhihong Yu (#76)
Re: POC: postgres_fdw insert batching

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

        if (junk_filter_needed)
        {
            resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

IMO the issue is that code iterates over all plans and moves to the next
for each one:

resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.

I'm pretty amazed this did not crash during any of the many regression
runs I did recently.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#80Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#79)
Re: POC: postgres_fdw insert batching

Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the start
of loop and the end of the loop:

* Determine if the FDW supports batch insert and determine the batch
* size (a FDW may support batching, but it may be disabled for the
* server/table).

My patch would reflect that change. I guess this was the reason the if /
else block was placed there in the first place.

Cheers

On Wed, Jan 20, 2021 at 4:56 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Show quoted text

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

IMO the issue is that code iterates over all plans and moves to the next
for each one:

resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.

I'm pretty amazed this did not crash during any of the many regression
runs I did recently.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#81Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Zhihong Yu (#80)
Re: POC: postgres_fdw insert batching

On 1/21/21 2:02 AM, Zhihong Yu wrote:

Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the
start of loop and the end of the loop:

     * Determine if the FDW supports batch insert and determine the batch
     * size (a FDW may support batching, but it may be disabled for the
     * server/table).

My patch would reflect that change. I guess this was the reason the if /
else block was placed there in the first place.

But can it change? All the loop does is extracting junk attributes from
the plans, it does not modify anything related to the batching. Or maybe
I just don't understand what you mean.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#82Zhihong Yu
zyu@yugabyte.com
In reply to: Tomas Vondra (#81)
Re: POC: postgres_fdw insert batching

Hi,
Do we need to consider how this part of code inside ExecInitModifyTable()
would evolve ?

I think placing the compound condition toward the end
of ExecInitModifyTable() is reasonable because it checks the latest
information.

Regards

On Wed, Jan 20, 2021 at 5:11 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Show quoted text

On 1/21/21 2:02 AM, Zhihong Yu wrote:

Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the
start of loop and the end of the loop:

* Determine if the FDW supports batch insert and determine the

batch

* size (a FDW may support batching, but it may be disabled for the
* server/table).

My patch would reflect that change. I guess this was the reason the if /
else block was placed there in the first place.

But can it change? All the loop does is extracting junk attributes from
the plans, it does not modify anything related to the batching. Or maybe
I just don't understand what you mean.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#83Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#77)
Re: POC: postgres_fdw insert batching

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

I may be wrong, but the most likely explanation seems to be this is due
to the junk filter initialization, which simply moves past the end of
the mtstate->resultRelInfo array.

resultRelInfo is certainly pointing at garbage at that point.

It kinda seems the GetForeignModifyBatchSize call should happen before
that block. The attached patch fixes this for me (i.e. regression tests
pass with no valgrind reports.

Or did I get that wrong?

Don't we need to initialize ri_BatchSize for *each* resultrelinfo,
not merely the first one? That is, this new code needs to be
somewhere inside a loop over the result rels.

regards, tom lane

#84Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#79)
Re: POC: postgres_fdw insert batching

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

IMO the issue is that code iterates over all plans and moves to the next
for each one:

resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.

Accessing mtstate->resultRelInfo directly would do. The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

/*
* Initialize RETURNING projections if needed.
*/
if (node->returningLists)
{
....
/*
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
foreach(l, node->returningLists)
{
List *rlist = (List *) lfirst(l);

resultRelInfo->ri_returningList = rlist;
resultRelInfo->ri_projectReturning =
ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
resultRelInfo->ri_RelationDesc->rd_att);
resultRelInfo++;
}
}

--
Amit Langote
EDB: http://www.enterprisedb.com

#85tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Zhihong Yu (#82)
RE: POC: postgres_fdw insert batching

From: Zhihong Yu <zyu@yugabyte.com>

Do we need to consider how this part of code inside ExecInitModifyTable() would evolve ?

I think placing the compound condition toward the end of ExecInitModifyTable() is reasonable because it checks the latest information.

+1 for Zaihong-san's idea. But instead of rewriting every relsultRelInfo to mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately before the if block.

Thanks a lot, all for helping to solve the problem quickly!

Regards
Takayuki Tsunakawa

#86Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#84)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On 1/21/21 2:24 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

IMO the issue is that code iterates over all plans and moves to the next
for each one:

resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.

Accessing mtstate->resultRelInfo directly would do. The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

/*
* Initialize RETURNING projections if needed.
*/
if (node->returningLists)
{
....
/*
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
foreach(l, node->returningLists)
{
List *rlist = (List *) lfirst(l);

resultRelInfo->ri_returningList = rlist;
resultRelInfo->ri_projectReturning =
ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
resultRelInfo->ri_RelationDesc->rd_att);
resultRelInfo++;
}
}

Right. But I think Tom is right this should initialize ri_BatchSize for
all the resultRelInfo elements, not just the first one. Per the attached
patch, which resolves the issue both on x86_64 and armv7l for me.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

modifytable-fix-2.patchtext/x-patch; charset=UTF-8; name=modifytable-fix-2.patchDownload
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..10febcae8a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2798,17 +2798,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * size (a FDW may support batching, but it may be disabled for the
 	 * server/table).
 	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
-	else
-		resultRelInfo->ri_BatchSize = 1;
+	resultRelInfo = mtstate->resultRelInfo;
+	for (i = 0; i < nplans; i++)
+	{
+		if (!resultRelInfo->ri_usesFdwDirectModify &&
+			operation == CMD_INSERT &&
+			resultRelInfo->ri_FdwRoutine != NULL &&
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+			resultRelInfo->ri_BatchSize =
+				resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+		else
+			resultRelInfo->ri_BatchSize = 1;
+
+		Assert(resultRelInfo->ri_BatchSize >= 1);
 
-	Assert(resultRelInfo->ri_BatchSize >= 1);
+		resultRelInfo++;
+	}
 
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
#87Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tom Lane (#83)
Re: POC: postgres_fdw insert batching

On 1/21/21 2:22 AM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

I may be wrong, but the most likely explanation seems to be this is due
to the junk filter initialization, which simply moves past the end of
the mtstate->resultRelInfo array.

resultRelInfo is certainly pointing at garbage at that point.

Yup. It's pretty amazing the x86 machines seem to be mostly OK with it.

It kinda seems the GetForeignModifyBatchSize call should happen before
that block. The attached patch fixes this for me (i.e. regression tests
pass with no valgrind reports.

Or did I get that wrong?

Don't we need to initialize ri_BatchSize for *each* resultrelinfo,
not merely the first one? That is, this new code needs to be
somewhere inside a loop over the result rels.

Yeah, I think you're right. That's an embarrassing oversight :-(

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#88Zhihong Yu
zyu@yugabyte.com
In reply to: tsunakawa.takay@fujitsu.com (#85)
Re: POC: postgres_fdw insert batching

Hi, Takayuki-san:
My first name is Zhihong.

You can call me Ted if you want to save some typing :-)

Cheers

On Wed, Jan 20, 2021 at 5:37 PM tsunakawa.takay@fujitsu.com <
tsunakawa.takay@fujitsu.com> wrote:

Show quoted text

From: Zhihong Yu <zyu@yugabyte.com>

Do we need to consider how this part of code inside

ExecInitModifyTable() would evolve ?

I think placing the compound condition toward the end of

ExecInitModifyTable() is reasonable because it checks the latest
information.

+1 for Zaihong-san's idea. But instead of rewriting every relsultRelInfo
to mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to
suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately
before the if block.

Thanks a lot, all for helping to solve the problem quickly!

Regards

Takayuki Tsunakawa

#89tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#86)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Right. But I think Tom is right this should initialize ri_BatchSize for all the
resultRelInfo elements, not just the first one. Per the attached patch, which
resolves the issue both on x86_64 and armv7l for me.

I think Your patch is perfect in the sense that it's ready for the future multi-target DML support. +1

Just for learning, could anyone tell me what this loop for? I thought current Postgres's DML supports a single target table, so it's enough to handle the first element of mtstate->resultRelInfo. In that sense, Amit-san and I agreed that we don't put the if block in the for loop yet.

Regards
Takayuki Tsunakawa

#90Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#86)
Re: POC: postgres_fdw insert batching

On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 2:24 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

IMO the issue is that code iterates over all plans and moves to the next
for each one:

resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.

Accessing mtstate->resultRelInfo directly would do. The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

/*
* Initialize RETURNING projections if needed.
*/
if (node->returningLists)
{
....
/*
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
foreach(l, node->returningLists)
{
List *rlist = (List *) lfirst(l);

resultRelInfo->ri_returningList = rlist;
resultRelInfo->ri_projectReturning =
ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
resultRelInfo->ri_RelationDesc->rd_att);
resultRelInfo++;
}
}

Right. But I think Tom is right this should initialize ri_BatchSize for
all the resultRelInfo elements, not just the first one. Per the attached
patch, which resolves the issue both on x86_64 and armv7l for me.

+1 in general. To avoid looping uselessly in the case of
UPDATE/DELETE where batching can't be used today, I'd suggest putting
if (operation == CMD_INSERT) around the loop.

--
Amit Langote
EDB: http://www.enterprisedb.com

#91tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Zhihong Yu (#88)
RE: POC: postgres_fdw insert batching

From: Zhihong Yu <zyu@yugabyte.com>

My first name is Zhihong.

You can call me Ted if you want to save some typing :-)

Ah, I'm very sorry. Thank you, let me call you Ted then. That can't be mistaken.

Regards
Takayuki Tsunakawa

#92Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#90)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On 1/21/21 2:53 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 2:24 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

IMO the issue is that code iterates over all plans and moves to the next
for each one:

resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.

Accessing mtstate->resultRelInfo directly would do. The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

/*
* Initialize RETURNING projections if needed.
*/
if (node->returningLists)
{
....
/*
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
foreach(l, node->returningLists)
{
List *rlist = (List *) lfirst(l);

resultRelInfo->ri_returningList = rlist;
resultRelInfo->ri_projectReturning =
ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
resultRelInfo->ri_RelationDesc->rd_att);
resultRelInfo++;
}
}

Right. But I think Tom is right this should initialize ri_BatchSize for
all the resultRelInfo elements, not just the first one. Per the attached
patch, which resolves the issue both on x86_64 and armv7l for me.

+1 in general. To avoid looping uselessly in the case of
UPDATE/DELETE where batching can't be used today, I'd suggest putting
if (operation == CMD_INSERT) around the loop.

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the
right fix to push.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

modifytable-fix-3.patchtext/x-patch; charset=UTF-8; name=modifytable-fix-3.patchDownload
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..a4870d621a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2797,18 +2797,30 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * Determine if the FDW supports batch insert and determine the batch
 	 * size (a FDW may support batching, but it may be disabled for the
 	 * server/table).
+	 *
+	 * We only do this for INSERT, so that for UPDATE/DELETE the value
+	 * remains set to 0.
 	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
-	else
-		resultRelInfo->ri_BatchSize = 1;
+	if (operation == CMD_INSERT)
+	{
+		resultRelInfo = mtstate->resultRelInfo;
+		for (i = 0; i < nplans; i++)
+		{
+			if (!resultRelInfo->ri_usesFdwDirectModify &&
+				operation == CMD_INSERT &&
+				resultRelInfo->ri_FdwRoutine != NULL &&
+				resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+				resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+				resultRelInfo->ri_BatchSize =
+					resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+			else
+				resultRelInfo->ri_BatchSize = 1;
+
+			Assert(resultRelInfo->ri_BatchSize >= 1);
 
-	Assert(resultRelInfo->ri_BatchSize >= 1);
+			resultRelInfo++;
+		}
+	}
 
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
#93Tom Lane
tgl@sss.pgh.pa.us
In reply to: tsunakawa.takay@fujitsu.com (#89)
Re: POC: postgres_fdw insert batching

"tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> writes:

Just for learning, could anyone tell me what this loop for? I thought current Postgres's DML supports a single target table, so it's enough to handle the first element of mtstate->resultRelInfo.

The "single target table" could be partitioned, in which case there'll be
multiple resultrelinfos, some of which could be foreign tables.

regards, tom lane

#94tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tomas Vondra (#92)
RE: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the right fix to
push.

Thanks to the outer check for operation == CMD_INSERT, the inner one became unnecessary.

Regards
Takayuki Tsunakawa

#95tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Tom Lane (#93)
RE: POC: postgres_fdw insert batching

From: Tom Lane <tgl@sss.pgh.pa.us>

The "single target table" could be partitioned, in which case there'll be
multiple resultrelinfos, some of which could be foreign tables.

Thank you. I thought so at first, but later I found that ExecInsert() only handles one element in mtstate->resultRelInfo. So I thought just the first element is processed in INSERT case.

I understood (guessed) the for loop is for UPDATE and DELETE. EXPLAIN without ANALYZE UPDATE/DELETE on a partitioned table shows partitions, which would be mtstate->resultRelInfo. EXPLAIN on INSERT doesn't show partitions, so I think INSERT will find relevant partitions based on input rows during execution.

Regards
Takayuki Tsunakawa

#96Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: tsunakawa.takay@fujitsu.com (#94)
Re: POC: postgres_fdw insert batching

On 1/21/21 3:09 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the right fix to
push.

Thanks to the outer check for operation == CMD_INSERT, the inner one became unnecessary.

Right. I've pushed the fix, hopefully buildfarm will get happy again.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#97Ian Lawrence Barwick
barwick@gmail.com
In reply to: Tomas Vondra (#73)
1 attachment(s)
Re: POC: postgres_fdw insert batching

Hi

2021年1月21日(木) 8:00 Tomas Vondra <tomas.vondra@enterprisedb.com>:

OK, pushed after a little bit of additional polishing (mostly comments).

Thanks everyone!

There's a minor typo in the doc's version of the ExecForeignBatchInsert()
declaration;
is:

TupleTableSlot **
ExecForeignBatchInsert(EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot **slots,
TupleTableSlot *planSlots,
int *numSlots);

should be:

TupleTableSlot **
ExecForeignBatchInsert(EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot **slots,
TupleTableSlot **planSlots,
int *numSlots);

(Trivial patch attached).

Regards

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

Attachments:

doc-fdw-batch-insert-fix.v1.patchtext/x-patch; charset=US-ASCII; name=doc-fdw-batch-insert-fix.v1.patchDownload
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 854913ae5f..2e73d296d2 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -619,7 +619,7 @@ TupleTableSlot **
 ExecForeignBatchInsert(EState *estate,
                   ResultRelInfo *rinfo,
                   TupleTableSlot **slots,
-                  TupleTableSlot *planSlots,
+                  TupleTableSlot **planSlots,
                   int *numSlots);
 </programlisting>
 
#98Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#96)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 3:09 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the right fix to
push.

Thanks to the outer check for operation == CMD_INSERT, the inner one became unnecessary.

Right. I've pushed the fix, hopefully buildfarm will get happy again.

I was looking at this and it looks like we've got a problematic case
where postgresGetForeignModifyBatchSize() is called from
ExecInitRoutingInfo().

That case is when the insert is performed as part of a cross-partition
update of a partitioned table containing postgres_fdw foreign table
partitions. Because we don't check the operation in
ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
inserts attempt to use batching. However the ResultRelInfo may be one
for the original update operation, so ri_FdwState contains a
PgFdwModifyState with batch_size set to 0, because updates don't
support batching. As things stand now,
postgresGetForeignModifyBatchSize() simply returns that, tripping the
following Assert in the caller.

Assert(partRelInfo->ri_BatchSize >= 1);

Use this example to see the crash:

create table p (a int) partition by list (a);
create table p1 (like p);
create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create foreign table fp1 (a int) server lb options (table_name 'p1');
alter table p attach partition fp1 for values in (1);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig before update on fp1 for each row execute function
report_trig_details();
create table p2 partition of p for values in (2);
insert into p values (2);
update p set a = 1; -- crashes

So we let's check mtstate->operation == CMD_INSERT in
ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
in cross-update situations where mtstate->operation would be
CMD_UPDATE.

I've attached a patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

0001-Prevent-FDW-insert-batching-during-cross-partition-u.patchapplication/octet-stream; name=0001-Prevent-FDW-insert-batching-during-cross-partition-u.patchDownload
From 868e7ca8f660a75949e095cbaa56386e0c4655f2 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Sat, 23 Jan 2021 16:35:21 +0900
Subject: [PATCH] Prevent FDW insert batching during cross-partition updates

A cross-partition update of a partitioned table is internally
implemented as delete+insert.  As things stand now, even those
inserts inadvertently end up using batching, because
ExecInitRoutingInfo() that is in the charge of initialing
ri_BatchSize for partitions doesn't check the original operation,
which would be CMD_UPDATE.  That may pose a problem if the target
of such an insert is a foreign partition, because its FDW may not
be able to handle both the original update and the batched insert
being performed at the same time.

For now, prevent batching in such cases.
---
 contrib/postgres_fdw/postgres_fdw.c  | 13 +++++++++++--
 src/backend/executor/execPartition.c |  3 ++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8648be0b81..a9eeeb87f4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,17 +1934,26 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
+							NULL;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
+	/*
+	 * Should never get called when the insert is being performed as part of
+	 * a row movement operation.
+	 */
+	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (resultRelInfo->ri_FdwState)
-		batch_size = ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+	if (fmstate)
+		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 1746cb8793..882397bc30 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1000,7 +1000,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
+	if (mtstate->operation == CMD_INSERT &&
+		partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
-- 
2.24.1

#99Zhihong Yu
zyu@yugabyte.com
In reply to: Amit Langote (#98)
Re: POC: postgres_fdw insert batching

Amit:
Good catch.

bq. ExecInitRoutingInfo() that is in the charge of initialing

Should be 'ExecInitRoutingInfo() that is in charge of initializing'

Cheers

On Sat, Jan 23, 2021 at 12:31 AM Amit Langote <amitlangote09@gmail.com>
wrote:

Show quoted text

On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 3:09 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Right, that's pretty much what I ended up doing (without the

CMD_INSERT

check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's

the right fix to

push.

Thanks to the outer check for operation == CMD_INSERT, the inner one

became unnecessary.

Right. I've pushed the fix, hopefully buildfarm will get happy again.

I was looking at this and it looks like we've got a problematic case
where postgresGetForeignModifyBatchSize() is called from
ExecInitRoutingInfo().

That case is when the insert is performed as part of a cross-partition
update of a partitioned table containing postgres_fdw foreign table
partitions. Because we don't check the operation in
ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
inserts attempt to use batching. However the ResultRelInfo may be one
for the original update operation, so ri_FdwState contains a
PgFdwModifyState with batch_size set to 0, because updates don't
support batching. As things stand now,
postgresGetForeignModifyBatchSize() simply returns that, tripping the
following Assert in the caller.

Assert(partRelInfo->ri_BatchSize >= 1);

Use this example to see the crash:

create table p (a int) partition by list (a);
create table p1 (like p);
create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create foreign table fp1 (a int) server lb options (table_name 'p1');
alter table p attach partition fp1 for values in (1);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig before update on fp1 for each row execute function
report_trig_details();
create table p2 partition of p for values in (2);
insert into p values (2);
update p set a = 1; -- crashes

So we let's check mtstate->operation == CMD_INSERT in
ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
in cross-update situations where mtstate->operation would be
CMD_UPDATE.

I've attached a patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

#100Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#98)
Re: POC: postgres_fdw insert batching

On 1/23/21 9:31 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/21/21 3:09 AM, tsunakawa.takay@fujitsu.com wrote:

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the right fix to
push.

Thanks to the outer check for operation == CMD_INSERT, the inner one became unnecessary.

Right. I've pushed the fix, hopefully buildfarm will get happy again.

I was looking at this and it looks like we've got a problematic case
where postgresGetForeignModifyBatchSize() is called from
ExecInitRoutingInfo().

That case is when the insert is performed as part of a cross-partition
update of a partitioned table containing postgres_fdw foreign table
partitions. Because we don't check the operation in
ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
inserts attempt to use batching. However the ResultRelInfo may be one
for the original update operation, so ri_FdwState contains a
PgFdwModifyState with batch_size set to 0, because updates don't
support batching. As things stand now,
postgresGetForeignModifyBatchSize() simply returns that, tripping the
following Assert in the caller.

Assert(partRelInfo->ri_BatchSize >= 1);

Use this example to see the crash:

create table p (a int) partition by list (a);
create table p1 (like p);
create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create foreign table fp1 (a int) server lb options (table_name 'p1');
alter table p attach partition fp1 for values in (1);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig before update on fp1 for each row execute function
report_trig_details();
create table p2 partition of p for values in (2);
insert into p values (2);
update p set a = 1; -- crashes

So we let's check mtstate->operation == CMD_INSERT in
ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
in cross-update situations where mtstate->operation would be
CMD_UPDATE.

I've attached a patch.

Thanks for catching this. I think it'd be good if the fix included a
regression test. The example seems like a good starting point, not sure
if it can be simplified further.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#101Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#100)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On Sun, Jan 24, 2021 at 2:17 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 1/23/21 9:31 AM, Amit Langote wrote:

I was looking at this and it looks like we've got a problematic case
where postgresGetForeignModifyBatchSize() is called from
ExecInitRoutingInfo().

That case is when the insert is performed as part of a cross-partition
update of a partitioned table containing postgres_fdw foreign table
partitions. Because we don't check the operation in
ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
inserts attempt to use batching. However the ResultRelInfo may be one
for the original update operation, so ri_FdwState contains a
PgFdwModifyState with batch_size set to 0, because updates don't
support batching. As things stand now,
postgresGetForeignModifyBatchSize() simply returns that, tripping the
following Assert in the caller.

Assert(partRelInfo->ri_BatchSize >= 1);

Use this example to see the crash:

create table p (a int) partition by list (a);
create table p1 (like p);
create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create foreign table fp1 (a int) server lb options (table_name 'p1');
alter table p attach partition fp1 for values in (1);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig before update on fp1 for each row execute function
report_trig_details();
create table p2 partition of p for values in (2);
insert into p values (2);
update p set a = 1; -- crashes

So we let's check mtstate->operation == CMD_INSERT in
ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
in cross-update situations where mtstate->operation would be
CMD_UPDATE.

I've attached a patch.

Thanks for catching this. I think it'd be good if the fix included a
regression test. The example seems like a good starting point, not sure
if it can be simplified further.

Yes, it can be simplified by using a local join to prevent the update
of the foreign partition from being pushed to the remote server, for
which my example in the previous email used a local trigger. Note
that the update of the foreign partition to be done locally is a
prerequisite for this bug to occur.

I've added that simplified test case in the attached updated patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Prevent-FDW-insert-batching-during-cross-partitio.patchapplication/octet-stream; name=v2-0001-Prevent-FDW-insert-batching-during-cross-partitio.patchDownload
From baf9b08711a8854ee48bbeef7e66c372aba61933 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Sat, 23 Jan 2021 16:35:21 +0900
Subject: [PATCH v2] Prevent FDW insert batching during cross-partition updates

A cross-partition update of a partitioned table is internally
implemented as delete+insert.  As things stand now, even those
inserts inadvertently end up using batching, because
ExecInitRoutingInfo() that is in charge of initializing ri_BatchSize
for partitions doesn't check the original operation, which would be
CMD_UPDATE.  That may pose a problem if the target of such an insert
is a foreign partition, because its FDW may not be able to handle
both the original update and the batched insert being performed at
the same time.

Prevent insert batching in such cases, for now.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 23 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c           | 13 +++++++++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 19 ++++++++++++++-
 src/backend/executor/execPartition.c          |  3 ++-
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b4a04d2c14..666e597f54 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9251,5 +9251,26 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+       tableoid       | a 
+----------------------+---
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test1_f | 1
+(2 rows)
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 8648be0b81..a9eeeb87f4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,17 +1934,26 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
+							NULL;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
+	/*
+	 * Should never get called when the insert is being performed as part of
+	 * a row movement operation.
+	 */
+	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (resultRelInfo->ri_FdwState)
-		batch_size = ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+	if (fmstate)
+		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 28b82f5f9d..8435817ea2 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2829,5 +2829,22 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 1746cb8793..882397bc30 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1000,7 +1000,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
+	if (mtstate->operation == CMD_INSERT &&
+		partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
-- 
2.24.1

#102tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Langote (#101)
RE: POC: postgres_fdw insert batching

From: Amit Langote <amitlangote09@gmail.com>

Yes, it can be simplified by using a local join to prevent the update of the foreign
partition from being pushed to the remote server, for which my example in the
previous email used a local trigger. Note that the update of the foreign
partition to be done locally is a prerequisite for this bug to occur.

Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it partway. Good catch (and my bad miss.)

+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
+							NULL;

This can be written as:

+ PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;

Regards
Takayuki Tsunakawa

#103Ian Lawrence Barwick
barwick@gmail.com
In reply to: Ian Lawrence Barwick (#97)
Re: POC: postgres_fdw insert batching

2021年1月22日(金) 14:50 Ian Lawrence Barwick <barwick@gmail.com>:

Hi

2021年1月21日(木) 8:00 Tomas Vondra <tomas.vondra@enterprisedb.com>:

OK, pushed after a little bit of additional polishing (mostly comments).

Thanks everyone!

There's a minor typo in the doc's version of the ExecForeignBatchInsert()
declaration;
is:

TupleTableSlot **
ExecForeignBatchInsert(EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot **slots,
TupleTableSlot *planSlots,
int *numSlots);

should be:

TupleTableSlot **
ExecForeignBatchInsert(EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot **slots,
TupleTableSlot **planSlots,
int *numSlots);

(Trivial patch attached).

Regards

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

--
EnterpriseDB: https://www.enterprisedb.com

#104Ian Lawrence Barwick
barwick@gmail.com
In reply to: Ian Lawrence Barwick (#97)
Re: POC: postgres_fdw insert batching

2021年1月22日(金) 14:50 Ian Lawrence Barwick <barwick@gmail.com>:

Hi

2021年1月21日(木) 8:00 Tomas Vondra <tomas.vondra@enterprisedb.com>:

OK, pushed after a little bit of additional polishing (mostly comments).

Thanks everyone!

There's a minor typo in the doc's version of the ExecForeignBatchInsert()
declaration;
is:

TupleTableSlot **
ExecForeignBatchInsert(EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot **slots,
TupleTableSlot *planSlots,
int *numSlots);

should be:

TupleTableSlot **
ExecForeignBatchInsert(EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot **slots,
TupleTableSlot **planSlots,
int *numSlots);

(Trivial patch attached).

Forgot to mention the relevant doc link:

https://www.postgresql.org/docs/devel/fdw-callbacks.html#FDW-CALLBACKS-UPDATE

Regards

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

#105Amit Langote
amitlangote09@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#102)
1 attachment(s)
Re: POC: postgres_fdw insert batching

Tsunakwa-san,

On Mon, Jan 25, 2021 at 1:21 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

Yes, it can be simplified by using a local join to prevent the update of the foreign
partition from being pushed to the remote server, for which my example in the
previous email used a local trigger. Note that the update of the foreign
partition to be done locally is a prerequisite for this bug to occur.

Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it partway. Good catch (and my bad miss.)

It appears I had missed your reply, sorry.

+       PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+                                                       (PgFdwModifyState *) resultRelInfo->ri_FdwState :
+                                                       NULL;

This can be written as:

+ PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;

Facepalm, yes.

Patch updated. Thanks for the review.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v3-0001-Prevent-FDW-insert-batching-during-cross-partitio.patchapplication/octet-stream; name=v3-0001-Prevent-FDW-insert-batching-during-cross-partitio.patchDownload
From 46eefa1d915cb16abc48666e5098babc9cb94150 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Sat, 23 Jan 2021 16:35:21 +0900
Subject: [PATCH v3] Prevent FDW insert batching during cross-partition updates

A cross-partition update of a partitioned table is internally
implemented as delete+insert.  As things stand now, even those
inserts inadvertently end up using batching, because
ExecInitRoutingInfo() that is in charge of initializing ri_BatchSize
for partitions doesn't check the original operation, which would be
CMD_UPDATE.  That may pose a problem if the target of such an insert
is a foreign partition, because its FDW may not be able to handle
both the original update and the batched insert being performed at
the same time.

Prevent insert batching in such cases, for now.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 23 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c           | 12 ++++++++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 19 ++++++++++++++-
 src/backend/executor/execPartition.c          |  3 ++-
 4 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b09dce63f5..f2540c614a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9396,5 +9396,26 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+       tableoid       | a 
+----------------------+---
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test1_f | 1
+(2 rows)
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2ce42ce3f1..993c6eb9d7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,17 +1934,25 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
+	PgFdwModifyState *fmstate =
+		(PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
+	/*
+	 * Should never get called when the insert is being performed as part of
+	 * a row movement operation.
+	 */
+	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (resultRelInfo->ri_FdwState)
-		batch_size = ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+	if (fmstate)
+		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 319c15d635..6329c33946 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2897,5 +2897,22 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 746cd1e9d7..5bba6c42c1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1000,7 +1000,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
+	if (mtstate->operation == CMD_INSERT &&
+		partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
-- 
2.24.1

#106tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Amit Langote (#105)
RE: POC: postgres_fdw insert batching

From: Amit Langote <amitlangote09@gmail.com>

It appears I had missed your reply, sorry.

+       PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+

(PgFdwModifyState *) resultRelInfo->ri_FdwState :

+ NULL;

This can be written as:

+       PgFdwModifyState *fmstate = (PgFdwModifyState *)
+ resultRelInfo->ri_FdwState;

Facepalm, yes.

Patch updated. Thanks for the review.

Thank you for picking this up.

Regards
Takayuki Tsunakawa

#107Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Ian Lawrence Barwick (#104)
Re: POC: postgres_fdw insert batching

On 2/5/21 2:55 AM, Ian Lawrence Barwick wrote:

...

There's a minor typo in the doc's version of the
ExecForeignBatchInsert() declaration;
is:

    TupleTableSlot **
    ExecForeignBatchInsert(EState *estate,
                      ResultRelInfo *rinfo,
                      TupleTableSlot **slots,
                      TupleTableSlot *planSlots,
                      int *numSlots);

should be:

    TupleTableSlot **
    ExecForeignBatchInsert(EState *estate,
                      ResultRelInfo *rinfo,
                      TupleTableSlot **slots,
                      TupleTableSlot **planSlots,
                      int *numSlots);

(Trivial patch attached).

Forgot to mention the relevant doc link:

   
https://www.postgresql.org/docs/devel/fdw-callbacks.html#FDW-CALLBACKS-UPDATE
<https://www.postgresql.org/docs/devel/fdw-callbacks.html#FDW-CALLBACKS-UPDATE&gt;

Thanks, I'll get this fixed.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#108Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#105)
Re: POC: postgres_fdw insert batching

On 2/5/21 3:52 AM, Amit Langote wrote:

Tsunakwa-san,

On Mon, Jan 25, 2021 at 1:21 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

Yes, it can be simplified by using a local join to prevent the update of the foreign
partition from being pushed to the remote server, for which my example in the
previous email used a local trigger. Note that the update of the foreign
partition to be done locally is a prerequisite for this bug to occur.

Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it partway. Good catch (and my bad miss.)

It appears I had missed your reply, sorry.

+       PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+                                                       (PgFdwModifyState *) resultRelInfo->ri_FdwState :
+                                                       NULL;

This can be written as:

+ PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;

Facepalm, yes.

Patch updated. Thanks for the review.

Thanks for the patch, it seems fine to me. I wonder it the commit
message needs some tweaks, though. At the moment it says:

Prevent FDW insert batching during cross-partition updates

but what the patch seems to be doing is simply initializing the info
only for CMD_INSERT operations. Which does the trick, but it affects
everything, i.e. all updates, no? Not just cross-partition updates.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#109Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#108)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/5/21 3:52 AM, Amit Langote wrote:

Tsunakwa-san,

On Mon, Jan 25, 2021 at 1:21 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

Yes, it can be simplified by using a local join to prevent the update of the foreign
partition from being pushed to the remote server, for which my example in the
previous email used a local trigger. Note that the update of the foreign
partition to be done locally is a prerequisite for this bug to occur.

Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it partway. Good catch (and my bad miss.)

It appears I had missed your reply, sorry.

+       PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+                                                       (PgFdwModifyState *) resultRelInfo->ri_FdwState :
+                                                       NULL;

This can be written as:

+ PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;

Facepalm, yes.

Patch updated. Thanks for the review.

Thanks for the patch, it seems fine to me.

Thanks for checking.

I wonder it the commit
message needs some tweaks, though. At the moment it says:

Prevent FDW insert batching during cross-partition updates

but what the patch seems to be doing is simply initializing the info
only for CMD_INSERT operations. Which does the trick, but it affects
everything, i.e. all updates, no? Not just cross-partition updates.

You're right. Please check the message in the updated patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v4-0001-Fix-tuple-routing-to-initialize-batching-only-for.patchapplication/x-patch; name=v4-0001-Fix-tuple-routing-to-initialize-batching-only-for.patchDownload
From b1d470fc764279ba12787271a04015a123d20b4f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Sat, 23 Jan 2021 16:35:21 +0900
Subject: [PATCH v4] Fix tuple routing to initialize batching only for inserts

Currently, the insert component of a cross-partition update of a
partitioned table (internally implemented as a delete followed by an
insert) inadvertently ends up using batching for the insert. That may
pose a problem if the insert target is a foreign partition, because
the partition's FDW may not be able to handle both the original update
operation and the batched insert operation being performed at the same
time.  So tighten up the check in ExecInitRoutingInfo() to initialize
batching only if the query's original operation is also INSERT.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 23 ++++++++++++++++++-
 contrib/postgres_fdw/postgres_fdw.c           | 13 +++++++++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 19 ++++++++++++++-
 src/backend/executor/execPartition.c          |  3 ++-
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 60c7e115d6..3326f1b542 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9414,5 +9414,26 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+       tableoid       | a 
+----------------------+---
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test1_f | 1
+(2 rows)
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 368997d9d1..35b48575c5 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,17 +1934,26 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
+							NULL;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
+	/*
+	 * Should never get called when the insert is being performed as part of
+	 * a row movement operation.
+	 */
+	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (resultRelInfo->ri_FdwState)
-		batch_size = ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+	if (fmstate)
+		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 151f4f1834..2b525ea44a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2909,5 +2909,22 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b9e4f2d80b..b8da4c5967 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1000,7 +1000,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (partRelInfo->ri_FdwRoutine != NULL &&
+	if (mtstate->operation == CMD_INSERT &&
+		partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
-- 
2.24.1

#110Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#109)
Re: POC: postgres_fdw insert batching

On 2/16/21 10:25 AM, Amit Langote wrote:

On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/5/21 3:52 AM, Amit Langote wrote:

Tsunakwa-san,

On Mon, Jan 25, 2021 at 1:21 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:

From: Amit Langote <amitlangote09@gmail.com>

Yes, it can be simplified by using a local join to prevent the update of the foreign
partition from being pushed to the remote server, for which my example in the
previous email used a local trigger. Note that the update of the foreign
partition to be done locally is a prerequisite for this bug to occur.

Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it partway. Good catch (and my bad miss.)

It appears I had missed your reply, sorry.

+       PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+                                                       (PgFdwModifyState *) resultRelInfo->ri_FdwState :
+                                                       NULL;

This can be written as:

+ PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;

Facepalm, yes.

Patch updated. Thanks for the review.

Thanks for the patch, it seems fine to me.

Thanks for checking.

I wonder it the commit
message needs some tweaks, though. At the moment it says:

Prevent FDW insert batching during cross-partition updates

but what the patch seems to be doing is simply initializing the info
only for CMD_INSERT operations. Which does the trick, but it affects
everything, i.e. all updates, no? Not just cross-partition updates.

You're right. Please check the message in the updated patch.

Thanks. I'm not sure I understand what "FDW may not be able to handle
both the original update operation and the batched insert operation
being performed at the same time" means. I mean, if we translate the
UPDATE into DELETE+INSERT, then we don't run both the update and insert
at the same time, right? What exactly is the problem with allowing
batching for inserts in cross-partition updates?

On a closer look, it seems the problem actually lies in a small
inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
former only set batch_size for CMD_INSERT while the latter called the
BatchSize() for all operations, expecting >= 1 result. So we may either
relax create_foreign_modify and set batch_size for all DML, or make
ExecInitRoutingInfo stricter (which is what the patches here do).

Is there a reason not to do the first thing, allowing batching of
inserts during cross-partition updates? I tried to do that, but it
dawned on me that we can't mix batched and un-batched operations, e.g.
DELETE + INSERT, because that'd break the order of execution, leading to
bogus results in case the same row is modified repeatedly, etc.

Am I getting this right?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#111Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#1)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On Wed, Feb 17, 2021 at 5:46 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Feb 17, 2021 at 12:04 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/16/21 10:25 AM, Amit Langote wrote:

On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra

Thanks for the patch, it seems fine to me.

Thanks for checking.

I wonder it the commit
message needs some tweaks, though. At the moment it says:

Prevent FDW insert batching during cross-partition updates

but what the patch seems to be doing is simply initializing the info
only for CMD_INSERT operations. Which does the trick, but it affects
everything, i.e. all updates, no? Not just cross-partition updates.

You're right. Please check the message in the updated patch.

Thanks. I'm not sure I understand what "FDW may not be able to handle
both the original update operation and the batched insert operation
being performed at the same time" means. I mean, if we translate the
UPDATE into DELETE+INSERT, then we don't run both the update and insert
at the same time, right? What exactly is the problem with allowing
batching for inserts in cross-partition updates?

Sorry, I hadn't shared enough details of my investigations when I
originally ran into this. Such as that I had considered implementing
the use of batching for these inserts too but had given up.

Now that you mention it, I think I gave a less convincing reason for
why we should avoid doing it at all. Maybe it would have been more
right to say that it is the core code, not necessarily the FDWs, that
currently fails to deal with the use of batching by the insert
component of a cross-partition update. Those failures could be
addressed as I'll describe below.

For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
to simply use the PgFdwModifyTable that is installed to handle the
insert component of a cross-partition update (one can get that one via
aux_fmstate field of the original PgFdwModifyState). However, even
though that's fine for postgres_fdw to do, what worries (had worried)
me is that it also results in scribbling on ri_BatchSize that the core
code may see to determine what to do with a particular tuple, and I
just have to hope that nodeModifyTable.c doesn't end up doing anything
unwarranted with the original update based on seeing a non-zero
ri_BatchSize. AFAICS, we are fine on that front.

That said, there are some deficiencies in the code that have to be
addressed before we can let postgres_fdw do as mentioned above. For
example, the code in ExecModifyTable() that runs after breaking out of
the loop to insert any remaining batched tuples appears to miss the
tuples batched by such inserts. Apparently, that is because the
ResultRelInfos used by those inserts are not present in
es_tuple_routing_result_relations. Turns out I had forgotten that
execPartition.c doesn't add the ResultRelInfos to that list if they
are made by ExecInitModifyTable() for the original update operation
and simply reused by ExecFindPartition() when tuples were routed to
those partitions. It can be "fixed" by reverting to the original
design in Tsunakawa-san's patch where the tuple routing result
relations were obtained from the PartitionTupleRouting data structure,
which fortunately stores all tuple routing result relations. (Sorry,
I gave wrong advice in [1] in retrospect.)

On a closer look, it seems the problem actually lies in a small
inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
former only set batch_size for CMD_INSERT while the latter called the
BatchSize() for all operations, expecting >= 1 result. So we may either
relax create_foreign_modify and set batch_size for all DML, or make
ExecInitRoutingInfo stricter (which is what the patches here do).

I think we should be fine if we make
postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
as described above. We can be sure that we are not mixing the
information used by the batched insert with that of the original
unbatched update.

Is there a reason not to do the first thing, allowing batching of
inserts during cross-partition updates? I tried to do that, but it
dawned on me that we can't mix batched and un-batched operations, e.g.
DELETE + INSERT, because that'd break the order of execution, leading to
bogus results in case the same row is modified repeatedly, etc.

Actually, postgres_fdw only supports moving a row into a partition (as
part of a cross-partition update that is) if it has already finished
performing any updates on it. So there is no worry of rows that are
moved into a partition subsequently getting updated due to the
original command.

The attached patch implements the changes necessary to make these
inserts use batching too.

[1] /messages/by-id/CA+HiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ@mail.gmail.com

Oops, I had mistakenly not hit "Reply All". Attaching the patch again.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v5-0001-Allow-batching-of-inserts-to-occur-in-some-cases.patchapplication/octet-stream; name=v5-0001-Allow-batching-of-inserts-to-occur-in-some-cases.patchDownload
From cdb9fc24e192ff50c48dbb874742c51abb83c1cd Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Sat, 23 Jan 2021 16:35:21 +0900
Subject: [PATCH v5] Allow batching of inserts to occur in some cases

Currently, the insert component of a cross-partition update of a
partitioned table inadvertently tries to use batching but it doesn't
work for two main reasons:

a) postgresGetForeignModifyBatchSize() looks at the wrong
PgFdwModifyState, one that does not belong to the insert, so has no
batching information.

b) ExecModifyTable(), when inserting any remaining batched tuples,
would fail to look at the ResultRelInfos used by such inserts because
they are not present in es_tuple_routing_result_relations, which
would result in those tuples not actually getting inserted.

This commit fixes both (a) and (b) so that those inserts can
correctly use batching.

To fix (a), postgresGetForeignModifyBatchSize() now uses the
PgFdwModifyState that actually belongs to the insert which contains
the information necessary to perform batching.

To fix (b), ExecModifyTable() now gets the ResultRelInfos to insert
any remaining batched tuples from the PartitionTupleRouting data
structure which does contain the ResultRelInfo that would be used
by the batched insert.  To implement this, this commit exposes the
definition of PartitionTupleRouting which was previously local to
execPartition.c.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 23 +++++-
 contrib/postgres_fdw/postgres_fdw.c           | 12 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 19 ++++-
 src/backend/executor/execPartition.c          | 69 ------------------
 src/backend/executor/nodeModifyTable.c        | 38 +++++++---
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 150 insertions(+), 83 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 60c7e115d6..3326f1b542 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9414,5 +9414,26 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+       tableoid       | a 
+----------------------+---
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test1_f | 1
+(2 rows)
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 368997d9d1..53472cf73c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,17 +1934,25 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
+	/*
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
+	 */
+	if (fmstate && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
+
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (resultRelInfo->ri_FdwState)
-		batch_size = ((PgFdwModifyState *) resultRelInfo->ri_FdwState)->batch_size;
+	if (fmstate)
+		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 151f4f1834..2b525ea44a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2909,5 +2909,22 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b9e4f2d80b..811997b4c8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partitions touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		'subplan_resultrel_htab'.  The remainder have been built especially
- *		for tuple routing.  See comment for PartitionDispatchData->indexes for
- *		details on how this array is indexed.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * subplan_resultrel_htab
- *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
- *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
- *		NULL in other cases.  Some of these may be useful for tuple routing
- *		to save having to build duplicates.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	int			num_partitions;
-	int			max_partitions;
-	HTAB	   *subplan_resultrel_htab;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2993ba43e3..e38d7472dc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2059,8 +2059,9 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List				  *relinfos = NIL;
-	ListCell			  *lc;
+	ResultRelInfo **relinfos;
+	int			nrels;
+	int			i;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2277,17 +2278,38 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
 	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
+	{
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		relinfos = proute->partitions;
+		nrels = proute->num_partitions;
+	}
 	else
-		relinfos = estate->es_opened_result_relations;
+	{
+		/* Otherwise, use result relations from the range table. */
+		relinfos = estate->es_result_relations;
+		nrels = estate->es_range_table_size;
+	}
 
-	foreach(lc, relinfos)
+	for (i = 0; i < nrels; i++)
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
+		resultRelInfo = relinfos[i];
+
+		/*
+		 * es_result_relations array is same length as the range table though
+		 * not all relations in it are necessarily result relations, so some
+		 * entries in it may be NULL.
+		 */
+		if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
 			ExecBatchInsert(node, resultRelInfo,
 						   resultRelInfo->ri_Slots,
 						   resultRelInfo->ri_PlanSlots,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..846261ef88 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*-----------------------
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partitions touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		'subplan_resultrel_htab'.  The remainder have been built especially
+ *		for tuple routing.  See comment for PartitionDispatchData->indexes for
+ *		details on how this array is indexed.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * subplan_resultrel_htab
+ *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
+ *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
+ *		NULL in other cases.  Some of these may be useful for tuple routing
+ *		to save having to build duplicates.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	int			num_partitions;
+	int			max_partitions;
+	HTAB	   *subplan_resultrel_htab;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#112Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Langote (#111)
Re: POC: postgres_fdw insert batching

On 2/17/21 9:51 AM, Amit Langote wrote:

On Wed, Feb 17, 2021 at 5:46 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Feb 17, 2021 at 12:04 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/16/21 10:25 AM, Amit Langote wrote:

On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra

Thanks for the patch, it seems fine to me.

Thanks for checking.

I wonder it the commit
message needs some tweaks, though. At the moment it says:

Prevent FDW insert batching during cross-partition updates

but what the patch seems to be doing is simply initializing the info
only for CMD_INSERT operations. Which does the trick, but it affects
everything, i.e. all updates, no? Not just cross-partition updates.

You're right. Please check the message in the updated patch.

Thanks. I'm not sure I understand what "FDW may not be able to handle
both the original update operation and the batched insert operation
being performed at the same time" means. I mean, if we translate the
UPDATE into DELETE+INSERT, then we don't run both the update and insert
at the same time, right? What exactly is the problem with allowing
batching for inserts in cross-partition updates?

Sorry, I hadn't shared enough details of my investigations when I
originally ran into this. Such as that I had considered implementing
the use of batching for these inserts too but had given up.

Now that you mention it, I think I gave a less convincing reason for
why we should avoid doing it at all. Maybe it would have been more
right to say that it is the core code, not necessarily the FDWs, that
currently fails to deal with the use of batching by the insert
component of a cross-partition update. Those failures could be
addressed as I'll describe below.

For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
to simply use the PgFdwModifyTable that is installed to handle the
insert component of a cross-partition update (one can get that one via
aux_fmstate field of the original PgFdwModifyState). However, even
though that's fine for postgres_fdw to do, what worries (had worried)
me is that it also results in scribbling on ri_BatchSize that the core
code may see to determine what to do with a particular tuple, and I
just have to hope that nodeModifyTable.c doesn't end up doing anything
unwarranted with the original update based on seeing a non-zero
ri_BatchSize. AFAICS, we are fine on that front.

That said, there are some deficiencies in the code that have to be
addressed before we can let postgres_fdw do as mentioned above. For
example, the code in ExecModifyTable() that runs after breaking out of
the loop to insert any remaining batched tuples appears to miss the
tuples batched by such inserts. Apparently, that is because the
ResultRelInfos used by those inserts are not present in
es_tuple_routing_result_relations. Turns out I had forgotten that
execPartition.c doesn't add the ResultRelInfos to that list if they
are made by ExecInitModifyTable() for the original update operation
and simply reused by ExecFindPartition() when tuples were routed to
those partitions. It can be "fixed" by reverting to the original
design in Tsunakawa-san's patch where the tuple routing result
relations were obtained from the PartitionTupleRouting data structure,
which fortunately stores all tuple routing result relations. (Sorry,
I gave wrong advice in [1] in retrospect.)

On a closer look, it seems the problem actually lies in a small
inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
former only set batch_size for CMD_INSERT while the latter called the
BatchSize() for all operations, expecting >= 1 result. So we may either
relax create_foreign_modify and set batch_size for all DML, or make
ExecInitRoutingInfo stricter (which is what the patches here do).

I think we should be fine if we make
postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
as described above. We can be sure that we are not mixing the
information used by the batched insert with that of the original
unbatched update.

Is there a reason not to do the first thing, allowing batching of
inserts during cross-partition updates? I tried to do that, but it
dawned on me that we can't mix batched and un-batched operations, e.g.
DELETE + INSERT, because that'd break the order of execution, leading to
bogus results in case the same row is modified repeatedly, etc.

Actually, postgres_fdw only supports moving a row into a partition (as
part of a cross-partition update that is) if it has already finished
performing any updates on it. So there is no worry of rows that are
moved into a partition subsequently getting updated due to the
original command.

The attached patch implements the changes necessary to make these
inserts use batching too.

[1] /messages/by-id/CA+HiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ@mail.gmail.com

Oops, I had mistakenly not hit "Reply All". Attaching the patch again.

Thanks. The patch seems reasonable, but it's a bit too large for a fix,
so I'll go ahead and push one of the previous fixes restricting batching
to plain INSERT commands. But this seems useful, so I suggest adding it
to the next commitfest.

One thing that surprised me is that we only move the rows *to* the
foreign partition, not from it (even on pg13, i.e. before the batching
etc.). I mean, using the example you posted earlier, with one foreign
and one local partition, consider this:

delete from p;
insert into p values (2);

test=# select * from p2;
a
---
2
(1 row)

test=# update p set a = 1;
UPDATE 1

test=# select * from p1;
a
---
1
(1 row)

OK, so it was moved to the foreign partition, which is for rows with
value in (1). So far so good. Let's do another update:

test=# update p set a = 2;
UPDATE 1
test=# select * from p1;
a
---
2
(1 row)

So now the foreign partition contains value (2), which is however wrong
with respect to the partitioning rules - this should be in p2, the local
partition. This however causes pretty annoying issue:

test=# explain analyze select * from p where a = 2;

QUERY PLAN
---------------------------------------------------------------
Seq Scan on p2 p (cost=0.00..41.88 rows=13 width=4)
(actual time=0.024..0.028 rows=0 loops=1)
Filter: (a = 2)
Planning Time: 0.355 ms
Execution Time: 0.089 ms
(4 rows)

That is, we fail to find the row, because we eliminate the partition.

Now, maybe this is expected, but it seems like a rather mind-boggling
violation of POLA principle. I've checked if postgres_fdw mentions this
somewhere, but all I see about row movement is this:

Note also that postgres_fdw supports row movement invoked by UPDATE
statements executed on partitioned tables, but it currently does not
handle the case where a remote partition chosen to insert a moved row
into is also an UPDATE target partition that will be updated later.

and if I understand that correctly, that's about something different.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#113Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#112)
Re: POC: postgres_fdw insert batching

On 2/17/21 8:36 PM, Tomas Vondra wrote:

...

Thanks. The patch seems reasonable, but it's a bit too large for a fix,
so I'll go ahead and push one of the previous fixes restricting batching
to plain INSERT commands. But this seems useful, so I suggest adding it
to the next commitfest.

I've pushed the v4 fix, adding the CMD_INSERT to execPartition.

I think we may need to revise the relationship between FDW and places
that (may) call GetForeignModifyBatchSize. Currently, these places need
to be quite well synchronized - in a way, the issue was (partially) due
to postgres_fdw and core not agreeing on the details.

In particular, create_foreign_modify sets batch_size for CMD_INSERT and
leaves it 0 otherwise. So GetForeignModifyBatchSize() returned 0 later,
triggering an assert.

It's probably better to require GetForeignModifyBatchSize() to always
return a valid batch size (>= 1). If batching is not supported, just
return 1.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#114Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#113)
1 attachment(s)
Re: POC: postgres_fdw insert batching

On Thu, Feb 18, 2021 at 8:38 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/17/21 8:36 PM, Tomas Vondra wrote:

Thanks. The patch seems reasonable, but it's a bit too large for a fix,
so I'll go ahead and push one of the previous fixes restricting batching
to plain INSERT commands. But this seems useful, so I suggest adding it
to the next commitfest.

I've pushed the v4 fix, adding the CMD_INSERT to execPartition.

I think we may need to revise the relationship between FDW and places
that (may) call GetForeignModifyBatchSize. Currently, these places need
to be quite well synchronized - in a way, the issue was (partially) due
to postgres_fdw and core not agreeing on the details.

Agreed.

In particular, create_foreign_modify sets batch_size for CMD_INSERT and
leaves it 0 otherwise. So GetForeignModifyBatchSize() returned 0 later,
triggering an assert.

It's probably better to require GetForeignModifyBatchSize() to always
return a valid batch size (>= 1). If batching is not supported, just
return 1.

That makes sense.

How about the attached?

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

ForeignModifyBatchSize-sanity.patchapplication/octet-stream; name=ForeignModifyBatchSize-sanity.patchDownload
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..75e303d083 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,17 +1934,27 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-							NULL;
+	PgFdwModifyState *fmstate;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
+	/*
+	 * Disable batching when there is a RETURNING clause or if there are AFTER
+	 * ROW triggers on the relation.
+	 */
+	if (resultRelInfo->ri_projectReturning != NULL ||
+		(resultRelInfo->ri_TrigDesc &&
+		 resultRelInfo->ri_TrigDesc->trig_insert_after_row))
+		return 1;
+
+	/* Otherwise use the batch size specified for server/table. */
+
 	/*
 	 * Should never get called when the insert is being performed as part of
 	 * a row movement operation.
 	 */
+	fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
 
 	/*
@@ -1956,14 +1966,8 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
+	Assert(batch_size >= 1);
 
-	/* Disable batching when we have to use RETURNING. */
-	if (resultRelInfo->ri_projectReturning != NULL ||
-		(resultRelInfo->ri_TrigDesc &&
-		 resultRelInfo->ri_TrigDesc->trig_insert_after_row))
-		return 1;
-
-	/* Otherwise use the batch size specified for server/table. */
 	return batch_size;
 }
 
@@ -3753,9 +3757,14 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
-	/* Set batch_size from foreign server/table options. */
+	/*
+	 * Set batch_size from foreign server/table options, although only inserts
+	 * support batching, so disable otherwise.
+	 */
 	if (operation == CMD_INSERT)
 		fmstate->batch_size = get_batch_size_option(rel);
+	else
+		fmstate->batch_size = 1;
 
 	fmstate->num_slots = 1;
 
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 2e73d296d2..1887859e34 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -678,7 +678,9 @@ GetForeignModifyBatchSize(ResultRelInfo *rinfo);
      <literal>rinfo</literal> is the <structname>ResultRelInfo</structname> struct describing
      the target foreign table.
      The FDW is expected to provide a foreign server and/or foreign
-     table option for the user to set this value, or some hard-coded value.
+     table option for the user to set this value, or return some hard-coded
+     value.  In any case, the returned value must be &gt;= 1.  Note that a
+     return value of 1 will instruct the core executor to disable batching.
     </para>
 
     <para>
#115Amit Langote
amitlangote09@gmail.com
In reply to: Tomas Vondra (#112)
Re: POC: postgres_fdw insert batching

On Thu, Feb 18, 2021 at 4:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/17/21 9:51 AM, Amit Langote wrote:

On Wed, Feb 17, 2021 at 5:46 PM Amit Langote <amitlangote09@gmail.com> wrote:

Sorry, I hadn't shared enough details of my investigations when I
originally ran into this. Such as that I had considered implementing
the use of batching for these inserts too but had given up.

Now that you mention it, I think I gave a less convincing reason for
why we should avoid doing it at all. Maybe it would have been more
right to say that it is the core code, not necessarily the FDWs, that
currently fails to deal with the use of batching by the insert
component of a cross-partition update. Those failures could be
addressed as I'll describe below.

For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
to simply use the PgFdwModifyTable that is installed to handle the
insert component of a cross-partition update (one can get that one via
aux_fmstate field of the original PgFdwModifyState). However, even
though that's fine for postgres_fdw to do, what worries (had worried)
me is that it also results in scribbling on ri_BatchSize that the core
code may see to determine what to do with a particular tuple, and I
just have to hope that nodeModifyTable.c doesn't end up doing anything
unwarranted with the original update based on seeing a non-zero
ri_BatchSize. AFAICS, we are fine on that front.

That said, there are some deficiencies in the code that have to be
addressed before we can let postgres_fdw do as mentioned above. For
example, the code in ExecModifyTable() that runs after breaking out of
the loop to insert any remaining batched tuples appears to miss the
tuples batched by such inserts. Apparently, that is because the
ResultRelInfos used by those inserts are not present in
es_tuple_routing_result_relations. Turns out I had forgotten that
execPartition.c doesn't add the ResultRelInfos to that list if they
are made by ExecInitModifyTable() for the original update operation
and simply reused by ExecFindPartition() when tuples were routed to
those partitions. It can be "fixed" by reverting to the original
design in Tsunakawa-san's patch where the tuple routing result
relations were obtained from the PartitionTupleRouting data structure,
which fortunately stores all tuple routing result relations. (Sorry,
I gave wrong advice in [1] in retrospect.)

On a closer look, it seems the problem actually lies in a small
inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
former only set batch_size for CMD_INSERT while the latter called the
BatchSize() for all operations, expecting >= 1 result. So we may either
relax create_foreign_modify and set batch_size for all DML, or make
ExecInitRoutingInfo stricter (which is what the patches here do).

I think we should be fine if we make
postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
as described above. We can be sure that we are not mixing the
information used by the batched insert with that of the original
unbatched update.

Is there a reason not to do the first thing, allowing batching of
inserts during cross-partition updates? I tried to do that, but it
dawned on me that we can't mix batched and un-batched operations, e.g.
DELETE + INSERT, because that'd break the order of execution, leading to
bogus results in case the same row is modified repeatedly, etc.

Actually, postgres_fdw only supports moving a row into a partition (as
part of a cross-partition update that is) if it has already finished
performing any updates on it. So there is no worry of rows that are
moved into a partition subsequently getting updated due to the
original command.

The attached patch implements the changes necessary to make these
inserts use batching too.

[1] /messages/by-id/CA+HiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ@mail.gmail.com

Oops, I had mistakenly not hit "Reply All". Attaching the patch again.

Thanks. The patch seems reasonable, but it's a bit too large for a fix,
so I'll go ahead and push one of the previous fixes restricting batching
to plain INSERT commands. But this seems useful, so I suggest adding it
to the next commitfest.

Okay will post the rebased patch to a new thread.

One thing that surprised me is that we only move the rows *to* the
foreign partition, not from it (even on pg13, i.e. before the batching
etc.). I mean, using the example you posted earlier, with one foreign
and one local partition, consider this:

delete from p;
insert into p values (2);

test=# select * from p2;
a
---
2
(1 row)

test=# update p set a = 1;
UPDATE 1

test=# select * from p1;
a
---
1
(1 row)

OK, so it was moved to the foreign partition, which is for rows with
value in (1). So far so good. Let's do another update:

test=# update p set a = 2;
UPDATE 1
test=# select * from p1;
a
---
2
(1 row)

So now the foreign partition contains value (2), which is however wrong
with respect to the partitioning rules - this should be in p2, the local
partition.

This however causes pretty annoying issue:

test=# explain analyze select * from p where a = 2;

QUERY PLAN
---------------------------------------------------------------
Seq Scan on p2 p (cost=0.00..41.88 rows=13 width=4)
(actual time=0.024..0.028 rows=0 loops=1)
Filter: (a = 2)
Planning Time: 0.355 ms
Execution Time: 0.089 ms
(4 rows)

That is, we fail to find the row, because we eliminate the partition.

Now, maybe this is expected, but it seems like a rather mind-boggling
violation of POLA principle.

Yeah, I think we knowingly allow this behavior. The documentation
states that a foreign table's constraints are not enforced by the core
server nor by the FDW, but I suppose we still allow declaring them
mostly for the planner's consumption:

https://www.postgresql.org/docs/current/sql-createforeigntable.html

"Constraints on foreign tables (such as CHECK or NOT NULL clauses) are
not enforced by the core PostgreSQL system, and most foreign data
wrappers do not attempt to enforce them either; that is, the
constraint is simply assumed to hold true. There would be little point
in such enforcement since it would only apply to rows inserted or
updated via the foreign table, and not to rows modified by other
means, such as directly on the remote server. Instead, a constraint
attached to a foreign table should represent a constraint that is
being enforced by the remote server."

Partitioning constraints are not treated any differently for those
reasons. It's a good idea to declare a CHECK constraint on the remote
table matching with the local partition constraint, though IIRC we
don't mention that advice anywhere in our documentation.

I've checked if postgres_fdw mentions this
somewhere, but all I see about row movement is this:

Note also that postgres_fdw supports row movement invoked by UPDATE
statements executed on partitioned tables, but it currently does not
handle the case where a remote partition chosen to insert a moved row
into is also an UPDATE target partition that will be updated later.

and if I understand that correctly, that's about something different.

Yeah, that's a note saying that while we do support moving a row from
a local partition to a postgres_fdw foreign partition, it's only
allowed if the foreign partition won't subsequently be updated. So to
reiterate, the cases we don't support:

* Moving a row from a foreign partition to a local one

* Moving a row from a local partition to a foreign one if the latter
will be updated subsequent to moving a row into it

postgres_fdw detects the second case with the following code in
postgresBeginForeignInsert():

/*
* If the foreign table we are about to insert routed rows into is also an
* UPDATE subplan result rel that will be updated later, proceeding with
* the INSERT will result in the later UPDATE incorrectly modifying those
* routed rows, so prevent the INSERT --- it would be nice if we could
* handle this case; but for now, throw an error for safety.
*/
if (plan && plan->operation == CMD_UPDATE &&
(resultRelInfo->ri_usesFdwDirectModify ||
resultRelInfo->ri_FdwState) &&
resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot route tuples into foreign table to be
updated \"%s\"",
RelationGetRelationName(rel))));
--
Amit Langote
EDB: http://www.enterprisedb.com