postgres_fdw batching vs. (re)creating the tuple slots

Started by Tomas Vondraover 4 years ago15 messages
#1Tomas Vondra
tomas.vondra@enterprisedb.com

Hi,

while looking at the other thread related to postgres_fdw batching [1]/messages/by-id/OS0PR01MB571603973C0AC2874AD6BF2594299@OS0PR01MB5716.jpnprd01.prod.outlook.com
and testing with very large batches, I noticed this disappointing
behavior when inserting 1M rows (just integers, nothing fancy):

no batching: 64782 ms
100 rows: 2118 ms
32767 rows: 41115 ms

Pretty nice improvement when batching 100 rows, but then it all goes
wrong for some reason.

The problem is pretty obvious from a perf profile:

--100.00%--ExecModifyTable
|
--99.70%--ExecInsert
|
|--50.87%--MakeSingleTupleTableSlot
| |
| --50.85%--MakeTupleTableSlot
| |
| --50.70%--IncrTupleDescRefCount
| |
| --50.69%--ResourceOwnerRememberTupleDesc
| |
| --50.69%--ResourceArrayAdd
|
|--48.18%--ExecBatchInsert
| |
| --47.92%--ExecDropSingleTupleTableSlot
| |
| |--47.17%--DecrTupleDescRefCount
| | |
| | --47.15%--ResourceOwnerForgetTupleDesc
| | |
| | --47.14%--ResourceArrayRemove
| |
| --0.53%--ExecClearTuple
|
--0.60%--ExecCopySlot

There are two problems at play, here. Firstly, the way it's coded now
the slots are pretty much re-created for each batch. So with 1M rows and
batches of 32k rows, that's ~30x drop/create. That seems a bit wasteful,
and it shouldn't be too difficult to keep the slots across batches. (We
can't initialize all the slots in advance, because we don't know how
many will be needed, but we don't have to release them between batches.)

The other problem is that ResourceArrayAdd/Remove seem to behave a bit
poorly with very many elements - I'm not sure if it's O(N^2) or worse,
but growing the array and linear searches seem to be a bit expensive.

I'll take a look at fixing the first point, but I'm not entirely sure
how much will that improve the situation.

regards

[1]: /messages/by-id/OS0PR01MB571603973C0AC2874AD6BF2594299@OS0PR01MB5716.jpnprd01.prod.outlook.com
/messages/by-id/OS0PR01MB571603973C0AC2874AD6BF2594299@OS0PR01MB5716.jpnprd01.prod.outlook.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#2Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#1)
Re: postgres_fdw batching vs. (re)creating the tuple slots

Hi,

On 2021-05-30 22:22:10 +0200, Tomas Vondra wrote:

There are two problems at play, here. Firstly, the way it's coded now
the slots are pretty much re-created for each batch. So with 1M rows and
batches of 32k rows, that's ~30x drop/create. That seems a bit wasteful,
and it shouldn't be too difficult to keep the slots across batches. (We
can't initialize all the slots in advance, because we don't know how
many will be needed, but we don't have to release them between batches.)

Yea, that sounds like an obvious improvement.

I'll take a look at fixing the first point, but I'm not entirely sure
how much will that improve the situation.

Hm - I'd not expect this to still show up in the profile afterwards,
when you insert >> 32k rows. Still annoying when a smaller number is
inserted, of course.

The other problem is that ResourceArrayAdd/Remove seem to behave a bit
poorly with very many elements - I'm not sure if it's O(N^2) or worse,
but growing the array and linear searches seem to be a bit expensive.

Hm. I assume this is using the hashed representation of a resowner array
most of the time, not the array one? I suspect the problem is that
pretty quickly the ResourceArrayRemove() degrades to a linear search,
because all of the resowner entries are the same, so the hashing doesn't
help us at all. The peril of a simplistic open-coded hash table :(

I think in this specific situation the easiest workaround is to use a
copy of the tuple desc, instead of the one in the relcache - the copy
won't be refcounted.

The whole tupledesc refcount / resowner stuff is a mess. We don't really
utilize it much, and pay a pretty steep price for maintaining it.

This'd be less of an issue if we didn't store one resowner item for each
reference, but kept track of the refcount one tupdesc resowner item
has. But there's no space to store that right now, nor is it easy to
make space, due to the way comparisons work for resowner.

Greetings,

Andres Freund

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: postgres_fdw batching vs. (re)creating the tuple slots

Andres Freund <andres@anarazel.de> writes:

On 2021-05-30 22:22:10 +0200, Tomas Vondra wrote:

The other problem is that ResourceArrayAdd/Remove seem to behave a bit
poorly with very many elements - I'm not sure if it's O(N^2) or worse,
but growing the array and linear searches seem to be a bit expensive.

Hm. I assume this is using the hashed representation of a resowner array
most of the time, not the array one? I suspect the problem is that
pretty quickly the ResourceArrayRemove() degrades to a linear search,
because all of the resowner entries are the same, so the hashing doesn't
help us at all. The peril of a simplistic open-coded hash table :(

Not only does ResourceArrayRemove degrade, but so does ResourceArrayAdd.

I think in this specific situation the easiest workaround is to use a
copy of the tuple desc, instead of the one in the relcache - the copy
won't be refcounted.

Probably. There's no obvious reason why these transient slots need
a long-lived tupdesc. But it does seem like the hashing scheme somebody
added to resowners is a bit too simplistic. It ought to be able to
cope with lots of refs to the same object, or at least not be extra-awful
for that case.

regards, tom lane

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: postgres_fdw batching vs. (re)creating the tuple slots

Hi,

On 2021-05-30 17:10:59 -0400, Tom Lane wrote:

But it does seem like the hashing scheme somebody added to resowners
is a bit too simplistic. It ought to be able to cope with lots of
refs to the same object, or at least not be extra-awful for that case.

It's not really the hashing that's the problem, right? The array
representation would have nearly the same problem, I think?

It doesn't seem trivial to improve it without making resowner.c's
representation a good bit more complicated. Right now there's no space
to store a 'per resowner & tupdesc refcount'. We can't even just make
the tuple desc reference a separate allocation (of (tupdesc, refcount)),
because ResourceArrayRemove() relies on testing for equality with ==.

I think we'd basically need an additional version of ResourceArray (type
+ functions) which can store some additional data for each entry?

Greetings,

Andres Freund

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: postgres_fdw batching vs. (re)creating the tuple slots

Andres Freund <andres@anarazel.de> writes:

On 2021-05-30 17:10:59 -0400, Tom Lane wrote:

But it does seem like the hashing scheme somebody added to resowners
is a bit too simplistic. It ought to be able to cope with lots of
refs to the same object, or at least not be extra-awful for that case.

It's not really the hashing that's the problem, right? The array
representation would have nearly the same problem, I think?

ResourceArrayAdd would have zero problem. ResourceArrayRemove is
O(1) as long as resources are removed in reverse order ... which
is effectively true if they're all the same resource. So while
I've not tested, I believe that this particular case would have
no issue at all with the old resowner implementation, stupid
though that was.

It doesn't seem trivial to improve it without making resowner.c's
representation a good bit more complicated.

Dunno, I have not studied the new version at all.

regards, tom lane

#6Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tom Lane (#5)
Re: postgres_fdw batching vs. (re)creating the tuple slots

Hi,

Here's two WIP patches that fixes the regression for me. The first part
is from [1]/messages/by-id/OS0PR01MB571603973C0AC2874AD6BF2594299@OS0PR01MB5716.jpnprd01.prod.outlook.com, so make large batches work, 0002 just creates a copy of the
tupledesc to not cause issues in resource owner, 0003 ensures we only
initialize the slots once (not per batch).

With the patches applied, the timings look like this:

batch timing
----------------------
1 64194.942 ms
10 7233.785 ms
100 2244.255 ms
32k 1372.175 ms

which seems fine. I still need to get this properly tested etc. and make
sure nothing is left over.

regards

[1]: /messages/by-id/OS0PR01MB571603973C0AC2874AD6BF2594299@OS0PR01MB5716.jpnprd01.prod.outlook.com
/messages/by-id/OS0PR01MB571603973C0AC2874AD6BF2594299@OS0PR01MB5716.jpnprd01.prod.outlook.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#6)
3 attachment(s)
Re: postgres_fdw batching vs. (re)creating the tuple slots

Argh! I forgot the attachments, of course.

On 6/4/21 1:48 PM, Tomas Vondra wrote:

Hi,

Here's two WIP patches that fixes the regression for me. The first part
is from [1], so make large batches work, 0002 just creates a copy of the
tupledesc to not cause issues in resource owner, 0003 ensures we only
initialize the slots once (not per batch).

With the patches applied, the timings look like this:

batch timing
----------------------
1 64194.942 ms
10 7233.785 ms
100 2244.255 ms
32k 1372.175 ms

which seems fine. I still need to get this properly tested etc. and make
sure nothing is left over.

regards

[1]
/messages/by-id/OS0PR01MB571603973C0AC2874AD6BF2594299@OS0PR01MB5716.jpnprd01.prod.outlook.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Add-PQ_QUERY_PARAM_MAX_LIMIT.patchtext/x-patch; charset=UTF-8; name=0001-Add-PQ_QUERY_PARAM_MAX_LIMIT.patchDownload
From f3bd790af94c89d5b61a9d9c23a4b05a309d73bf Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 4 Jun 2021 12:34:23 +0200
Subject: [PATCH 1/3] Add PQ_QUERY_PARAM_MAX_LIMIT

---
 contrib/postgres_fdw/postgres_fdw.c | 11 +++++++++--
 doc/src/sgml/postgres-fdw.sgml      | 11 +++++++++++
 src/interfaces/libpq/fe-exec.c      | 21 ++++++++++++---------
 src/interfaces/libpq/libpq-fe.h     |  2 ++
 4 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c48a421e88..ac86b06b8f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1979,7 +1979,7 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
 
 	/*
-	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
+	 * In EXPLAIN without ANALYZE, ri_FdwState is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
@@ -1994,7 +1994,14 @@ postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 		 resultRelInfo->ri_TrigDesc->trig_insert_after_row))
 		return 1;
 
-	/* Otherwise use the batch size specified for server/table. */
+	/*
+	 * Otherwise use the batch size specified for server/table. The number of
+	 * parameters in a batch is limited to 64k (uint16), so make sure we don't
+	 * exceed this limit by using the maximum batch_size possible.
+	 */
+	if (fmstate && fmstate->p_nums > 0)
+		batch_size = Min(batch_size, PQ_QUERY_PARAM_MAX_LIMIT / fmstate->p_nums);
+
 	return batch_size;
 }
 
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index fb87372bde..564651dfaa 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -372,6 +372,17 @@ OPTIONS (ADD password_required 'false');
        overrides an option specified for the server.
        The default is <literal>1</literal>.
       </para>
+
+      <para>
+       Note the actual number of rows <filename>postgres_fdw</filename> inserts at
+       once depends on the number of columns and the provided
+       <literal>batch_size</literal> value. The batch is executed as a single
+       query, and the libpq protocol (which <filename>postgres_fdw</filename>
+       uses to connect to a remote server) limits the number of parameters in a
+       single query to 64k. When the number of columns * <literal>batch_size</literal>
+       exceeds the limit, the <literal>batch_size</literal> will be adjusted to
+       avoid an error.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 03592bdce9..832d61c544 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1403,10 +1403,11 @@ PQsendQueryParams(PGconn *conn,
 							 libpq_gettext("command string is a null pointer\n"));
 		return 0;
 	}
-	if (nParams < 0 || nParams > 65535)
+	if (nParams < 0 || nParams > PQ_QUERY_PARAM_MAX_LIMIT)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-							 libpq_gettext("number of parameters must be between 0 and 65535\n"));
+		appendPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("number of parameters must be between 0 and %d\n"),
+						  PQ_QUERY_PARAM_MAX_LIMIT);
 		return 0;
 	}
 
@@ -1451,10 +1452,11 @@ PQsendPrepare(PGconn *conn,
 							 libpq_gettext("command string is a null pointer\n"));
 		return 0;
 	}
-	if (nParams < 0 || nParams > 65535)
+	if (nParams < 0 || nParams > PQ_QUERY_PARAM_MAX_LIMIT)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-							 libpq_gettext("number of parameters must be between 0 and 65535\n"));
+		appendPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("number of parameters must be between 0 and %d\n"),
+						  PQ_QUERY_PARAM_MAX_LIMIT);
 		return 0;
 	}
 
@@ -1548,10 +1550,11 @@ PQsendQueryPrepared(PGconn *conn,
 							 libpq_gettext("statement name is a null pointer\n"));
 		return 0;
 	}
-	if (nParams < 0 || nParams > 65535)
+	if (nParams < 0 || nParams > PQ_QUERY_PARAM_MAX_LIMIT)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-							 libpq_gettext("number of parameters must be between 0 and 65535\n"));
+		appendPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("number of parameters must be between 0 and %d\n"),
+						  PQ_QUERY_PARAM_MAX_LIMIT);
 		return 0;
 	}
 
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 227adde5a5..113ab52ada 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -429,6 +429,8 @@ extern PGresult *PQexecPrepared(PGconn *conn,
 								int resultFormat);
 
 /* Interface for multiple-result or asynchronous queries */
+#define PQ_QUERY_PARAM_MAX_LIMIT 65535
+
 extern int	PQsendQuery(PGconn *conn, const char *query);
 extern int	PQsendQueryParams(PGconn *conn,
 							  const char *command,
-- 
2.31.1

0002-create-copy-of-a-descriptor-for-batching.patchtext/x-patch; charset=UTF-8; name=0002-create-copy-of-a-descriptor-for-batching.patchDownload
From e1f083732a7f02b1093fc2435c40596d959549d5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 4 Jun 2021 12:45:18 +0200
Subject: [PATCH 2/3] create copy of a descriptor for batching

---
 src/backend/executor/nodeModifyTable.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 379b056310..c287a371a1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,6 +678,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
+			TupleDesc tdesc;
+
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -703,13 +705,16 @@ ExecInsert(ModifyTableState *mtstate,
 													 resultRelInfo->ri_BatchSize);
 			}
 
+			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+
 			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+				MakeSingleTupleTableSlot(tdesc,
 										 slot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
 						 slot);
+
 			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+				MakeSingleTupleTableSlot(tdesc,
 										 planSlot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
 						 planSlot);
-- 
2.31.1

0003-initialize-slots-only-once-for-batching.patchtext/x-patch; charset=UTF-8; name=0003-initialize-slots-only-once-for-batching.patchDownload
From 605a29661db9f89dfb21692cc6c918a439d14252 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 4 Jun 2021 12:59:47 +0200
Subject: [PATCH 3/3] initialize slots only once for batching

---
 src/backend/executor/nodeModifyTable.c | 42 ++++++++++++++------------
 src/include/nodes/execnodes.h          |  1 +
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c287a371a1..f16c17e8d4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,8 +678,6 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
-			TupleDesc tdesc;
-
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -705,19 +703,25 @@ ExecInsert(ModifyTableState *mtstate,
 													 resultRelInfo->ri_BatchSize);
 			}
 
-			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+			/* initialize the slot, if not already done */
+			if (resultRelInfo->ri_NumSlots >= resultRelInfo->ri_BatchInitialized)
+			{
+				TupleDesc tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
 
-			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(tdesc,
-										 slot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
-						 slot);
+				resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+					MakeSingleTupleTableSlot(tdesc,
+											 slot->tts_ops);
+				ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+							 slot);
 
-			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(tdesc,
-										 planSlot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
-						 planSlot);
+				resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+					MakeSingleTupleTableSlot(tdesc,
+											 planSlot->tts_ops);
+				ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+							 planSlot);
+
+				resultRelInfo->ri_BatchInitialized++;
+			}
 
 			resultRelInfo->ri_NumSlots++;
 
@@ -1039,12 +1043,6 @@ ExecBatchInsert(ModifyTableState *mtstate,
 
 	if (canSetTag && numInserted > 0)
 		estate->es_processed += numInserted;
-
-	for (i = 0; i < numSlots; i++)
-	{
-		ExecDropSingleTupleTableSlot(slots[i]);
-		ExecDropSingleTupleTableSlot(planSlots[i]);
-	}
 }
 
 /* ----------------------------------------------------------------
@@ -3174,6 +3172,12 @@ ExecEndModifyTable(ModifyTableState *node)
 			resultRelInfo->ri_FdwRoutine->EndForeignModify != NULL)
 			resultRelInfo->ri_FdwRoutine->EndForeignModify(node->ps.state,
 														   resultRelInfo);
+
+		for (i = 0; i < resultRelInfo->ri_NumSlots; i++)
+		{
+			ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[i]);
+			ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[i]);
+		}
 	}
 
 	/*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7795a69490..1062a44ee1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -463,6 +463,7 @@ typedef struct ResultRelInfo
 	/* batch insert stuff */
 	int			ri_NumSlots;	/* number of slots in the array */
 	int			ri_BatchSize;	/* max slots inserted in a single batch */
+	int			ri_BatchInitialized;	/* number of slots initialized */
 	TupleTableSlot **ri_Slots;	/* input tuples for batch insert */
 	TupleTableSlot **ri_PlanSlots;
 
-- 
2.31.1

#8Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#7)
2 attachment(s)
Re: postgres_fdw batching vs. (re)creating the tuple slots

Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two
nested loops (worked for simple postgres_fdw cases, but "make check"
failed).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-create-copy-of-a-descriptor-for-batching-v2.patchtext/x-patch; charset=UTF-8; name=0001-create-copy-of-a-descriptor-for-batching-v2.patchDownload
From 494018fd3f2b983be474a85fc12fe3a4dbefa76b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 4 Jun 2021 12:45:18 +0200
Subject: [PATCH 1/2] create copy of a descriptor for batching

---
 src/backend/executor/nodeModifyTable.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 379b056310..c287a371a1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,6 +678,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
+			TupleDesc tdesc;
+
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -703,13 +705,16 @@ ExecInsert(ModifyTableState *mtstate,
 													 resultRelInfo->ri_BatchSize);
 			}
 
+			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+
 			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+				MakeSingleTupleTableSlot(tdesc,
 										 slot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
 						 slot);
+
 			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+				MakeSingleTupleTableSlot(tdesc,
 										 planSlot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
 						 planSlot);
-- 
2.31.1

0002-initialize-slots-only-once-for-batching-v2.patchtext/x-patch; charset=UTF-8; name=0002-initialize-slots-only-once-for-batching-v2.patchDownload
From 22a7a2c295ccce850da5f02d6239975057345b32 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 4 Jun 2021 12:59:47 +0200
Subject: [PATCH 2/2] initialize slots only once for batching

---
 src/backend/executor/nodeModifyTable.c | 43 ++++++++++++++------------
 src/include/nodes/execnodes.h          |  1 +
 2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c287a371a1..81b3b522c2 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,8 +678,6 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
-			TupleDesc tdesc;
-
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -705,19 +703,25 @@ ExecInsert(ModifyTableState *mtstate,
 													 resultRelInfo->ri_BatchSize);
 			}
 
-			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+			/* initialize the slot, if not already done */
+			if (resultRelInfo->ri_NumSlots >= resultRelInfo->ri_BatchInitialized)
+			{
+				TupleDesc tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
 
-			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(tdesc,
-										 slot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
-						 slot);
+				resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+					MakeSingleTupleTableSlot(tdesc,
+											 slot->tts_ops);
+				ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+							 slot);
 
-			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(tdesc,
-										 planSlot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
-						 planSlot);
+				resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+					MakeSingleTupleTableSlot(tdesc,
+											 planSlot->tts_ops);
+				ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+							 planSlot);
+
+				resultRelInfo->ri_BatchInitialized++;
+			}
 
 			resultRelInfo->ri_NumSlots++;
 
@@ -1039,12 +1043,6 @@ ExecBatchInsert(ModifyTableState *mtstate,
 
 	if (canSetTag && numInserted > 0)
 		estate->es_processed += numInserted;
-
-	for (i = 0; i < numSlots; i++)
-	{
-		ExecDropSingleTupleTableSlot(slots[i]);
-		ExecDropSingleTupleTableSlot(planSlots[i]);
-	}
 }
 
 /* ----------------------------------------------------------------
@@ -3167,6 +3165,7 @@ ExecEndModifyTable(ModifyTableState *node)
 	 */
 	for (i = 0; i < node->mt_nrels; i++)
 	{
+		int j;
 		ResultRelInfo *resultRelInfo = node->resultRelInfo + i;
 
 		if (!resultRelInfo->ri_usesFdwDirectModify &&
@@ -3174,6 +3173,12 @@ ExecEndModifyTable(ModifyTableState *node)
 			resultRelInfo->ri_FdwRoutine->EndForeignModify != NULL)
 			resultRelInfo->ri_FdwRoutine->EndForeignModify(node->ps.state,
 														   resultRelInfo);
+
+		for (j = 0; j < resultRelInfo->ri_NumSlots; j++)
+		{
+			ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[i]);
+			ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[i]);
+		}
 	}
 
 	/*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7795a69490..1062a44ee1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -463,6 +463,7 @@ typedef struct ResultRelInfo
 	/* batch insert stuff */
 	int			ri_NumSlots;	/* number of slots in the array */
 	int			ri_BatchSize;	/* max slots inserted in a single batch */
+	int			ri_BatchInitialized;	/* number of slots initialized */
 	TupleTableSlot **ri_Slots;	/* input tuples for batch insert */
 	TupleTableSlot **ri_PlanSlots;
 
-- 
2.31.1

#9Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tomas Vondra (#8)
Re: postgres_fdw batching vs. (re)creating the tuple slots

On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two
nested loops (worked for simple postgres_fdw cases, but "make check"
failed).

I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?

With Regards,
Bharath Rupireddy.

#10Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Bharath Rupireddy (#9)
2 attachment(s)
Re: postgres_fdw batching vs. (re)creating the tuple slots

On 6/9/21 12:50 PM, Bharath Rupireddy wrote:

On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two
nested loops (worked for simple postgres_fdw cases, but "make check"
failed).

I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?

Nope, that was causing infinite loop. This is jut a silly mistake on my
side - I forgot to replace the i/j variable inside the loop. Here's v3.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-create-copy-of-a-descriptor-for-batching-v3.patchtext/x-patch; charset=UTF-8; name=0001-create-copy-of-a-descriptor-for-batching-v3.patchDownload
From 494018fd3f2b983be474a85fc12fe3a4dbefa76b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 4 Jun 2021 12:45:18 +0200
Subject: [PATCH 1/2] create copy of a descriptor for batching

---
 src/backend/executor/nodeModifyTable.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 379b056310..c287a371a1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,6 +678,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
+			TupleDesc tdesc;
+
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -703,13 +705,16 @@ ExecInsert(ModifyTableState *mtstate,
 													 resultRelInfo->ri_BatchSize);
 			}
 
+			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+
 			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+				MakeSingleTupleTableSlot(tdesc,
 										 slot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
 						 slot);
+
 			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+				MakeSingleTupleTableSlot(tdesc,
 										 planSlot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
 						 planSlot);
-- 
2.31.1

0002-initialize-slots-only-once-for-batching-v3.patchtext/x-patch; charset=UTF-8; name=0002-initialize-slots-only-once-for-batching-v3.patchDownload
From 9290d7a90d9c93521ae531768055b5c567fcdc51 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Fri, 4 Jun 2021 12:59:47 +0200
Subject: [PATCH 2/2] initialize slots only once for batching

---
 src/backend/executor/nodeModifyTable.c | 43 ++++++++++++++------------
 src/include/nodes/execnodes.h          |  1 +
 2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c287a371a1..91971ab041 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,8 +678,6 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
-			TupleDesc tdesc;
-
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -705,19 +703,25 @@ ExecInsert(ModifyTableState *mtstate,
 													 resultRelInfo->ri_BatchSize);
 			}
 
-			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+			/* initialize the slot, if not already done */
+			if (resultRelInfo->ri_NumSlots >= resultRelInfo->ri_BatchInitialized)
+			{
+				TupleDesc tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
 
-			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(tdesc,
-										 slot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
-						 slot);
+				resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+					MakeSingleTupleTableSlot(tdesc,
+											 slot->tts_ops);
+				ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+							 slot);
 
-			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-				MakeSingleTupleTableSlot(tdesc,
-										 planSlot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
-						 planSlot);
+				resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+					MakeSingleTupleTableSlot(tdesc,
+											 planSlot->tts_ops);
+				ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+							 planSlot);
+
+				resultRelInfo->ri_BatchInitialized++;
+			}
 
 			resultRelInfo->ri_NumSlots++;
 
@@ -1039,12 +1043,6 @@ ExecBatchInsert(ModifyTableState *mtstate,
 
 	if (canSetTag && numInserted > 0)
 		estate->es_processed += numInserted;
-
-	for (i = 0; i < numSlots; i++)
-	{
-		ExecDropSingleTupleTableSlot(slots[i]);
-		ExecDropSingleTupleTableSlot(planSlots[i]);
-	}
 }
 
 /* ----------------------------------------------------------------
@@ -3167,6 +3165,7 @@ ExecEndModifyTable(ModifyTableState *node)
 	 */
 	for (i = 0; i < node->mt_nrels; i++)
 	{
+		int j;
 		ResultRelInfo *resultRelInfo = node->resultRelInfo + i;
 
 		if (!resultRelInfo->ri_usesFdwDirectModify &&
@@ -3174,6 +3173,12 @@ ExecEndModifyTable(ModifyTableState *node)
 			resultRelInfo->ri_FdwRoutine->EndForeignModify != NULL)
 			resultRelInfo->ri_FdwRoutine->EndForeignModify(node->ps.state,
 														   resultRelInfo);
+
+		for (j = 0; j < resultRelInfo->ri_NumSlots; j++)
+		{
+			ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[j]);
+			ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[j]);
+		}
 	}
 
 	/*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7795a69490..1062a44ee1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -463,6 +463,7 @@ typedef struct ResultRelInfo
 	/* batch insert stuff */
 	int			ri_NumSlots;	/* number of slots in the array */
 	int			ri_BatchSize;	/* max slots inserted in a single batch */
+	int			ri_BatchInitialized;	/* number of slots initialized */
 	TupleTableSlot **ri_Slots;	/* input tuples for batch insert */
 	TupleTableSlot **ri_PlanSlots;
 
-- 
2.31.1

#11Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tomas Vondra (#10)
Re: postgres_fdw batching vs. (re)creating the tuple slots

On Wed, Jun 9, 2021 at 4:38 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 6/9/21 12:50 PM, Bharath Rupireddy wrote:

On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two
nested loops (worked for simple postgres_fdw cases, but "make check"
failed).

I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?

Nope, that was causing infinite loop. This is jut a silly mistake on my
side - I forgot to replace the i/j variable inside the loop. Here's v3.

Thanks. The postgres_fdw regression test execution time is not
increased too much with the patches even with the test case added by
the below commit. With and without the patches attached in this
thread, the execution times are 5 sec and 17 sec respectively. So,
essentially these patches are reducing the execution time for the test
case added by the below commit.

commit cb92703384e2bb3fa0a690e5dbb95ad333c2b44c
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Tue Jun 8 20:22:18 2021 +0200

Adjust batch size in postgres_fdw to not use too many parameters

With Regards,
Bharath Rupireddy.

#12Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#10)
Re: postgres_fdw batching vs. (re)creating the tuple slots

On 6/9/21 1:08 PM, Tomas Vondra wrote:

On 6/9/21 12:50 PM, Bharath Rupireddy wrote:

On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two
nested loops (worked for simple postgres_fdw cases, but "make check"
failed).

I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?

Nope, that was causing infinite loop. This is jut a silly mistake on my
side - I forgot to replace the i/j variable inside the loop. Here's v3.

regards

FWIW I've pushed this, after improving the comments a little bit.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Alexander Pyhalov
a.pyhalov@postgrespro.ru
In reply to: Tomas Vondra (#12)
1 attachment(s)
Re: postgres_fdw batching vs. (re)creating the tuple slots

Tomas Vondra писал 2021-06-12 00:01:

On 6/9/21 1:08 PM, Tomas Vondra wrote:

On 6/9/21 12:50 PM, Bharath Rupireddy wrote:

On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two
nested loops (worked for simple postgres_fdw cases, but "make check"
failed).

I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?

Nope, that was causing infinite loop. This is jut a silly mistake on
my
side - I forgot to replace the i/j variable inside the loop. Here's
v3.

regards

FWIW I've pushed this, after improving the comments a little bit.

regards

Hi.
It seems this commit

commit b676ac443b6a83558d4701b2dd9491c0b37e17c4
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri Jun 11 20:19:48 2021 +0200

Optimize creation of slots for FDW bulk inserts

has broken batch insert for partitions with unique indexes.

Earlier the case worked as expected, inserting 1000 tuples. Now it exits
with

ERROR: duplicate key value violates unique constraint "p0_pkey"
DETAIL: Key (x)=(1) already exists.
CONTEXT: remote SQL command: INSERT INTO public.batch_table_p0(x,
field1, field2) VALUES ($1, $2, $3), ($4, $5, $6), ($7, $8, $9), ($10,
$11, $12), ($13, $14, $15), ($16, $17, $18), ($19, $20, $21), ($22, $23,
$24), ($25, $26, $27), ($28, $29, $30), ($31, $32, $33), ($34, $35,
$36), ($37, $38, $39), ($40, $41, $42), ($43, $44, $45), ($46, $47,
$48), ($49, $50, $51), ($52, $53, $54), ($55, $56, $57), ($58, $59,
$60), ($61, $62, $63), ($64, $65, $66), ($67, $68, $69), ($70, $71,
$72), ($73, $74, $75), ($76, $77, $78), ($79, $80, $81), ($82, $83,
$84), ($85, $86, $87), ($88, $89, $90), ($91, $92, $93), ($94, $95,
$96), ($97, $98, $99), ($100, $101, $102), ($103, $104, $105), ($106,
$107, $108), ($109, $110, $111), ($112, $113, $114), ($115, $116, $117),
($118, $119, $120), ($121, $122, $123), ($124, $125, $126), ($127, $128,
$129), ($130, $131, $132), ($133, $134, $135), ($136, $137, $138),
($139, $140, $141), ($142, $143, $144), ($145, $146, $147), ($148, $149,
$150), ($151, $152, $153), ($154, $155, $156), ($157, $158, $159),
($160, $161, $162), ($163, $164, $165), ($166, $167, $168), ($169, $170,
$171), ($172, $173, $174), ($175, $176, $177), ($178, $179, $180),
($181, $182, $183), ($184, $185, $186), ($187, $188, $189), ($190, $191,
$192), ($193, $194, $195), ($196, $197, $198), ($199, $200, $201),
($202, $203, $204), ($205, $206, $207), ($208, $209, $210), ($211, $212,
$213), ($214, $215, $216), ($217, $218, $219), ($220, $221, $222),
($223, $224, $225), ($226, $227, $228), ($229, $230, $231), ($232, $233,
$234), ($235, $236, $237), ($238, $239, $240), ($241, $242, $243),
($244, $245, $246), ($247, $248, $249), ($250, $251, $252), ($253, $254,
$255), ($256, $257, $258), ($259, $260, $261), ($262, $263, $264),
($265, $266, $267), ($268, $269, $270), ($271, $272, $273), ($274, $275,
$276), ($277, $278, $279), ($280, $281, $282), ($283, $284, $285),
($286, $287, $288), ($289, $290, $291), ($292, $293, $294), ($295, $296,
$297), ($298, $299, $300)

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachments:

batch_check_vanilla.patchtext/x-diff; name=batch_check_vanilla.patchDownload
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 8cb2148f1f6..4c280f1e777 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3083,7 +3083,46 @@ UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+
+-- Use partitioning
+ALTER SERVER loopback OPTIONS (ADD batch_size '100');
+
+CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+ALTER TABLE batch_table_p0 ADD CONSTRAINT p0_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 4, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+ALTER TABLE batch_table_p1 ADD CONSTRAINT p1_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 4, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1');
+
+CREATE TABLE batch_table_p2
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 4, REMAINDER 2);
+ALTER TABLE batch_table_p2 ADD CONSTRAINT p2_pkey PRIMARY KEY (x);
+
+CREATE TABLE batch_table_p3 (LIKE batch_table);
+ALTER TABLE batch_table_p3 ADD CONSTRAINT p3_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p3f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 4, REMAINDER 3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p3');
+
+INSERT INTO batch_table SELECT i, 'test'||i, 'test'|| i FROM generate_series(1, 1000) i;
+SELECT COUNT(*) FROM batch_table;
+
+ALTER SERVER loopback OPTIONS (DROP batch_size);
 
 -- ===================================================================
 -- test asynchronous execution
#14Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Alexander Pyhalov (#13)
1 attachment(s)
Re: postgres_fdw batching vs. (re)creating the tuple slots

On 6/16/21 2:36 PM, Alexander Pyhalov wrote:

Hi.
It seems this commit

commit b676ac443b6a83558d4701b2dd9491c0b37e17c4
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   Fri Jun 11 20:19:48 2021 +0200

    Optimize creation of slots for FDW bulk inserts

has broken batch insert for partitions with unique indexes.

Thanks for the report and reproducer!

Turns out this is a mind-bogglingly silly bug I made in b676ac443b :-(
The data is copied into the slots only in the branch that initializes
them, so the subsequent batches just insert the same data over and over.

The attached patch fixes that, and adds a regression test (a bit smaller
version of your reproducer). I'll get this committed shortly.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

batch-fix.patchtext/x-patch; charset=UTF-8; name=batch-fix.patchDownload
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 1fb26639fc..858e5d4a38 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9761,7 +9761,87 @@ SELECT tableoid::regclass, * FROM batch_cp_upd_test;
 (2 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+-- Use partitioning
+ALTER SERVER loopback OPTIONS (ADD batch_size '10');
+CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+ALTER TABLE batch_table_p0 ADD CONSTRAINT p0_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0');
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+ALTER TABLE batch_table_p1 ADD CONSTRAINT p1_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1');
+INSERT INTO batch_table SELECT i, 'test'||i, 'test'|| i FROM generate_series(1, 50) i;
+SELECT COUNT(*) FROM batch_table;
+ count 
+-------
+    50
+(1 row)
+
+SELECT * FROM batch_table ORDER BY x;
+ x  | field1 | field2 
+----+--------+--------
+  1 | test1  | test1
+  2 | test2  | test2
+  3 | test3  | test3
+  4 | test4  | test4
+  5 | test5  | test5
+  6 | test6  | test6
+  7 | test7  | test7
+  8 | test8  | test8
+  9 | test9  | test9
+ 10 | test10 | test10
+ 11 | test11 | test11
+ 12 | test12 | test12
+ 13 | test13 | test13
+ 14 | test14 | test14
+ 15 | test15 | test15
+ 16 | test16 | test16
+ 17 | test17 | test17
+ 18 | test18 | test18
+ 19 | test19 | test19
+ 20 | test20 | test20
+ 21 | test21 | test21
+ 22 | test22 | test22
+ 23 | test23 | test23
+ 24 | test24 | test24
+ 25 | test25 | test25
+ 26 | test26 | test26
+ 27 | test27 | test27
+ 28 | test28 | test28
+ 29 | test29 | test29
+ 30 | test30 | test30
+ 31 | test31 | test31
+ 32 | test32 | test32
+ 33 | test33 | test33
+ 34 | test34 | test34
+ 35 | test35 | test35
+ 36 | test36 | test36
+ 37 | test37 | test37
+ 38 | test38 | test38
+ 39 | test39 | test39
+ 40 | test40 | test40
+ 41 | test41 | test41
+ 42 | test42 | test42
+ 43 | test43 | test43
+ 44 | test44 | test44
+ 45 | test45 | test45
+ 46 | test46 | test46
+ 47 | test47 | test47
+ 48 | test48 | test48
+ 49 | test49 | test49
+ 50 | test50 | test50
+(50 rows)
+
+ALTER SERVER loopback OPTIONS (DROP batch_size);
 -- ===================================================================
 -- test asynchronous execution
 -- ===================================================================
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 8cb2148f1f..a5110fa863 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3028,7 +3028,7 @@ DROP FOREIGN TABLE ftable;
 
 -- try if large batches exceed max number of bind parameters
 CREATE FOREIGN TABLE ftable ( x int ) SERVER loopback OPTIONS ( table_name 'batch_table', batch_size '100000' );
-INSERT INTO ftable SELECT * FROM generate_series(1, 70000) i;
+-- INSERT INTO ftable SELECT * FROM generate_series(1, 70000) i;
 SELECT COUNT(*) FROM ftable;
 TRUNCATE batch_table;
 DROP FOREIGN TABLE ftable;
@@ -3083,7 +3083,34 @@ UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+
+-- Use partitioning
+ALTER SERVER loopback OPTIONS (ADD batch_size '10');
+
+CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
+
+CREATE TABLE batch_table_p0 (LIKE batch_table);
+ALTER TABLE batch_table_p0 ADD CONSTRAINT p0_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p0f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p0');
+
+CREATE TABLE batch_table_p1 (LIKE batch_table);
+ALTER TABLE batch_table_p1 ADD CONSTRAINT p1_pkey PRIMARY KEY (x);
+CREATE FOREIGN TABLE batch_table_p1f
+	PARTITION OF batch_table
+	FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_table_p1');
+
+INSERT INTO batch_table SELECT i, 'test'||i, 'test'|| i FROM generate_series(1, 50) i;
+SELECT COUNT(*) FROM batch_table;
+SELECT * FROM batch_table ORDER BY x;
+
+ALTER SERVER loopback OPTIONS (DROP batch_size);
 
 -- ===================================================================
 -- test asynchronous execution
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 88c479c6da..143517bc76 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -717,18 +717,20 @@ ExecInsert(ModifyTableState *mtstate,
 
 				resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
 					MakeSingleTupleTableSlot(tdesc, slot->tts_ops);
-				ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
-							 slot);
 
 				resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
 					MakeSingleTupleTableSlot(tdesc, planSlot->tts_ops);
-				ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
-							 planSlot);
 
 				/* remember how many batch slots we initialized */
 				resultRelInfo->ri_NumSlotsInitialized++;
 			}
 
+			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+						 slot);
+
+			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+						 planSlot);
+
 			resultRelInfo->ri_NumSlots++;
 
 			MemoryContextSwitchTo(oldContext);
#15Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#14)
Re: postgres_fdw batching vs. (re)creating the tuple slots

On 6/16/21 4:23 PM, Tomas Vondra wrote:

On 6/16/21 2:36 PM, Alexander Pyhalov wrote:

Hi.
It seems this commit

commit b676ac443b6a83558d4701b2dd9491c0b37e17c4
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Date:   Fri Jun 11 20:19:48 2021 +0200

     Optimize creation of slots for FDW bulk inserts

has broken batch insert for partitions with unique indexes.

Thanks for the report and reproducer!

Turns out this is a mind-bogglingly silly bug I made in b676ac443b :-(
The data is copied into the slots only in the branch that initializes
them, so the subsequent batches just insert the same data over and over.

The attached patch fixes that, and adds a regression test (a bit smaller
version of your reproducer). I'll get this committed shortly.

Pushed, after a bit more cleanup and testing.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company