Allow batched insert during cross-partition updates

Started by Amit Langotealmost 5 years ago37 messages

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

1 attachment(s)

Based on the discussion at:

/messages/by-id/6929d485-2d2a-da46-3681-4a400a3d794f@enterprisedb.com

I'm posting the patch for $subject here in this new thread and I'll
add it to the next CF per Tomas' advice.

With 927f453a94106 committed earlier today, we limit insert batching
only to the cases where the query's main command is also insert,
because allowing it to be used in other cases can hit some limitations
of the current code.

One such case is cross-partition updates of a partitioned table which
internally uses insert. postgres_fdw supports some cases where a row
is moved from a local partition to a foreign partition. When doing
so, the moved row is inserted into the latter, but those inserts can't
use batching due to the aforementioned commit.

As described in the thread linked above, to make batching possible for
those internal inserts, we'd need to make some changes to both the
core code and postgres_fdw, which the attached patch implements.
Details are in the commit message.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v1-0001-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From 2032eb6a4cfa4b3edd1435280628d6e69fe8395f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Sat, 23 Jan 2021 16:35:21 +0900
Subject: [PATCH v1] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo() which initializes the insert target
partition allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition use batching too, essentially
reverting a check that 927f453a94106 put in place.

There are two places that need fixing to correctly handle batching
in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition is now exposed whereas it was
previously local to execPartition.c.
---
 contrib/postgres_fdw/postgres_fdw.c    | 11 ++--
 src/backend/executor/execPartition.c   | 72 +-------------------------
 src/backend/executor/nodeModifyTable.c | 50 +++++++++++++-----
 src/include/executor/execPartition.h   | 72 +++++++++++++++++++++++++-
 4 files changed, 112 insertions(+), 93 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..53472cf73c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,18 +1934,17 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-							NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of
-	 * a row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5967..811997b4c8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partitions touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		'subplan_resultrel_htab'.  The remainder have been built especially
- *		for tuple routing.  See comment for PartitionDispatchData->indexes for
- *		details on how this array is indexed.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * subplan_resultrel_htab
- *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
- *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
- *		NULL in other cases.  Some of these may be useful for tuple routing
- *		to save having to build duplicates.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	int			num_partitions;
-	int			max_partitions;
-	HTAB	   *subplan_resultrel_htab;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -1000,8 +931,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2993ba43e3..5f92854c94 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2059,8 +2059,6 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List				  *relinfos = NIL;
-	ListCell			  *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2277,22 +2275,46 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
 	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	{
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-						   resultRelInfo->ri_Slots,
-						   resultRelInfo->ri_PlanSlots,
-						   resultRelInfo->ri_NumSlots,
-						   estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..846261ef88 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*-----------------------
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partitions touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		'subplan_resultrel_htab'.  The remainder have been built especially
+ *		for tuple routing.  See comment for PartitionDispatchData->indexes for
+ *		details on how this array is indexed.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * subplan_resultrel_htab
+ *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
+ *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
+ *		NULL in other cases.  Some of these may be useful for tuple routing
+ *		to save having to build duplicates.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	int			num_partitions;
+	int			max_partitions;
+	HTAB	   *subplan_resultrel_htab;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Amit Langote (#1)

Re: Allow batched insert during cross-partition updates

On Thu, Feb 18, 2021 at 6:52 PM Amit Langote <amitlangote09@gmail.com> wrote:

Based on the discussion at:

/messages/by-id/6929d485-2d2a-da46-3681-4a400a3d794f@enterprisedb.com

I'm posting the patch for $subject here in this new thread and I'll
add it to the next CF per Tomas' advice.

Done: https://commitfest.postgresql.org/32/2992/

--
Amit Langote
EDB: http://www.enterprisedb.com

Georgios Kokolatos

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Amit Langote (#2)

Re: Allow batched insert during cross-partition updates

Hi,

thanks for the patch. I had a first look and played around with the code.

The code seems clean, complete, and does what it says on the tin. I will
need a bit more time to acclimatise with all the use cases for a more
thorough review.

I small question though is why expose PartitionTupleRouting and not add
a couple of functions to get the necessary info? If I have read the code
correctly the only members actually needed to be exposed are num_partitions
and partitions. Not a critique, I am just curious.

Cheers,
//Georgios

Georgios

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Amit Langote (#2)

Re: Allow batched insert during cross-partition updates

Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, February 18, 2021 10:54 AM, Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Feb 18, 2021 at 6:52 PM Amit Langote amitlangote09@gmail.com wrote:

Based on the discussion at:
/messages/by-id/6929d485-2d2a-da46-3681-4a400a3d794f@enterprisedb.com
I'm posting the patch for $subject here in this new thread and I'll
add it to the next CF per Tomas' advice.

Done:https://commitfest.postgresql.org/32/2992/

--------------------------------------------------

apparently I did not receive the review comment I sent via the commitfest app.
Apologies for the chatter. Find the message-id here:

Show quoted text

Amit Langote
EDB: http://www.enterprisedb.com

Georgios

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Georgios (#4)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, March 10, 2021 1:23 PM, Georgios <gkokolatos@protonmail.com> wrote:

Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, February 18, 2021 10:54 AM, Amit Langote amitlangote09@gmail.com wrote:

On Thu, Feb 18, 2021 at 6:52 PM Amit Langote amitlangote09@gmail.com wrote:

Based on the discussion at:
/messages/by-id/6929d485-2d2a-da46-3681-4a400a3d794f@enterprisedb.com
I'm posting the patch for $subject here in this new thread and I'll
add it to the next CF per Tomas' advice.

Done:https://commitfest.postgresql.org/32/2992/

apparently I did not receive the review comment I sent via the commitfest app.
Apologies for the chatter. Find the message-id here:

/messages/by-id/161530518971.29967.9368488207318158252.pgcf@coridan.postgresql.org

I continued looking a bit at the patch, yet I am either failing to see fix or I am
looking at the wrong thing. Please find attached a small repro of what my expectetions
were.

As you can see in the repro, I would expect the
UPDATE local_root_remote_partitions SET a = 2;
to move the tuples to remote_partition_2 on the same transaction.
However this is not the case, with or without the patch.

Is my expectation of this patch wrong?

Cheers,
//Georgios

Show quoted text

Amit Langote
EDB: http://www.enterprisedb.com

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Georgios Kokolatos (#3)

Re: Allow batched insert during cross-partition updates

Hi Georgios,

On Wed, Mar 10, 2021 at 12:54 AM Georgios Kokolatos
<gkokolatos@protonmail.com> wrote:

Hi,

thanks for the patch. I had a first look and played around with the code.

The code seems clean, complete, and does what it says on the tin. I will
need a bit more time to acclimatise with all the use cases for a more
thorough review.

Thanks for checking.

I small question though is why expose PartitionTupleRouting and not add
a couple of functions to get the necessary info? If I have read the code
correctly the only members actually needed to be exposed are num_partitions
and partitions. Not a critique, I am just curious.

I had implemented accessor functions in an earlier unposted version of
the patch, but just exposing PartitionTupleRouting does not sound so
harmful, so I switched to that approach. Maybe if others agree with
you that accessor functions would be better, I will change the patch
that way.

--
Amit Langote
EDB: http://www.enterprisedb.com

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Georgios (#5)

Re: Allow batched insert during cross-partition updates

Hi Georgios,

On Wed, Mar 10, 2021 at 9:30 PM Georgios <gkokolatos@protonmail.com> wrote:

I continued looking a bit at the patch, yet I am either failing to see fix or I am
looking at the wrong thing. Please find attached a small repro of what my expectetions
were.

As you can see in the repro, I would expect the
UPDATE local_root_remote_partitions SET a = 2;
to move the tuples to remote_partition_2 on the same transaction.
However this is not the case, with or without the patch.

Is my expectation of this patch wrong?

I think yes. We currently don't have the feature you are looking for
-- moving tuples from one remote partition to another remote
partition. This patch is not for adding that feature.

What we do support however is moving rows from a local partition to a
remote partition and that involves performing an INSERT on the latter.
This patch is for teaching those INSERTs to use batched mode if
allowed, which is currently prohibited. So with this patch, if an
UPDATE moves 10 rows from a local partition to a remote partition,
then they will be inserted with a single INSERT command containing all
10 rows, instead of 10 separate INSERT commands.

--
Amit Langote
EDB: http://www.enterprisedb.com

Noname

gkokolatos@pm.me

almost 5 years ago

In reply to: Amit Langote (#7)

Re: Allow batched insert during cross-partition updates

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, March 11, 2021 9:42 AM, Amit Langote <amitlangote09@gmail.com> wrote:

Hi Georgios,

On Wed, Mar 10, 2021 at 9:30 PM Georgios gkokolatos@protonmail.com wrote:

I continued looking a bit at the patch, yet I am either failing to see fix or I am
looking at the wrong thing. Please find attached a small repro of what my expectetions
were.
As you can see in the repro, I would expect the
UPDATE local_root_remote_partitions SET a = 2;
to move the tuples to remote_partition_2 on the same transaction.
However this is not the case, with or without the patch.
Is my expectation of this patch wrong?

I think yes. We currently don't have the feature you are looking for
-- moving tuples from one remote partition to another remote
partition. This patch is not for adding that feature.

Thank you for correcting me.

What we do support however is moving rows from a local partition to a
remote partition and that involves performing an INSERT on the latter.
This patch is for teaching those INSERTs to use batched mode if
allowed, which is currently prohibited. So with this patch, if an
UPDATE moves 10 rows from a local partition to a remote partition,
then they will be inserted with a single INSERT command containing all
10 rows, instead of 10 separate INSERT commands.

So, if I understand correctly then in my previously attached repro I
should have written instead:

CREATE TABLE local_root_remote_partitions (a int) PARTITION BY LIST ( a );
CREATE TABLE
local_root_local_partition_1
PARTITION OF
local_root_remote_partitions FOR VALUES IN (1);

CREATE FOREIGN TABLE
local_root_remote_partition_2
PARTITION OF
local_root_remote_partitions FOR VALUES IN (2)
SERVER
remote_server
OPTIONS (
table_name 'remote_partition_2',
batch_size '10'
);

INSERT INTO local_root_remote_partitions VALUES (1), (1);
-- Everything should be on local_root_local_partition_1 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

UPDATE local_root_remote_partitions SET a = 2;
-- Everything should be on remote_partition_2 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

I am guessing that I am still wrong because the UPDATE operation above will
fail due to the restrictions imposed in postgresBeginForeignInsert regarding
UPDATES.

Would it be too much to ask for the addition of a test case that will
demonstrate the change of behaviour found in patch?

Cheers,
//Georgios

Show quoted text

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Amit Langote
EDB: http://www.enterprisedb.com

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Noname (#8)

Re: Allow batched insert during cross-partition updates

On Thu, Mar 11, 2021 at 8:36 PM <gkokolatos@pm.me> wrote:

On Thursday, March 11, 2021 9:42 AM, Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Mar 10, 2021 at 9:30 PM Georgios gkokolatos@protonmail.com wrote:

I continued looking a bit at the patch, yet I am either failing to see fix or I am
looking at the wrong thing. Please find attached a small repro of what my expectetions
were.
As you can see in the repro, I would expect the
UPDATE local_root_remote_partitions SET a = 2;
to move the tuples to remote_partition_2 on the same transaction.
However this is not the case, with or without the patch.
Is my expectation of this patch wrong?

I think yes. We currently don't have the feature you are looking for
-- moving tuples from one remote partition to another remote
partition. This patch is not for adding that feature.

Thank you for correcting me.

What we do support however is moving rows from a local partition to a
remote partition and that involves performing an INSERT on the latter.
This patch is for teaching those INSERTs to use batched mode if
allowed, which is currently prohibited. So with this patch, if an
UPDATE moves 10 rows from a local partition to a remote partition,
then they will be inserted with a single INSERT command containing all
10 rows, instead of 10 separate INSERT commands.

So, if I understand correctly then in my previously attached repro I
should have written instead:

CREATE TABLE local_root_remote_partitions (a int) PARTITION BY LIST ( a );
CREATE TABLE
local_root_local_partition_1
PARTITION OF
local_root_remote_partitions FOR VALUES IN (1);

CREATE FOREIGN TABLE
local_root_remote_partition_2
PARTITION OF
local_root_remote_partitions FOR VALUES IN (2)
SERVER
remote_server
OPTIONS (
table_name 'remote_partition_2',
batch_size '10'
);

INSERT INTO local_root_remote_partitions VALUES (1), (1);
-- Everything should be on local_root_local_partition_1 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

UPDATE local_root_remote_partitions SET a = 2;
-- Everything should be on remote_partition_2 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

I am guessing that I am still wrong because the UPDATE operation above will
fail due to the restrictions imposed in postgresBeginForeignInsert regarding
UPDATES.

Yeah, for the move to work without hitting the restriction you
mention, you will need to write the UPDATE query such that
local_root_remote_partition_2 is not updated. For example, as
follows:

UPDATE local_root_remote_partitions SET a = 2 WHERE a <> 2;

With this query, the remote partition is not one of the result
relations to be updated, so is able to escape that restriction.

Would it be too much to ask for the addition of a test case that will
demonstrate the change of behaviour found in patch.

Hmm, I don't think there's a way to display whether the INSERT done on
the remote partition as a part of an (tuple-moving) UPDATE used
batching or not. That's because that INSERT's state is hidden from
EXPLAIN. Maybe we should change EXPLAIN to make it show such hidden
INSERT's state (especially its batch size) under the original UPDATE
node, but I am not sure.

By the way, the test case added by commit 927f453a94106 does exercise
the code added by this patch, but as I said in the last paragraph, we
can't verify that by inspecting EXPLAIN output.

--
Amit Langote
EDB: http://www.enterprisedb.com

#10

Noname

gkokolatos@pm.me

almost 5 years ago

In reply to: Amit Langote (#9)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, March 12, 2021 3:45 AM, Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Mar 11, 2021 at 8:36 PM gkokolatos@pm.me wrote:

On Thursday, March 11, 2021 9:42 AM, Amit Langote amitlangote09@gmail.com wrote:

On Wed, Mar 10, 2021 at 9:30 PM Georgios gkokolatos@protonmail.com wrote:

I continued looking a bit at the patch, yet I am either failing to see fix or I am
looking at the wrong thing. Please find attached a small repro of what my expectetions
were.
As you can see in the repro, I would expect the
UPDATE local_root_remote_partitions SET a = 2;
to move the tuples to remote_partition_2 on the same transaction.
However this is not the case, with or without the patch.
Is my expectation of this patch wrong?

I think yes. We currently don't have the feature you are looking for
-- moving tuples from one remote partition to another remote
partition. This patch is not for adding that feature.

Thank you for correcting me.

What we do support however is moving rows from a local partition to a
remote partition and that involves performing an INSERT on the latter.
This patch is for teaching those INSERTs to use batched mode if
allowed, which is currently prohibited. So with this patch, if an
UPDATE moves 10 rows from a local partition to a remote partition,
then they will be inserted with a single INSERT command containing all
10 rows, instead of 10 separate INSERT commands.

So, if I understand correctly then in my previously attached repro I
should have written instead:

CREATE TABLE local_root_remote_partitions (a int) PARTITION BY LIST ( a );
CREATE TABLE
local_root_local_partition_1
PARTITION OF
local_root_remote_partitions FOR VALUES IN (1);

CREATE FOREIGN TABLE
local_root_remote_partition_2
PARTITION OF
local_root_remote_partitions FOR VALUES IN (2)
SERVER
remote_server
OPTIONS (
table_name 'remote_partition_2',
batch_size '10'
);

INSERT INTO local_root_remote_partitions VALUES (1), (1);
-- Everything should be on local_root_local_partition_1 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

UPDATE local_root_remote_partitions SET a = 2;
-- Everything should be on remote_partition_2 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

I am guessing that I am still wrong because the UPDATE operation above will
fail due to the restrictions imposed in postgresBeginForeignInsert regarding
UPDATES.

Yeah, for the move to work without hitting the restriction you
mention, you will need to write the UPDATE query such that
local_root_remote_partition_2 is not updated. For example, as
follows:

UPDATE local_root_remote_partitions SET a = 2 WHERE a <> 2;

With this query, the remote partition is not one of the result
relations to be updated, so is able to escape that restriction.

Excellent. Thank you for the explanation and patience.

Would it be too much to ask for the addition of a test case that will
demonstrate the change of behaviour found in patch.

Hmm, I don't think there's a way to display whether the INSERT done on
the remote partition as a part of an (tuple-moving) UPDATE used
batching or not. That's because that INSERT's state is hidden from
EXPLAIN. Maybe we should change EXPLAIN to make it show such hidden
INSERT's state (especially its batch size) under the original UPDATE
node, but I am not sure.

Yeah, there does not seem to be a way for explain to do show that information
with the current code.

By the way, the test case added by commit 927f453a94106 does exercise
the code added by this patch, but as I said in the last paragraph, we
can't verify that by inspecting EXPLAIN output.

I never doubted that. However, there is a difference. The current patch
changes the query to be executed in the remote from:

INSERT INTO <snip> VALUES ($1);
to:
INSERT INTO <snip> VALUES ($1), ($2) ... ($n);

When this patch gets in, it would be very helpful to know that subsequent
code changes will not cause regressions. So I was wondering if there is
a way to craft a test case that would break for the code in 927f453a94106
yet succeed with the current patch.

I attach version 2 of my small reproduction. I am under the impression that
in this version, examining the value of cmin in the remote table should
give an indication of whether the remote received a multiple insert queries
with a single value, or a single insert query with multiple values.

Or is this a wrong assumption of mine?

Cheers,
//Georgios

Show quoted text

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Amit Langote
EDB: http://www.enterprisedb.com

#11

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Noname (#10)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

Hi Georgios,

On Fri, Mar 12, 2021 at 7:59 PM <gkokolatos@pm.me> wrote:

On Friday, March 12, 2021 3:45 AM, Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Mar 11, 2021 at 8:36 PM gkokolatos@pm.me wrote:

On Thursday, March 11, 2021 9:42 AM, Amit Langote amitlangote09@gmail.com wrote:

What we do support however is moving rows from a local partition to a
remote partition and that involves performing an INSERT on the latter.
This patch is for teaching those INSERTs to use batched mode if
allowed, which is currently prohibited. So with this patch, if an
UPDATE moves 10 rows from a local partition to a remote partition,
then they will be inserted with a single INSERT command containing all
10 rows, instead of 10 separate INSERT commands.

So, if I understand correctly then in my previously attached repro I
should have written instead:

CREATE TABLE local_root_remote_partitions (a int) PARTITION BY LIST ( a );
CREATE TABLE
local_root_local_partition_1
PARTITION OF
local_root_remote_partitions FOR VALUES IN (1);

CREATE FOREIGN TABLE
local_root_remote_partition_2
PARTITION OF
local_root_remote_partitions FOR VALUES IN (2)
SERVER
remote_server
OPTIONS (
table_name 'remote_partition_2',
batch_size '10'
);

INSERT INTO local_root_remote_partitions VALUES (1), (1);
-- Everything should be on local_root_local_partition_1 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

UPDATE local_root_remote_partitions SET a = 2;
-- Everything should be on remote_partition_2 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

I am guessing that I am still wrong because the UPDATE operation above will
fail due to the restrictions imposed in postgresBeginForeignInsert regarding
UPDATES.

Yeah, for the move to work without hitting the restriction you
mention, you will need to write the UPDATE query such that
local_root_remote_partition_2 is not updated. For example, as
follows:

UPDATE local_root_remote_partitions SET a = 2 WHERE a <> 2;

With this query, the remote partition is not one of the result
relations to be updated, so is able to escape that restriction.

Excellent. Thank you for the explanation and patience.

Would it be too much to ask for the addition of a test case that will
demonstrate the change of behaviour found in patch.

Hmm, I don't think there's a way to display whether the INSERT done on
the remote partition as a part of an (tuple-moving) UPDATE used
batching or not. That's because that INSERT's state is hidden from
EXPLAIN. Maybe we should change EXPLAIN to make it show such hidden
INSERT's state (especially its batch size) under the original UPDATE
node, but I am not sure.

Yeah, there does not seem to be a way for explain to do show that information
with the current code.

By the way, the test case added by commit 927f453a94106 does exercise
the code added by this patch, but as I said in the last paragraph, we
can't verify that by inspecting EXPLAIN output.

I never doubted that. However, there is a difference. The current patch
changes the query to be executed in the remote from:

INSERT INTO <snip> VALUES ($1);
to:
INSERT INTO <snip> VALUES ($1), ($2) ... ($n);

When this patch gets in, it would be very helpful to know that subsequent
code changes will not cause regressions. So I was wondering if there is
a way to craft a test case that would break for the code in 927f453a94106
yet succeed with the current patch.

The test case "works" both before and after the patch, with the
difference being in the form of the remote query. It seems to me
though that you do get that.

I attach version 2 of my small reproduction. I am under the impression that
in this version, examining the value of cmin in the remote table should
give an indication of whether the remote received a multiple insert queries
with a single value, or a single insert query with multiple values.

Or is this a wrong assumption of mine?

No, I think you have a good idea here.

I've adjusted that test case to confirm that the batching indeed works
by checking cmin of the moved rows, as you suggest. Please check the
attached updated patch.
--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v2-0001-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From 9adaa32ff081af79176e0bc08a60bf873970c7b8 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v2] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo() which initializes the insert target
partition allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition use batching too, essentially
reverting a check that 927f453a94106 put in place.

There are two places that need fixing to correctly handle batching
in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition is now exposed whereas it was
previously local to execPartition.c.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 16 ++++-
 contrib/postgres_fdw/postgres_fdw.c           | 11 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 11 ++-
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 50 +++++++++----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 133 insertions(+), 99 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..a415cb6786 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9414,8 +9414,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched inserts also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -9425,7 +9425,7 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
 CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
 -- The following moves a row from the local partition to the foreign one
 UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
@@ -9435,5 +9435,15 @@ SELECT tableoid::regclass, * FROM batch_cp_upd_test;
  batch_cp_upd_test1_f | 1
 (2 rows)
 
+-- To confirm that they were indeed inserted in batch mode, that is, with a
+-- single command, check cmin of the moved rows in the foreign partition's base
+-- table
+SELECT cmin, * FROM batch_cp_upd_test1;
+ cmin | a 
+------+---
+    1 | 1
+    1 | 1
+(2 rows)
+
 -- Clean up
 DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..53472cf73c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,18 +1934,17 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-							NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of
-	 * a row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea44a..68dd6b1011 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2909,8 +2909,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched inserts also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -2920,11 +2920,16 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
 CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
 
 -- The following moves a row from the local partition to the foreign one
 UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
 
+-- To confirm that they were indeed inserted in batch mode, that is, with a
+-- single command, check cmin of the moved rows in the foreign partition's base
+-- table
+SELECT cmin, * FROM batch_cp_upd_test1;
+
 -- Clean up
 DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5967..811997b4c8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partitions touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		'subplan_resultrel_htab'.  The remainder have been built especially
- *		for tuple routing.  See comment for PartitionDispatchData->indexes for
- *		details on how this array is indexed.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * subplan_resultrel_htab
- *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
- *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
- *		NULL in other cases.  Some of these may be useful for tuple routing
- *		to save having to build duplicates.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	int			num_partitions;
-	int			max_partitions;
-	HTAB	   *subplan_resultrel_htab;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -1000,8 +931,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2993ba43e3..5f92854c94 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2059,8 +2059,6 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List				  *relinfos = NIL;
-	ListCell			  *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2277,22 +2275,46 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
 	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	{
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-						   resultRelInfo->ri_Slots,
-						   resultRelInfo->ri_PlanSlots,
-						   resultRelInfo->ri_NumSlots,
-						   estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..846261ef88 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*-----------------------
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partitions touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		'subplan_resultrel_htab'.  The remainder have been built especially
+ *		for tuple routing.  See comment for PartitionDispatchData->indexes for
+ *		details on how this array is indexed.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * subplan_resultrel_htab
+ *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
+ *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
+ *		NULL in other cases.  Some of these may be useful for tuple routing
+ *		to save having to build duplicates.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	int			num_partitions;
+	int			max_partitions;
+	HTAB	   *subplan_resultrel_htab;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#12

Noname

gkokolatos@pm.me

almost 5 years ago

In reply to: Amit Langote (#11)

Re: Allow batched insert during cross-partition updates

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, March 16, 2021 6:13 AM, Amit Langote <amitlangote09@gmail.com> wrote:

Hi Georgios,

On Fri, Mar 12, 2021 at 7:59 PM gkokolatos@pm.me wrote:

On Friday, March 12, 2021 3:45 AM, Amit Langote amitlangote09@gmail.com wrote:

On Thu, Mar 11, 2021 at 8:36 PM gkokolatos@pm.me wrote:

On Thursday, March 11, 2021 9:42 AM, Amit Langote amitlangote09@gmail.com wrote:

What we do support however is moving rows from a local partition to a
remote partition and that involves performing an INSERT on the latter.
This patch is for teaching those INSERTs to use batched mode if
allowed, which is currently prohibited. So with this patch, if an
UPDATE moves 10 rows from a local partition to a remote partition,
then they will be inserted with a single INSERT command containing all
10 rows, instead of 10 separate INSERT commands.

So, if I understand correctly then in my previously attached repro I
should have written instead:

CREATE TABLE local_root_remote_partitions (a int) PARTITION BY LIST ( a );
CREATE TABLE
local_root_local_partition_1
PARTITION OF
local_root_remote_partitions FOR VALUES IN (1);

CREATE FOREIGN TABLE
local_root_remote_partition_2
PARTITION OF
local_root_remote_partitions FOR VALUES IN (2)
SERVER
remote_server
OPTIONS (
table_name 'remote_partition_2',
batch_size '10'
);

INSERT INTO local_root_remote_partitions VALUES (1), (1);
-- Everything should be on local_root_local_partition_1 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

UPDATE local_root_remote_partitions SET a = 2;
-- Everything should be on remote_partition_2 and on the same transaction
SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;

I am guessing that I am still wrong because the UPDATE operation above will
fail due to the restrictions imposed in postgresBeginForeignInsert regarding
UPDATES.

Yeah, for the move to work without hitting the restriction you
mention, you will need to write the UPDATE query such that
local_root_remote_partition_2 is not updated. For example, as
follows:
UPDATE local_root_remote_partitions SET a = 2 WHERE a <> 2;
With this query, the remote partition is not one of the result
relations to be updated, so is able to escape that restriction.

Excellent. Thank you for the explanation and patience.

Would it be too much to ask for the addition of a test case that will
demonstrate the change of behaviour found in patch.

Hmm, I don't think there's a way to display whether the INSERT done on
the remote partition as a part of an (tuple-moving) UPDATE used
batching or not. That's because that INSERT's state is hidden from
EXPLAIN. Maybe we should change EXPLAIN to make it show such hidden
INSERT's state (especially its batch size) under the original UPDATE
node, but I am not sure.

Yeah, there does not seem to be a way for explain to do show that information
with the current code.

By the way, the test case added by commit 927f453a94106 does exercise
the code added by this patch, but as I said in the last paragraph, we
can't verify that by inspecting EXPLAIN output.

I never doubted that. However, there is a difference. The current patch
changes the query to be executed in the remote from:
INSERT INTO <snip> VALUES ($1);
to:
INSERT INTO <snip> VALUES ($1), ($2) ... ($n);
When this patch gets in, it would be very helpful to know that subsequent
code changes will not cause regressions. So I was wondering if there is
a way to craft a test case that would break for the code in 927f453a94106
yet succeed with the current patch.

The test case "works" both before and after the patch, with the
difference being in the form of the remote query. It seems to me
though that you do get that.

I attach version 2 of my small reproduction. I am under the impression that
in this version, examining the value of cmin in the remote table should
give an indication of whether the remote received a multiple insert queries
with a single value, or a single insert query with multiple values.
Or is this a wrong assumption of mine?

No, I think you have a good idea here.

Thank you.

I've adjusted that test case to confirm that the batching indeed works
by checking cmin of the moved rows, as you suggest. Please check the
attached updated patch.

Excellent. The patch in the current version with the added test seems
ready to me.

I would still vote to have accessor functions instead of exposing the
whole PartitionTupleRouting struct, but I am not going to hold a too
strong stance about it.

If you agree with me, please provide an updated version of the patch.
Otherwise let it be known and I will flag the patch as RfC in the
commitfest app.

Cheers,
//Georgios

Show quoted text

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Amit Langote
EDB: http://www.enterprisedb.com

#13

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Noname (#12)

Re: Allow batched insert during cross-partition updates

Hi Georgios,

On Tue, Mar 16, 2021 at 5:12 PM <gkokolatos@pm.me> wrote:

On Tuesday, March 16, 2021 6:13 AM, Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Mar 12, 2021 at 7:59 PM gkokolatos@pm.me wrote:

On Friday, March 12, 2021 3:45 AM, Amit Langote amitlangote09@gmail.com wrote:

By the way, the test case added by commit 927f453a94106 does exercise
the code added by this patch, but as I said in the last paragraph, we
can't verify that by inspecting EXPLAIN output.

I never doubted that. However, there is a difference. The current patch
changes the query to be executed in the remote from:
INSERT INTO <snip> VALUES ($1);
to:
INSERT INTO <snip> VALUES ($1), ($2) ... ($n);
When this patch gets in, it would be very helpful to know that subsequent
code changes will not cause regressions. So I was wondering if there is
a way to craft a test case that would break for the code in 927f453a94106
yet succeed with the current patch.

The test case "works" both before and after the patch, with the
difference being in the form of the remote query. It seems to me
though that you do get that.

I attach version 2 of my small reproduction. I am under the impression that
in this version, examining the value of cmin in the remote table should
give an indication of whether the remote received a multiple insert queries
with a single value, or a single insert query with multiple values.
Or is this a wrong assumption of mine?

No, I think you have a good idea here.

I've adjusted that test case to confirm that the batching indeed works
by checking cmin of the moved rows, as you suggest. Please check the
attached updated patch.

Excellent. The patch in the current version with the added test seems
ready to me.

Thanks for quickly checking that.

I would still vote to have accessor functions instead of exposing the
whole PartitionTupleRouting struct, but I am not going to hold a too
strong stance about it.

I as well, although I would wait for others to chime in before
updating the patch that way.

--
Amit Langote
EDB: http://www.enterprisedb.com

#14

Noname

gkokolatos@pm.me

almost 5 years ago

In reply to: Amit Langote (#13)

Re: Allow batched insert during cross-partition updates

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, March 16, 2021 9:59 AM, Amit Langote <amitlangote09@gmail.com> wrote:

Hi Georgios,

On Tue, Mar 16, 2021 at 5:12 PM gkokolatos@pm.me wrote:

On Tuesday, March 16, 2021 6:13 AM, Amit Langote amitlangote09@gmail.com wrote:

On Fri, Mar 12, 2021 at 7:59 PM gkokolatos@pm.me wrote:

On Friday, March 12, 2021 3:45 AM, Amit Langote amitlangote09@gmail.com wrote:

By the way, the test case added by commit 927f453a94106 does exercise
the code added by this patch, but as I said in the last paragraph, we
can't verify that by inspecting EXPLAIN output.

I never doubted that. However, there is a difference. The current patch
changes the query to be executed in the remote from:
INSERT INTO <snip> VALUES ($1);
to:
INSERT INTO <snip> VALUES ($1), ($2) ... ($n);
When this patch gets in, it would be very helpful to know that subsequent
code changes will not cause regressions. So I was wondering if there is
a way to craft a test case that would break for the code in 927f453a94106
yet succeed with the current patch.

The test case "works" both before and after the patch, with the
difference being in the form of the remote query. It seems to me
though that you do get that.

I attach version 2 of my small reproduction. I am under the impression that
in this version, examining the value of cmin in the remote table should
give an indication of whether the remote received a multiple insert queries
with a single value, or a single insert query with multiple values.
Or is this a wrong assumption of mine?

No, I think you have a good idea here.
I've adjusted that test case to confirm that the batching indeed works
by checking cmin of the moved rows, as you suggest. Please check the
attached updated patch.

Excellent. The patch in the current version with the added test seems
ready to me.

Thanks for quickly checking that.

A pleasure.

I would still vote to have accessor functions instead of exposing the
whole PartitionTupleRouting struct, but I am not going to hold a too
strong stance about it.

I as well, although I would wait for others to chime in before
updating the patch that way.

Fair enough.

Status updated to RfC in the commitfest app.

Show quoted text

----------------------------------------------------------------------------------------------

Amit Langote
EDB: http://www.enterprisedb.com

#15

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Noname (#14)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

On Tue, Mar 16, 2021 at 6:13 PM <gkokolatos@pm.me> wrote:

Status updated to RfC in the commitfest app.

Patch fails to apply per cfbot, so rebased.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v3-0001-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v3-0001-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From d0424e8613ceed8b8868287b2a7592143f282653 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v3] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo() which initializes the insert target
partition allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition use batching too, essentially
reverting a check that 927f453a94106 put in place.

There are two places that need fixing to correctly handle batching
in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition is now exposed whereas it was
previously local to execPartition.c.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 16 ++++-
 contrib/postgres_fdw/postgres_fdw.c           | 11 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 11 ++-
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 50 +++++++++----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 133 insertions(+), 99 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 59e4e27ffb..c503615a28 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9402,8 +9402,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched inserts also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -9413,7 +9413,7 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
 CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
 -- The following moves a row from the local partition to the foreign one
 UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
 ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
@@ -9424,6 +9424,16 @@ SELECT tableoid::regclass, * FROM batch_cp_upd_test;
  batch_cp_up_test1    | 2
 (2 rows)
 
+-- To confirm that they were indeed inserted in batch mode, that is, with a
+-- single command, check cmin of the moved rows in the foreign partition's base
+-- table
+SELECT cmin, * FROM batch_cp_upd_test1;
+ cmin | a 
+------+---
+    1 | 1
+    1 | 1
+(2 rows)
+
 -- Clean up
 DROP TABLE batch_table, batch_cp_upd_test CASCADE;
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 16c2979f2d..eeb52ca4d9 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1959,18 +1959,17 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-							NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of
-	 * a row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 107d1c0e03..f4fdf07549 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2924,8 +2924,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched inserts also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -2935,12 +2935,17 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
 CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
 
 -- The following moves a row from the local partition to the foreign one
 UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
 
+-- To confirm that they were indeed inserted in batch mode, that is, with a
+-- single command, check cmin of the moved rows in the foreign partition's base
+-- table
+SELECT cmin, * FROM batch_cp_upd_test1;
+
 -- Clean up
 DROP TABLE batch_table, batch_cp_upd_test CASCADE;
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 558060e080..75a01ba7e0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partitions touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		'subplan_resultrel_htab'.  The remainder have been built especially
- *		for tuple routing.  See comment for PartitionDispatchData->indexes for
- *		details on how this array is indexed.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * subplan_resultrel_htab
- *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
- *		cache ResultRelInfos from targets of an UPDATE ModifyTable node;
- *		NULL in other cases.  Some of these may be useful for tuple routing
- *		to save having to build duplicates.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	int			num_partitions;
-	int			max_partitions;
-	HTAB	   *subplan_resultrel_htab;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -1001,8 +932,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bf65785e64..813161c9fc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2158,8 +2158,6 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List	   *relinfos = NIL;
-	ListCell   *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2438,22 +2436,46 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
 	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	{
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-						   resultRelInfo->ri_Slots,
-						   resultRelInfo->ri_PlanSlots,
-						   resultRelInfo->ri_NumSlots,
-						   estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..846261ef88 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*-----------------------
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partitions touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		'subplan_resultrel_htab'.  The remainder have been built especially
+ *		for tuple routing.  See comment for PartitionDispatchData->indexes for
+ *		details on how this array is indexed.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * subplan_resultrel_htab
+ *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
+ *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
+ *		NULL in other cases.  Some of these may be useful for tuple routing
+ *		to save having to build duplicates.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	int			num_partitions;
+	int			max_partitions;
+	HTAB	   *subplan_resultrel_htab;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#16

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Amit Langote (#15)

Re: Allow batched insert during cross-partition updates

Hi,
In the description:

cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo() which initializes the insert target

'which' should be dropped since 'because' should start a sentence.

+-- Check that batched inserts also works for inserts made during

inserts also works -> inserts also work

+       Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+              RELKIND_PARTITIONED_TABLE);

The level of nested field accesses is quite deep. If the assertion fails,
it would be hard to know which field is null.
Maybe use several assertions:
Assert(node->rootResultRelInfo)
Assert(node->rootResultRelInfo->ri_RelationDesc)
Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
...

Cheers

On Sun, Apr 4, 2021 at 8:06 AM Amit Langote <amitlangote09@gmail.com> wrote:

Show quoted text

On Tue, Mar 16, 2021 at 6:13 PM <gkokolatos@pm.me> wrote:

Status updated to RfC in the commitfest app.

Patch fails to apply per cfbot, so rebased.

--
Amit Langote
EDB: http://www.enterprisedb.com

#17

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#16)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

On Mon, Apr 5, 2021 at 1:16 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
In the description:

cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo() which initializes the insert target

'which' should be dropped since 'because' should start a sentence.

+-- Check that batched inserts also works for inserts made during

inserts also works -> inserts also work
+       Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+              RELKIND_PARTITIONED_TABLE);
The level of nested field accesses is quite deep. If the assertion fails, it would be hard to know which field is null.
Maybe use several assertions:
Assert(node->rootResultRelInfo)
Assert(node->rootResultRelInfo->ri_RelationDesc)
Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind == ...

Thanks for taking a look at this.

While I agree about having the 1st Assert you suggest, I don't think
this code needs the 2nd one, because its failure would very likely be
due to a problem in some totally unrelated code.

Updated patch attached. I had to adjust the test case a little bit to
account for the changes of 86dc90056d, something I failed to notice
yesterday. Also, I expanded the test case added in postgres_fdw.sql a
bit to show the batching in action more explicitly.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v4-0001-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v4-0001-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From d90ea2d0178e1bd6e4e0c3c92ecc932c92f423b3 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v4] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo(), which initializes the insert target
partition, allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition use batching too, essentially
reverting a check that 927f453a94106 put in place.

There are two places that need fixing to correctly handle batching
in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition is now exposed whereas it was
previously local to execPartition.c.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 60 +++++++++++++---
 contrib/postgres_fdw/postgres_fdw.c           | 11 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 33 +++++++--
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 51 +++++++++----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 191 insertions(+), 108 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 59e4e27ffb..30712f6b26 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9402,8 +9402,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched inserts also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -9411,18 +9411,60 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2), (2);
+-- The following moves rows from the local partition to the foreign one that
+-- has insert batching enabled
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test1_f | 1
+(3 rows)
+
+-- cmin should be the same in all the rows, because they would've been
+-- inserted by one command
+SELECT cmin, * FROM batch_cp_upd_test1;
+ cmin | a 
+------+---
+    0 | 1
+    0 | 1
+    0 | 1
+(3 rows)
+
+INSERT INTO batch_cp_upd_test VALUES (2), (2), (2);
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+       tableoid       | a 
+----------------------+---
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(6 rows)
+
+-- cmin shoud be different across rows, because each one would be inserted
+-- by its own command
+SELECT cmin, * FROM batch_cp_upd_test3;
+ cmin | a 
+------+---
+    0 | 3
+    1 | 3
+    2 | 3
+(3 rows)
 
 -- Clean up
 DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 16c2979f2d..eeb52ca4d9 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1959,18 +1959,17 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-							NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of
-	 * a row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 107d1c0e03..22362ec52c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2924,8 +2924,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched inserts also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -2933,13 +2933,34 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2), (2);
+
+-- The following moves rows from the local partition to the foreign one that
+-- has insert batching enabled
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+
+-- cmin should be the same in all the rows, because they would've been
+-- inserted by one command
+SELECT cmin, * FROM batch_cp_upd_test1;
+
+INSERT INTO batch_cp_upd_test VALUES (2), (2), (2);
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- cmin shoud be different across rows, because each one would be inserted
+-- by its own command
+SELECT cmin, * FROM batch_cp_upd_test3;
 
 -- Clean up
 DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 558060e080..75a01ba7e0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partitions touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		'subplan_resultrel_htab'.  The remainder have been built especially
- *		for tuple routing.  See comment for PartitionDispatchData->indexes for
- *		details on how this array is indexed.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * subplan_resultrel_htab
- *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
- *		cache ResultRelInfos from targets of an UPDATE ModifyTable node;
- *		NULL in other cases.  Some of these may be useful for tuple routing
- *		to save having to build duplicates.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	int			num_partitions;
-	int			max_partitions;
-	HTAB	   *subplan_resultrel_htab;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -1001,8 +932,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bf65785e64..0460e2704a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2158,8 +2158,6 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List	   *relinfos = NIL;
-	ListCell   *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2438,22 +2436,47 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
 	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	{
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo);
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-						   resultRelInfo->ri_Slots,
-						   resultRelInfo->ri_PlanSlots,
-						   resultRelInfo->ri_NumSlots,
-						   estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..846261ef88 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*-----------------------
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partitions touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		'subplan_resultrel_htab'.  The remainder have been built especially
+ *		for tuple routing.  See comment for PartitionDispatchData->indexes for
+ *		details on how this array is indexed.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * subplan_resultrel_htab
+ *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
+ *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
+ *		NULL in other cases.  Some of these may be useful for tuple routing
+ *		to save having to build duplicates.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	int			num_partitions;
+	int			max_partitions;
+	HTAB	   *subplan_resultrel_htab;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#18

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 5 years ago

In reply to: Amit Langote (#17)

Re: Allow batched insert during cross-partition updates

On Tue, Apr 6, 2021 at 3:08 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Apr 5, 2021 at 1:16 AM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
In the description:

cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo() which initializes the insert target

'which' should be dropped since 'because' should start a sentence.

+-- Check that batched inserts also works for inserts made during

inserts also works -> inserts also work
+       Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+              RELKIND_PARTITIONED_TABLE);
The level of nested field accesses is quite deep. If the assertion fails, it would be hard to know which field is null.
Maybe use several assertions:
Assert(node->rootResultRelInfo)
Assert(node->rootResultRelInfo->ri_RelationDesc)
Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind == ...
Thanks for taking a look at this.

While I agree about having the 1st Assert you suggest, I don't think
this code needs the 2nd one, because its failure would very likely be
due to a problem in some totally unrelated code.

Updated patch attached. I had to adjust the test case a little bit to
account for the changes of 86dc90056d, something I failed to notice
yesterday. Also, I expanded the test case added in postgres_fdw.sql a
bit to show the batching in action more explicitly.

Some minor comments:
1) don't we need order by clause for the selects in the tests added?
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+SELECT cmin, * FROM batch_cp_upd_test1;

2) typo - it is "should" not "shoud"
+-- cmin shoud be different across rows, because each one would be inserted

3) will the cmin in the output always be the same?
+SELECT cmin, * FROM batch_cp_upd_test3;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#19

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Bharath Rupireddy (#18)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

On Tue, Apr 6, 2021 at 6:49 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, Apr 6, 2021 at 3:08 PM Amit Langote <amitlangote09@gmail.com> wrote:

Updated patch attached. I had to adjust the test case a little bit to
account for the changes of 86dc90056d, something I failed to notice
yesterday. Also, I expanded the test case added in postgres_fdw.sql a
bit to show the batching in action more explicitly.

Some minor comments:

Thanks for the review.

1) don't we need order by clause for the selects in the tests added?
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;

Good point. It wasn't necessary before, but it is after the test
expansion, so added.

3) will the cmin in the output always be the same?
+SELECT cmin, * FROM batch_cp_upd_test3;

TBH, I am not so sure. Maybe it's not a good idea to rely on cmin
after all. I've rewritten the tests to use a different method of
determining if a single or multiple insert commands were used in
moving rows into foreign partitions.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v5-0001-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v5-0001-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From a81fe61315d269068f93d532cb62643e2e2d6a4f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v5] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo(), which initializes the insert target
partition, allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition use batching too, essentially
reverting a check that 927f453a94106 put in place.

There are two places that need fixing to correctly handle batching
in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition is now exposed whereas it was
previously local to execPartition.c.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 53 +++++++++++---
 contrib/postgres_fdw/postgres_fdw.c           | 11 ++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 44 +++++++++---
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 51 +++++++++----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 191 insertions(+), 112 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 59e4e27ffb..97d16be11f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9402,8 +9402,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -9411,21 +9411,52 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(4 rows)
+
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
+             cmd              
+------------------------------
+ INSERT on batch_cp_upd_test1
+ INSERT on batch_cp_upd_test3
+ INSERT on batch_cp_upd_test3
+(3 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog CASCADE;
 -- ===================================================================
 -- test asynchronous execution
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 16c2979f2d..eeb52ca4d9 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1959,18 +1959,17 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-							NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of
-	 * a row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 107d1c0e03..e6e4b264ef 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2924,8 +2924,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -2933,16 +2933,44 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog CASCADE;
 
 -- ===================================================================
 -- test asynchronous execution
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 558060e080..75a01ba7e0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partitions touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		'subplan_resultrel_htab'.  The remainder have been built especially
- *		for tuple routing.  See comment for PartitionDispatchData->indexes for
- *		details on how this array is indexed.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * subplan_resultrel_htab
- *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
- *		cache ResultRelInfos from targets of an UPDATE ModifyTable node;
- *		NULL in other cases.  Some of these may be useful for tuple routing
- *		to save having to build duplicates.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	int			num_partitions;
-	int			max_partitions;
-	HTAB	   *subplan_resultrel_htab;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -1001,8 +932,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bf65785e64..0460e2704a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2158,8 +2158,6 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List	   *relinfos = NIL;
-	ListCell   *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2438,22 +2436,47 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
 	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	{
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo);
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-						   resultRelInfo->ri_Slots,
-						   resultRelInfo->ri_PlanSlots,
-						   resultRelInfo->ri_NumSlots,
-						   estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..846261ef88 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*-----------------------
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partitions touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		'subplan_resultrel_htab'.  The remainder have been built especially
+ *		for tuple routing.  See comment for PartitionDispatchData->indexes for
+ *		details on how this array is indexed.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * subplan_resultrel_htab
+ *		Hash table to store subplan ResultRelInfos by Oid.  This is used to
+ *		cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
+ *		NULL in other cases.  Some of these may be useful for tuple routing
+ *		to save having to build duplicates.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	int			num_partitions;
+	int			max_partitions;
+	HTAB	   *subplan_resultrel_htab;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#20

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

almost 5 years ago

In reply to: Amit Langote (#19)

Re: Allow batched insert during cross-partition updates

On Tue, Apr 6, 2021 at 6:37 PM Amit Langote <amitlangote09@gmail.com> wrote:

3) will the cmin in the output always be the same?
+SELECT cmin, * FROM batch_cp_upd_test3;

TBH, I am not so sure. Maybe it's not a good idea to rely on cmin
after all. I've rewritten the tests to use a different method of
determining if a single or multiple insert commands were used in
moving rows into foreign partitions.

Thanks! It looks good!

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#21

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Bharath Rupireddy (#20)

Re: Allow batched insert during cross-partition updates

On Tue, Apr 6, 2021 at 10:52 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, Apr 6, 2021 at 6:37 PM Amit Langote <amitlangote09@gmail.com> wrote:

3) will the cmin in the output always be the same?
+SELECT cmin, * FROM batch_cp_upd_test3;

TBH, I am not so sure. Maybe it's not a good idea to rely on cmin
after all. I've rewritten the tests to use a different method of
determining if a single or multiple insert commands were used in
moving rows into foreign partitions.

Thanks! It looks good!

Thanks for checking. I'll mark this as RfC.

--
Amit Langote
EDB: http://www.enterprisedb.com

#22

houzj.fnst@fujitsu.com

over 4 years ago

In reply to: Amit Langote (#21)

RE: Allow batched insert during cross-partition updates

Thanks! It looks good!

Thanks for checking. I'll mark this as RfC.

Hi,

The patch cannot be applied to the latest head branch, it will be nice if you can rebase it.
And when looking into the patch, I have some comments on it.

1)
IIRC, After the commit c5b7ba4, the initialization of mt_partition_tuple_routing was postponed out of ExecInitModifyTable.
So, the following if-test use "proute" which is initialized at the beginning of the ExecModifyTable() could be out of date.
And the regression test of postgres_fdw failed with the patch after the commit c5b7ba4.

+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
 	if (proute)
...

It seems we should get the mt_partition_tuple_routing just before the if-test.

2)
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			if (resultRelInfo &&

I am not sure do we need to check if resultRelInfo == NULL because the the existing code did not check it.
And if we need to check it, it might be better use "if (resultRelInfo == NULL &&..."

3)
+	if (fmstate && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;

It might be better to write like " if (fmstate != NULL && fmstate->aux_fmstate != NULL)".

Best regards,
houzj

#23

Amit Langote

amitlangote09@gmail.com

over 4 years ago

In reply to: houzj.fnst@fujitsu.com (#22)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

On Fri, May 7, 2021 at 6:39 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Thanks! It looks good!

Thanks for checking. I'll mark this as RfC.

Hi,

The patch cannot be applied to the latest head branch, it will be nice if you can rebase it.

Thanks, done.

And when looking into the patch, I have some comments on it.

1)
IIRC, After the commit c5b7ba4, the initialization of mt_partition_tuple_routing was postponed out of ExecInitModifyTable.
So, the following if-test use "proute" which is initialized at the beginning of the ExecModifyTable() could be out of date.
And the regression test of postgres_fdw failed with the patch after the commit c5b7ba4.
+        * If the query's main target relation is a partitioned table, any inserts
+        * would have been performed using tuple routing, so use the appropriate
+        * set of target relations.  Note that this also covers any inserts
+        * performed during cross-partition UPDATEs that would have occurred
+        * through tuple routing.
*/
if (proute)
...
It seems we should get the mt_partition_tuple_routing just before the if-test.

That's a good observation. Fixed.

2)
+               foreach(lc, estate->es_opened_result_relations)
+               {
+                       resultRelInfo = lfirst(lc);
+                       if (resultRelInfo &&
I am not sure do we need to check if resultRelInfo == NULL because the the existing code did not check it.
And if we need to check it, it might be better use "if (resultRelInfo == NULL &&..."

I don't quite remember why I added that test, because nowhere do we
add a NULL value to es_opened_result_relations. Actually, we can even
Assert(resultRelInfo != NULL) here.

3)
+       if (fmstate && fmstate->aux_fmstate != NULL)
+               fmstate = fmstate->aux_fmstate;
It might be better to write like " if (fmstate != NULL && fmstate->aux_fmstate != NULL)".

Sure, done. Actually, there's a if (fmstate) statement right below
the code being added, which I fixed to match the style used by the new
code.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v6-0001-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v6-0001-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From fcabb1a384ba54c71848a66da88af880a33ce50b Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v6] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo(), which initializes the insert target
partition, allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition updates use batching too,
essentially reverting a check that 927f453a94106 put in place.

There are two places that needed to be fixed in order to correctly
handle batching in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition, which was previously local to
execPartition.c, is exposed via execPartition.h.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 53 +++++++++++---
 contrib/postgres_fdw/postgres_fdw.c           | 13 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 44 +++++++++---
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 56 ++++++++++-----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 195 insertions(+), 115 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6f533c745d..983fc3f940 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9616,8 +9616,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -9625,21 +9625,52 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(4 rows)
+
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
+             cmd              
+------------------------------
+ INSERT on batch_cp_upd_test1
+ INSERT on batch_cp_upd_test3
+ INSERT on batch_cp_upd_test3
+(3 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog CASCADE;
 -- ===================================================================
 -- test asynchronous execution
 -- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 4ff58d9c27..1a39a2ebe3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1965,25 +1965,24 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-							(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-							NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of
-	 * a row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate != NULL && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (fmstate)
+	if (fmstate != NULL)
 		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 000e2534fc..aaa3532ec7 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3029,8 +3029,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -3038,16 +3038,44 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog CASCADE;
 
 -- ===================================================================
 -- test asynchronous execution
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 8afddca73a..471eb2d54b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partition touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		the owning ModifyTableState node.  The remainder have been built
- *		especially for tuple routing.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * is_borrowed_rel
- *		Array of 'max_partitions' booleans recording whether a given entry
- *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
- *		ModifyTableState node, rather than being built here.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	bool	   *is_borrowed_rel;
-	int			num_partitions;
-	int			max_partitions;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -918,8 +849,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c5a2a9a054..f766d25859 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2334,9 +2334,6 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List	   *relinfos = NIL;
-	ListCell   *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2591,22 +2588,49 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
-	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	if (node->mt_partition_tuple_routing)
+	{
+		PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo);
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-						   resultRelInfo->ri_Slots,
-						   resultRelInfo->ri_PlanSlots,
-						   resultRelInfo->ri_NumSlots,
-						   estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			Assert(resultRelInfo != NULL);
+			if (resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..29d485e914 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partition touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		the owning ModifyTableState node.  The remainder have been built
+ *		especially for tuple routing.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * is_borrowed_rel
+ *		Array of 'max_partitions' booleans recording whether a given entry
+ *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
+ *		ModifyTableState node, rather than being built here.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	bool	   *is_borrowed_rel;
+	int			num_partitions;
+	int			max_partitions;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#24

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Amit Langote (#23)

Re: Allow batched insert during cross-partition updates

Amit Langote <amitlangote09@gmail.com> writes:

[ v6-0001-Allow-batching-of-inserts-during-cross-partition-.patch ]

Per the cfbot, this isn't applying anymore, so I'm setting it back
to Waiting on Author.

regards, tom lane

#25

Amit Langote

amitlangote09@gmail.com

over 4 years ago

In reply to: Tom Lane (#24)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

On Fri, Jul 2, 2021 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

[ v6-0001-Allow-batching-of-inserts-during-cross-partition-.patch ]

Per the cfbot, this isn't applying anymore, so I'm setting it back
to Waiting on Author.

Rebased patch attached. Thanks for the reminder.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v7-0001-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v7-0001-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From e95aa9a127e29953a6260567496d1f3c80038964 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v7] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo(), which initializes the insert target
partition, allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition updates use batching too,
essentially reverting a check that 927f453a94106 put in place.

There are two places that needed to be fixed in order to correctly
handle batching in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition, which was previously local to
execPartition.c, is exposed via execPartition.h.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 53 +++++++++++---
 contrib/postgres_fdw/postgres_fdw.c           | 13 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 44 +++++++++---
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 56 ++++++++++-----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 195 insertions(+), 115 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 31b5de91ad..e729b7b00b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9741,8 +9741,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -9750,21 +9750,52 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(4 rows)
+
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
+             cmd              
+------------------------------
+ INSERT on batch_cp_upd_test1
+ INSERT on batch_cp_upd_test3
+ INSERT on batch_cp_upd_test3
+(3 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog, batch_table_p0, batch_table_p1 CASCADE;
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
 CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fafbab6b02..dd9f1c5274 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2016,25 +2016,24 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int			batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-	(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-	NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of a
-	 * row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate != NULL && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_FdwState is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (fmstate)
+	if (fmstate != NULL)
 		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 286dd99573..36569c0896 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3065,8 +3065,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -3074,16 +3074,44 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog, batch_table_p0, batch_table_p1 CASCADE;
 
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 606c920b06..2a4b5a85eb 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partition touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		the owning ModifyTableState node.  The remainder have been built
- *		especially for tuple routing.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * is_borrowed_rel
- *		Array of 'max_partitions' booleans recording whether a given entry
- *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
- *		ModifyTableState node, rather than being built here.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	bool	   *is_borrowed_rel;
-	int			num_partitions;
-	int			max_partitions;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -923,8 +854,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c24684aa6f..8fcbdb89ff 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2358,9 +2358,6 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List	   *relinfos = NIL;
-	ListCell   *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2615,22 +2612,49 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
-	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	if (node->mt_partition_tuple_routing)
+	{
+		PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo);
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-							resultRelInfo->ri_Slots,
-							resultRelInfo->ri_PlanSlots,
-							resultRelInfo->ri_NumSlots,
-							estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			Assert(resultRelInfo != NULL);
+			if (resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..29d485e914 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partition touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		the owning ModifyTableState node.  The remainder have been built
+ *		especially for tuple routing.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * is_borrowed_rel
+ *		Array of 'max_partitions' booleans recording whether a given entry
+ *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
+ *		ModifyTableState node, rather than being built here.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	bool	   *is_borrowed_rel;
+	int			num_partitions;
+	int			max_partitions;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#26

vignesh C

vignesh21@gmail.com

over 4 years ago

In reply to: Amit Langote (#25)

Re: Allow batched insert during cross-partition updates

On Fri, Jul 2, 2021 at 7:35 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Jul 2, 2021 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

[ v6-0001-Allow-batching-of-inserts-during-cross-partition-.patch ]

Per the cfbot, this isn't applying anymore, so I'm setting it back
to Waiting on Author.

Rebased patch attached. Thanks for the reminder.

One of the test postgres_fdw has failed, can you please post an
updated patch for the fix:
test postgres_fdw ... FAILED (test process exited with
exit code 2) 7264 ms

Regards,
Vignesh

#27

Amit Langote

amitlangote09@gmail.com

over 4 years ago

In reply to: vignesh C (#26)

2 attachment(s)

Re: Allow batched insert during cross-partition updates

On Thu, Jul 22, 2021 at 2:18 PM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Jul 2, 2021 at 7:35 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Jul 2, 2021 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

[ v6-0001-Allow-batching-of-inserts-during-cross-partition-.patch ]

Per the cfbot, this isn't applying anymore, so I'm setting it back
to Waiting on Author.

Rebased patch attached. Thanks for the reminder.

One of the test postgres_fdw has failed, can you please post an
updated patch for the fix:
test postgres_fdw ... FAILED (test process exited with
exit code 2) 7264 ms

Thanks Vignesh.

I found a problem with the underlying batching code that caused this
failure and have just reported it here:

/messages/by-id/CA+HiwqEWd5B0-e-RvixGGUrNvGkjH2s4m95=JcwUnyV=f0rAKQ@mail.gmail.com

Here's v8, including the patch for the above problem.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v8-0002-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v8-0002-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From 72d33b12941f29d26ea6a07ded0a839486523ddb Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v8 2/2] Allow batching of inserts during cross-partition
 updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo(), which initializes the insert target
partition, allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition updates use batching too,
essentially reverting a check that 927f453a94106 put in place.

There are two places that needed to be fixed in order to correctly
handle batching in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition, which was previously local to
execPartition.c, is exposed via execPartition.h.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 53 +++++++++++---
 contrib/postgres_fdw/postgres_fdw.c           | 13 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 44 +++++++++---
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 56 ++++++++++-----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 195 insertions(+), 115 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ed25e7a743..9522a9f80c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9763,8 +9763,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -9772,21 +9772,52 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(4 rows)
+
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
+             cmd              
+------------------------------
+ INSERT on batch_cp_upd_test1
+ INSERT on batch_cp_upd_test3
+ INSERT on batch_cp_upd_test3
+(3 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog, batch_table_p0, batch_table_p1 CASCADE;
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
 CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f15c97ad7a..634d278097 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2008,25 +2008,24 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int			batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-	(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-	NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of a
-	 * row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate != NULL && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_FdwState is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (fmstate)
+	if (fmstate != NULL)
 		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 02a6b15a13..ffd5a037d8 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3071,8 +3071,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -3080,16 +3080,44 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog, batch_table_p0, batch_table_p1 CASCADE;
 
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5c723bc54e..3dc80f7399 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partition touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		the owning ModifyTableState node.  The remainder have been built
- *		especially for tuple routing.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * is_borrowed_rel
- *		Array of 'max_partitions' booleans recording whether a given entry
- *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
- *		ModifyTableState node, rather than being built here.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	bool	   *is_borrowed_rel;
-	int			num_partitions;
-	int			max_partitions;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -923,8 +854,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index dad459a467..2a97b719b6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2360,9 +2360,6 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List	   *relinfos = NIL;
-	ListCell   *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2617,22 +2614,49 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
-	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	if (node->mt_partition_tuple_routing)
+	{
+		PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo);
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-							resultRelInfo->ri_Slots,
-							resultRelInfo->ri_PlanSlots,
-							resultRelInfo->ri_NumSlots,
-							estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			Assert(resultRelInfo != NULL);
+			if (resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..29d485e914 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partition touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		the owning ModifyTableState node.  The remainder have been built
+ *		especially for tuple routing.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * is_borrowed_rel
+ *		Array of 'max_partitions' booleans recording whether a given entry
+ *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
+ *		ModifyTableState node, rather than being built here.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	bool	   *is_borrowed_rel;
+	int			num_partitions;
+	int			max_partitions;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

v8-0001-Fix-a-thinko-in-b676ac443b6.patchapplication/octet-stream; name=v8-0001-Fix-a-thinko-in-b676ac443b6.patchDownload

From dab04f1cc39a53b4afd122a3b21073f97252a225 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 27 Jul 2021 11:28:51 +0900
Subject: [PATCH v8 1/2] Fix a thinko in b676ac443b6

---
 src/backend/executor/nodeModifyTable.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c24684aa6f..dad459a467 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -714,12 +714,14 @@ ExecInsert(ModifyTableState *mtstate,
 			if (resultRelInfo->ri_NumSlots >= resultRelInfo->ri_NumSlotsInitialized)
 			{
 				TupleDesc	tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+				TupleDesc	plan_tdesc =
+					CreateTupleDescCopy(planSlot->tts_tupleDescriptor);
 
 				resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
 					MakeSingleTupleTableSlot(tdesc, slot->tts_ops);
 
 				resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-					MakeSingleTupleTableSlot(tdesc, planSlot->tts_ops);
+					MakeSingleTupleTableSlot(plan_tdesc, planSlot->tts_ops);
 
 				/* remember how many batch slots we initialized */
 				resultRelInfo->ri_NumSlotsInitialized++;
-- 
2.24.1

#28

Amit Langote

amitlangote09@gmail.com

over 4 years ago

In reply to: Amit Langote (#27)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

On Tue, Jul 27, 2021 at 11:32 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jul 22, 2021 at 2:18 PM vignesh C <vignesh21@gmail.com> wrote:

One of the test postgres_fdw has failed, can you please post an
updated patch for the fix:
test postgres_fdw ... FAILED (test process exited with
exit code 2) 7264 ms

Thanks Vignesh.

I found a problem with the underlying batching code that caused this
failure and have just reported it here:

/messages/by-id/CA+HiwqEWd5B0-e-RvixGGUrNvGkjH2s4m95=JcwUnyV=f0rAKQ@mail.gmail.com

Here's v8, including the patch for the above problem.

Tomas committed the bug-fix, so attaching a rebased version of the
patch for $subject.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v9-0001-Allow-batching-of-inserts-during-cross-partition-.patchapplication/octet-stream; name=v9-0001-Allow-batching-of-inserts-during-cross-partition-.patchDownload

From 8b7a620b15d9f90e54894bb9aa3064bb6a569d9c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v9] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo(), which initializes the insert target
partition, allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition updates use batching too,
essentially reverting a check that 927f453a94106 put in place.

There are two places that needed to be fixed in order to correctly
handle batching in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition, which was previously local to
execPartition.c, is exposed via execPartition.h.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 53 +++++++++++---
 contrib/postgres_fdw/postgres_fdw.c           | 13 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 44 +++++++++---
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 56 ++++++++++-----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 195 insertions(+), 115 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index e3ee30f1aa..370a5896b2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9954,8 +9954,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -9963,21 +9963,52 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(4 rows)
+
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
+             cmd              
+------------------------------
+ INSERT on batch_cp_upd_test1
+ INSERT on batch_cp_upd_test3
+ INSERT on batch_cp_upd_test3
+(3 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog, batch_table_p0, batch_table_p1 CASCADE;
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
 CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9d443baf02..d59c49b605 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2009,25 +2009,24 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int			batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-	(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-	NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of a
-	 * row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate != NULL && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_FdwState is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (fmstate)
+	if (fmstate != NULL)
 		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 30b5175da5..150f839aa0 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3127,8 +3127,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -3136,16 +3136,44 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog, batch_table_p0, batch_table_p1 CASCADE;
 
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5c723bc54e..3dc80f7399 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partition touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		the owning ModifyTableState node.  The remainder have been built
- *		especially for tuple routing.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * is_borrowed_rel
- *		Array of 'max_partitions' booleans recording whether a given entry
- *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
- *		ModifyTableState node, rather than being built here.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	bool	   *is_borrowed_rel;
-	int			num_partitions;
-	int			max_partitions;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -923,8 +854,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d328856ae5..1c2e2ecbac 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2360,9 +2360,6 @@ ExecModifyTable(PlanState *pstate)
 	ItemPointerData tuple_ctid;
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List	   *relinfos = NIL;
-	ListCell   *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2617,22 +2614,49 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
-	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	if (node->mt_partition_tuple_routing)
+	{
+		PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo);
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-							resultRelInfo->ri_Slots,
-							resultRelInfo->ri_PlanSlots,
-							resultRelInfo->ri_NumSlots,
-							estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			Assert(resultRelInfo != NULL);
+			if (resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..29d485e914 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partition touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		the owning ModifyTableState node.  The remainder have been built
+ *		especially for tuple routing.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * is_borrowed_rel
+ *		Array of 'max_partitions' booleans recording whether a given entry
+ *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
+ *		ModifyTableState node, rather than being built here.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	bool	   *is_borrowed_rel;
+	int			num_partitions;
+	int			max_partitions;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#29

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Amit Langote (#28)

Re: Allow batched insert during cross-partition updates

Hi,

On 2021-08-24 12:03:59 +0900, Amit Langote wrote:

Tomas committed the bug-fix, so attaching a rebased version of the
patch for $subject.

This patch is in the current CF, but doesn't apply: http://cfbot.cputube.org/patch_37_2992.log
marked the entry as waiting-on-author.

Greetings,

Andres Freund

#30

Amit Langote

amitlangote09@gmail.com

almost 4 years ago

In reply to: Andres Freund (#29)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

On Tue, Mar 22, 2022 at 9:30 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-08-24 12:03:59 +0900, Amit Langote wrote:

Tomas committed the bug-fix, so attaching a rebased version of the
patch for $subject.

This patch is in the current CF, but doesn't apply: http://cfbot.cputube.org/patch_37_2992.log
marked the entry as waiting-on-author.

Thanks for the heads up.

Rebased to fix a minor conflict with some recently committed
nodeModifyTable.c changes.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v10-0001-Allow-batching-of-inserts-during-cross-partition.patchapplication/octet-stream; name=v10-0001-Allow-batching-of-inserts-during-cross-partition.patchDownload

From a942e77a9eb8acf23265c21cb9ef6d648288b29e Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v10] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo(), which initializes the insert target
partition, allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition updates use batching too,
essentially reverting a check that 927f453a94106 put in place.

There are two places that needed to be fixed in order to correctly
handle batching in such cases:

* postgresGetForeignModifyBatchSize() must look at the appropriate
PgFdwModifyState, that is, one that belongs to the insert, not to the
original update operation, because only the former would be set up to
contain the information necessary to perform batching.

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted.  To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition, which was previously local to
execPartition.c, is exposed via execPartition.h.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 53 +++++++++++---
 contrib/postgres_fdw/postgres_fdw.c           | 13 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 44 +++++++++---
 src/backend/executor/execPartition.c          | 72 +------------------
 src/backend/executor/nodeModifyTable.c        | 56 ++++++++++-----
 src/include/executor/execPartition.h          | 72 ++++++++++++++++++-
 6 files changed, 195 insertions(+), 115 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f210f91188..93101a7856 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10014,8 +10014,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -10023,21 +10023,52 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(4 rows)
+
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
+             cmd              
+------------------------------
+ INSERT on batch_cp_upd_test1
+ INSERT on batch_cp_upd_test3
+ INSERT on batch_cp_upd_test3
+(3 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog, batch_table_p0, batch_table_p1 CASCADE;
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
 CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 56654844e8..08fc006745 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2022,25 +2022,24 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int			batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-	(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-	NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of a
-	 * row movement operation.
+	 * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+	 * details on what it represents.
 	 */
-	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+	if (fmstate != NULL && fmstate->aux_fmstate != NULL)
+		fmstate = fmstate->aux_fmstate;
 
 	/*
 	 * In EXPLAIN without ANALYZE, ri_FdwState is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determined earlier.
 	 */
-	if (fmstate)
+	if (fmstate != NULL)
 		batch_size = fmstate->batch_size;
 	else
 		batch_size = get_batch_size_option(resultRelInfo->ri_RelationDesc);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 95b6b7192e..4cb2f718e8 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3154,8 +3154,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -3163,16 +3163,44 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching enabled, so 1 INSERT for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- This update moves rows from the local partition to the foreign one that
+-- has insert batching disabled, so separate INSERTs for each row.
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3.
+SELECT * FROM cmdlog ORDER BY 1;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test, cmdlog, batch_table_p0, batch_table_p1 CASCADE;
 
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..d54e9719dd 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -35,75 +35,6 @@
 #include "utils/ruleutils.h"
 
 
-/*-----------------------
- * PartitionTupleRouting - Encapsulates all information required to
- * route a tuple inserted into a partitioned table to one of its leaf
- * partitions.
- *
- * partition_root
- *		The partitioned table that's the target of the command.
- *
- * partition_dispatch_info
- *		Array of 'max_dispatch' elements containing a pointer to a
- *		PartitionDispatch object for every partitioned table touched by tuple
- *		routing.  The entry for the target partitioned table is *always*
- *		present in the 0th element of this array.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * nonleaf_partitions
- *		Array of 'max_dispatch' elements containing pointers to fake
- *		ResultRelInfo objects for nonleaf partitions, useful for checking
- *		the partition constraint.
- *
- * num_dispatch
- *		The current number of items stored in the 'partition_dispatch_info'
- *		array.  Also serves as the index of the next free array element for
- *		new PartitionDispatch objects that need to be stored.
- *
- * max_dispatch
- *		The current allocated size of the 'partition_dispatch_info' array.
- *
- * partitions
- *		Array of 'max_partitions' elements containing a pointer to a
- *		ResultRelInfo for every leaf partition touched by tuple routing.
- *		Some of these are pointers to ResultRelInfos which are borrowed out of
- *		the owning ModifyTableState node.  The remainder have been built
- *		especially for tuple routing.  See comment for
- *		PartitionDispatchData->indexes for details on how this array is
- *		indexed.
- *
- * is_borrowed_rel
- *		Array of 'max_partitions' booleans recording whether a given entry
- *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
- *		ModifyTableState node, rather than being built here.
- *
- * num_partitions
- *		The current number of items stored in the 'partitions' array.  Also
- *		serves as the index of the next free array element for new
- *		ResultRelInfo objects that need to be stored.
- *
- * max_partitions
- *		The current allocated size of the 'partitions' array.
- *
- * memcxt
- *		Memory context used to allocate subsidiary structs.
- *-----------------------
- */
-struct PartitionTupleRouting
-{
-	Relation	partition_root;
-	PartitionDispatch *partition_dispatch_info;
-	ResultRelInfo **nonleaf_partitions;
-	int			num_dispatch;
-	int			max_dispatch;
-	ResultRelInfo **partitions;
-	bool	   *is_borrowed_rel;
-	int			num_partitions;
-	int			max_partitions;
-	MemoryContext memcxt;
-};
-
 /*-----------------------
  * PartitionDispatch - information about one partitioned table in a partition
  * hierarchy required to route a tuple to any of its partitions.  A
@@ -923,8 +854,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 701fe05296..f335db1dcf 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2662,9 +2662,6 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
-	List	   *relinfos = NIL;
-	ListCell   *lc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -2925,22 +2922,49 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/*
-	 * Insert remaining tuples for batch insert.
+	 * Insert any remaining batched tuples.
+	 *
+	 * If the query's main target relation is a partitioned table, any inserts
+	 * would have been performed using tuple routing, so use the appropriate
+	 * set of target relations.  Note that this also covers any inserts
+	 * performed during cross-partition UPDATEs that would have occurred
+	 * through tuple routing.
 	 */
-	if (proute)
-		relinfos = estate->es_tuple_routing_result_relations;
-	else
-		relinfos = estate->es_opened_result_relations;
+	if (node->mt_partition_tuple_routing)
+	{
+		PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
+		int			i;
 
-	foreach(lc, relinfos)
+		Assert(node->rootResultRelInfo);
+		Assert(node->rootResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+			   RELKIND_PARTITIONED_TABLE);
+		for (i = 0; i < proute->num_partitions; i++)
+		{
+			resultRelInfo = proute->partitions[i];
+			if (resultRelInfo && resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
+	}
+	else
 	{
-		resultRelInfo = lfirst(lc);
-		if (resultRelInfo->ri_NumSlots > 0)
-			ExecBatchInsert(node, resultRelInfo,
-							resultRelInfo->ri_Slots,
-							resultRelInfo->ri_PlanSlots,
-							resultRelInfo->ri_NumSlots,
-							estate, node->canSetTag);
+		ListCell *lc;
+
+		/* Otherwise, use result relations from the range table. */
+		foreach(lc, estate->es_opened_result_relations)
+		{
+			resultRelInfo = lfirst(lc);
+			Assert(resultRelInfo != NULL);
+			if (resultRelInfo->ri_NumSlots > 0)
+				ExecBatchInsert(node, resultRelInfo,
+								resultRelInfo->ri_Slots,
+								resultRelInfo->ri_PlanSlots,
+								resultRelInfo->ri_NumSlots,
+								estate, node->canSetTag);
+		}
 	}
 
 	/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..0e032ca426 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -18,9 +18,77 @@
 #include "nodes/plannodes.h"
 #include "partitioning/partprune.h"
 
-/* See execPartition.c for the definitions. */
+/* See execPartition.c for the definition. */
 typedef struct PartitionDispatchData *PartitionDispatch;
-typedef struct PartitionTupleRouting PartitionTupleRouting;
+
+/*
+ * PartitionTupleRouting - Encapsulates all information required to
+ * route a tuple inserted into a partitioned table to one of its leaf
+ * partitions.
+ *
+ * partition_root
+ *		The partitioned table that's the target of the command.
+ *
+ * partition_dispatch_info
+ *		Array of 'max_dispatch' elements containing a pointer to a
+ *		PartitionDispatch object for every partitioned table touched by tuple
+ *		routing.  The entry for the target partitioned table is *always*
+ *		present in the 0th element of this array.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * nonleaf_partitions
+ *		Array of 'max_dispatch' elements containing pointers to fake
+ *		ResultRelInfo objects for nonleaf partitions, useful for checking
+ *		the partition constraint.
+ *
+ * num_dispatch
+ *		The current number of items stored in the 'partition_dispatch_info'
+ *		array.  Also serves as the index of the next free array element for
+ *		new PartitionDispatch objects that need to be stored.
+ *
+ * max_dispatch
+ *		The current allocated size of the 'partition_dispatch_info' array.
+ *
+ * partitions
+ *		Array of 'max_partitions' elements containing a pointer to a
+ *		ResultRelInfo for every leaf partition touched by tuple routing.
+ *		Some of these are pointers to ResultRelInfos which are borrowed out of
+ *		the owning ModifyTableState node.  The remainder have been built
+ *		especially for tuple routing.  See comment for
+ *		PartitionDispatchData->indexes for details on how this array is
+ *		indexed.
+ *
+ * is_borrowed_rel
+ *		Array of 'max_partitions' booleans recording whether a given entry
+ *		in 'partitions' is a ResultRelInfo pointer borrowed from the owning
+ *		ModifyTableState node, rather than being built here.
+ *
+ * num_partitions
+ *		The current number of items stored in the 'partitions' array.  Also
+ *		serves as the index of the next free array element for new
+ *		ResultRelInfo objects that need to be stored.
+ *
+ * max_partitions
+ *		The current allocated size of the 'partitions' array.
+ *
+ * memcxt
+ *		Memory context used to allocate subsidiary structs.
+ *-----------------------
+ */
+typedef struct PartitionTupleRouting
+{
+	Relation	partition_root;
+	PartitionDispatch *partition_dispatch_info;
+	ResultRelInfo **nonleaf_partitions;
+	int			num_dispatch;
+	int			max_dispatch;
+	ResultRelInfo **partitions;
+	bool	   *is_borrowed_rel;
+	int			num_partitions;
+	int			max_partitions;
+	MemoryContext memcxt;
+} PartitionTupleRouting;
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
-- 
2.24.1

#31

Etsuro Fujita

etsuro.fujita@gmail.com

about 3 years ago

In reply to: Amit Langote (#30)

Re: Allow batched insert during cross-partition updates

Amit-san,

On Tue, Mar 22, 2022 at 10:17 AM Amit Langote <amitlangote09@gmail.com> wrote:

Rebased to fix a minor conflict with some recently committed
nodeModifyTable.c changes.

Apologies for not having reviewed the patch. Here are some review comments:

* The patch conflicts with commit ffbb7e65a, so please update the
patch. (That commit would cause an API break, so I am planning to
apply a fix to HEAD as well [1]/messages/by-id/CAPmGK17rmXEY3BL=Aq71L8UZv5f-mz=uxJkz1kMnfSSY+pFe-A@mail.gmail.com.) That commit fixed the handling of
pending inserts, which I think would eliminate the need to do this:

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted. To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition, which was previously local to
execPartition.c, is exposed via execPartition.h.

* In postgresGetForeignModifyBatchSize():

    /*
-    * Should never get called when the insert is being performed as part of a
-    * row movement operation.
+    * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+    * details on what it represents.
     */
-   Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+   if (fmstate != NULL && fmstate->aux_fmstate != NULL)
+       fmstate = fmstate->aux_fmstate;

I might be missing something, but I think we should leave the Assert
as-is, because we still disallow to move rows to another foreign-table
partition that is also an UPDATE subplan result relation, which means
that any fmstate should have fmstate->aux_fmstate=NULL.

* Also in that function:

-   if (fmstate)
+   if (fmstate != NULL)

This is correct, but I would vote for leaving that as-is, to make
back-patching easy.

That is all I have for now. I will mark this as Waiting on Author.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/CAPmGK17rmXEY3BL=Aq71L8UZv5f-mz=uxJkz1kMnfSSY+pFe-A@mail.gmail.com

#32

Amit Langote

amitlangote09@gmail.com

about 3 years ago

In reply to: Etsuro Fujita (#31)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

Hi Fujita-san,

On Tue, Dec 6, 2022 at 6:47 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Tue, Mar 22, 2022 at 10:17 AM Amit Langote <amitlangote09@gmail.com> wrote:

Rebased to fix a minor conflict with some recently committed
nodeModifyTable.c changes.

Apologies for not having reviewed the patch. Here are some review comments:

No problem and thanks for taking a look.

* The patch conflicts with commit ffbb7e65a, so please update the
patch. (That commit would cause an API break, so I am planning to
apply a fix to HEAD as well [1].) That commit fixed the handling of
pending inserts, which I think would eliminate the need to do this:

* ExecModifyTable(), when inserting any remaining batched tuples,
must look at the correct set of ResultRelInfos that would've been
used by such inserts, because failing to do so would result in those
tuples not actually getting inserted. To fix, ExecModifyTable() is
now made to get the ResultRelInfos from the PartitionTupleRouting
data structure which contains the ResultRelInfo that would be used by
those internal inserts. To allow nodeModifyTable.c look inside
PartitionTupleRouting, its definition, which was previously local to
execPartition.c, is exposed via execPartition.h.

Ah, I see. Removed those hunks.

* In postgresGetForeignModifyBatchSize():
/*
-    * Should never get called when the insert is being performed as part of a
-    * row movement operation.
+    * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+    * details on what it represents.
*/
-   Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+   if (fmstate != NULL && fmstate->aux_fmstate != NULL)
+       fmstate = fmstate->aux_fmstate;
I might be missing something, but I think we should leave the Assert
as-is, because we still disallow to move rows to another foreign-table
partition that is also an UPDATE subplan result relation, which means
that any fmstate should have fmstate->aux_fmstate=NULL.

Hmm, yes. I forgot that 86dc90056df effectively disabled *all*
attempts of inserting into foreign partitions that are also UPDATE
target relations, so you are correct that fmstate->aux_fmstate would
never be set when entering this function.

That means this functionality is only useful for foreign partitions
that are not also being updated by the original UPDATE.

I've reinstated the Assert, removed the if block as it's useless, and
updated the comment a bit to clarify the restriction a bit.

* Also in that function:
-   if (fmstate)
+   if (fmstate != NULL)
This is correct, but I would vote for leaving that as-is, to make
back-patching easy.

Removed this hunk.

That is all I have for now. I will mark this as Waiting on Author.

Updated patch attached.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v11-0001-Allow-batching-of-inserts-during-cross-partition.patchapplication/octet-stream; name=v11-0001-Allow-batching-of-inserts-during-cross-partition.patchDownload

From f7eb527499686a2b49e470cfc1fe9b2a2dcac9df Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 16 Mar 2021 14:03:05 +0900
Subject: [PATCH v11] Allow batching of inserts during cross-partition updates

Currently, the insert that runs internally as part of
cross-partition update of partitioned tables can't use batching
because ExecInitRoutingInfo(), which initializes the insert target
partition, allows batching only if the query's original operation is
CMD_INSERT.  This commit loosens that restriction to allow the
internal inserts of cross-partition updates use batching too,
essentially reverting a check that 927f453a94106 put in place.

While at it, also update the comment in
postgresGetForeignModifyBatchSize() a bit to clarify which inserts
are disallowed during cross-partition updates.

This also adjusts the test case added by 927f453a94106 to confirm
that the insert performed when moving the rows into the foreign
partition indeed uses batched mode.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 56 +++++++++++++++----
 contrib/postgres_fdw/postgres_fdw.c           | 10 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     | 46 ++++++++++++---
 src/backend/executor/execPartition.c          |  3 +-
 4 files changed, 89 insertions(+), 26 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2ab3f1efaa..e75d4f629a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10345,8 +10345,8 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for some inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -10354,21 +10354,55 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+-- This update moves rows from the local partition 'batch_cp_upd_test2' to the
+-- foreign partition 'batch_cp_upd_test1', one that has insert batching enabled,
+-- so a single INSERTs for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+-- This one moves rows from the local partition 'batch_cp_upd_test2' to the
+-- foreign partition 'batch_cp_upd_test2', one that has insert batching
+-- disabled, so separate INSERTs for the two rows.
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(4 rows)
+
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3 as
+-- described above.
+SELECT * FROM cmdlog ORDER BY 1;
+             cmd              
+------------------------------
+ INSERT on batch_cp_upd_test1
+ INSERT on batch_cp_upd_test3
+ INSERT on batch_cp_upd_test3
+(3 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_table_p0, batch_table_p1, batch_cp_upd_test, cmdlog CASCADE;
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
 CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 1ceac2e0cf..1f6d236426 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2017,16 +2017,16 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int			batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-	(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-	NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of a
-	 * row movement operation.
+	 * Should never get called when the insert is being performed on a table
+	 * that is also among the target relations of an UPDATE operation,
+	 * because postgresBeginForeignInsert() currently rejects such insert
+	 * attempts.
 	 */
 	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 51560429e0..3495666d07 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3327,8 +3327,8 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Check that batched mode also works for some inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -3336,16 +3336,46 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt () RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+
+-- This update moves rows from the local partition 'batch_cp_upd_test2' to the
+-- foreign partition 'batch_cp_upd_test1', one that has insert batching enabled,
+-- so a single INSERTs for both rows.
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+-- This one moves rows from the local partition 'batch_cp_upd_test2' to the
+-- foreign partition 'batch_cp_upd_test2', one that has insert batching
+-- disabled, so separate INSERTs for the two rows.
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3 as
+-- described above.
+SELECT * FROM cmdlog ORDER BY 1;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_table_p0, batch_table_p1, batch_cp_upd_test, cmdlog CASCADE;
 
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88d0ea3adb..7a13c4dc9e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1018,8 +1018,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =
-- 
2.35.3

#33

Etsuro Fujita

etsuro.fujita@gmail.com

about 3 years ago

In reply to: Amit Langote (#32)

Re: Allow batched insert during cross-partition updates

Amit-san,

On Thu, Dec 8, 2022 at 5:00 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Dec 6, 2022 at 6:47 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
* In postgresGetForeignModifyBatchSize():
/*
-    * Should never get called when the insert is being performed as part of a
-    * row movement operation.
+    * Use the auxiliary state if any; see postgresBeginForeignInsert() for
+    * details on what it represents.
*/
-   Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+   if (fmstate != NULL && fmstate->aux_fmstate != NULL)
+       fmstate = fmstate->aux_fmstate;
I might be missing something, but I think we should leave the Assert
as-is, because we still disallow to move rows to another foreign-table
partition that is also an UPDATE subplan result relation, which means
that any fmstate should have fmstate->aux_fmstate=NULL.
Hmm, yes. I forgot that 86dc90056df effectively disabled *all*
attempts of inserting into foreign partitions that are also UPDATE
target relations, so you are correct that fmstate->aux_fmstate would
never be set when entering this function.

That means this functionality is only useful for foreign partitions
that are not also being updated by the original UPDATE.

Yeah, I think so too.

I've reinstated the Assert, removed the if block as it's useless, and
updated the comment a bit to clarify the restriction a bit.

Looks good to me.

* Also in that function:
-   if (fmstate)
+   if (fmstate != NULL)
This is correct, but I would vote for leaving that as-is, to make
back-patching easy.
Removed this hunk.

Thanks!

Updated patch attached.

Thanks for the patch! I will review the patch a bit more, but I think
it would be committable.

Best regards,
Etsuro Fujita

#34

Etsuro Fujita

etsuro.fujita@gmail.com

about 3 years ago

In reply to: Etsuro Fujita (#33)

1 attachment(s)

Re: Allow batched insert during cross-partition updates

On Thu, Dec 8, 2022 at 8:01 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Thu, Dec 8, 2022 at 5:00 PM Amit Langote <amitlangote09@gmail.com> wrote:

Updated patch attached.

I will review the patch a bit more, but I think
it would be committable.

One thing I noticed is this bit:

 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0,
batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_table_p0, batch_table_p1,
batch_cp_upd_test, cmdlog CASCADE;

This would be nitpicking, but this as-proposed will not remove remote
tables created for foreign-table partitions of the partitioned table
‘batch_cp_upd_test’. So I modified this a bit further to remove them
as well. Also, I split this into two, for readability. Another thing
is a typo in a test-case comment: s/a single INSERTs/a single INSERT/.
I fixed it as well. Other than that, the patch looks good to me.
Attached is an updated patch. If there are no objections, I will
commit the patch.

Best regards,
Etsuro Fujita

Attachments:

v11-0001-Allow-batching-of-inserts-during-cross-partition-efujita.patchapplication/octet-stream; name=v11-0001-Allow-batching-of-inserts-during-cross-partition-efujita.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2ab3f1efaa..e536c0e5b5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10345,8 +10345,12 @@ SELECT COUNT(*) FROM batch_table;
     66
 (1 row)
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Clean up
+DROP TABLE batch_table;
+DROP TABLE batch_table_p0;
+DROP TABLE batch_table_p1;
+-- Check that batched mode also works for some inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -10354,21 +10358,59 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt() RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+-- This update moves rows from the local partition 'batch_cp_upd_test2' to the
+-- foreign partition 'batch_cp_upd_test1', one that has insert batching
+-- enabled, so a single INSERT for both rows.
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+-- This one moves rows from the local partition 'batch_cp_upd_test2' to the
+-- foreign partition 'batch_cp_upd_test2', one that has insert batching
+-- disabled, so separate INSERTs for the two rows.
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
        tableoid       | a 
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_up_test1    | 2
-(2 rows)
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test3_f | 3
+ batch_cp_upd_test3_f | 3
+(4 rows)
+
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3 as
+-- described above.
+SELECT * FROM cmdlog ORDER BY 1;
+             cmd              
+------------------------------
+ INSERT on batch_cp_upd_test1
+ INSERT on batch_cp_upd_test3
+ INSERT on batch_cp_upd_test3
+(3 rows)
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_cp_upd_test;
+DROP TABLE batch_cp_upd_test1;
+DROP TABLE batch_cp_upd_test3;
+DROP TABLE cmdlog;
+DROP FUNCTION log_stmt();
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
 CREATE TABLE batch_table ( x int, field1 text, field2 text) PARTITION BY HASH (x);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 1ceac2e0cf..1f6d236426 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2017,16 +2017,16 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int			batch_size;
-	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
-	(PgFdwModifyState *) resultRelInfo->ri_FdwState :
-	NULL;
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
 	/*
-	 * Should never get called when the insert is being performed as part of a
-	 * row movement operation.
+	 * Should never get called when the insert is being performed on a table
+	 * that is also among the target relations of an UPDATE operation,
+	 * because postgresBeginForeignInsert() currently rejects such insert
+	 * attempts.
 	 */
 	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 51560429e0..f18d2291f0 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3327,8 +3327,13 @@ CREATE TABLE batch_table_p2
 INSERT INTO batch_table SELECT * FROM generate_series(1, 66) i;
 SELECT COUNT(*) FROM batch_table;
 
--- Check that enabling batched inserts doesn't interfere with cross-partition
--- updates
+-- Clean up
+DROP TABLE batch_table;
+DROP TABLE batch_table_p0;
+DROP TABLE batch_table_p1;
+
+-- Check that batched mode also works for some inserts made during
+-- cross-partition updates
 CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
 CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
 CREATE FOREIGN TABLE batch_cp_upd_test1_f
@@ -3336,16 +3341,50 @@ CREATE FOREIGN TABLE batch_cp_upd_test1_f
 	FOR VALUES IN (1)
 	SERVER loopback
 	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
-CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+CREATE TABLE batch_cp_upd_test2 PARTITION OF batch_cp_upd_test
 	FOR VALUES IN (2);
-INSERT INTO batch_cp_upd_test VALUES (1), (2);
+CREATE TABLE batch_cp_upd_test3 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test3_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (3)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test3', batch_size '1');
+
+-- Create statement triggers on remote tables that "log" any INSERTs
+-- performed on them.
+CREATE TABLE cmdlog (cmd text);
+CREATE FUNCTION log_stmt() RETURNS TRIGGER LANGUAGE plpgsql AS $$
+	BEGIN INSERT INTO public.cmdlog VALUES (TG_OP || ' on ' || TG_RELNAME); RETURN NULL; END;
+$$;
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test1
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+CREATE TRIGGER stmt_trig AFTER INSERT ON batch_cp_upd_test3
+	FOR EACH STATEMENT EXECUTE FUNCTION log_stmt();
+
+-- This update moves rows from the local partition 'batch_cp_upd_test2' to the
+-- foreign partition 'batch_cp_upd_test1', one that has insert batching
+-- enabled, so a single INSERT for both rows.
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+-- This one moves rows from the local partition 'batch_cp_upd_test2' to the
+-- foreign partition 'batch_cp_upd_test2', one that has insert batching
+-- disabled, so separate INSERTs for the two rows.
+INSERT INTO batch_cp_upd_test VALUES (2), (2);
+UPDATE batch_cp_upd_test t SET a = 3 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND s.a = 2;
+
+SELECT tableoid::regclass, * FROM batch_cp_upd_test ORDER BY 1;
 
--- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
-SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+-- Should see 1 INSERT on batch_cp_upd_test1 and 2 on batch_cp_upd_test3 as
+-- described above.
+SELECT * FROM cmdlog ORDER BY 1;
 
 -- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0, batch_table_p1 CASCADE;
+DROP TABLE batch_cp_upd_test;
+DROP TABLE batch_cp_upd_test1;
+DROP TABLE batch_cp_upd_test3;
+DROP TABLE cmdlog;
+DROP FUNCTION log_stmt();
 
 -- Use partitioning
 ALTER SERVER loopback OPTIONS (ADD batch_size '10');
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 76d79b9741..fce9d6316e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1018,8 +1018,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 *
 	 * If the FDW does not support batching, we set the batch size to 1.
 	 */
-	if (mtstate->operation == CMD_INSERT &&
-		partRelInfo->ri_FdwRoutine != NULL &&
+	if (partRelInfo->ri_FdwRoutine != NULL &&
 		partRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
 		partRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
 		partRelInfo->ri_BatchSize =

#35

Amit Langote

amitlangote09@gmail.com

about 3 years ago

In reply to: Etsuro Fujita (#34)

Re: Allow batched insert during cross-partition updates

On Wed, Dec 14, 2022 at 6:45 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Thu, Dec 8, 2022 at 8:01 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Thu, Dec 8, 2022 at 5:00 PM Amit Langote <amitlangote09@gmail.com> wrote:

Updated patch attached.

I will review the patch a bit more, but I think
it would be committable.

One thing I noticed is this bit:
-- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0,
batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_table_p0, batch_table_p1,
batch_cp_upd_test, cmdlog CASCADE;
This would be nitpicking, but this as-proposed will not remove remote
tables created for foreign-table partitions of the partitioned table
‘batch_cp_upd_test’. So I modified this a bit further to remove them
as well. Also, I split this into two, for readability. Another thing
is a typo in a test-case comment: s/a single INSERTs/a single INSERT/.
I fixed it as well. Other than that, the patch looks good to me.
Attached is an updated patch. If there are no objections, I will
commit the patch.

Thanks for the changes. LGTM.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#36

Etsuro Fujita

etsuro.fujita@gmail.com

about 3 years ago

In reply to: Amit Langote (#35)

Re: Allow batched insert during cross-partition updates

Hi Amit-san,

On Wed, Dec 14, 2022 at 10:29 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Dec 14, 2022 at 6:45 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
One thing I noticed is this bit:
-- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0,
batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_table_p0, batch_table_p1,
batch_cp_upd_test, cmdlog CASCADE;
This would be nitpicking, but this as-proposed will not remove remote
tables created for foreign-table partitions of the partitioned table
‘batch_cp_upd_test’. So I modified this a bit further to remove them
as well. Also, I split this into two, for readability. Another thing
is a typo in a test-case comment: s/a single INSERTs/a single INSERT/.
I fixed it as well. Other than that, the patch looks good to me.
Attached is an updated patch. If there are no objections, I will
commit the patch.
LGTM.

Cool! Pushed.

Best regards,
Etsuro Fujita

#37

Amit Langote

amitlangote09@gmail.com

about 3 years ago

In reply to: Etsuro Fujita (#36)

Re: Allow batched insert during cross-partition updates

On Tue, Dec 20, 2022 at 7:18 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Dec 14, 2022 at 10:29 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Dec 14, 2022 at 6:45 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
One thing I noticed is this bit:
-- Clean up
-DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0,
batch_table_p1 CASCADE;
+DROP TABLE batch_table, batch_table_p0, batch_table_p1,
batch_cp_upd_test, cmdlog CASCADE;
This would be nitpicking, but this as-proposed will not remove remote
tables created for foreign-table partitions of the partitioned table
‘batch_cp_upd_test’. So I modified this a bit further to remove them
as well. Also, I split this into two, for readability. Another thing
is a typo in a test-case comment: s/a single INSERTs/a single INSERT/.
I fixed it as well. Other than that, the patch looks good to me.
Attached is an updated patch. If there are no objections, I will
commit the patch.
LGTM.
Cool! Pushed.

Thank you, Fujita-san.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com