non-bulk inserts and tuple routing

Started by Amit Langoteabout 8 years ago43 messages
#1Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
4 attachment(s)

Hi.

I have a patch that rearranges the code around partition tuple-routing,
such that allocation of per-partition objects (ResultRelInfo,
TupleConversionMap, etc.) is delayed until a given partition is actually
inserted into (i.e., a tuple is routed to it). I can see good win for
non-bulk inserts with the patch and the patch is implemented such that it
doesn't affect the bulk-insert case much.

Performance numbers:

* Uses following hash-partitioned table:

create table t1 (a int, b int) partition by hash (a);
create table t1_x partition of t1 for values with (modulus M, remainder R)
...

* Non-bulk insert uses the following code (insert 100,000 rows one-by-one):

do $$
begin
for i in 1..100000 loop
insert into t1 values (i, i+1);
end loop;
end; $$;

* Times in milliseconds:

#parts HEAD Patched

8 6216.300 4977.670
16 9061.388 6360.093
32 14081.656 8752.405
64 24887.110 13919.384
128 45926.251 24582.411
256 88088.084 45490.894

As you can see the performance can be as much as 2x faster with the patch,
although time taken still increases as the number of partitions increases,
because we still lock *all* partitions at the beginning.

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

* Times in milliseconds:

#parts HEAD Patched

8 458.301 450.875
16 409.271 510.723
32 500.960 612.003
64 430.687 795.046
128 449.314 565.786
256 493.171 490.187

Not much harm here, although numbers are a bit noisy.

Patch is divided into 4, first 3 of which are refactoring patches.

I know this patch will conflict severely with [1]https://commitfest.postgresql.org/16/1023/ and [2]https://commitfest.postgresql.org/16/1184/, so it's fine if
we consider applying these later. Will add this to next CF.

Thanks,
Amit

[1]: https://commitfest.postgresql.org/16/1023/

[2]: https://commitfest.postgresql.org/16/1184/

Attachments:

0001-Teach-CopyFrom-to-use-ModifyTableState-for-tuple-rou.patchtext/plain; charset=UTF-8; name=0001-Teach-CopyFrom-to-use-ModifyTableState-for-tuple-rou.patchDownload
From a87be8a84d467d65cc0b6cf02655fc3b2b9a458f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 10:43:45 +0900
Subject: [PATCH 1/4] Teach CopyFrom to use ModifyTableState for tuple-routing

This removes all fields of CopyStateData that were meant for
tuple routing and instead uses ModifyTableState that has all those
fields, including transition_tupconv_maps.  In COPY's case,
transition_tupconv_maps is only required if tuple routing is being
used, so it's safe.
---
 src/backend/commands/copy.c | 79 ++++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 37 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 254be28ae4..c82103e1c5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -166,14 +166,7 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 
-	PartitionDispatch *partition_dispatch_info;
-	int			num_dispatch;	/* Number of entries in the above array */
-	int			num_partitions; /* Number of members in the following arrays */
-	ResultRelInfo **partitions; /* Per partition result relation pointers */
-	TupleConversionMap **partition_tupconv_maps;
-	TupleTableSlot *partition_tuple_slot;
 	TransitionCaptureState *transition_capture;
-	TupleConversionMap **transition_tupconv_maps;
 
 	/*
 	 * These variables are used to reduce overhead in textual COPY FROM.
@@ -2289,6 +2282,7 @@ CopyFrom(CopyState cstate)
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *saved_resultRelInfo = NULL;
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
+	ModifyTableState *mtstate = makeNode(ModifyTableState);
 	ExprContext *econtext;
 	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
@@ -2478,22 +2472,28 @@ CopyFrom(CopyState cstate)
 		TupleTableSlot *partition_tuple_slot;
 		int			num_parted,
 					num_partitions;
-
-		ExecSetupPartitionTupleRouting(NULL,
+		ModifyTable *node = makeNode(ModifyTable);
+
+		/* Just need make this field appear valid. */
+		node->nominalRelation = 1;
+		mtstate->ps.plan = (Plan *) node;
+		mtstate->ps.state = estate;
+		mtstate->resultRelInfo = resultRelInfo;
+		ExecSetupPartitionTupleRouting(mtstate,
 									   cstate->rel,
-									   1,
+									   node->nominalRelation,
 									   estate,
 									   &partition_dispatch_info,
 									   &partitions,
 									   &partition_tupconv_maps,
 									   &partition_tuple_slot,
 									   &num_parted, &num_partitions);
-		cstate->partition_dispatch_info = partition_dispatch_info;
-		cstate->num_dispatch = num_parted;
-		cstate->partitions = partitions;
-		cstate->num_partitions = num_partitions;
-		cstate->partition_tupconv_maps = partition_tupconv_maps;
-		cstate->partition_tuple_slot = partition_tuple_slot;
+		mtstate->mt_partition_dispatch_info = partition_dispatch_info;
+		mtstate->mt_num_dispatch = num_parted;
+		mtstate->mt_partitions = partitions;
+		mtstate->mt_num_partitions = num_partitions;
+		mtstate->mt_partition_tupconv_maps = partition_tupconv_maps;
+		mtstate->mt_partition_tuple_slot = partition_tuple_slot;
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2505,12 +2505,13 @@ CopyFrom(CopyState cstate)
 		{
 			int			i;
 
-			cstate->transition_tupconv_maps = (TupleConversionMap **)
-				palloc0(sizeof(TupleConversionMap *) * cstate->num_partitions);
-			for (i = 0; i < cstate->num_partitions; ++i)
+			mtstate->mt_transition_tupconv_maps = (TupleConversionMap **)
+										palloc0(sizeof(TupleConversionMap *) *
+												mtstate->mt_num_partitions);
+			for (i = 0; i < mtstate->mt_num_partitions; ++i)
 			{
-				cstate->transition_tupconv_maps[i] =
-					convert_tuples_by_name(RelationGetDescr(cstate->partitions[i]->ri_RelationDesc),
+				mtstate->mt_transition_tupconv_maps[i] =
+					convert_tuples_by_name(RelationGetDescr(mtstate->mt_partitions[i]->ri_RelationDesc),
 										   RelationGetDescr(cstate->rel),
 										   gettext_noop("could not convert row type"));
 			}
@@ -2530,7 +2531,7 @@ CopyFrom(CopyState cstate)
 	if ((resultRelInfo->ri_TrigDesc != NULL &&
 		 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
 		  resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) ||
-		cstate->partition_dispatch_info != NULL ||
+		mtstate->mt_partition_dispatch_info != NULL ||
 		cstate->volatile_defexprs)
 	{
 		useHeapMultiInsert = false;
@@ -2605,7 +2606,7 @@ CopyFrom(CopyState cstate)
 		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
 
 		/* Determine the partition to heap_insert the tuple into */
-		if (cstate->partition_dispatch_info)
+		if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
 			int			leaf_part_index;
 			TupleConversionMap *map;
@@ -2619,11 +2620,11 @@ CopyFrom(CopyState cstate)
 			 * partition, respectively.
 			 */
 			leaf_part_index = ExecFindPartition(resultRelInfo,
-												cstate->partition_dispatch_info,
+										mtstate->mt_partition_dispatch_info,
 												slot,
 												estate);
 			Assert(leaf_part_index >= 0 &&
-				   leaf_part_index < cstate->num_partitions);
+				   leaf_part_index < mtstate->mt_num_partitions);
 
 			/*
 			 * If this tuple is mapped to a partition that is not same as the
@@ -2641,7 +2642,8 @@ CopyFrom(CopyState cstate)
 			 * to the selected partition.
 			 */
 			saved_resultRelInfo = resultRelInfo;
-			resultRelInfo = cstate->partitions[leaf_part_index];
+			resultRelInfo = mtstate->mt_partitions[leaf_part_index];
+			Assert(resultRelInfo != NULL);
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
@@ -2671,7 +2673,7 @@ CopyFrom(CopyState cstate)
 					 */
 					cstate->transition_capture->tcs_original_insert_tuple = NULL;
 					cstate->transition_capture->tcs_map =
-						cstate->transition_tupconv_maps[leaf_part_index];
+						mtstate->mt_transition_tupconv_maps[leaf_part_index];
 				}
 				else
 				{
@@ -2688,7 +2690,7 @@ CopyFrom(CopyState cstate)
 			 * We might need to convert from the parent rowtype to the
 			 * partition rowtype.
 			 */
-			map = cstate->partition_tupconv_maps[leaf_part_index];
+			map = mtstate->mt_partition_tupconv_maps[leaf_part_index];
 			if (map)
 			{
 				Relation	partrel = resultRelInfo->ri_RelationDesc;
@@ -2700,7 +2702,7 @@ CopyFrom(CopyState cstate)
 				 * point on.  Use a dedicated slot from this point on until
 				 * we're finished dealing with the partition.
 				 */
-				slot = cstate->partition_tuple_slot;
+				slot = mtstate->mt_partition_tuple_slot;
 				Assert(slot != NULL);
 				ExecSetSlotDescriptor(slot, RelationGetDescr(partrel));
 				ExecStoreTuple(tuple, slot, InvalidBuffer, true);
@@ -2852,7 +2854,7 @@ CopyFrom(CopyState cstate)
 	ExecCloseIndices(resultRelInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
-	if (cstate->partition_dispatch_info)
+	if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		int			i;
 
@@ -2862,23 +2864,26 @@ CopyFrom(CopyState cstate)
 		 * the main target table of COPY that will be closed eventually by
 		 * DoCopy().  Also, tupslot is NULL for the root partitioned table.
 		 */
-		for (i = 1; i < cstate->num_dispatch; i++)
+		for (i = 1; i < mtstate->mt_num_dispatch; i++)
 		{
-			PartitionDispatch pd = cstate->partition_dispatch_info[i];
+			PartitionDispatch pd = mtstate->mt_partition_dispatch_info[i];
 
 			heap_close(pd->reldesc, NoLock);
 			ExecDropSingleTupleTableSlot(pd->tupslot);
 		}
-		for (i = 0; i < cstate->num_partitions; i++)
+		for (i = 0; i < mtstate->mt_num_partitions; i++)
 		{
-			ResultRelInfo *resultRelInfo = cstate->partitions[i];
+			ResultRelInfo *resultRelInfo = mtstate->mt_partitions[i];
 
-			ExecCloseIndices(resultRelInfo);
-			heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+			if (resultRelInfo)
+			{
+				ExecCloseIndices(resultRelInfo);
+				heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+			}
 		}
 
 		/* Release the standalone partition tuple descriptor */
-		ExecDropSingleTupleTableSlot(cstate->partition_tuple_slot);
+		ExecDropSingleTupleTableSlot(mtstate->mt_partition_tuple_slot);
 	}
 
 	/* Close any trigger target relations */
-- 
2.11.0

0002-ExecFindPartition-refactoring.patchtext/plain; charset=UTF-8; name=0002-ExecFindPartition-refactoring.patchDownload
From 3e251d46de5105581acf620773568bb9cdecdf0b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 13:56:25 +0900
Subject: [PATCH 2/4] ExecFindPartition refactoring

---
 src/backend/commands/copy.c            |  5 +----
 src/backend/executor/execPartition.c   | 14 ++++++--------
 src/backend/executor/nodeModifyTable.c |  5 +----
 src/include/executor/execPartition.h   |  5 +----
 4 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c82103e1c5..280d449dec 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2619,10 +2619,7 @@ CopyFrom(CopyState cstate)
 			 * will get us the ResultRelInfo and TupleConversionMap for the
 			 * partition, respectively.
 			 */
-			leaf_part_index = ExecFindPartition(resultRelInfo,
-										mtstate->mt_partition_dispatch_info,
-												slot,
-												estate);
+			leaf_part_index = ExecFindPartition(mtstate, slot);
 			Assert(leaf_part_index >= 0 &&
 				   leaf_part_index < mtstate->mt_num_partitions);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d545af2b67..a40c174230 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -155,11 +155,7 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 }
 
 /*
- * ExecFindPartition -- Find a leaf partition in the partition tree rooted
- * at parent, for the heap tuple contained in *slot
- *
- * estate must be non-NULL; we'll need it to compute any expressions in the
- * partition key(s)
+ * ExecFindPartition -- Find a leaf partition for tuple contained in slot
  *
  * If no leaf partition is found, this routine errors out with the appropriate
  * error message, else it returns the leaf partition sequence number
@@ -167,14 +163,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
  * the partition tree.
  */
 int
-ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
-				  TupleTableSlot *slot, EState *estate)
+ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot)
 {
+	EState	   *estate = mtstate->ps.state;
 	int			result;
 	Datum		values[PARTITION_MAX_KEYS];
 	bool		isnull[PARTITION_MAX_KEYS];
 	Relation	rel;
-	PartitionDispatch parent;
+	PartitionDispatch  *pd = mtstate->mt_partition_dispatch_info,
+						parent;
+	ResultRelInfo *resultRelInfo = mtstate->resultRelInfo;
 	ExprContext *ecxt = GetPerTupleExprContext(estate);
 	TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index afb83ed3ae..f836dd3703 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -292,10 +292,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 * the ResultRelInfo and TupleConversionMap for the partition,
 		 * respectively.
 		 */
-		leaf_part_index = ExecFindPartition(resultRelInfo,
-											mtstate->mt_partition_dispatch_info,
-											slot,
-											estate);
+		leaf_part_index = ExecFindPartition(mtstate, slot);
 		Assert(leaf_part_index >= 0 &&
 			   leaf_part_index < mtstate->mt_num_partitions);
 
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 86a199d169..19e3b9d233 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,9 +58,6 @@ extern void ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 							   TupleConversionMap ***tup_conv_maps,
 							   TupleTableSlot **partition_tuple_slot,
 							   int *num_parted, int *num_partitions);
-extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
-				  PartitionDispatch *pd,
-				  TupleTableSlot *slot,
-				  EState *estate);
+extern int ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot);
 
 #endif							/* EXECPARTITION_H */
-- 
2.11.0

0003-ExecSetupPartitionTupleRouting-refactoring.patchtext/plain; charset=UTF-8; name=0003-ExecSetupPartitionTupleRouting-refactoring.patchDownload
From 6ea3100c3df46ee131ea3d7590eaba378536c320 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH 3/4] ExecSetupPartitionTupleRouting refactoring

---
 src/backend/commands/copy.c            | 22 +----------
 src/backend/executor/execPartition.c   | 69 +++++++++++++++-------------------
 src/backend/executor/nodeModifyTable.c | 25 +-----------
 src/include/executor/execPartition.h   |  9 +----
 4 files changed, 33 insertions(+), 92 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 280d449dec..e7fe020fa7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2466,12 +2466,6 @@ CopyFrom(CopyState cstate)
 	 */
 	if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
-		PartitionDispatch *partition_dispatch_info;
-		ResultRelInfo **partitions;
-		TupleConversionMap **partition_tupconv_maps;
-		TupleTableSlot *partition_tuple_slot;
-		int			num_parted,
-					num_partitions;
 		ModifyTable *node = makeNode(ModifyTable);
 
 		/* Just need make this field appear valid. */
@@ -2479,21 +2473,7 @@ CopyFrom(CopyState cstate)
 		mtstate->ps.plan = (Plan *) node;
 		mtstate->ps.state = estate;
 		mtstate->resultRelInfo = resultRelInfo;
-		ExecSetupPartitionTupleRouting(mtstate,
-									   cstate->rel,
-									   node->nominalRelation,
-									   estate,
-									   &partition_dispatch_info,
-									   &partitions,
-									   &partition_tupconv_maps,
-									   &partition_tuple_slot,
-									   &num_parted, &num_partitions);
-		mtstate->mt_partition_dispatch_info = partition_dispatch_info;
-		mtstate->mt_num_dispatch = num_parted;
-		mtstate->mt_partitions = partitions;
-		mtstate->mt_num_partitions = num_partitions;
-		mtstate->mt_partition_tupconv_maps = partition_tupconv_maps;
-		mtstate->mt_partition_tuple_slot = partition_tuple_slot;
+		ExecSetupPartitionTupleRouting(mtstate, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index a40c174230..a495b165bd 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -41,42 +41,19 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  * ExecSetupPartitionTupleRouting - set up information needed during
  * tuple routing for partitioned tables
  *
- * Output arguments:
- * 'pd' receives an array of PartitionDispatch objects with one entry for
- *		every partitioned table in the partition tree
- * 'partitions' receives an array of ResultRelInfo* objects with one entry for
- *		every leaf partition in the partition tree
- * 'tup_conv_maps' receives an array of TupleConversionMap objects with one
- *		entry for every leaf partition (required to convert input tuple based
- *		on the root table's rowtype to a leaf partition's rowtype after tuple
- *		routing is done)
- * 'partition_tuple_slot' receives a standalone TupleTableSlot to be used
- *		to manipulate any given leaf partition's rowtype after that partition
- *		is chosen by tuple-routing.
- * 'num_parted' receives the number of partitioned tables in the partition
- *		tree (= the number of entries in the 'pd' output array)
- * 'num_partitions' receives the number of leaf partitions in the partition
- *		tree (= the number of entries in the 'partitions' and 'tup_conv_maps'
- *		output arrays
- *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
  */
 void
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel,
-							   Index resultRTindex,
-							   EState *estate,
-							   PartitionDispatch **pd,
-							   ResultRelInfo ***partitions,
-							   TupleConversionMap ***tup_conv_maps,
-							   TupleTableSlot **partition_tuple_slot,
-							   int *num_parted, int *num_partitions)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
+	EState	   *estate = mtstate->ps.state;
+	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+	Index		resultRTindex = node->nominalRelation;
 	ResultRelInfo *leaf_part_rri;
 
 	/*
@@ -84,23 +61,35 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	 * partitions.
 	 */
 	(void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL);
-	*pd = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts);
-	*num_partitions = list_length(leaf_parts);
-	*partitions = (ResultRelInfo **) palloc(*num_partitions *
-											sizeof(ResultRelInfo *));
-	*tup_conv_maps = (TupleConversionMap **) palloc0(*num_partitions *
-													 sizeof(TupleConversionMap *));
+	mtstate->mt_partition_dispatch_info =
+				RelationGetPartitionDispatchInfo(rel,
+												 &mtstate->mt_num_dispatch,
+												 &leaf_parts);
+	mtstate->mt_num_partitions = list_length(leaf_parts);
 
 	/*
+	 * Allocate an array of ResultRelInfo pointers, but actual
+	 * ResultRelInfo's will be allocated if and when needed.  See
+	 * ExecFindPartition where it's done.
+	 */
+	mtstate->mt_partitions = (ResultRelInfo **)
+										 palloc0(sizeof(ResultRelInfo *) *
+												 mtstate->mt_num_partitions);
+	/* Ditto. */
+	mtstate->mt_partition_tupconv_maps =
+							(TupleConversionMap **)
+										palloc0(sizeof(TupleConversionMap *) *
+												mtstate->mt_num_partitions);
+	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
 	 * given partition's rowtype.  It is attached to the caller-specified node
 	 * (such as ModifyTableState) and released when the node finishes
 	 * processing.
 	 */
-	*partition_tuple_slot = MakeTupleTableSlot();
+	mtstate->mt_partition_tuple_slot = MakeTupleTableSlot();
 
-	leaf_part_rri = (ResultRelInfo *) palloc0(*num_partitions *
-											  sizeof(ResultRelInfo));
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo) *
+											  mtstate->mt_num_partitions);
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
@@ -119,8 +108,10 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 * Save a tuple conversion map to convert a tuple routed to this
 		 * partition from the parent's type to the partition's.
 		 */
-		(*tup_conv_maps)[i] = convert_tuples_by_name(tupDesc, part_tupdesc,
-													 gettext_noop("could not convert row type"));
+		mtstate->mt_partition_tupconv_maps[i] =
+								convert_tuples_by_name(tupDesc,
+													   part_tupdesc,
+									gettext_noop("could not convert row type"));
 
 		InitResultRelInfo(leaf_part_rri,
 						  partrel,
@@ -149,7 +140,7 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		estate->es_leaf_result_relations =
 			lappend(estate->es_leaf_result_relations, leaf_part_rri);
 
-		(*partitions)[i] = leaf_part_rri++;
+		mtstate->mt_partitions[i] = leaf_part_rri++;
 		i++;
 	}
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f836dd3703..6a3b171587 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1942,30 +1942,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	/* Build state for INSERT tuple routing */
 	if (operation == CMD_INSERT &&
 		rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		PartitionDispatch *partition_dispatch_info;
-		ResultRelInfo **partitions;
-		TupleConversionMap **partition_tupconv_maps;
-		TupleTableSlot *partition_tuple_slot;
-		int			num_parted,
-					num_partitions;
-
-		ExecSetupPartitionTupleRouting(mtstate,
-									   rel,
-									   node->nominalRelation,
-									   estate,
-									   &partition_dispatch_info,
-									   &partitions,
-									   &partition_tupconv_maps,
-									   &partition_tuple_slot,
-									   &num_parted, &num_partitions);
-		mtstate->mt_partition_dispatch_info = partition_dispatch_info;
-		mtstate->mt_num_dispatch = num_parted;
-		mtstate->mt_partitions = partitions;
-		mtstate->mt_num_partitions = num_partitions;
-		mtstate->mt_partition_tupconv_maps = partition_tupconv_maps;
-		mtstate->mt_partition_tuple_slot = partition_tuple_slot;
-	}
+		ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 19e3b9d233..c3ddf879b9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -50,14 +50,7 @@ typedef struct PartitionDispatchData
 typedef struct PartitionDispatchData *PartitionDispatch;
 
 extern void ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel,
-							   Index resultRTindex,
-							   EState *estate,
-							   PartitionDispatch **pd,
-							   ResultRelInfo ***partitions,
-							   TupleConversionMap ***tup_conv_maps,
-							   TupleTableSlot **partition_tuple_slot,
-							   int *num_parted, int *num_partitions);
+							   Relation rel);
 extern int ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot);
 
 #endif							/* EXECPARTITION_H */
-- 
2.11.0

0004-During-tuple-routing-initialize-per-partition-object.patchtext/plain; charset=UTF-8; name=0004-During-tuple-routing-initialize-per-partition-object.patchDownload
From ed8469d38a0747fe1b3d1fb3bb8c45b4cb2a2b45 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 1 Nov 2017 10:31:21 +0900
Subject: [PATCH 4/4] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |  15 +--
 src/backend/executor/execPartition.c   | 225 ++++++++++++++++++++++++---------
 src/backend/executor/nodeModifyTable.c | 108 ++--------------
 src/include/nodes/execnodes.h          |   1 +
 4 files changed, 180 insertions(+), 169 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e7fe020fa7..3674aea9b3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2479,23 +2479,14 @@ CopyFrom(CopyState cstate)
 		 * If we are capturing transition tuples, they may need to be
 		 * converted from partition format back to partitioned table format
 		 * (this is only ever necessary if a BEFORE trigger modifies the
-		 * tuple).
+		 * tuple).  Note that we don't allocate the actual maps here; they'll
+		 * be allocated by ExecInitPartitionResultRelInfo() if and when
+		 * needed.
 		 */
 		if (cstate->transition_capture != NULL)
-		{
-			int			i;
-
 			mtstate->mt_transition_tupconv_maps = (TupleConversionMap **)
 										palloc0(sizeof(TupleConversionMap *) *
 												mtstate->mt_num_partitions);
-			for (i = 0; i < mtstate->mt_num_partitions; ++i)
-			{
-				mtstate->mt_transition_tupconv_maps[i] =
-					convert_tuples_by_name(RelationGetDescr(mtstate->mt_partitions[i]->ri_RelationDesc),
-										   RelationGetDescr(cstate->rel),
-										   gettext_noop("could not convert row type"));
-			}
-		}
 	}
 
 	/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index a495b165bd..3e2226e5f8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -23,6 +23,8 @@
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
 
+static void ExecInitPartitionResultRelInfo(ModifyTableState *mtstate,
+					int partidx);
 static PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel,
 								 int *num_parted, List **leaf_part_oids);
 static void get_partition_dispatch_recurse(Relation rel, Relation parent,
@@ -47,14 +49,9 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 void
 ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
-	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	EState	   *estate = mtstate->ps.state;
-	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-	Index		resultRTindex = node->nominalRelation;
-	ResultRelInfo *leaf_part_rri;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -66,6 +63,11 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 												 &mtstate->mt_num_dispatch,
 												 &leaf_parts);
 	mtstate->mt_num_partitions = list_length(leaf_parts);
+	mtstate->mt_partition_oids = (Oid *) palloc0(sizeof(Oid) *
+												 mtstate->mt_num_partitions);
+	i = 0;
+	foreach (cell, leaf_parts)
+		mtstate->mt_partition_oids[i++] = lfirst_oid(cell);
 
 	/*
 	 * Allocate an array of ResultRelInfo pointers, but actual
@@ -87,62 +89,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 * processing.
 	 */
 	mtstate->mt_partition_tuple_slot = MakeTupleTableSlot();
-
-	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo) *
-											  mtstate->mt_num_partitions);
-	i = 0;
-	foreach(cell, leaf_parts)
-	{
-		Relation	partrel;
-		TupleDesc	part_tupdesc;
-
-		/*
-		 * We locked all the partitions above including the leaf partitions.
-		 * Note that each of the relations in *partitions are eventually
-		 * closed by the caller.
-		 */
-		partrel = heap_open(lfirst_oid(cell), NoLock);
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		mtstate->mt_partition_tupconv_maps[i] =
-								convert_tuples_by_name(tupDesc,
-													   part_tupdesc,
-									gettext_noop("could not convert row type"));
-
-		InitResultRelInfo(leaf_part_rri,
-						  partrel,
-						  resultRTindex,
-						  rel,
-						  estate->es_instrument);
-
-		/*
-		 * Verify result relation is a valid target for INSERT.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
-
-		estate->es_leaf_result_relations =
-			lappend(estate->es_leaf_result_relations, leaf_part_rri);
-
-		mtstate->mt_partitions[i] = leaf_part_rri++;
-		i++;
-	}
 }
 
 /*
@@ -257,11 +203,168 @@ ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot)
 				 val_desc ? errdetail("Partition key of the failing row contains %s.", val_desc) : 0));
 	}
 
+	/* Initialize the partition result rel, if not done already. */
+	ExecInitPartitionResultRelInfo(mtstate, result);
 	ecxt->ecxt_scantuple = ecxt_scantuple_old;
 	return result;
 }
 
 /*
+ * ExecInitPartitionResultRelInfo
+ *		Initialize ResultRelInfo for a partition if not done already
+ */
+static void
+ExecInitPartitionResultRelInfo(ModifyTableState *mtstate, int partidx)
+{
+	EState	   *estate = mtstate->ps.state;
+	Relation	rootrel = mtstate->resultRelInfo->ri_RelationDesc;
+	Index		resultRTindex = mtstate->resultRelInfo->ri_RangeTableIndex;
+	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+	Relation	partrel;
+	TupleDesc	tupDesc = RelationGetDescr(rootrel),
+				part_tupdesc;
+
+	/* Nothing to do if already set.*/
+	if (mtstate->mt_partitions[partidx])
+		return;
+
+	mtstate->mt_partitions[partidx] = (ResultRelInfo *)
+											palloc0(sizeof(ResultRelInfo));
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(mtstate->mt_partition_oids[partidx], NoLock);
+	part_tupdesc = RelationGetDescr(partrel);
+	InitResultRelInfo(mtstate->mt_partitions[partidx],
+					  partrel,
+					  resultRTindex,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for INSERT.
+	 */
+	CheckValidResultRel(mtstate->mt_partitions[partidx], CMD_INSERT);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		mtstate->mt_partitions[partidx]->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(mtstate->mt_partitions[partidx],
+						mtstate->mt_onconflict != ONCONFLICT_NONE);
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	mtstate->mt_partition_tupconv_maps[partidx] =
+							convert_tuples_by_name(tupDesc, part_tupdesc,
+											   gettext_noop("could not convert row type"));
+
+	/*
+	 * Also, if needed, the map to convert from partition's rowtype to the
+	 * parent's that is needed to store the partition's tuples into the
+	 * transition tuplestore which only accepts tuples of parent's rowtype.
+	 */
+	if (mtstate->mt_transition_tupconv_maps)
+		mtstate->mt_transition_tupconv_maps[partidx] =
+							convert_tuples_by_name(part_tupdesc, tupDesc,
+											   gettext_noop("could not convert row type"));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
+	 * that we didn't build the withCheckOptionList for each partition within
+	 * the planner, but simple translation of the varattnos for each partition
+	 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
+	 * cases are handled above.
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *mapped_wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+
+		/*
+		 * In case of INSERT on partitioned tables, there is only one plan.
+		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
+		 * partition.  We make a copy of the WCO qual for each partition; note
+		 * that, if there are SubPlans in there, they all end up attached to
+		 * the one parent Plan node.
+		 */
+		Assert(mtstate->operation == CMD_INSERT &&
+			   list_length(node->withCheckOptionLists) == 1 &&
+			   mtstate->mt_nplans == 1);
+		wcoList = linitial(node->withCheckOptionLists);
+		mapped_wcoList = map_partition_varattnos(wcoList,
+												 resultRTindex,
+												 partrel, rootrel, NULL);
+		foreach(ll, mapped_wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		mtstate->mt_partitions[partidx]->ri_WithCheckOptions = mapped_wcoList;
+		mtstate->mt_partitions[partidx]->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build a projection for each leaf partition rel.  Note that we
+	 * didn't build the returningList for each partition within the
+	 * planner, but simple translation of the varattnos for each partition
+	 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
+	 * are handled above.
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		List	   *rlist;
+
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Initialize result tuple slot and assign its rowtype using the first
+		 * RETURNING list.  We assume the rest will look the same.
+		 */
+		tupDesc = ExecTypeFromTL(returningList, false);
+
+		/* Set up a slot for the output of the RETURNING projection(s) */
+		ExecInitResultTupleSlot(estate, &mtstate->ps);
+		ExecAssignResultType(&mtstate->ps, tupDesc);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+
+		/* Need an econtext too */
+		if (mtstate->ps.ps_ExprContext == NULL)
+			ExecAssignExprContext(estate, &mtstate->ps);
+		econtext = mtstate->ps.ps_ExprContext;
+
+		rlist = map_partition_varattnos(returningList,
+										resultRTindex,
+										partrel, rootrel, NULL);
+		mtstate->mt_partitions[partidx]->ri_projectReturning =
+				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+										part_tupdesc);
+	}
+
+	/* Note that the entries in this list appear in no predetermined order. */
+	estate->es_leaf_result_relations =
+								lappend(estate->es_leaf_result_relations,
+										mtstate->mt_partitions[partidx]);
+}
+
+/*
  * RelationGetPartitionDispatchInfo
  *		Returns information necessary to route tuples down a partition tree
  *
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6a3b171587..8b45fdaeb7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1511,23 +1511,14 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 		mtstate->mt_transition_tupconv_maps = (TupleConversionMap **)
 			palloc0(sizeof(TupleConversionMap *) * numResultRelInfos);
 
-		/* Choose the right set of partitions */
+		/*
+		 * If partition tuple-routing is active, we can't have partition
+		 * ResultRelInfo's just yet, so return in that case.  Instead,
+		 * the conversion map will be initialized in
+		 * ExecInitPartitionResultRelInfo() if and when needed.
+		 */
 		if (mtstate->mt_partition_dispatch_info != NULL)
-		{
-			/*
-			 * For tuple routing among partitions, we need TupleDescs based on
-			 * the partition routing table.
-			 */
-			ResultRelInfo **resultRelInfos = mtstate->mt_partitions;
-
-			for (i = 0; i < numResultRelInfos; ++i)
-			{
-				mtstate->mt_transition_tupconv_maps[i] =
-					convert_tuples_by_name(RelationGetDescr(resultRelInfos[i]->ri_RelationDesc),
-										   RelationGetDescr(targetRelInfo->ri_RelationDesc),
-										   gettext_noop("could not convert row type"));
-			}
-		}
+			return;
 		else
 		{
 			/* Otherwise we need the ResultRelInfo for each subplan. */
@@ -1978,65 +1969,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
-	 * cases are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && mtstate->mt_num_partitions > 0)
-	{
-		List	   *wcoList;
-		PlanState  *plan;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition.  We make a copy of the WCO qual for each partition; note
-		 * that, if there are SubPlans in there, they all end up attached to
-		 * the one parent Plan node.
-		 */
-		Assert(operation == CMD_INSERT &&
-			   list_length(node->withCheckOptionLists) == 1 &&
-			   mtstate->mt_nplans == 1);
-		wcoList = linitial(node->withCheckOptionLists);
-		plan = mtstate->mt_plans[0];
-		for (i = 0; i < mtstate->mt_num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = mtstate->mt_partitions[i];
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/* varno = node->nominalRelation */
-			mapped_wcoList = map_partition_varattnos(wcoList,
-													 node->nominalRelation,
-													 partrel, rel, NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   plan);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *returningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2068,31 +2006,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
-		 * are handled above.
-		 */
-		returningList = linitial(node->returningLists);
-		for (i = 0; i < mtstate->mt_num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = mtstate->mt_partitions[i];
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/* varno = node->nominalRelation */
-			rlist = map_partition_varattnos(returningList,
-											node->nominalRelation,
-											partrel, rel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
@@ -2367,8 +2280,11 @@ ExecEndModifyTable(ModifyTableState *node)
 	{
 		ResultRelInfo *resultRelInfo = node->mt_partitions[i];
 
-		ExecCloseIndices(resultRelInfo);
-		heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		if (resultRelInfo)
+		{
+			ExecCloseIndices(resultRelInfo);
+			heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		}
 	}
 
 	/* Release the standalone partition tuple descriptor, if any */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1a35c5c9ad..988a374a74 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -982,6 +982,7 @@ typedef struct ModifyTableState
 	int			mt_num_dispatch;	/* Number of entries in the above array */
 	int			mt_num_partitions;	/* Number of members in the following
 									 * arrays */
+	Oid		   *mt_partition_oids;	/* Per partition OIDs */
 	ResultRelInfo **mt_partitions;	/* Per partition result relation pointers */
 	TupleConversionMap **mt_partition_tupconv_maps;
 	/* Per partition tuple conversion map */
-- 
2.11.0

#2Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#1)
Re: non-bulk inserts and tuple routing

On Tue, Dec 19, 2017 at 3:36 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

* Times in milliseconds:

#parts HEAD Patched

8 458.301 450.875
16 409.271 510.723
32 500.960 612.003
64 430.687 795.046
128 449.314 565.786
256 493.171 490.187

While the earlier numbers were monotonically increasing with number of
partitions, these numbers don't. For example the number on HEAD with 8
partitions is higher than that with 128 partitions as well. That's
kind of wierd. May be something wrong with the measurement. Do we see
similar unstability when bulk inserting in an unpartitioned table?
Also, the numbers against 64 partitions are really bad. That's almost
2x slower.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#3Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#2)
Re: non-bulk inserts and tuple routing

Hi Ashutosh.

On 2017/12/19 19:12, Ashutosh Bapat wrote:

On Tue, Dec 19, 2017 at 3:36 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

* Times in milliseconds:

#parts HEAD Patched

8 458.301 450.875
16 409.271 510.723
32 500.960 612.003
64 430.687 795.046
128 449.314 565.786
256 493.171 490.187

While the earlier numbers were monotonically increasing with number of
partitions, these numbers don't. For example the number on HEAD with 8
partitions is higher than that with 128 partitions as well. That's
kind of wierd. May be something wrong with the measurement.

In the bulk-insert case, we initialize partitions only once, because the
COPY that loads those 100,000 rows is executed just once.

Whereas in the non-bulk insert case, we initialize partitions (lock,
allocate various objects) 100,000 times, because that's how many times the
INSERT is executed, once for each of 100,000 rows to be inserted.

Without the patch, the object initialization occurs N times where N is the
number of partitions. With the patch it occurs just once -- only for the
partition to which the row was routed. Time required, although became
smaller with the patch, is still monotonically increasing, because the
patch didn't do anything about locking all partitions.

Does that make sense?

Do we see
similar unstability when bulk inserting in an unpartitioned table?
Also, the numbers against 64 partitions are really bad. That's almost
2x slower.

Sorry, as I said the numbers I initially posted were a bit noisy. I just
re-ran that COPY against the patched and get the following numbers:

#parts Patched

8 441.852
16 417.510
32 435.276
64 486.497
128 436.473
256 446.312

Thanks,
Amit

#4Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#1)
4 attachment(s)
Re: non-bulk inserts and tuple routing

On 2017/12/19 19:06, Amit Langote wrote:

Hi.

I have a patch that rearranges the code around partition tuple-routing,
such that allocation of per-partition objects (ResultRelInfo,
TupleConversionMap, etc.) is delayed until a given partition is actually
inserted into (i.e., a tuple is routed to it). I can see good win for
non-bulk inserts with the patch and the patch is implemented such that it
doesn't affect the bulk-insert case much.

Performance numbers:

* Uses following hash-partitioned table:

create table t1 (a int, b int) partition by hash (a);
create table t1_x partition of t1 for values with (modulus M, remainder R)
...

* Non-bulk insert uses the following code (insert 100,000 rows one-by-one):

do $$
begin
for i in 1..100000 loop
insert into t1 values (i, i+1);
end loop;
end; $$;

* Times in milliseconds:

#parts HEAD Patched

8 6216.300 4977.670
16 9061.388 6360.093
32 14081.656 8752.405
64 24887.110 13919.384
128 45926.251 24582.411
256 88088.084 45490.894

As you can see the performance can be as much as 2x faster with the patch,
although time taken still increases as the number of partitions increases,
because we still lock *all* partitions at the beginning.

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

* Times in milliseconds:

#parts HEAD Patched

8 458.301 450.875
16 409.271 510.723
32 500.960 612.003
64 430.687 795.046
128 449.314 565.786
256 493.171 490.187

Not much harm here, although numbers are a bit noisy.

Patch is divided into 4, first 3 of which are refactoring patches.

I know this patch will conflict severely with [1] and [2], so it's fine if
we consider applying these later. Will add this to next CF.

I rebased the patches, since they started conflicting with a recently
committed patch [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cc6337d2fed5.

Thanks,
Amit

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cc6337d2fed5
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cc6337d2fed5

Attachments:

v2-0001-Teach-CopyFrom-to-use-ModifyTableState-for-tuple-.patchtext/plain; charset=UTF-8; name=v2-0001-Teach-CopyFrom-to-use-ModifyTableState-for-tuple-.patchDownload
From 187683f6ea153e8be1a5c067b3546e70b15ccd47 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 10:43:45 +0900
Subject: [PATCH v2 1/4] Teach CopyFrom to use ModifyTableState for
 tuple-routing

This removes all fields of CopyStateData that were meant for
tuple routing and instead uses ModifyTableState that has all those
fields, including transition_tupconv_maps.  In COPY's case,
transition_tupconv_maps is only required if tuple routing is being
used, so it's safe.
---
 src/backend/commands/copy.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6bfca2a4af..242dc56d87 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -166,11 +166,7 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 
-	/* Tuple-routing support info */
-	PartitionTupleRouting *partition_tuple_routing;
-
 	TransitionCaptureState *transition_capture;
-	TupleConversionMap **transition_tupconv_maps;
 
 	/*
 	 * These variables are used to reduce overhead in textual COPY FROM.
@@ -2286,6 +2282,7 @@ CopyFrom(CopyState cstate)
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *saved_resultRelInfo = NULL;
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
+	ModifyTableState *mtstate = makeNode(ModifyTableState);
 	ExprContext *econtext;
 	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
@@ -2304,6 +2301,8 @@ CopyFrom(CopyState cstate)
 	Size		bufferedTuplesSize = 0;
 	int			firstBufferedLineNo = 0;
 
+	PartitionTupleRouting *proute = NULL;
+
 	Assert(cstate->rel);
 
 	/*
@@ -2469,10 +2468,15 @@ CopyFrom(CopyState cstate)
 	 */
 	if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
-		PartitionTupleRouting *proute;
+		ModifyTable *node = makeNode(ModifyTable);
 
-		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+		/* Just need make this field appear valid. */
+		node->nominalRelation = 1;
+		mtstate->ps.plan = (Plan *) node;
+		mtstate->ps.state = estate;
+		mtstate->resultRelInfo = resultRelInfo;
+		proute = mtstate->mt_partition_tuple_routing =
+			ExecSetupPartitionTupleRouting(mtstate, cstate->rel, 1, estate);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2484,11 +2488,11 @@ CopyFrom(CopyState cstate)
 		{
 			int			i;
 
-			cstate->transition_tupconv_maps = (TupleConversionMap **)
+			mtstate->mt_transition_tupconv_maps = (TupleConversionMap **)
 				palloc0(sizeof(TupleConversionMap *) * proute->num_partitions);
 			for (i = 0; i < proute->num_partitions; ++i)
 			{
-				cstate->transition_tupconv_maps[i] =
+				mtstate->mt_transition_tupconv_maps[i] =
 					convert_tuples_by_name(RelationGetDescr(proute->partitions[i]->ri_RelationDesc),
 										   RelationGetDescr(cstate->rel),
 										   gettext_noop("could not convert row type"));
@@ -2509,7 +2513,7 @@ CopyFrom(CopyState cstate)
 	if ((resultRelInfo->ri_TrigDesc != NULL &&
 		 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
 		  resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) ||
-		cstate->partition_tuple_routing != NULL ||
+		mtstate->mt_partition_tuple_routing != NULL ||
 		cstate->volatile_defexprs)
 	{
 		useHeapMultiInsert = false;
@@ -2584,11 +2588,10 @@ CopyFrom(CopyState cstate)
 		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
 
 		/* Determine the partition to heap_insert the tuple into */
-		if (cstate->partition_tuple_routing)
+		if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
 			int			leaf_part_index;
 			TupleConversionMap *map;
-			PartitionTupleRouting *proute = cstate->partition_tuple_routing;
 
 			/*
 			 * Away we go ... If we end up not finding a partition after all,
@@ -2651,7 +2654,7 @@ CopyFrom(CopyState cstate)
 					 */
 					cstate->transition_capture->tcs_original_insert_tuple = NULL;
 					cstate->transition_capture->tcs_map =
-						cstate->transition_tupconv_maps[leaf_part_index];
+						mtstate->mt_transition_tupconv_maps[leaf_part_index];
 				}
 				else
 				{
@@ -2832,8 +2835,8 @@ CopyFrom(CopyState cstate)
 	ExecCloseIndices(resultRelInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
-	if (cstate->partition_tuple_routing)
-		ExecCleanupTupleRouting(cstate->partition_tuple_routing);
+	if (proute)
+		ExecCleanupTupleRouting(proute);
 
 	/* Close any trigger target relations */
 	ExecCleanUpTriggerState(estate);
-- 
2.11.0

v2-0002-ExecFindPartition-refactoring.patchtext/plain; charset=UTF-8; name=v2-0002-ExecFindPartition-refactoring.patchDownload
From c05bbc58f7e994d21df210a16985911bc4954f50 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 13:56:25 +0900
Subject: [PATCH v2 2/4] ExecFindPartition refactoring

---
 src/backend/commands/copy.c            |  5 +----
 src/backend/executor/execPartition.c   | 15 +++++++--------
 src/backend/executor/nodeModifyTable.c |  5 +----
 src/include/executor/execPartition.h   |  5 +----
 4 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 242dc56d87..8c724b8695 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2601,10 +2601,7 @@ CopyFrom(CopyState cstate)
 			 * will get us the ResultRelInfo and TupleConversionMap for the
 			 * partition, respectively.
 			 */
-			leaf_part_index = ExecFindPartition(resultRelInfo,
-												proute->partition_dispatch_info,
-												slot,
-												estate);
+			leaf_part_index = ExecFindPartition(mtstate, slot);
 			Assert(leaf_part_index >= 0 &&
 				   leaf_part_index < proute->num_partitions);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 8c0d2df63c..21d230881a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -140,11 +140,7 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 }
 
 /*
- * ExecFindPartition -- Find a leaf partition in the partition tree rooted
- * at parent, for the heap tuple contained in *slot
- *
- * estate must be non-NULL; we'll need it to compute any expressions in the
- * partition key(s)
+ * ExecFindPartition -- Find a leaf partition for tuple contained in slot
  *
  * If no leaf partition is found, this routine errors out with the appropriate
  * error message, else it returns the leaf partition sequence number
@@ -152,14 +148,17 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
  * the partition tree.
  */
 int
-ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
-				  TupleTableSlot *slot, EState *estate)
+ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot)
 {
+	EState	   *estate = mtstate->ps.state;
 	int			result;
 	Datum		values[PARTITION_MAX_KEYS];
 	bool		isnull[PARTITION_MAX_KEYS];
 	Relation	rel;
-	PartitionDispatch parent;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	PartitionDispatch  *pd = proute->partition_dispatch_info,
+						parent;
+	ResultRelInfo *resultRelInfo = mtstate->resultRelInfo;
 	ExprContext *ecxt = GetPerTupleExprContext(estate);
 	TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c5eca1bb74..9a8f667d72 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -292,10 +292,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 * get us the ResultRelInfo and TupleConversionMap for the partition,
 		 * respectively.
 		 */
-		leaf_part_index = ExecFindPartition(resultRelInfo,
-											proute->partition_dispatch_info,
-											slot,
-											estate);
+		leaf_part_index = ExecFindPartition(mtstate, slot);
 		Assert(leaf_part_index >= 0 &&
 			   leaf_part_index < proute->num_partitions);
 
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index b5df357acd..79dad58828 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -86,10 +86,7 @@ typedef struct PartitionTupleRouting
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 							   Relation rel, Index resultRTindex,
 							   EState *estate);
-extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
-				  PartitionDispatch *pd,
-				  TupleTableSlot *slot,
-				  EState *estate);
 extern void ExecCleanupTupleRouting(PartitionTupleRouting *proute);
+extern int ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot);
 
 #endif							/* EXECPARTITION_H */
-- 
2.11.0

v2-0003-ExecSetupPartitionTupleRouting-refactoring.patchtext/plain; charset=UTF-8; name=v2-0003-ExecSetupPartitionTupleRouting-refactoring.patchDownload
From cb5aeacf77cd11dc460fe5d2e2a4398359188e43 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v2 3/4] ExecSetupPartitionTupleRouting refactoring

---
 src/backend/commands/copy.c            |  2 +-
 src/backend/executor/execPartition.c   | 14 +++++++++++---
 src/backend/executor/nodeModifyTable.c |  4 +---
 src/include/executor/execPartition.h   |  3 +--
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8c724b8695..13f1b5b3e1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2476,7 +2476,7 @@ CopyFrom(CopyState cstate)
 		mtstate->ps.state = estate;
 		mtstate->resultRelInfo = resultRelInfo;
 		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(mtstate, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 21d230881a..be15189f1d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -46,14 +46,15 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  * RowExclusiveLock mode upon return from this function.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
+	EState	   *estate = mtstate->ps.state;
+	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+	Index		resultRTindex = node->nominalRelation;
 	ResultRelInfo *leaf_part_rri;
 	PartitionTupleRouting *proute;
 
@@ -67,8 +68,15 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
 	proute->num_partitions = list_length(leaf_parts);
+
+	/*
+	 * Allocate an array of ResultRelInfo pointers, but actual
+	 * ResultRelInfo's will be allocated if and when needed.  See
+	 * ExecFindPartition where it's done.
+	 */
 	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
 												   sizeof(ResultRelInfo *));
+	/* Here too, actual TupleConversionMap's will be allocated later. */
 	proute->partition_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9a8f667d72..0bd47ef3ab 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1934,9 +1934,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
+			ExecSetupPartitionTupleRouting(mtstate, rel);
 		num_partitions = proute->num_partitions;
 	}
 
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 79dad58828..b3517e2ee0 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -84,8 +84,7 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern void ExecCleanupTupleRouting(PartitionTupleRouting *proute);
 extern int ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot);
 
-- 
2.11.0

v2-0004-During-tuple-routing-initialize-per-partition-obj.patchtext/plain; charset=UTF-8; name=v2-0004-During-tuple-routing-initialize-per-partition-obj.patchDownload
From ce6ae497836a6e46b8b769e7da31fc6b0bca0c3d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 1 Nov 2017 10:31:21 +0900
Subject: [PATCH v2 4/4] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |  16 +--
 src/backend/executor/execPartition.c   | 239 ++++++++++++++++++++++++---------
 src/backend/executor/nodeModifyTable.c | 111 ++-------------
 src/include/executor/execPartition.h   |   1 +
 4 files changed, 188 insertions(+), 179 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 13f1b5b3e1..a6a5409770 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2482,22 +2482,13 @@ CopyFrom(CopyState cstate)
 		 * If we are capturing transition tuples, they may need to be
 		 * converted from partition format back to partitioned table format
 		 * (this is only ever necessary if a BEFORE trigger modifies the
-		 * tuple).
+		 * tuple).  Note that we don't allocate the actual maps here; they'll
+		 * be allocated by ExecInitPartitionResultRelInfo() if and when
+		 * needed.
 		 */
 		if (cstate->transition_capture != NULL)
-		{
-			int			i;
-
 			mtstate->mt_transition_tupconv_maps = (TupleConversionMap **)
 				palloc0(sizeof(TupleConversionMap *) * proute->num_partitions);
-			for (i = 0; i < proute->num_partitions; ++i)
-			{
-				mtstate->mt_transition_tupconv_maps[i] =
-					convert_tuples_by_name(RelationGetDescr(proute->partitions[i]->ri_RelationDesc),
-										   RelationGetDescr(cstate->rel),
-										   gettext_noop("could not convert row type"));
-			}
-		}
 	}
 
 	/*
@@ -2622,6 +2613,7 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			Assert(resultRelInfo != NULL);
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index be15189f1d..0654490e9c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -23,6 +23,8 @@
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
 
+static void ExecInitPartitionResultRelInfo(ModifyTableState *mtstate,
+					int partidx);
 static PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel,
 								 int *num_parted, List **leaf_part_oids);
 static void get_partition_dispatch_recurse(Relation rel, Relation parent,
@@ -48,15 +50,10 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 PartitionTupleRouting *
 ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
-	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	EState	   *estate = mtstate->ps.state;
-	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-	Index		resultRTindex = node->nominalRelation;
-	ResultRelInfo *leaf_part_rri;
-	PartitionTupleRouting *proute;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -68,19 +65,23 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
 	proute->num_partitions = list_length(leaf_parts);
+	proute->partition_oids = (Oid *) palloc0(proute->num_partitions *
+											 sizeof(Oid));
+	i = 0;
+	foreach (cell, leaf_parts)
+		proute->partition_oids[i++] = lfirst_oid(cell);
 
 	/*
 	 * Allocate an array of ResultRelInfo pointers, but actual
 	 * ResultRelInfo's will be allocated if and when needed.  See
 	 * ExecFindPartition where it's done.
 	 */
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+													sizeof(ResultRelInfo *));
 	/* Here too, actual TupleConversionMap's will be allocated later. */
 	proute->partition_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
-
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
 	 * given partition's rowtype.  It is attached to the caller-specified node
@@ -89,61 +90,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot();
 
-	leaf_part_rri = (ResultRelInfo *) palloc0(proute->num_partitions *
-											  sizeof(ResultRelInfo));
-	i = 0;
-	foreach(cell, leaf_parts)
-	{
-		Relation	partrel;
-		TupleDesc	part_tupdesc;
-
-		/*
-		 * We locked all the partitions above including the leaf partitions.
-		 * Note that each of the relations in proute->partitions are
-		 * eventually closed by the caller.
-		 */
-		partrel = heap_open(lfirst_oid(cell), NoLock);
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->partition_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		InitResultRelInfo(leaf_part_rri,
-						  partrel,
-						  resultRTindex,
-						  rel,
-						  estate->es_instrument);
-
-		/*
-		 * Verify result relation is a valid target for INSERT.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
-
-		estate->es_leaf_result_relations =
-			lappend(estate->es_leaf_result_relations, leaf_part_rri);
-
-		proute->partitions[i] = leaf_part_rri++;
-		i++;
-	}
-
 	return proute;
 }
 
@@ -261,6 +207,8 @@ ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot)
 				 val_desc ? errdetail("Partition key of the failing row contains %s.", val_desc) : 0));
 	}
 
+	/* Initialize the partition result rel, if not done already. */
+	ExecInitPartitionResultRelInfo(mtstate, result);
 	ecxt->ecxt_scantuple = ecxt_scantuple_old;
 	return result;
 }
@@ -295,8 +243,11 @@ ExecCleanupTupleRouting(PartitionTupleRouting * proute)
 	{
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
-		ExecCloseIndices(resultRelInfo);
-		heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		if (resultRelInfo)
+		{
+			ExecCloseIndices(resultRelInfo);
+			heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		}
 	}
 
 	/* Release the standalone partition tuple descriptor, if any */
@@ -305,6 +256,162 @@ ExecCleanupTupleRouting(PartitionTupleRouting * proute)
 }
 
 /*
+ * ExecInitPartitionResultRelInfo
+ *		Initialize ResultRelInfo for a partition if not done already
+ */
+static void
+ExecInitPartitionResultRelInfo(ModifyTableState *mtstate, int partidx)
+{
+	EState	   *estate = mtstate->ps.state;
+	Relation	rootrel = mtstate->resultRelInfo->ri_RelationDesc;
+	Index		resultRTindex = mtstate->resultRelInfo->ri_RangeTableIndex;
+	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	Relation	partrel;
+	TupleDesc	tupDesc = RelationGetDescr(rootrel),
+				part_tupdesc;
+
+	/* Nothing to do if already set.*/
+	if (proute->partitions[partidx])
+		return;
+
+	proute->partitions[partidx] = (ResultRelInfo *)
+											palloc0(sizeof(ResultRelInfo));
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+	part_tupdesc = RelationGetDescr(partrel);
+	InitResultRelInfo(proute->partitions[partidx],
+					  partrel,
+					  resultRTindex,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for INSERT.
+	 */
+	CheckValidResultRel(proute->partitions[partidx], CMD_INSERT);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		proute->partitions[partidx]->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(proute->partitions[partidx],
+						mtstate->mt_onconflict != ONCONFLICT_NONE);
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->partition_tupconv_maps[partidx] =
+							convert_tuples_by_name(tupDesc, part_tupdesc,
+											   gettext_noop("could not convert row type"));
+
+	/*
+	 * Also, if needed, the map to convert from partition's rowtype to the
+	 * parent's that is needed to store the partition's tuples into the
+	 * transition tuplestore which only accepts tuples of parent's rowtype.
+	 */
+	if (mtstate->mt_transition_tupconv_maps)
+		mtstate->mt_transition_tupconv_maps[partidx] =
+							convert_tuples_by_name(part_tupdesc, tupDesc,
+											   gettext_noop("could not convert row type"));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
+	 * that we didn't build the withCheckOptionList for each partition within
+	 * the planner, but simple translation of the varattnos for each partition
+	 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
+	 * cases are handled above.
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *mapped_wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+
+		/*
+		 * In case of INSERT on partitioned tables, there is only one plan.
+		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
+		 * partition.  We make a copy of the WCO qual for each partition; note
+		 * that, if there are SubPlans in there, they all end up attached to
+		 * the one parent Plan node.
+		 */
+		Assert(mtstate->operation == CMD_INSERT &&
+			   list_length(node->withCheckOptionLists) == 1 &&
+			   mtstate->mt_nplans == 1);
+		wcoList = linitial(node->withCheckOptionLists);
+		mapped_wcoList = map_partition_varattnos(wcoList,
+												 resultRTindex,
+												 partrel, rootrel, NULL);
+		foreach(ll, mapped_wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		proute->partitions[partidx]->ri_WithCheckOptions = mapped_wcoList;
+		proute->partitions[partidx]->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build a projection for each leaf partition rel.  Note that we
+	 * didn't build the returningList for each partition within the
+	 * planner, but simple translation of the varattnos for each partition
+	 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
+	 * are handled above.
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		List	   *rlist;
+
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Initialize result tuple slot and assign its rowtype using the first
+		 * RETURNING list.  We assume the rest will look the same.
+		 */
+		tupDesc = ExecTypeFromTL(returningList, false);
+
+		/* Set up a slot for the output of the RETURNING projection(s) */
+		ExecInitResultTupleSlot(estate, &mtstate->ps);
+		ExecAssignResultType(&mtstate->ps, tupDesc);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+
+		/* Need an econtext too */
+		if (mtstate->ps.ps_ExprContext == NULL)
+			ExecAssignExprContext(estate, &mtstate->ps);
+		econtext = mtstate->ps.ps_ExprContext;
+
+		rlist = map_partition_varattnos(returningList,
+										resultRTindex,
+										partrel, rootrel, NULL);
+		proute->partitions[partidx]->ri_projectReturning =
+				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+										part_tupdesc);
+	}
+
+	/* Note that the entries in this list appear in no predetermined order. */
+	estate->es_leaf_result_relations =
+								lappend(estate->es_leaf_result_relations,
+										proute->partitions[partidx]);
+}
+
+/*
  * RelationGetPartitionDispatchInfo
  *		Returns information necessary to route tuples down a partition tree
  *
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 0bd47ef3ab..fd1390a508 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -302,6 +302,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		Assert(resultRelInfo != NULL);
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -1512,23 +1513,14 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 		mtstate->mt_transition_tupconv_maps = (TupleConversionMap **)
 			palloc0(sizeof(TupleConversionMap *) * numResultRelInfos);
 
-		/* Choose the right set of partitions */
+		/*
+		 * If partition tuple-routing is active, we can't have partition
+		 * ResultRelInfo's just yet, so return in that case.  Instead,
+		 * the conversion map will be initialized in
+		 * ExecInitPartitionResultRelInfo() if and when needed.
+		 */
 		if (proute != NULL)
-		{
-			/*
-			 * For tuple routing among partitions, we need TupleDescs based on
-			 * the partition routing table.
-			 */
-			ResultRelInfo **resultRelInfos = proute->partitions;
-
-			for (i = 0; i < numResultRelInfos; ++i)
-			{
-				mtstate->mt_transition_tupconv_maps[i] =
-					convert_tuples_by_name(RelationGetDescr(resultRelInfos[i]->ri_RelationDesc),
-										   RelationGetDescr(targetRelInfo->ri_RelationDesc),
-										   gettext_noop("could not convert row type"));
-			}
-		}
+			return;
 		else
 		{
 			/* Otherwise we need the ResultRelInfo for each subplan. */
@@ -1830,8 +1822,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ListCell   *l;
 	int			i;
 	Relation	rel;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -1932,11 +1922,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	rel = mtstate->resultRelInfo->ri_RelationDesc;
 	if (operation == CMD_INSERT &&
 		rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate, rel);
-		num_partitions = proute->num_partitions;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -1972,65 +1959,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
-	 * cases are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *wcoList;
-		PlanState  *plan;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition.  We make a copy of the WCO qual for each partition; note
-		 * that, if there are SubPlans in there, they all end up attached to
-		 * the one parent Plan node.
-		 */
-		Assert(operation == CMD_INSERT &&
-			   list_length(node->withCheckOptionLists) == 1 &&
-			   mtstate->mt_nplans == 1);
-		wcoList = linitial(node->withCheckOptionLists);
-		plan = mtstate->mt_plans[0];
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/* varno = node->nominalRelation */
-			mapped_wcoList = map_partition_varattnos(wcoList,
-													 node->nominalRelation,
-													 partrel, rel, NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   plan);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *returningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2061,31 +1995,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
-		 * are handled above.
-		 */
-		returningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/* varno = node->nominalRelation */
-			rlist = map_partition_varattnos(returningList,
-											node->nominalRelation,
-											partrel, rel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index b3517e2ee0..cd256551bb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -79,6 +79,7 @@ typedef struct PartitionTupleRouting
 	int			num_dispatch;
 	ResultRelInfo **partitions;
 	int			num_partitions;
+	Oid		   *partition_oids;
 	TupleConversionMap **partition_tupconv_maps;
 	TupleTableSlot *partition_tuple_slot;
 } PartitionTupleRouting;
-- 
2.11.0

#5Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#4)
Re: non-bulk inserts and tuple routing

On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

I rebased the patches, since they started conflicting with a recently
committed patch [1].

I think that my latest commit has managed to break this pretty thoroughly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#5)
3 attachment(s)
Re: non-bulk inserts and tuple routing

On 2018/01/20 7:07, Robert Haas wrote:

On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

I rebased the patches, since they started conflicting with a recently
committed patch [1].

I think that my latest commit has managed to break this pretty thoroughly.

I rebased it. Here are the performance numbers again.

* Uses following hash-partitioned table:

create table t1 (a int, b int) partition by hash (a);
create table t1_x partition of t1 for values with (modulus M, remainder R)
...

* Non-bulk insert uses the following code (insert 100,000 rows one-by-one):

do $$
begin
for i in 1..100000 loop
insert into t1 values (i, i+1);
end loop;
end; $$;

Times in milliseconds:

#parts HEAD Patched
8 6148.313 4938.775
16 8882.420 6203.911
32 14251.072 8595.068
64 24465.691 13718.161
128 45099.435 23898.026
256 87307.332 44428.126

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

Times in milliseconds:

#parts HEAD Patched

8 466.170 446.865
16 445.341 444.990
32 443.544 487.713
64 460.579 435.412
128 469.953 422.403
256 463.592 431.118

Thanks,
Amit

Attachments:

v3-0001-Teach-CopyFrom-to-use-ModifyTableState-for-tuple-.patchtext/plain; charset=UTF-8; name=v3-0001-Teach-CopyFrom-to-use-ModifyTableState-for-tuple-.patchDownload
From 4b6cf902fa10819d29d021c36fec1200f72db11d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 10:43:45 +0900
Subject: [PATCH v3 1/3] Teach CopyFrom to use ModifyTableState for
 tuple-routing

This removes all fields of CopyStateData that were meant for
tuple routing and instead uses ModifyTableState that has all those
fields, including transition_tupconv_maps.  In COPY's case,
transition_tupconv_maps is only required if tuple routing is being
used, so it's safe.
---
 src/backend/commands/copy.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 04a24c6082..251676b321 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -166,9 +166,6 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 
-	/* Tuple-routing support info */
-	PartitionTupleRouting *partition_tuple_routing;
-
 	TransitionCaptureState *transition_capture;
 
 	/*
@@ -2285,6 +2282,7 @@ CopyFrom(CopyState cstate)
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *saved_resultRelInfo = NULL;
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
+	ModifyTableState *mtstate = makeNode(ModifyTableState);
 	ExprContext *econtext;
 	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
@@ -2303,6 +2301,8 @@ CopyFrom(CopyState cstate)
 	Size		bufferedTuplesSize = 0;
 	int			firstBufferedLineNo = 0;
 
+	PartitionTupleRouting *proute = NULL;
+
 	Assert(cstate->rel);
 
 	/*
@@ -2468,10 +2468,15 @@ CopyFrom(CopyState cstate)
 	 */
 	if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
-		PartitionTupleRouting *proute;
+		ModifyTable *node = makeNode(ModifyTable);
 
-		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+		/* Just need make this field appear valid. */
+		node->nominalRelation = 1;
+		mtstate->ps.plan = (Plan *) node;
+		mtstate->ps.state = estate;
+		mtstate->resultRelInfo = resultRelInfo;
+		proute = mtstate->mt_partition_tuple_routing =
+			ExecSetupPartitionTupleRouting(mtstate, cstate->rel, 1, estate);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2496,7 +2501,7 @@ CopyFrom(CopyState cstate)
 	if ((resultRelInfo->ri_TrigDesc != NULL &&
 		 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
 		  resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) ||
-		cstate->partition_tuple_routing != NULL ||
+		mtstate->mt_partition_tuple_routing != NULL ||
 		cstate->volatile_defexprs)
 	{
 		useHeapMultiInsert = false;
@@ -2571,10 +2576,9 @@ CopyFrom(CopyState cstate)
 		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
 
 		/* Determine the partition to heap_insert the tuple into */
-		if (cstate->partition_tuple_routing)
+		if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
 			int			leaf_part_index;
-			PartitionTupleRouting *proute = cstate->partition_tuple_routing;
 
 			/*
 			 * Away we go ... If we end up not finding a partition after all,
@@ -2806,8 +2810,8 @@ CopyFrom(CopyState cstate)
 	ExecCloseIndices(resultRelInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
-	if (cstate->partition_tuple_routing)
-		ExecCleanupTupleRouting(cstate->partition_tuple_routing);
+	if (proute)
+		ExecCleanupTupleRouting(proute);
 
 	/* Close any trigger target relations */
 	ExecCleanUpTriggerState(estate);
-- 
2.11.0

v3-0002-ExecFindPartition-refactoring.patchtext/plain; charset=UTF-8; name=v3-0002-ExecFindPartition-refactoring.patchDownload
From 58a24b5e0ef06e2d4d855afd2beca60a0c171ae8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 13:56:25 +0900
Subject: [PATCH v3 2/3] ExecFindPartition refactoring

---
 src/backend/commands/copy.c            |  5 +----
 src/backend/executor/execPartition.c   | 17 +++++++++--------
 src/backend/executor/nodeModifyTable.c |  5 +----
 src/include/executor/execPartition.h   |  5 +----
 4 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 251676b321..2096a52cea 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2588,10 +2588,7 @@ CopyFrom(CopyState cstate)
 			 * will get us the ResultRelInfo and TupleConversionMap for the
 			 * partition, respectively.
 			 */
-			leaf_part_index = ExecFindPartition(resultRelInfo,
-												proute->partition_dispatch_info,
-												slot,
-												estate);
+			leaf_part_index = ExecFindPartition(mtstate, slot);
 			Assert(leaf_part_index >= 0 &&
 				   leaf_part_index < proute->num_partitions);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 89b7bb4c60..918cd62cb0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -226,11 +226,7 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 }
 
 /*
- * ExecFindPartition -- Find a leaf partition in the partition tree rooted
- * at parent, for the heap tuple contained in *slot
- *
- * estate must be non-NULL; we'll need it to compute any expressions in the
- * partition key(s)
+ * ExecFindPartition -- Find a leaf partition for tuple contained in slot
  *
  * If no leaf partition is found, this routine errors out with the appropriate
  * error message, else it returns the leaf partition sequence number
@@ -238,14 +234,19 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
  * the partition tree.
  */
 int
-ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
-				  TupleTableSlot *slot, EState *estate)
+ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot)
 {
+	EState	   *estate = mtstate->ps.state;
 	int			result;
 	Datum		values[PARTITION_MAX_KEYS];
 	bool		isnull[PARTITION_MAX_KEYS];
 	Relation	rel;
-	PartitionDispatch parent;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	PartitionDispatch  *pd = proute->partition_dispatch_info,
+						parent;
+	ResultRelInfo *resultRelInfo = (mtstate->rootResultRelInfo != NULL)
+										? mtstate->rootResultRelInfo
+										: mtstate->resultRelInfo;
 	ExprContext *ecxt = GetPerTupleExprContext(estate);
 	TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6c2f8d4ec0..bd88b41ff6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -297,10 +297,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 * get us the ResultRelInfo and TupleConversionMap for the partition,
 		 * respectively.
 		 */
-		leaf_part_index = ExecFindPartition(resultRelInfo,
-											proute->partition_dispatch_info,
-											slot,
-											estate);
+		leaf_part_index = ExecFindPartition(mtstate, slot);
 		Assert(leaf_part_index >= 0 &&
 			   leaf_part_index < proute->num_partitions);
 
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 18e08129f8..87c4c3249e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -103,10 +103,6 @@ typedef struct PartitionTupleRouting
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 							   Relation rel, Index resultRTindex,
 							   EState *estate);
-extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
-				  PartitionDispatch *pd,
-				  TupleTableSlot *slot,
-				  EState *estate);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
@@ -115,5 +111,6 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 						  TupleTableSlot *new_slot,
 						  TupleTableSlot **p_my_slot);
 extern void ExecCleanupTupleRouting(PartitionTupleRouting *proute);
+extern int ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot);
 
 #endif							/* EXECPARTITION_H */
-- 
2.11.0

v3-0003-During-tuple-routing-initialize-per-partition-obj.patchtext/plain; charset=UTF-8; name=v3-0003-During-tuple-routing-initialize-per-partition-obj.patchDownload
From f69147bb78e2b89941ad6b1d6af0aed4cbdf17f1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v3 3/3] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |   6 +-
 src/backend/executor/execPartition.c   | 504 +++++++++++++++++++++++----------
 src/backend/executor/nodeModifyTable.c | 152 +---------
 src/include/executor/execPartition.h   |  10 +-
 4 files changed, 375 insertions(+), 297 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2096a52cea..1000bb4461 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2476,7 +2476,7 @@ CopyFrom(CopyState cstate)
 		mtstate->ps.state = estate;
 		mtstate->resultRelInfo = resultRelInfo;
 		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(mtstate, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2609,6 +2609,7 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			Assert(resultRelInfo != NULL);
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
@@ -2638,8 +2639,7 @@ CopyFrom(CopyState cstate)
 					 */
 					cstate->transition_capture->tcs_original_insert_tuple = NULL;
 					cstate->transition_capture->tcs_map =
-						TupConvMapForLeaf(proute, saved_resultRelInfo,
-										  leaf_part_index);
+						TupConvMapForLeaf(mtstate, leaf_part_index);
 				}
 				else
 				{
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 918cd62cb0..4e8fb01424 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -23,6 +23,8 @@
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
 
+static void ExecInitPartitionResultRelInfo(ModifyTableState *mtstate,
+					int partidx);
 static PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel,
 								 int *num_parted, List **leaf_part_oids);
 static void get_partition_dispatch_recurse(Relation rel, Relation parent,
@@ -44,22 +46,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of various objects for all
+ * partitions here, the objects themselves are lazily allocated and
+ * initialized for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionResultRelInfo.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
-	TupleDesc	tupDesc = RelationGetDescr(rel);
+	PartitionTupleRouting *proute;
 	List	   *leaf_parts;
 	ListCell   *cell;
-	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
-	int			num_update_rri = 0,
-				update_rri_index = 0;
-	bool		is_update = false;
-	PartitionTupleRouting *proute;
+	int			leaf_index,
+				update_rri_index,
+				num_update_rri;
+	bool		is_update;
+	ResultRelInfo *update_rri;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -71,20 +74,24 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
 	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	/*
+	 * Actual ResultRelInfo's and TupleConversionMap's are allocated in
+	 * ExecInitResultRelInfo().
+	 */
+	proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+													sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc0(proute->num_partitions *
+											 sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		is_update = true;
 		update_rri = mtstate->resultRelInfo;
-		num_update_rri = list_length(node->plans);
+		num_update_rri = mtstate->mt_nplans;
 		proute->subplan_partition_offsets =
 			palloc(num_update_rri * sizeof(int));
 
@@ -94,15 +101,29 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
+
+	leaf_index = update_rri_index = 0;
+	foreach (cell, leaf_parts)
 	{
+		Oid		leaf_oid = lfirst_oid(cell);
+
+		proute->partition_oids[leaf_index] = leaf_oid;
+
 		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
+		 * The per-subplan resultrels and the resultrels of the leaf
+		 * partitions are both in the same canonical order.  So while going
+		 * through the leaf partition oids, we need to keep track of the
+		 * next per-subplan result rel to be looked for in the leaf
+		 * partition resultrels.
 		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
+		if (is_update && update_rri_index < num_update_rri &&
+			RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
+		{
+			proute->subplan_partition_offsets[update_rri_index] = leaf_index;
+			update_rri_index++;
+		}
+
+		leaf_index++;
 	}
 
 	/*
@@ -113,109 +134,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot();
 
-	i = 0;
-	foreach(cell, leaf_parts)
-	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
-		Oid			leaf_oid = lfirst_oid(cell);
-
-		if (is_update)
-		{
-			/*
-			 * If the leaf partition is already present in the per-subplan
-			 * result rels, we re-use that rather than initialize a new result
-			 * rel. The per-subplan resultrels and the resultrels of the leaf
-			 * partitions are both in the same canonical order. So while going
-			 * through the leaf partition oids, we need to keep track of the
-			 * next per-subplan result rel to be looked for in the leaf
-			 * partition resultrels.
-			 */
-			if (update_rri_index < num_update_rri &&
-				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
-			{
-				leaf_part_rri = &update_rri[update_rri_index];
-				partrel = leaf_part_rri->ri_RelationDesc;
-
-				/*
-				 * This is required in order to we convert the partition's
-				 * tuple to be compatible with the root partitioned table's
-				 * tuple descriptor.  When generating the per-subplan result
-				 * rels, this was not set.
-				 */
-				leaf_part_rri->ri_PartitionRoot = rel;
-
-				/* Remember the subplan offset for this ResultRelInfo */
-				proute->subplan_partition_offsets[update_rri_index] = i;
-
-				update_rri_index++;
-			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
-		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
-
-		estate->es_leaf_result_relations =
-			lappend(estate->es_leaf_result_relations, leaf_part_rri);
-
-		proute->partitions[i] = leaf_part_rri;
-		i++;
-	}
-
 	/*
 	 * For UPDATE, we should have found all the per-subplan resultrels in the
 	 * leaf partitions.
@@ -341,10 +259,251 @@ ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot)
 				 val_desc ? errdetail("Partition key of the failing row contains %s.", val_desc) : 0));
 	}
 
+	/* Initialize the partition result rel, if not done already. */
+	ExecInitPartitionResultRelInfo(mtstate, result);
 	ecxt->ecxt_scantuple = ecxt_scantuple_old;
 	return result;
 }
 
+static int
+leafpart_index_cmp(const void *arg1, const void *arg2)
+{
+	int		leafidx1 = *(const int *) arg1;
+	int		leafidx2 = *(const int *) arg2;
+
+	if (leafidx1 > leafidx2)
+		return 1;
+	else if (leafidx1 < leafidx2)
+		return -1;
+	return 0;
+}
+
+/*
+ * ExecInitPartitionResultRelInfo
+ *		Initialize ResultRelInfo for a partition if not done already
+ */
+static void
+ExecInitPartitionResultRelInfo(ModifyTableState *mtstate, int partidx)
+{
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	Relation	partrel,
+				rootrel;
+	ResultRelInfo *leaf_part_rri;
+	int			firstVarno;
+	Relation	firstResultRel;
+
+	/* Nothing to do if already set.*/
+	if (proute->partitions[partidx])
+		return;
+
+	leaf_part_rri = NULL;
+	rootrel = (mtstate->rootResultRelInfo != NULL)
+						? mtstate->rootResultRelInfo->ri_RelationDesc
+						: mtstate->resultRelInfo->ri_RelationDesc;
+
+	/*
+	 * If we are doing tuple routing for update, try to reuse the
+	 * per-subplan resultrel for this partition that ExecInitModifyTable()
+	 * might already have created.
+	 */
+	if (mtstate && mtstate->operation == CMD_UPDATE)
+	{
+		ResultRelInfo   *update_rri;
+		int   *partidx_entry;
+
+		update_rri = mtstate->resultRelInfo;
+
+		/*
+		 * If the partition got a subplan, we'd be able to find its index
+		 * in proute->subplan_partition_offsets.
+		 */
+		partidx_entry = (int *) bsearch(&partidx,
+										proute->subplan_partition_offsets,
+										mtstate->mt_nplans, sizeof(int),
+										leafpart_index_cmp);
+		if (partidx_entry)
+		{
+			int		update_rri_index =
+							partidx_entry - proute->subplan_partition_offsets;
+
+			Assert (update_rri_index < mtstate->mt_nplans);
+			leaf_part_rri = &update_rri[update_rri_index];
+			partrel = leaf_part_rri->ri_RelationDesc;
+
+			/*
+			 * This is required in order to we convert the partition's
+			 * tuple to be compatible with the root partitioned table's
+			 * tuple descriptor.  When generating the per-subplan result
+			 * rels, this was not set.
+			 */
+			leaf_part_rri->ri_PartitionRoot = rootrel;
+		}
+	}
+
+	/*
+	 * Create new result rel, either if we are *inserting* the new tuple, or
+	 * if we didn't find the result rel above for the update tuple routing
+	 * case.
+	 */
+	if (leaf_part_rri == NULL)
+	{
+		EState	   *estate = mtstate->ps.state;
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+		Index		resultRTindex = node->nominalRelation;
+
+		leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+
+		/*
+		 * We locked all the partitions in ExecSetupPartitionTupleRouting
+		 * including the leaf partitions.
+		 */
+		partrel = heap_open(proute->partition_oids[partidx], NoLock);
+		InitResultRelInfo(leaf_part_rri,
+						  partrel,
+						  resultRTindex,
+						  rootrel,
+						  estate->es_instrument);
+
+		/*
+		 * These are required as reference objects for mapping partition
+		 * attno's in expressions in WithCheckOptions and RETURNING.
+		 */
+		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/*
+		 * Build WITH CHECK OPTION constraints for this partition rel.  Note
+		 * that we didn't build the withCheckOptionList for each partition
+		 * within the planner, but simple translation of the varattnos will
+		 * suffice.  This only occurs for the INSERT case or in the case of
+		 * UPDATE for which we didn't find a result rel above to reuse.
+		 */
+		if (node && node->withCheckOptionLists != NIL)
+		{
+			List	   *wcoList;
+			List	   *mapped_wcoList;
+			List	   *wcoExprs = NIL;
+			ListCell   *ll;
+
+			/*
+			 * In the case of INSERT on partitioned tables, there is only one
+			 * plan.  Likewise, there is only one WCO list, not one per
+			 * partition.  For UPDATE, there would be as many WCO lists as
+			 * there are plans, but we use the first one as reference.  Note
+			 * that if there are SubPlans in there, they all end up attached
+			 * to the one parent Plan node.
+			 */
+			Assert((mtstate->operation == CMD_INSERT &&
+					list_length(node->withCheckOptionLists) == 1 &&
+					mtstate->mt_nplans == 1) ||
+				   (mtstate->operation == CMD_UPDATE &&
+					list_length(node->withCheckOptionLists) ==
+														mtstate->mt_nplans));
+			wcoList = linitial(node->withCheckOptionLists);
+
+			mapped_wcoList = map_partition_varattnos(wcoList,
+													 firstVarno,
+													 partrel, firstResultRel,
+													 NULL);
+			foreach(ll, mapped_wcoList)
+			{
+				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+												   mtstate->mt_plans[0]);
+				wcoExprs = lappend(wcoExprs, wcoExpr);
+			}
+
+			leaf_part_rri->ri_WithCheckOptions = mapped_wcoList;
+			leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+		}
+
+		/*
+		 * Build the RETURNING projection if any for the partition.  Note that
+		 * we didn't build the returningList for each partition within the
+		 * planner, but simple translation of the varattnos will suffice.
+		 * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+		 * ExecInitModifyTable() would've initialized this.
+		 */
+		if (node && node->returningLists != NIL)
+		{
+			TupleTableSlot *slot;
+			ExprContext *econtext;
+			List	   *returningList;
+			List	   *rlist;
+			TupleDesc	tupDesc;
+
+			/* See the comment written above for WCO lists. */
+			Assert((mtstate->operation == CMD_INSERT &&
+					list_length(node->returningLists) == 1 &&
+					mtstate->mt_nplans == 1) ||
+				   (mtstate->operation == CMD_UPDATE &&
+					list_length(node->returningLists) ==
+														mtstate->mt_nplans));
+			returningList = linitial(node->returningLists);
+
+			/*
+			 * Initialize result tuple slot and assign its rowtype using the first
+			 * RETURNING list.  We assume the rest will look the same.
+			 */
+			tupDesc = ExecTypeFromTL(returningList, false);
+
+			/* Set up a slot for the output of the RETURNING projection(s) */
+			ExecInitResultTupleSlot(estate, &mtstate->ps);
+			ExecAssignResultType(&mtstate->ps, tupDesc);
+			slot = mtstate->ps.ps_ResultTupleSlot;
+
+			/* Need an econtext too */
+			if (mtstate->ps.ps_ExprContext == NULL)
+				ExecAssignExprContext(estate, &mtstate->ps);
+			econtext = mtstate->ps.ps_ExprContext;
+
+			rlist = map_partition_varattnos(returningList,
+											firstVarno,
+											partrel, firstResultRel, NULL);
+			leaf_part_rri->ri_projectReturning =
+				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+										RelationGetDescr(partrel));
+		}
+
+		/*
+		 * Note that the entries in this list appear in no predetermined
+		 * order as result of initializing partition result rels as and when
+		 * they're needed.
+		 */
+		estate->es_leaf_result_relations =
+									lappend(estate->es_leaf_result_relations,
+											leaf_part_rri);
+
+		/*
+		 * Open partition indices.  The user may have asked to check for
+		 * conflicts within this leaf partition and do "nothing" instead of
+		 * throwing an error.  Be prepared in that case by initializing the
+		 * index information needed by ExecInsert() to perform speculative
+		 * insertions.
+		 */
+		if (partrel->rd_rel->relhasindex &&
+			leaf_part_rri->ri_IndexRelationDescs == NULL)
+			ExecOpenIndices(leaf_part_rri,
+							mtstate->mt_onconflict != ONCONFLICT_NONE);
+	}
+
+	/*
+	 * Verify result relation is a valid target for INSERT.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+}
+
 /*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
@@ -373,40 +532,90 @@ ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute)
 }
 
 /*
- * TupConvMapForLeaf -- Get the tuple conversion map for a given leaf partition
- * index.
+ * ChildParentTupConvMap -- Return tuple conversion map to convert tuples of
+ *							'partrel' into those of 'rootrel'
+ *
+ * If the function was previously called for this partition, we would've
+ * either already created the map and stored the same at
+ * proute->child_parent_tupconv_maps[index] or found out that such a map
+ * is not needed and thus set proute->child_parent_map_not_required[index].
  */
-TupleConversionMap *
-TupConvMapForLeaf(PartitionTupleRouting *proute,
-				  ResultRelInfo *rootRelInfo, int leaf_index)
+static TupleConversionMap *
+ChildParentTupConvMap(Relation partrel, Relation rootrel,
+					  PartitionTupleRouting *proute, int index)
 {
-	ResultRelInfo **resultRelInfos = proute->partitions;
-	TupleConversionMap **map;
-	TupleDesc	tupdesc;
+	TupleConversionMap *map;
 
 	/* Don't call this if we're not supposed to be using this type of map. */
 	Assert(proute->child_parent_tupconv_maps != NULL);
 
 	/* If it's already known that we don't need a map, return NULL. */
-	if (proute->child_parent_map_not_required[leaf_index])
+	if (proute->child_parent_map_not_required[index])
 		return NULL;
 
 	/* If we've already got a map, return it. */
-	map = &proute->child_parent_tupconv_maps[leaf_index];
-	if (*map != NULL)
-		return *map;
+	map = proute->child_parent_tupconv_maps[index];
+	if (map != NULL)
+		return map;
 
 	/* No map yet; try to create one. */
-	tupdesc = RelationGetDescr(resultRelInfos[leaf_index]->ri_RelationDesc);
-	*map =
-		convert_tuples_by_name(tupdesc,
-							   RelationGetDescr(rootRelInfo->ri_RelationDesc),
-							   gettext_noop("could not convert row type"));
+	map = convert_tuples_by_name(RelationGetDescr(partrel),
+								 RelationGetDescr(rootrel),
+								 gettext_noop("could not convert row type"));
 
 	/* If it turns out no map is needed, remember for next time. */
-	proute->child_parent_map_not_required[leaf_index] = (*map == NULL);
+	proute->child_parent_map_not_required[index] = (map == NULL);
+
+	return map;
+}
+
+/*
+ * TupConvMapForLeaf -- Get the tuple conversion map for a given leaf partition
+ * index.
+ *
+ * Call this only if it's known that the partition at leaf_index has been
+ * initialized with ExecInitPartitionResultRelInfo().
+ */
+TupleConversionMap *
+TupConvMapForLeaf(ModifyTableState *mtstate, int leaf_index)
+{
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	ResultRelInfo **resultrels = proute->partitions,
+				  *rootRelInfo = (mtstate->rootResultRelInfo != NULL)
+									? mtstate->rootResultRelInfo
+									: mtstate->resultRelInfo;
+
+	Assert(resultrels[leaf_index] != NULL);
 
-	return *map;
+	return ChildParentTupConvMap(resultrels[leaf_index]->ri_RelationDesc,
+								 rootRelInfo->ri_RelationDesc,
+								 proute, leaf_index);
+}
+
+/*
+ * TupConvMapForSubplan -- Get the tuple conversion map for a partition given
+ * its subplan index.
+ *
+ * Call this if it's unclear whether the partition's ResultRelInfo has been
+ * initialized in mtstate->mt_partition_tuple_routing.
+ */
+TupleConversionMap *
+TupConvMapForSubplan(ModifyTableState *mtstate, int subplan_index)
+{
+	int		leaf_index;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	ResultRelInfo *resultrels = mtstate->resultRelInfo,
+				  *rootRelInfo = (mtstate->rootResultRelInfo != NULL)
+									? mtstate->rootResultRelInfo
+									: mtstate->resultRelInfo;
+
+	Assert(proute != NULL && proute->subplan_partition_offsets != NULL);
+	leaf_index = proute->subplan_partition_offsets[subplan_index];
+
+	Assert(subplan_index >= 0 && subplan_index < mtstate->mt_nplans);
+	return ChildParentTupConvMap(resultrels[subplan_index].ri_RelationDesc,
+								 rootRelInfo->ri_RelationDesc,
+								 proute, leaf_index);
 }
 
 /*
@@ -488,8 +697,11 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 			continue;
 		}
 
-		ExecCloseIndices(resultRelInfo);
-		heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		if (resultRelInfo)
+		{
+			ExecCloseIndices(resultRelInfo);
+			heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		}
 	}
 
 	/* Release the standalone partition tuple descriptors, if any */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bd88b41ff6..afd9fbc853 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -307,6 +307,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		Assert(resultRelInfo != NULL);
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -335,8 +336,7 @@ ExecInsert(ModifyTableState *mtstate,
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
 
 				mtstate->mt_transition_capture->tcs_map =
-					TupConvMapForLeaf(proute, saved_resultRelInfo,
-									  leaf_part_index);
+					TupConvMapForLeaf(mtstate, leaf_part_index);
 			}
 			else
 			{
@@ -351,8 +351,7 @@ ExecInsert(ModifyTableState *mtstate,
 		if (mtstate->mt_oc_transition_capture != NULL)
 		{
 			mtstate->mt_oc_transition_capture->tcs_map =
-				TupConvMapForLeaf(proute, saved_resultRelInfo,
-								  leaf_part_index);
+				TupConvMapForLeaf(mtstate, leaf_part_index);
 		}
 
 		/*
@@ -1801,26 +1800,10 @@ tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
 	 * array *only* if partition-indexed array is not required.
 	 */
 	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-	{
-		int			leaf_index;
-		PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+		return TupConvMapForSubplan(mtstate, whichplan);
 
-		/*
-		 * If subplan-indexed array is NULL, things should have been arranged
-		 * to convert the subplan index to partition index.
-		 */
-		Assert(proute && proute->subplan_partition_offsets != NULL);
-
-		leaf_index = proute->subplan_partition_offsets[whichplan];
-
-		return TupConvMapForLeaf(proute, getTargetResultRelInfo(mtstate),
-								 leaf_index);
-	}
-	else
-	{
-		Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-		return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-	}
+	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
+	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
 }
 
 /* ----------------------------------------------------------------
@@ -2094,14 +2077,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2224,20 +2203,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2284,77 +2251,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2385,44 +2287,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 87c4c3249e..d5881c312b 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -96,16 +96,18 @@ typedef struct PartitionTupleRouting
 	TupleConversionMap **child_parent_tupconv_maps;
 	bool	   *child_parent_map_not_required;
 	int		   *subplan_partition_offsets;
+	Oid		   *partition_oids;
 	TupleTableSlot *partition_tuple_slot;
 	TupleTableSlot *root_tuple_slot;
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
-extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
-				  ResultRelInfo *rootRelInfo, int leaf_index);
+extern TupleConversionMap *TupConvMapForLeaf(ModifyTableState *mtstate,
+				  int leaf_index);
+extern TupleConversionMap *TupConvMapForSubplan(ModifyTableState *mtstate,
+				  int subplan_index);
 extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 						  HeapTuple tuple,
 						  TupleTableSlot *new_slot,
-- 
2.11.0

#7Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#6)
3 attachment(s)
Re: non-bulk inserts and tuple routing

On 2018/01/24 17:25, Amit Langote wrote:

On 2018/01/20 7:07, Robert Haas wrote:

On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote wrote:

I rebased the patches, since they started conflicting with a recently
committed patch [1].

I think that my latest commit has managed to break this pretty thoroughly.

I rebased it. Here are the performance numbers again.

* Uses following hash-partitioned table:

create table t1 (a int, b int) partition by hash (a);
create table t1_x partition of t1 for values with (modulus M, remainder R)
...

* Non-bulk insert uses the following code (insert 100,000 rows one-by-one):

do $$
begin
for i in 1..100000 loop
insert into t1 values (i, i+1);
end loop;
end; $$;

Times in milliseconds:

#parts HEAD Patched
8 6148.313 4938.775
16 8882.420 6203.911
32 14251.072 8595.068
64 24465.691 13718.161
128 45099.435 23898.026
256 87307.332 44428.126

* Bulk-inserting 100,000 rows using COPY:

copy t1 from '/tmp/t1.csv' csv;

Times in milliseconds:

#parts HEAD Patched

8 466.170 446.865
16 445.341 444.990
32 443.544 487.713
64 460.579 435.412
128 469.953 422.403
256 463.592 431.118

Rebased again.

Thanks,
Amit

Attachments:

v4-0001-Teach-CopyFrom-to-use-ModifyTableState-for-tuple-.patchtext/plain; charset=UTF-8; name=v4-0001-Teach-CopyFrom-to-use-ModifyTableState-for-tuple-.patchDownload
From 250fe77e304a0f0841f2d7978405c2a32fdeb7a2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 10:43:45 +0900
Subject: [PATCH v4 1/3] Teach CopyFrom to use ModifyTableState for
 tuple-routing

This removes all fields of CopyStateData that were meant for
tuple routing and instead uses ModifyTableState that has all those
fields, including transition_tupconv_maps.  In COPY's case,
transition_tupconv_maps is only required if tuple routing is being
used, so it's safe.
---
 src/backend/commands/copy.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 04a24c6082..251676b321 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -166,9 +166,6 @@ typedef struct CopyStateData
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
 
-	/* Tuple-routing support info */
-	PartitionTupleRouting *partition_tuple_routing;
-
 	TransitionCaptureState *transition_capture;
 
 	/*
@@ -2285,6 +2282,7 @@ CopyFrom(CopyState cstate)
 	ResultRelInfo *resultRelInfo;
 	ResultRelInfo *saved_resultRelInfo = NULL;
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
+	ModifyTableState *mtstate = makeNode(ModifyTableState);
 	ExprContext *econtext;
 	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
@@ -2303,6 +2301,8 @@ CopyFrom(CopyState cstate)
 	Size		bufferedTuplesSize = 0;
 	int			firstBufferedLineNo = 0;
 
+	PartitionTupleRouting *proute = NULL;
+
 	Assert(cstate->rel);
 
 	/*
@@ -2468,10 +2468,15 @@ CopyFrom(CopyState cstate)
 	 */
 	if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
-		PartitionTupleRouting *proute;
+		ModifyTable *node = makeNode(ModifyTable);
 
-		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+		/* Just need make this field appear valid. */
+		node->nominalRelation = 1;
+		mtstate->ps.plan = (Plan *) node;
+		mtstate->ps.state = estate;
+		mtstate->resultRelInfo = resultRelInfo;
+		proute = mtstate->mt_partition_tuple_routing =
+			ExecSetupPartitionTupleRouting(mtstate, cstate->rel, 1, estate);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2496,7 +2501,7 @@ CopyFrom(CopyState cstate)
 	if ((resultRelInfo->ri_TrigDesc != NULL &&
 		 (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
 		  resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) ||
-		cstate->partition_tuple_routing != NULL ||
+		mtstate->mt_partition_tuple_routing != NULL ||
 		cstate->volatile_defexprs)
 	{
 		useHeapMultiInsert = false;
@@ -2571,10 +2576,9 @@ CopyFrom(CopyState cstate)
 		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
 
 		/* Determine the partition to heap_insert the tuple into */
-		if (cstate->partition_tuple_routing)
+		if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
 			int			leaf_part_index;
-			PartitionTupleRouting *proute = cstate->partition_tuple_routing;
 
 			/*
 			 * Away we go ... If we end up not finding a partition after all,
@@ -2806,8 +2810,8 @@ CopyFrom(CopyState cstate)
 	ExecCloseIndices(resultRelInfo);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
-	if (cstate->partition_tuple_routing)
-		ExecCleanupTupleRouting(cstate->partition_tuple_routing);
+	if (proute)
+		ExecCleanupTupleRouting(proute);
 
 	/* Close any trigger target relations */
 	ExecCleanUpTriggerState(estate);
-- 
2.11.0

v4-0002-ExecFindPartition-refactoring.patchtext/plain; charset=UTF-8; name=v4-0002-ExecFindPartition-refactoring.patchDownload
From 96f0b1fed760e8b921f2d1e5be72ad864f9dd3f6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 13:56:25 +0900
Subject: [PATCH v4 2/3] ExecFindPartition refactoring

---
 src/backend/commands/copy.c            |  5 +----
 src/backend/executor/execPartition.c   | 17 +++++++++--------
 src/backend/executor/nodeModifyTable.c |  5 +----
 src/include/executor/execPartition.h   |  5 +----
 4 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 251676b321..2096a52cea 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2588,10 +2588,7 @@ CopyFrom(CopyState cstate)
 			 * will get us the ResultRelInfo and TupleConversionMap for the
 			 * partition, respectively.
 			 */
-			leaf_part_index = ExecFindPartition(resultRelInfo,
-												proute->partition_dispatch_info,
-												slot,
-												estate);
+			leaf_part_index = ExecFindPartition(mtstate, slot);
 			Assert(leaf_part_index >= 0 &&
 				   leaf_part_index < proute->num_partitions);
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 106a96d910..947adf3032 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -227,11 +227,7 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 }
 
 /*
- * ExecFindPartition -- Find a leaf partition in the partition tree rooted
- * at parent, for the heap tuple contained in *slot
- *
- * estate must be non-NULL; we'll need it to compute any expressions in the
- * partition key(s)
+ * ExecFindPartition -- Find a leaf partition for tuple contained in slot
  *
  * If no leaf partition is found, this routine errors out with the appropriate
  * error message, else it returns the leaf partition sequence number
@@ -239,14 +235,19 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
  * the partition tree.
  */
 int
-ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
-				  TupleTableSlot *slot, EState *estate)
+ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot)
 {
+	EState	   *estate = mtstate->ps.state;
 	int			result;
 	Datum		values[PARTITION_MAX_KEYS];
 	bool		isnull[PARTITION_MAX_KEYS];
 	Relation	rel;
-	PartitionDispatch parent;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	PartitionDispatch  *pd = proute->partition_dispatch_info,
+						parent;
+	ResultRelInfo *resultRelInfo = (mtstate->rootResultRelInfo != NULL)
+										? mtstate->rootResultRelInfo
+										: mtstate->resultRelInfo;
 	ExprContext *ecxt = GetPerTupleExprContext(estate);
 	TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 828e1b0015..d2df9e94cf 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -297,10 +297,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 * get us the ResultRelInfo and TupleConversionMap for the partition,
 		 * respectively.
 		 */
-		leaf_part_index = ExecFindPartition(resultRelInfo,
-											proute->partition_dispatch_info,
-											slot,
-											estate);
+		leaf_part_index = ExecFindPartition(mtstate, slot);
 		Assert(leaf_part_index >= 0 &&
 			   leaf_part_index < proute->num_partitions);
 
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..7373f60ffb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -105,10 +105,6 @@ typedef struct PartitionTupleRouting
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 							   Relation rel, Index resultRTindex,
 							   EState *estate);
-extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
-				  PartitionDispatch *pd,
-				  TupleTableSlot *slot,
-				  EState *estate);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
@@ -117,5 +113,6 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 						  TupleTableSlot *new_slot,
 						  TupleTableSlot **p_my_slot);
 extern void ExecCleanupTupleRouting(PartitionTupleRouting *proute);
+extern int ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot);
 
 #endif							/* EXECPARTITION_H */
-- 
2.11.0

v4-0003-During-tuple-routing-initialize-per-partition-obj.patchtext/plain; charset=UTF-8; name=v4-0003-During-tuple-routing-initialize-per-partition-obj.patchDownload
From c821f1e30e3c33424fec52a7b1e1a7c28cbacba9 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v4 3/3] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |   6 +-
 src/backend/executor/execPartition.c   | 506 +++++++++++++++++++++++----------
 src/backend/executor/nodeModifyTable.c | 153 +---------
 src/include/executor/execPartition.h   |  10 +-
 4 files changed, 377 insertions(+), 298 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2096a52cea..1000bb4461 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2476,7 +2476,7 @@ CopyFrom(CopyState cstate)
 		mtstate->ps.state = estate;
 		mtstate->resultRelInfo = resultRelInfo;
 		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(mtstate, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2609,6 +2609,7 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			Assert(resultRelInfo != NULL);
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
@@ -2638,8 +2639,7 @@ CopyFrom(CopyState cstate)
 					 */
 					cstate->transition_capture->tcs_original_insert_tuple = NULL;
 					cstate->transition_capture->tcs_map =
-						TupConvMapForLeaf(proute, saved_resultRelInfo,
-										  leaf_part_index);
+						TupConvMapForLeaf(mtstate, leaf_part_index);
 				}
 				else
 				{
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 947adf3032..088f1fafd9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -23,6 +23,8 @@
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
 
+static void ExecInitPartitionResultRelInfo(ModifyTableState *mtstate,
+					int partidx);
 static PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel,
 								 int *num_parted, List **leaf_part_oids);
 static void get_partition_dispatch_recurse(Relation rel, Relation parent,
@@ -44,22 +46,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of various objects for all
+ * partitions here, the objects themselves are lazily allocated and
+ * initialized for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionResultRelInfo.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
-	TupleDesc	tupDesc = RelationGetDescr(rel);
+	PartitionTupleRouting *proute;
 	List	   *leaf_parts;
 	ListCell   *cell;
-	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
-	int			num_update_rri = 0,
-				update_rri_index = 0;
-	bool		is_update = false;
-	PartitionTupleRouting *proute;
+	int			leaf_index,
+				update_rri_index,
+				num_update_rri;
+	bool		is_update;
+	ResultRelInfo *update_rri;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -71,20 +74,24 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
 	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	/*
+	 * Actual ResultRelInfo's and TupleConversionMap's are allocated in
+	 * ExecInitResultRelInfo().
+	 */
+	proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+													sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc0(proute->num_partitions *
+											 sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		is_update = true;
 		update_rri = mtstate->resultRelInfo;
-		num_update_rri = list_length(node->plans);
+		num_update_rri = mtstate->mt_nplans;
 		proute->subplan_partition_offsets =
 			palloc(num_update_rri * sizeof(int));
 		proute->num_subplan_partition_offsets = num_update_rri;
@@ -95,15 +102,29 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
+
+	leaf_index = update_rri_index = 0;
+	foreach (cell, leaf_parts)
 	{
+		Oid		leaf_oid = lfirst_oid(cell);
+
+		proute->partition_oids[leaf_index] = leaf_oid;
+
 		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
+		 * The per-subplan resultrels and the resultrels of the leaf
+		 * partitions are both in the same canonical order.  So while going
+		 * through the leaf partition oids, we need to keep track of the
+		 * next per-subplan result rel to be looked for in the leaf
+		 * partition resultrels.
 		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
+		if (is_update && update_rri_index < num_update_rri &&
+			RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
+		{
+			proute->subplan_partition_offsets[update_rri_index] = leaf_index;
+			update_rri_index++;
+		}
+
+		leaf_index++;
 	}
 
 	/*
@@ -114,109 +135,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot();
 
-	i = 0;
-	foreach(cell, leaf_parts)
-	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
-		Oid			leaf_oid = lfirst_oid(cell);
-
-		if (is_update)
-		{
-			/*
-			 * If the leaf partition is already present in the per-subplan
-			 * result rels, we re-use that rather than initialize a new result
-			 * rel. The per-subplan resultrels and the resultrels of the leaf
-			 * partitions are both in the same canonical order. So while going
-			 * through the leaf partition oids, we need to keep track of the
-			 * next per-subplan result rel to be looked for in the leaf
-			 * partition resultrels.
-			 */
-			if (update_rri_index < num_update_rri &&
-				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
-			{
-				leaf_part_rri = &update_rri[update_rri_index];
-				partrel = leaf_part_rri->ri_RelationDesc;
-
-				/*
-				 * This is required in order to we convert the partition's
-				 * tuple to be compatible with the root partitioned table's
-				 * tuple descriptor.  When generating the per-subplan result
-				 * rels, this was not set.
-				 */
-				leaf_part_rri->ri_PartitionRoot = rel;
-
-				/* Remember the subplan offset for this ResultRelInfo */
-				proute->subplan_partition_offsets[update_rri_index] = i;
-
-				update_rri_index++;
-			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
-		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
-
-		estate->es_leaf_result_relations =
-			lappend(estate->es_leaf_result_relations, leaf_part_rri);
-
-		proute->partitions[i] = leaf_part_rri;
-		i++;
-	}
-
 	/*
 	 * For UPDATE, we should have found all the per-subplan resultrels in the
 	 * leaf partitions.
@@ -342,10 +260,251 @@ ExecFindPartition(ModifyTableState *mtstate, TupleTableSlot *slot)
 				 val_desc ? errdetail("Partition key of the failing row contains %s.", val_desc) : 0));
 	}
 
+	/* Initialize the partition result rel, if not done already. */
+	ExecInitPartitionResultRelInfo(mtstate, result);
 	ecxt->ecxt_scantuple = ecxt_scantuple_old;
 	return result;
 }
 
+static int
+leafpart_index_cmp(const void *arg1, const void *arg2)
+{
+	int		leafidx1 = *(const int *) arg1;
+	int		leafidx2 = *(const int *) arg2;
+
+	if (leafidx1 > leafidx2)
+		return 1;
+	else if (leafidx1 < leafidx2)
+		return -1;
+	return 0;
+}
+
+/*
+ * ExecInitPartitionResultRelInfo
+ *		Initialize ResultRelInfo for a partition if not done already
+ */
+static void
+ExecInitPartitionResultRelInfo(ModifyTableState *mtstate, int partidx)
+{
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	Relation	partrel,
+				rootrel;
+	ResultRelInfo *leaf_part_rri;
+	int			firstVarno;
+	Relation	firstResultRel;
+
+	/* Nothing to do if already set.*/
+	if (proute->partitions[partidx])
+		return;
+
+	leaf_part_rri = NULL;
+	rootrel = (mtstate->rootResultRelInfo != NULL)
+						? mtstate->rootResultRelInfo->ri_RelationDesc
+						: mtstate->resultRelInfo->ri_RelationDesc;
+
+	/*
+	 * If we are doing tuple routing for update, try to reuse the
+	 * per-subplan resultrel for this partition that ExecInitModifyTable()
+	 * might already have created.
+	 */
+	if (mtstate && mtstate->operation == CMD_UPDATE)
+	{
+		ResultRelInfo   *update_rri;
+		int   *partidx_entry;
+
+		update_rri = mtstate->resultRelInfo;
+
+		/*
+		 * If the partition got a subplan, we'd be able to find its index
+		 * in proute->subplan_partition_offsets.
+		 */
+		partidx_entry = (int *) bsearch(&partidx,
+										proute->subplan_partition_offsets,
+										mtstate->mt_nplans, sizeof(int),
+										leafpart_index_cmp);
+		if (partidx_entry)
+		{
+			int		update_rri_index =
+							partidx_entry - proute->subplan_partition_offsets;
+
+			Assert (update_rri_index < mtstate->mt_nplans);
+			leaf_part_rri = &update_rri[update_rri_index];
+			partrel = leaf_part_rri->ri_RelationDesc;
+
+			/*
+			 * This is required in order to we convert the partition's
+			 * tuple to be compatible with the root partitioned table's
+			 * tuple descriptor.  When generating the per-subplan result
+			 * rels, this was not set.
+			 */
+			leaf_part_rri->ri_PartitionRoot = rootrel;
+		}
+	}
+
+	/*
+	 * Create new result rel, either if we are *inserting* the new tuple, or
+	 * if we didn't find the result rel above for the update tuple routing
+	 * case.
+	 */
+	if (leaf_part_rri == NULL)
+	{
+		EState	   *estate = mtstate->ps.state;
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+		Index		resultRTindex = node->nominalRelation;
+
+		leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+
+		/*
+		 * We locked all the partitions in ExecSetupPartitionTupleRouting
+		 * including the leaf partitions.
+		 */
+		partrel = heap_open(proute->partition_oids[partidx], NoLock);
+		InitResultRelInfo(leaf_part_rri,
+						  partrel,
+						  resultRTindex,
+						  rootrel,
+						  estate->es_instrument);
+
+		/*
+		 * These are required as reference objects for mapping partition
+		 * attno's in expressions in WithCheckOptions and RETURNING.
+		 */
+		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/*
+		 * Build WITH CHECK OPTION constraints for this partition rel.  Note
+		 * that we didn't build the withCheckOptionList for each partition
+		 * within the planner, but simple translation of the varattnos will
+		 * suffice.  This only occurs for the INSERT case or in the case of
+		 * UPDATE for which we didn't find a result rel above to reuse.
+		 */
+		if (node && node->withCheckOptionLists != NIL)
+		{
+			List	   *wcoList;
+			List	   *mapped_wcoList;
+			List	   *wcoExprs = NIL;
+			ListCell   *ll;
+
+			/*
+			 * In the case of INSERT on partitioned tables, there is only one
+			 * plan.  Likewise, there is only one WCO list, not one per
+			 * partition.  For UPDATE, there would be as many WCO lists as
+			 * there are plans, but we use the first one as reference.  Note
+			 * that if there are SubPlans in there, they all end up attached
+			 * to the one parent Plan node.
+			 */
+			Assert((mtstate->operation == CMD_INSERT &&
+					list_length(node->withCheckOptionLists) == 1 &&
+					mtstate->mt_nplans == 1) ||
+				   (mtstate->operation == CMD_UPDATE &&
+					list_length(node->withCheckOptionLists) ==
+														mtstate->mt_nplans));
+			wcoList = linitial(node->withCheckOptionLists);
+
+			mapped_wcoList = map_partition_varattnos(wcoList,
+													 firstVarno,
+													 partrel, firstResultRel,
+													 NULL);
+			foreach(ll, mapped_wcoList)
+			{
+				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+												   mtstate->mt_plans[0]);
+				wcoExprs = lappend(wcoExprs, wcoExpr);
+			}
+
+			leaf_part_rri->ri_WithCheckOptions = mapped_wcoList;
+			leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+		}
+
+		/*
+		 * Build the RETURNING projection if any for the partition.  Note that
+		 * we didn't build the returningList for each partition within the
+		 * planner, but simple translation of the varattnos will suffice.
+		 * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+		 * ExecInitModifyTable() would've initialized this.
+		 */
+		if (node && node->returningLists != NIL)
+		{
+			TupleTableSlot *slot;
+			ExprContext *econtext;
+			List	   *returningList;
+			List	   *rlist;
+			TupleDesc	tupDesc;
+
+			/* See the comment written above for WCO lists. */
+			Assert((mtstate->operation == CMD_INSERT &&
+					list_length(node->returningLists) == 1 &&
+					mtstate->mt_nplans == 1) ||
+				   (mtstate->operation == CMD_UPDATE &&
+					list_length(node->returningLists) ==
+														mtstate->mt_nplans));
+			returningList = linitial(node->returningLists);
+
+			/*
+			 * Initialize result tuple slot and assign its rowtype using the first
+			 * RETURNING list.  We assume the rest will look the same.
+			 */
+			tupDesc = ExecTypeFromTL(returningList, false);
+
+			/* Set up a slot for the output of the RETURNING projection(s) */
+			ExecInitResultTupleSlot(estate, &mtstate->ps);
+			ExecAssignResultType(&mtstate->ps, tupDesc);
+			slot = mtstate->ps.ps_ResultTupleSlot;
+
+			/* Need an econtext too */
+			if (mtstate->ps.ps_ExprContext == NULL)
+				ExecAssignExprContext(estate, &mtstate->ps);
+			econtext = mtstate->ps.ps_ExprContext;
+
+			rlist = map_partition_varattnos(returningList,
+											firstVarno,
+											partrel, firstResultRel, NULL);
+			leaf_part_rri->ri_projectReturning =
+				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+										RelationGetDescr(partrel));
+		}
+
+		/*
+		 * Note that the entries in this list appear in no predetermined
+		 * order as result of initializing partition result rels as and when
+		 * they're needed.
+		 */
+		estate->es_leaf_result_relations =
+									lappend(estate->es_leaf_result_relations,
+											leaf_part_rri);
+
+		/*
+		 * Open partition indices.  The user may have asked to check for
+		 * conflicts within this leaf partition and do "nothing" instead of
+		 * throwing an error.  Be prepared in that case by initializing the
+		 * index information needed by ExecInsert() to perform speculative
+		 * insertions.
+		 */
+		if (partrel->rd_rel->relhasindex &&
+			leaf_part_rri->ri_IndexRelationDescs == NULL)
+			ExecOpenIndices(leaf_part_rri,
+							mtstate->mt_onconflict != ONCONFLICT_NONE);
+	}
+
+	/*
+	 * Verify result relation is a valid target for INSERT.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+}
+
 /*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
@@ -374,40 +533,92 @@ ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute)
 }
 
 /*
- * TupConvMapForLeaf -- Get the tuple conversion map for a given leaf partition
- * index.
+ * ChildParentTupConvMap -- Return tuple conversion map to convert tuples of
+ *							'partrel' into those of 'rootrel'
+ *
+ * If the function was previously called for this partition, we would've
+ * either already created the map and stored the same at
+ * proute->child_parent_tupconv_maps[index] or found out that such a map
+ * is not needed and thus set proute->child_parent_map_not_required[index].
  */
-TupleConversionMap *
-TupConvMapForLeaf(PartitionTupleRouting *proute,
-				  ResultRelInfo *rootRelInfo, int leaf_index)
+static TupleConversionMap *
+ChildParentTupConvMap(Relation partrel, Relation rootrel,
+					  PartitionTupleRouting *proute, int index)
 {
-	ResultRelInfo **resultRelInfos = proute->partitions;
-	TupleConversionMap **map;
-	TupleDesc	tupdesc;
+	TupleConversionMap *map;
 
 	/* Don't call this if we're not supposed to be using this type of map. */
 	Assert(proute->child_parent_tupconv_maps != NULL);
 
 	/* If it's already known that we don't need a map, return NULL. */
-	if (proute->child_parent_map_not_required[leaf_index])
+	if (proute->child_parent_map_not_required[index])
 		return NULL;
 
 	/* If we've already got a map, return it. */
-	map = &proute->child_parent_tupconv_maps[leaf_index];
-	if (*map != NULL)
-		return *map;
+	map = proute->child_parent_tupconv_maps[index];
+	if (map != NULL)
+		return map;
 
 	/* No map yet; try to create one. */
-	tupdesc = RelationGetDescr(resultRelInfos[leaf_index]->ri_RelationDesc);
-	*map =
-		convert_tuples_by_name(tupdesc,
-							   RelationGetDescr(rootRelInfo->ri_RelationDesc),
-							   gettext_noop("could not convert row type"));
+	map = convert_tuples_by_name(RelationGetDescr(partrel),
+								 RelationGetDescr(rootrel),
+								 gettext_noop("could not convert row type"));
 
 	/* If it turns out no map is needed, remember for next time. */
-	proute->child_parent_map_not_required[leaf_index] = (*map == NULL);
+	proute->child_parent_map_not_required[index] = (map == NULL);
+
+	return map;
+}
+
+/*
+ * TupConvMapForLeaf -- Get the tuple conversion map for a given leaf partition
+ * index.
+ *
+ * Call this only if it's known that the partition at leaf_index has been
+ * initialized with ExecInitPartitionResultRelInfo().
+ */
+TupleConversionMap *
+TupConvMapForLeaf(ModifyTableState *mtstate, int leaf_index)
+{
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	ResultRelInfo **resultrels = proute->partitions,
+				  *rootRelInfo = (mtstate->rootResultRelInfo != NULL)
+									? mtstate->rootResultRelInfo
+									: mtstate->resultRelInfo;
+
+	Assert(resultrels[leaf_index] != NULL);
 
-	return *map;
+	return ChildParentTupConvMap(resultrels[leaf_index]->ri_RelationDesc,
+								 rootRelInfo->ri_RelationDesc,
+								 proute, leaf_index);
+}
+
+/*
+ * TupConvMapForSubplan -- Get the tuple conversion map for a partition given
+ * its subplan index.
+ *
+ * Call this if it's unclear whether the partition's ResultRelInfo has been
+ * initialized in mtstate->mt_partition_tuple_routing.
+ */
+TupleConversionMap *
+TupConvMapForSubplan(ModifyTableState *mtstate, int subplan_index)
+{
+	int		leaf_index;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	ResultRelInfo *resultrels = mtstate->resultRelInfo,
+				  *rootRelInfo = (mtstate->rootResultRelInfo != NULL)
+									? mtstate->rootResultRelInfo
+									: mtstate->resultRelInfo;
+
+	Assert(proute != NULL &&
+		   proute->subplan_partition_offsets != NULL &&
+		   subplan_index < proute->num_subplan_partition_offsets);
+	leaf_index = proute->subplan_partition_offsets[subplan_index];
+
+	Assert(subplan_index >= 0 && subplan_index < mtstate->mt_nplans);
+	return ChildParentTupConvMap(resultrels[subplan_index].ri_RelationDesc,
+								 rootRelInfo->ri_RelationDesc,
+								 proute, leaf_index);
 }
 
 /*
@@ -490,8 +701,11 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 			continue;
 		}
 
-		ExecCloseIndices(resultRelInfo);
-		heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		if (resultRelInfo)
+		{
+			ExecCloseIndices(resultRelInfo);
+			heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		}
 	}
 
 	/* Release the standalone partition tuple descriptors, if any */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d2df9e94cf..afd9fbc853 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -307,6 +307,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		Assert(resultRelInfo != NULL);
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -335,8 +336,7 @@ ExecInsert(ModifyTableState *mtstate,
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
 
 				mtstate->mt_transition_capture->tcs_map =
-					TupConvMapForLeaf(proute, saved_resultRelInfo,
-									  leaf_part_index);
+					TupConvMapForLeaf(mtstate, leaf_part_index);
 			}
 			else
 			{
@@ -351,8 +351,7 @@ ExecInsert(ModifyTableState *mtstate,
 		if (mtstate->mt_oc_transition_capture != NULL)
 		{
 			mtstate->mt_oc_transition_capture->tcs_map =
-				TupConvMapForLeaf(proute, saved_resultRelInfo,
-								  leaf_part_index);
+				TupConvMapForLeaf(mtstate, leaf_part_index);
 		}
 
 		/*
@@ -1801,27 +1800,10 @@ tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
 	 * array *only* if partition-indexed array is not required.
 	 */
 	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-	{
-		int			leaf_index;
-		PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+		return TupConvMapForSubplan(mtstate, whichplan);
 
-		/*
-		 * If subplan-indexed array is NULL, things should have been arranged
-		 * to convert the subplan index to partition index.
-		 */
-		Assert(proute && proute->subplan_partition_offsets != NULL &&
-			   whichplan < proute->num_subplan_partition_offsets);
-
-		leaf_index = proute->subplan_partition_offsets[whichplan];
-
-		return TupConvMapForLeaf(proute, getTargetResultRelInfo(mtstate),
-								 leaf_index);
-	}
-	else
-	{
-		Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-		return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-	}
+	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
+	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
 }
 
 /* ----------------------------------------------------------------
@@ -2095,14 +2077,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2225,20 +2203,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2285,77 +2251,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2386,44 +2287,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 7373f60ffb..b1c08f6581 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,11 +104,12 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
-extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
-				  ResultRelInfo *rootRelInfo, int leaf_index);
+extern TupleConversionMap *TupConvMapForLeaf(ModifyTableState *mtstate,
+				  int leaf_index);
+extern TupleConversionMap *TupConvMapForSubplan(ModifyTableState *mtstate,
+				  int subplan_index);
 extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 						  HeapTuple tuple,
 						  TupleTableSlot *new_slot,
-- 
2.11.0

#8Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#7)
Re: non-bulk inserts and tuple routing

(2018/01/25 11:11), Amit Langote wrote:

Rebased again.

Thanks for the rebased patch!

The patches apply cleanly and compile successfully, but make check fails
in an assert-enabled build.

Best regards,
Etsuro Fujita

#9Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#8)
Re: non-bulk inserts and tuple routing

Fujita-san,

Thanks for the review.

On 2018/01/25 18:30, Etsuro Fujita wrote:

(2018/01/25 11:11), Amit Langote wrote:

Rebased again.

Thanks for the rebased patch!

The patches apply cleanly and compile successfully, but make check fails
in an assert-enabled build.

Hmm, I can't seem to reproduce the failure with v4 patches I posted
earlier today.

=======================
All 186 tests passed.
=======================

Can you please post the errors you're seeing?

Thanks,
Amit

#10Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#9)
Re: non-bulk inserts and tuple routing

(2018/01/25 18:52), Amit Langote wrote:

On 2018/01/25 18:30, Etsuro Fujita wrote:

The patches apply cleanly and compile successfully, but make check fails
in an assert-enabled build.

Hmm, I can't seem to reproduce the failure with v4 patches I posted
earlier today.

=======================
All 186 tests passed.
=======================

Can you please post the errors you're seeing?

OK, will do.

Best regards,
Etsuro Fujita

#11Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#9)
Re: non-bulk inserts and tuple routing

(2018/01/25 18:52), Amit Langote wrote:

On 2018/01/25 18:30, Etsuro Fujita wrote:

The patches apply cleanly and compile successfully, but make check fails
in an assert-enabled build.

Hmm, I can't seem to reproduce the failure with v4 patches I posted
earlier today.

Can you please post the errors you're seeing?

A quick debug showed that the failure was due to a segmentation fault
caused by this change to ExecSetupPartitionTupleRouting (in patch
v4-0003-During-tuple-routing-initialize-per-partition-obj):

- bool is_update = false;

+ bool is_update;

I modified that patch to initialize the is_update to false as before.
With the modified version, make check passed successfully.

I'll review the patch in more detail!

Best regards,
Etsuro Fujita

#12Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#11)
Re: non-bulk inserts and tuple routing

Fujita-san,

On 2018/01/30 18:22, Etsuro Fujita wrote:

(2018/01/25 18:52), Amit Langote wrote:

On 2018/01/25 18:30, Etsuro Fujita wrote:

The patches apply cleanly and compile successfully, but make check fails
in an assert-enabled build.

Hmm, I can't seem to reproduce the failure with v4 patches I posted
earlier today.

Can you please post the errors you're seeing?

A quick debug showed that the failure was due to a segmentation fault
caused by this change to ExecSetupPartitionTupleRouting (in patch
v4-0003-During-tuple-routing-initialize-per-partition-obj):

-    bool        is_update = false;

+    bool        is_update;

I modified that patch to initialize the is_update to false as before. With
the modified version, make check passed successfully.

Oops, my bad.

I'll review the patch in more detail!

Thank you. Will wait for your comments before sending a new version then.

Regards,
Amit

#13Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#12)
Re: non-bulk inserts and tuple routing

(2018/01/30 18:39), Amit Langote wrote:

Will wait for your comments before sending a new version then.

Ok, I'll post my comments as soon as possible.

Best regards,
Etsuro Fujita

#14Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#13)
Re: non-bulk inserts and tuple routing

(2018/01/30 18:52), Etsuro Fujita wrote:

(2018/01/30 18:39), Amit Langote wrote:

Will wait for your comments before sending a new version then.

Ok, I'll post my comments as soon as possible.

* ExecInitPartitionResultRelInfo is called from ExecFindPartition, but
we could call that another way; in ExecInsert/CopyFrom we call that
after ExecFindPartition if the partition chosen by ExecFindPartition has
not been initialized yet. Maybe either would be OK, but I like that
because I think that would not only better divide that labor but better
fit into the existing code in ExecInsert/CopyFrom IMO.

* In ExecInitPartitionResultRelInfo:
+       /*
+        * Note that the entries in this list appear in no predetermined
+        * order as result of initializing partition result rels as and when
+        * they're needed.
+        */
+       estate->es_leaf_result_relations =
+ 
lappend(estate->es_leaf_result_relations,
+                                           leaf_part_rri);

Is it OK to put this within the "if (leaf_part_rri == NULL)" block?

* In the same function:
+   /*
+    * Verify result relation is a valid target for INSERT.
+    */
+   CheckValidResultRel(leaf_part_rri, CMD_INSERT);

I think it would be better to leave the previous comments as-is here:

/*
* Verify result relation is a valid target for an INSERT. An
UPDATE
* of a partition-key becomes a DELETE+INSERT operation, so
this check
* is still required when the operation is CMD_UPDATE.
*/

* ExecInitPartitionResultRelInfo does the work other than the
initialization of ResultRelInfo for the chosen partition (eg, create a
tuple conversion map to convert a tuple routed to the partition from the
parent's type to the partition's). So I'd propose to rename that
function to eg, ExecInitPartition.

* CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting
and ExecFindPartition with a mostly-dummy ModifyTableState node. I'm
not sure that is a good idea. My concern about that is that might be
something like a headache in future development.

* The patch 0001 and 0002 are pretty small but can't be reviewed without
the patch 0003. The total size of the three patches aren't that large,
so I think it would be better to put those patches together into a
single patch.

That's all for now. I'll continue to review the patches!

Best regards,
Etsuro Fujita

#15Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#14)
2 attachment(s)
Re: non-bulk inserts and tuple routing

Fujita-san,

Thank you for the review.

On 2018/02/02 19:56, Etsuro Fujita wrote:

(2018/01/30 18:52), Etsuro Fujita wrote:

(2018/01/30 18:39), Amit Langote wrote:

Will wait for your comments before sending a new version then.

Ok, I'll post my comments as soon as possible.

* ExecInitPartitionResultRelInfo is called from ExecFindPartition, but we
could call that another way; in ExecInsert/CopyFrom we call that after
ExecFindPartition if the partition chosen by ExecFindPartition has not
been initialized yet.  Maybe either would be OK, but I like that because I
think that would not only better divide that labor but better fit into the
existing code in ExecInsert/CopyFrom IMO.

I see no problem with that, so done that way.

* In ExecInitPartitionResultRelInfo:
+       /*
+        * Note that the entries in this list appear in no predetermined
+        * order as result of initializing partition result rels as and when
+        * they're needed.
+        */
+       estate->es_leaf_result_relations =
+ lappend(estate->es_leaf_result_relations,
+                                           leaf_part_rri);

Is it OK to put this within the "if (leaf_part_rri == NULL)" block?

Good catch. I moved it outside the block. I was under the impression
that leaf result relations that were reused from the
mtstate->resultRelInfo arrary would have already been added to the list,
but it seems they are not.

* In the same function:
+   /*
+    * Verify result relation is a valid target for INSERT.
+    */
+   CheckValidResultRel(leaf_part_rri, CMD_INSERT);

I think it would be better to leave the previous comments as-is here:

        /*
         * Verify result relation is a valid target for an INSERT.  An UPDATE
         * of a partition-key becomes a DELETE+INSERT operation, so this
check
         * is still required when the operation is CMD_UPDATE.
         */

Oops, my bad. Didn't notice that I had ended up removing the part about
UPDATE.

* ExecInitPartitionResultRelInfo does the work other than the
initialization of ResultRelInfo for the chosen partition (eg, create a
tuple conversion map to convert a tuple routed to the partition from the
parent's type to the partition's).  So I'd propose to rename that function
to eg, ExecInitPartition.

I went with ExevInitPartitionInfo.

* CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting and
ExecFindPartition with a mostly-dummy ModifyTableState node.  I'm not sure
that is a good idea.  My concern about that is that might be something
like a headache in future development.

OK, I removed those changes.

* The patch 0001 and 0002 are pretty small but can't be reviewed without
the patch 0003.  The total size of the three patches aren't that large, so
I think it would be better to put those patches together into a single patch.

As I said above, I got rid of 0001. Then, I merged the
ExecFindPartition() refactoring patch 0002 into 0003.

The code in tupconv_map_for_subplan() currently assumes that it can rely
on all leaf partitions having been initialized. Since we're breaking that
assumption with this proposal, that needed to be changed. So the patch
contained some refactoring to make it not rely on that assumption. I
carved that out into a separate patch which can be applied and tested
before the main patch.

That's all for now.  I'll continue to review the patches!

Here is the updated version that contains two patches as described above.

Thanks,
Amit

Attachments:

v24-0001-Refactor-partition-tuple-conversion-maps-handlin.patchtext/plain; charset=UTF-8; name=v24-0001-Refactor-partition-tuple-conversion-maps-handlin.patchDownload
From 03ec63385251f31bdf95006258617d10eda94709 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 5 Feb 2018 13:38:05 +0900
Subject: [PATCH v24 1/2] Refactor partition tuple conversion maps handling
 code

tupconv_map_for_subplan() currently assumes that it gets to use
the Relation pointer for *all* leaf partitions.  That is, both those
that exist in mtstate->resultRelInfo array and those that don't and
hence would be initialized by ExecSetupPartitionTupleRouting().
However, an upcoming patch will change ExecSetupPartitionTupleRouting
such that leaf partitions' ResultRelInfo are no longer initialized
there.  So make it stop relying on the latter.
---
 src/backend/executor/execPartition.c   | 87 ++++++++++++++++++++++++++--------
 src/backend/executor/nodeModifyTable.c | 23 ++-------
 src/include/executor/execPartition.h   |  2 +
 3 files changed, 73 insertions(+), 39 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 106a96d910..f2a920f4c3 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -373,40 +373,89 @@ ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute)
 }
 
 /*
- * TupConvMapForLeaf -- Get the tuple conversion map for a given leaf partition
- * index.
+ * ChildParentTupConvMap -- Return tuple conversion map to convert tuples of
+ *							'partrel' into those of 'rootrel'
+ *
+ * If the function was previously called for this partition, we would've
+ * either already created the map and stored the same at
+ * proute->child_parent_tupconv_maps[index] or found out that such a map
+ * is not needed and thus set proute->child_parent_map_not_required[index].
  */
-TupleConversionMap *
-TupConvMapForLeaf(PartitionTupleRouting *proute,
-				  ResultRelInfo *rootRelInfo, int leaf_index)
+static TupleConversionMap *
+ChildParentTupConvMap(Relation partrel, Relation rootrel,
+					  PartitionTupleRouting *proute, int index)
 {
-	ResultRelInfo **resultRelInfos = proute->partitions;
-	TupleConversionMap **map;
-	TupleDesc	tupdesc;
+	TupleConversionMap *map;
 
 	/* Don't call this if we're not supposed to be using this type of map. */
 	Assert(proute->child_parent_tupconv_maps != NULL);
 
 	/* If it's already known that we don't need a map, return NULL. */
-	if (proute->child_parent_map_not_required[leaf_index])
+	if (proute->child_parent_map_not_required[index])
 		return NULL;
 
 	/* If we've already got a map, return it. */
-	map = &proute->child_parent_tupconv_maps[leaf_index];
-	if (*map != NULL)
-		return *map;
+	map = proute->child_parent_tupconv_maps[index];
+	if (map != NULL)
+		return map;
 
 	/* No map yet; try to create one. */
-	tupdesc = RelationGetDescr(resultRelInfos[leaf_index]->ri_RelationDesc);
-	*map =
-		convert_tuples_by_name(tupdesc,
-							   RelationGetDescr(rootRelInfo->ri_RelationDesc),
-							   gettext_noop("could not convert row type"));
+	map = convert_tuples_by_name(RelationGetDescr(partrel),
+								 RelationGetDescr(rootrel),
+								 gettext_noop("could not convert row type"));
 
 	/* If it turns out no map is needed, remember for next time. */
-	proute->child_parent_map_not_required[leaf_index] = (*map == NULL);
+	proute->child_parent_map_not_required[index] = (map == NULL);
+
+	return map;
+}
+
+/*
+ * TupConvMapForLeaf -- Get the tuple conversion map for a given leaf partition
+ * index.
+ *
+ * Call this only if it's known that the partition at leaf_index has been
+ * initialized.
+ */
+TupleConversionMap *
+TupConvMapForLeaf(PartitionTupleRouting *proute,
+				  ResultRelInfo *rootRelInfo, int leaf_index)
+{
+	ResultRelInfo **resultrels = proute->partitions;
+
+	Assert(resultrels[leaf_index] != NULL);
 
-	return *map;
+	return ChildParentTupConvMap(resultrels[leaf_index]->ri_RelationDesc,
+								 rootRelInfo->ri_RelationDesc, proute,
+								 leaf_index);
+}
+
+/*
+ * TupConvMapForSubplan -- Get the tuple conversion map for a partition given
+ * its subplan index.
+ *
+ * Call this if it's unclear whether the partition's ResultRelInfo has been
+ * initialized in mtstate->mt_partition_tuple_routing.
+ */
+TupleConversionMap *
+TupConvMapForSubplan(ModifyTableState *mtstate, int subplan_index)
+{
+	int		leaf_index;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	ResultRelInfo *resultrels = mtstate->resultRelInfo,
+				  *rootRelInfo = (mtstate->rootResultRelInfo != NULL)
+									? mtstate->rootResultRelInfo
+									: mtstate->resultRelInfo;
+
+	Assert(proute != NULL &&
+		   proute->subplan_partition_offsets != NULL &&
+		   subplan_index < proute->num_subplan_partition_offsets);
+	leaf_index = proute->subplan_partition_offsets[subplan_index];
+
+	Assert(subplan_index >= 0 && subplan_index < mtstate->mt_nplans);
+	return ChildParentTupConvMap(resultrels[subplan_index].ri_RelationDesc,
+								 rootRelInfo->ri_RelationDesc,
+								 proute, leaf_index);
 }
 
 /*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a8ecbd830..d054da5330 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1804,27 +1804,10 @@ tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
 	 * array *only* if partition-indexed array is not required.
 	 */
 	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-	{
-		int			leaf_index;
-		PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-
-		/*
-		 * If subplan-indexed array is NULL, things should have been arranged
-		 * to convert the subplan index to partition index.
-		 */
-		Assert(proute && proute->subplan_partition_offsets != NULL &&
-			   whichplan < proute->num_subplan_partition_offsets);
+		return TupConvMapForSubplan(mtstate, whichplan);
 
-		leaf_index = proute->subplan_partition_offsets[whichplan];
-
-		return TupConvMapForLeaf(proute, getTargetResultRelInfo(mtstate),
-								 leaf_index);
-	}
-	else
-	{
-		Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-		return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-	}
+	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
+	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..a75a37060a 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -112,6 +112,8 @@ extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
+extern TupleConversionMap *TupConvMapForSubplan(ModifyTableState *mtstate,
+				int subplan_index);
 extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 						  HeapTuple tuple,
 						  TupleTableSlot *new_slot,
-- 
2.11.0

v24-0002-During-tuple-routing-initialize-per-partition-ob.patchtext/plain; charset=UTF-8; name=v24-0002-During-tuple-routing-initialize-per-partition-ob.patchDownload
From 316c54eb48d9973a2859a31a993d065efa791299 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v24 2/2] During tuple-routing, initialize per-partition
 objects lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |   7 +-
 src/backend/executor/execPartition.c   | 409 +++++++++++++++++++++++----------
 src/backend/executor/nodeModifyTable.c | 128 +----------
 src/include/executor/execPartition.h   |   8 +-
 4 files changed, 302 insertions(+), 250 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df9af..d69d88c8a8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2590,6 +2590,10 @@ CopyFrom(CopyState cstate)
 			Assert(leaf_part_index >= 0 &&
 				   leaf_part_index < proute->num_partitions);
 
+			/* Initialize partition info, if not done already. */
+			ExecInitPartitionInfo(NULL, resultRelInfo, proute, estate,
+								  leaf_part_index);
+
 			/*
 			 * If this tuple is mapped to a partition that is not same as the
 			 * previous one, we'd better make the bulk insert mechanism gets a
@@ -2607,6 +2611,7 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			Assert(resultRelInfo != NULL);
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f2a920f4c3..6386dea5fb 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,22 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of various objects for all
+ * partitions here, the objects themselves are lazily allocated and
+ * initialized for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
-	TupleDesc	tupDesc = RelationGetDescr(rel);
+	PartitionTupleRouting *proute;
 	List	   *leaf_parts;
 	ListCell   *cell;
-	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
-	int			num_update_rri = 0,
-				update_rri_index = 0;
+	int			leaf_index,
+				update_rri_index,
+				num_update_rri;
 	bool		is_update = false;
-	PartitionTupleRouting *proute;
+	ResultRelInfo *update_rri;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -71,20 +72,24 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
 	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	/*
+	 * Actual ResultRelInfo's and TupleConversionMap's are allocated in
+	 * ExecInitResultRelInfo().
+	 */
+	proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+													sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc0(proute->num_partitions *
+											 sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		is_update = true;
 		update_rri = mtstate->resultRelInfo;
-		num_update_rri = list_length(node->plans);
+		num_update_rri = mtstate->mt_nplans;
 		proute->subplan_partition_offsets =
 			palloc(num_update_rri * sizeof(int));
 		proute->num_subplan_partition_offsets = num_update_rri;
@@ -95,15 +100,29 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
+
+	leaf_index = update_rri_index = 0;
+	foreach (cell, leaf_parts)
 	{
+		Oid		leaf_oid = lfirst_oid(cell);
+
+		proute->partition_oids[leaf_index] = leaf_oid;
+
 		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
+		 * The per-subplan resultrels and the resultrels of the leaf
+		 * partitions are both in the same canonical order.  So while going
+		 * through the leaf partition oids, we need to keep track of the
+		 * next per-subplan result rel to be looked for in the leaf
+		 * partition resultrels.
 		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
+		if (is_update && update_rri_index < num_update_rri &&
+			RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
+		{
+			proute->subplan_partition_offsets[update_rri_index] = leaf_index;
+			update_rri_index++;
+		}
+
+		leaf_index++;
 	}
 
 	/*
@@ -114,109 +133,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot();
 
-	i = 0;
-	foreach(cell, leaf_parts)
-	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
-		Oid			leaf_oid = lfirst_oid(cell);
-
-		if (is_update)
-		{
-			/*
-			 * If the leaf partition is already present in the per-subplan
-			 * result rels, we re-use that rather than initialize a new result
-			 * rel. The per-subplan resultrels and the resultrels of the leaf
-			 * partitions are both in the same canonical order. So while going
-			 * through the leaf partition oids, we need to keep track of the
-			 * next per-subplan result rel to be looked for in the leaf
-			 * partition resultrels.
-			 */
-			if (update_rri_index < num_update_rri &&
-				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
-			{
-				leaf_part_rri = &update_rri[update_rri_index];
-				partrel = leaf_part_rri->ri_RelationDesc;
-
-				/*
-				 * This is required in order to we convert the partition's
-				 * tuple to be compatible with the root partitioned table's
-				 * tuple descriptor.  When generating the per-subplan result
-				 * rels, this was not set.
-				 */
-				leaf_part_rri->ri_PartitionRoot = rel;
-
-				/* Remember the subplan offset for this ResultRelInfo */
-				proute->subplan_partition_offsets[update_rri_index] = i;
-
-				update_rri_index++;
-			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
-		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
-
-		estate->es_leaf_result_relations =
-			lappend(estate->es_leaf_result_relations, leaf_part_rri);
-
-		proute->partitions[i] = leaf_part_rri;
-		i++;
-	}
-
 	/*
 	 * For UPDATE, we should have found all the per-subplan resultrels in the
 	 * leaf partitions.
@@ -345,6 +261,244 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 	return result;
 }
 
+static int
+leafpart_index_cmp(const void *arg1, const void *arg2)
+{
+	int		leafidx1 = *(const int *) arg1;
+	int		leafidx2 = *(const int *) arg2;
+
+	if (leafidx1 > leafidx2)
+		return 1;
+	else if (leafidx1 < leafidx2)
+		return -1;
+	return 0;
+}
+
+/*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ */
+void
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	partrel,
+				rootrel;
+	ResultRelInfo *leaf_part_rri;
+
+	/* Nothing to do if already set. */
+	if (proute->partitions[partidx])
+		return;
+
+	leaf_part_rri = NULL;
+	rootrel = resultRelInfo->ri_RelationDesc;
+
+	/*
+	 * If we are doing tuple routing for update, try to reuse the
+	 * per-subplan resultrel for this partition that ExecInitModifyTable()
+	 * might already have created.
+	 */
+	if (mtstate && mtstate->operation == CMD_UPDATE)
+	{
+		int   *partidx_entry;
+
+		/*
+		 * If the partition's ResultRelInfo has already been created, we'd
+		 * find its index in proute->subplan_partition_offsets.
+		 */
+		partidx_entry = (int *) bsearch(&partidx,
+										proute->subplan_partition_offsets,
+										mtstate->mt_nplans, sizeof(int),
+										leafpart_index_cmp);
+		if (partidx_entry)
+		{
+			int		update_rri_index =
+							partidx_entry - proute->subplan_partition_offsets;
+
+			Assert (update_rri_index < mtstate->mt_nplans);
+			leaf_part_rri = &mtstate->resultRelInfo[update_rri_index];
+			partrel = leaf_part_rri->ri_RelationDesc;
+
+			/*
+			 * This is required in order to we convert the partition's
+			 * tuple to be compatible with the root partitioned table's
+			 * tuple descriptor.  When generating the per-subplan result
+			 * rels, this was not set.
+			 */
+			leaf_part_rri->ri_PartitionRoot = rootrel;
+		}
+	}
+
+	/*
+	 * Create new result rel, either if we are *inserting* the new tuple, or
+	 * if we didn't find the result rel above for the update tuple routing
+	 * case.
+	 */
+	if (leaf_part_rri == NULL)
+	{
+		int			firstVarno;
+		Relation	firstResultRel;
+		ModifyTable *node = NULL;
+
+		if (mtstate)
+		{
+			node = (ModifyTable *) mtstate->ps.plan;
+			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+			firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		}
+
+		leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+
+		/*
+		 * We locked all the partitions in ExecSetupPartitionTupleRouting
+		 * including the leaf partitions.
+		 */
+		partrel = heap_open(proute->partition_oids[partidx], NoLock);
+		InitResultRelInfo(leaf_part_rri,
+						  partrel,
+						  node ? node->nominalRelation : 1,
+						  rootrel,
+						  estate->es_instrument);
+
+		/*
+		 * Build WITH CHECK OPTION constraints for this partition rel.  Note
+		 * that we didn't build the withCheckOptionList for each partition
+		 * within the planner, but simple translation of the varattnos will
+		 * suffice.  This only occurs for the INSERT case or in the case of
+		 * UPDATE for which we didn't find a result rel above to reuse.
+		 */
+		if (node && node->withCheckOptionLists != NIL)
+		{
+			List	   *wcoList;
+			List	   *mapped_wcoList;
+			List	   *wcoExprs = NIL;
+			ListCell   *ll;
+
+			/*
+			 * In the case of INSERT on partitioned tables, there is only one
+			 * plan.  Likewise, there is only one WCO list, not one per
+			 * partition.  For UPDATE, there would be as many WCO lists as
+			 * there are plans, but we use the first one as reference.  Note
+			 * that if there are SubPlans in there, they all end up attached
+			 * to the one parent Plan node.
+			 */
+			Assert((node->operation == CMD_INSERT &&
+					list_length(node->withCheckOptionLists) == 1 &&
+					list_length(node->plans) == 1) ||
+				   (node->operation == CMD_UPDATE &&
+					list_length(node->withCheckOptionLists) ==
+					list_length(node->plans)));
+			wcoList = linitial(node->withCheckOptionLists);
+
+			mapped_wcoList = map_partition_varattnos(wcoList,
+													 firstVarno,
+													 partrel, firstResultRel,
+													 NULL);
+			foreach(ll, mapped_wcoList)
+			{
+				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+												   mtstate->mt_plans[0]);
+				wcoExprs = lappend(wcoExprs, wcoExpr);
+			}
+
+			leaf_part_rri->ri_WithCheckOptions = mapped_wcoList;
+			leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+		}
+
+		/*
+		 * Build the RETURNING projection if any for the partition.  Note that
+		 * we didn't build the returningList for each partition within the
+		 * planner, but simple translation of the varattnos will suffice.
+		 * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+		 * ExecInitModifyTable() would've initialized this.
+		 */
+		if (node && node->returningLists != NIL)
+		{
+			TupleTableSlot *slot;
+			ExprContext *econtext;
+			List	   *returningList;
+			List	   *rlist;
+			TupleDesc	tupDesc;
+
+			/* See the comment written above for WCO lists. */
+			Assert((node->operation == CMD_INSERT &&
+					list_length(node->returningLists) == 1 &&
+					list_length(node->plans) == 1) ||
+				   (node->operation == CMD_UPDATE &&
+					list_length(node->returningLists) ==
+					list_length(node->plans)));
+			returningList = linitial(node->returningLists);
+
+			/*
+			 * Initialize result tuple slot and assign its rowtype using the first
+			 * RETURNING list.  We assume the rest will look the same.
+			 */
+			tupDesc = ExecTypeFromTL(returningList, false);
+
+			/* Set up a slot for the output of the RETURNING projection(s) */
+			ExecInitResultTupleSlot(estate, &mtstate->ps);
+			ExecAssignResultType(&mtstate->ps, tupDesc);
+			slot = mtstate->ps.ps_ResultTupleSlot;
+
+			/* Need an econtext too */
+			if (mtstate->ps.ps_ExprContext == NULL)
+				ExecAssignExprContext(estate, &mtstate->ps);
+			econtext = mtstate->ps.ps_ExprContext;
+
+			rlist = map_partition_varattnos(returningList,
+											firstVarno,
+											partrel, firstResultRel, NULL);
+			leaf_part_rri->ri_projectReturning =
+				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+										RelationGetDescr(partrel));
+		}
+
+		/*
+		 * Open partition indices.  The user may have asked to check for
+		 * conflicts within this leaf partition and do "nothing" instead of
+		 * throwing an error.  Be prepared in that case by initializing the
+		 * index information needed by ExecInsert() to perform speculative
+		 * insertions.
+		 */
+		if (partrel->rd_rel->relhasindex &&
+			leaf_part_rri->ri_IndexRelationDescs == NULL)
+			ExecOpenIndices(leaf_part_rri,
+							mtstate != NULL &&
+							mtstate->mt_onconflict != ONCONFLICT_NONE);
+	}
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Note that the entries in this list appear in no predetermined order,
+	 * because partition result rels are initialized as and when they're
+	 * needed.
+	 */
+	estate->es_leaf_result_relations =
+								lappend(estate->es_leaf_result_relations,
+										leaf_part_rri);
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+}
+
 /*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
@@ -538,8 +692,11 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 			continue;
 		}
 
-		ExecCloseIndices(resultRelInfo);
-		heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		if (resultRelInfo)
+		{
+			ExecCloseIndices(resultRelInfo);
+			heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		}
 	}
 
 	/* Release the standalone partition tuple descriptors, if any */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d054da5330..15ca3281c9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -304,12 +304,17 @@ ExecInsert(ModifyTableState *mtstate,
 		Assert(leaf_part_index >= 0 &&
 			   leaf_part_index < proute->num_partitions);
 
+		/* Initialize partition info, if not done already. */
+		ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
+							  leaf_part_index);
+
 		/*
 		 * Save the old ResultRelInfo and switch to the one corresponding to
 		 * the selected partition.
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		Assert(resultRelInfo != NULL);
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2081,14 +2086,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2211,20 +2212,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2271,77 +2260,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2372,44 +2296,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index a75a37060a..4a62b9f1ff 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +104,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern void ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
-- 
2.11.0

#16Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#15)
Re: non-bulk inserts and tuple routing

(2018/02/05 14:34), Amit Langote wrote:

On 2018/02/02 19:56, Etsuro Fujita wrote:

* ExecInitPartitionResultRelInfo is called from ExecFindPartition, but we
could call that another way; in ExecInsert/CopyFrom we call that after
ExecFindPartition if the partition chosen by ExecFindPartition has not
been initialized yet. Maybe either would be OK, but I like that because I
think that would not only better divide that labor but better fit into the
existing code in ExecInsert/CopyFrom IMO.

I see no problem with that, so done that way.

Thanks.

* In ExecInitPartitionResultRelInfo:
+       /*
+        * Note that the entries in this list appear in no predetermined
+        * order as result of initializing partition result rels as and when
+        * they're needed.
+        */
+       estate->es_leaf_result_relations =
+ lappend(estate->es_leaf_result_relations,
+                                           leaf_part_rri);

Is it OK to put this within the "if (leaf_part_rri == NULL)" block?

Good catch. I moved it outside the block. I was under the impression
that leaf result relations that were reused from the
mtstate->resultRelInfo arrary would have already been added to the list,
but it seems they are not.

I commented this because the update-tuple-routing patch has added to the
list ResultRelInfos reused from the mtstate->resultRelInfo arrary, but
on reflection I noticed this would cause oddity in reporting execution
stats for partitions' triggers in EXPLAIN ANALYZE. Here is an example
using the head:

postgres=# create table trigger_test (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table trigger_test1 partition of trigger_test for
values in (1);
CREATE TABLE
postgres=# create table trigger_test2 partition of trigger_test for
values in (2);
CREATE TABLE
postgres=# create trigger before_upd_row_trig before update on
trigger_test1 for
each row execute procedure trigger_data (23, 'skidoo');
CREATE TRIGGER
postgres=# create trigger before_del_row_trig before delete on
trigger_test1 for
each row execute procedure trigger_data (23, 'skidoo');
CREATE TRIGGER
postgres=# insert into trigger_test values (1, 'foo');
INSERT 0 1
postgres=# explain analyze update trigger_test set a = 2 where a = 1;
NOTICE: before_upd_row_trig(23, skidoo) BEFORE ROW UPDATE ON trigger_test1
NOTICE: OLD: (1,foo),NEW: (2,foo)
NOTICE: before_del_row_trig(23, skidoo) BEFORE ROW DELETE ON trigger_test1
NOTICE: OLD: (1,foo)
QUERY PLAN

--------------------------------------------------------------------------------
-------------------------------
Update on trigger_test (cost=0.00..25.88 rows=6 width=42) (actual
time=2.337..
2.337 rows=0 loops=1)
Update on trigger_test1
-> Seq Scan on trigger_test1 (cost=0.00..25.88 rows=6 width=42)
(actual tim
e=0.009..0.011 rows=1 loops=1)
Filter: (a = 1)
Planning time: 0.186 ms
Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1
Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1
Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1
Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1
Execution time: 2.396 ms
(10 rows)

Both trigger stats for the on-update and on-delete triggers are doubly
shown in the above output. The reason would be that
ExplainPrintTriggers called report_triggers twice for trigger_test1's
ResultRelInfo: once for it from queryDesc->estate->es_result_relations
and once for it from queryDesc->estate->es_leaf_result_relations. I
don't think this is intended behavior, so I think we should fix this. I
think we could probably address this by modifying ExecInitPartitionInfo
in your patch so that it doesn't add to the es_leaf_result_relations
list ResultRelInfos reused from the mtstate->resultRelInfo arrary, as
your previous version of the patch. (ExecGetTriggerResultRel looks at
the list too, but it would probably work well for this change.) It
might be better to address this in another patch, though.

* In the same function:
+   /*
+    * Verify result relation is a valid target for INSERT.
+    */
+   CheckValidResultRel(leaf_part_rri, CMD_INSERT);

I think it would be better to leave the previous comments as-is here:

/*
* Verify result relation is a valid target for an INSERT. An UPDATE
* of a partition-key becomes a DELETE+INSERT operation, so this
check
* is still required when the operation is CMD_UPDATE.
*/

Oops, my bad. Didn't notice that I had ended up removing the part about
UPDATE.

OK.

* ExecInitPartitionResultRelInfo does the work other than the
initialization of ResultRelInfo for the chosen partition (eg, create a
tuple conversion map to convert a tuple routed to the partition from the
parent's type to the partition's). So I'd propose to rename that function
to eg, ExecInitPartition.

I went with ExevInitPartitionInfo.

Fine with me.

* CopyFrom is modified so that it calls ExecSetupPartitionTupleRouting and
ExecFindPartition with a mostly-dummy ModifyTableState node. I'm not sure
that is a good idea. My concern about that is that might be something
like a headache in future development.

OK, I removed those changes.

Thanks.

* The patch 0001 and 0002 are pretty small but can't be reviewed without
the patch 0003. The total size of the three patches aren't that large, so
I think it would be better to put those patches together into a single patch.

As I said above, I got rid of 0001. Then, I merged the
ExecFindPartition() refactoring patch 0002 into 0003.

The code in tupconv_map_for_subplan() currently assumes that it can rely
on all leaf partitions having been initialized. Since we're breaking that
assumption with this proposal, that needed to be changed. So the patch
contained some refactoring to make it not rely on that assumption. I
carved that out into a separate patch which can be applied and tested
before the main patch.

OK, will review that patch separately.

Here is the updated version that contains two patches as described above.

Thanks for updating the patches! I'll post my next comments in a few days.

Best regards,
Etsuro Fujita

#17Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#16)
Re: non-bulk inserts and tuple routing

On 2018/02/05 19:43, Etsuro Fujita wrote:

(2018/02/05 14:34), Amit Langote wrote:

On 2018/02/02 19:56, Etsuro Fujita wrote:

* In ExecInitPartitionResultRelInfo:
+       /*
+        * Note that the entries in this list appear in no predetermined
+        * order as result of initializing partition result rels as and
when
+        * they're needed.
+        */
+       estate->es_leaf_result_relations =
+ lappend(estate->es_leaf_result_relations,
+                                           leaf_part_rri);

Is it OK to put this within the "if (leaf_part_rri == NULL)" block?

Good catch.  I moved it outside the block.  I was under the impression
that leaf result relations that were reused from the
mtstate->resultRelInfo arrary would have already been added to the list,
but it seems they are not.

I commented this because the update-tuple-routing patch has added to the
list ResultRelInfos reused from the mtstate->resultRelInfo arrary, but on
reflection I noticed this would cause oddity in reporting execution stats
for partitions' triggers in EXPLAIN ANALYZE.  Here is an example using the
head:

postgres=# create table trigger_test (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table trigger_test1 partition of trigger_test for values
in (1);
CREATE TABLE
postgres=# create table trigger_test2 partition of trigger_test for values
in (2);
CREATE TABLE
postgres=# create trigger before_upd_row_trig before update on
trigger_test1 for
 each row execute procedure trigger_data (23, 'skidoo');
CREATE TRIGGER
postgres=# create trigger before_del_row_trig before delete on
trigger_test1 for
 each row execute procedure trigger_data (23, 'skidoo');
CREATE TRIGGER
postgres=# insert into trigger_test values (1, 'foo');
INSERT 0 1
postgres=# explain analyze update trigger_test set a = 2 where a = 1;
NOTICE:  before_upd_row_trig(23, skidoo) BEFORE ROW UPDATE ON trigger_test1
NOTICE:  OLD: (1,foo),NEW: (2,foo)
NOTICE:  before_del_row_trig(23, skidoo) BEFORE ROW DELETE ON trigger_test1
NOTICE:  OLD: (1,foo)
                                                  QUERY PLAN

--------------------------------------------------------------------------------

-------------------------------
 Update on trigger_test  (cost=0.00..25.88 rows=6 width=42) (actual
time=2.337..
2.337 rows=0 loops=1)
   Update on trigger_test1
   ->  Seq Scan on trigger_test1  (cost=0.00..25.88 rows=6 width=42)
(actual tim
e=0.009..0.011 rows=1 loops=1)
         Filter: (a = 1)
 Planning time: 0.186 ms
 Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1
 Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1
 Trigger before_del_row_trig on trigger_test1: time=0.495 calls=1
 Trigger before_upd_row_trig on trigger_test1: time=0.870 calls=1
 Execution time: 2.396 ms
(10 rows)

Both trigger stats for the on-update and on-delete triggers are doubly
shown in the above output.  The reason would be that ExplainPrintTriggers
called report_triggers twice for trigger_test1's ResultRelInfo: once for
it from queryDesc->estate->es_result_relations and once for it from
queryDesc->estate->es_leaf_result_relations.  I don't think this is
intended behavior, so I think we should fix this.  I think we could
probably address this by modifying ExecInitPartitionInfo in your patch so
that it doesn't add to the es_leaf_result_relations list ResultRelInfos
reused from the mtstate->resultRelInfo arrary, as your previous version of
the patch.  (ExecGetTriggerResultRel looks at the list too, but it would
probably work well for this change.)  It might be better to address this
in another patch, though.

I see. Thanks for the analysis and the explanation.

Seeing as this bug exists in HEAD, as you also seem to be saying, we'd
need to fix it independently of the patches on this thread. I've posted a
patch in another thread titled "update tuple routing and triggers".

Thanks,
Amit

#18Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#16)
Re: non-bulk inserts and tuple routing

(2018/02/05 19:43), Etsuro Fujita wrote:

(2018/02/05 14:34), Amit Langote wrote:

The code in tupconv_map_for_subplan() currently assumes that it can rely
on all leaf partitions having been initialized.

On reflection I noticed this analysis is not 100% correct; I think what
that function actually assumes is that all sublplans' partitions have
already been initialized, not all leaf partitions.

Since we're breaking that
assumption with this proposal, that needed to be changed. So the patch
contained some refactoring to make it not rely on that assumption.

I don't think we really need this refactoring because since that as in
another patch you posted, we could initialize all subplans' partitions
in ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could
be called without any changes to that function because of what I said above.

Here is the updated version that contains two patches as described above.

Thanks for updating the patches! I'll post my next comments in a few days.

Here are comments for the other patch (patch
v24-0002-During-tuple-routing-initialize-per-partition-ob.patch):

o On changes to ExecSetupPartitionTupleRouting:

* The comment below wouldn't be correct; no ExecInitResultRelInfo in the
patch.

-   proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-                                                  sizeof(ResultRelInfo *));
+   /*
+    * Actual ResultRelInfo's and TupleConversionMap's are allocated in
+    * ExecInitResultRelInfo().
+    */
+   proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+                                                   sizeof(ResultRelInfo 
*));

* The patch removes this from the initialization step for a subplan's
partition, but I think it would be better to keep this here because I
think it's a good thing to put the initialization stuff together into
one place.

- /*
- * This is required in order to we convert the partition's
- * tuple to be compatible with the root partitioned table's
- * tuple descriptor. When generating the per-subplan result
- * rels, this was not set.
- */
- leaf_part_rri->ri_PartitionRoot = rel;

* I think it would be better to keep this comment here.

- /* Remember the subplan offset for this ResultRelInfo */

* Why is this removed from that initialization?

- proute->partitions[i] = leaf_part_rri;

o On changes to ExecInitPartitionInfo:

* I don't understand the step starting from this, but I'm wondering if
that step can be removed by keeping the above setup of
proute->partitions for the subplan's partition in
ExecSetupPartitionTupleRouting.

+   /*
+    * If we are doing tuple routing for update, try to reuse the
+    * per-subplan resultrel for this partition that ExecInitModifyTable()
+    * might already have created.
+    */
+   if (mtstate && mtstate->operation == CMD_UPDATE)

That's all I have for now.

Best regards,
Etsuro Fujita

#19Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#18)
Re: non-bulk inserts and tuple routing

(2018/02/07 19:36), Etsuro Fujita wrote:

(2018/02/05 19:43), Etsuro Fujita wrote:

(2018/02/05 14:34), Amit Langote wrote:

Here is the updated version that contains two patches as described
above.

Here are some minor comments:

o On changes to ExecInsert

* This might be just my taste, but I think it would be better to (1)
change ExecInitPartitionInfo so that it creates and returns a
newly-initialized ResultRelInfo for an initialized partition, and (2)
rewrite this bit:

+       /* Initialize partition info, if not done already. */
+       ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
+                             leaf_part_index);
+
         /*
          * Save the old ResultRelInfo and switch to the one 
corresponding to
          * the selected partition.
          */
         saved_resultRelInfo = resultRelInfo;
         resultRelInfo = proute->partitions[leaf_part_index];
+       Assert(resultRelInfo != NULL);

to something like this (I would say the same thing to the copy.c changes):

/*
* Save the old ResultRelInfo and switch to the one corresponding to
* the selected partition.
*/
saved_resultRelInfo = resultRelInfo;
resultRelInfo = proute->partitions[leaf_part_index];
if (resultRelInfo == NULL);
{
/* Initialize partition info. */
resultRelInfo = ExecInitPartitionInfo(mtstate,
saved_resultRelInfo,
proute,
estate,
leaf_part_index);
}

This would make ExecInitPartitionInfo more simple because it can assume
that the given partition has not been initialized yet.

o On changes to execPartition.h

* Please add a brief decsription about partition_oids to the comments
for this struct.

@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
{
PartitionDispatch *partition_dispatch_info;
int num_dispatch;
+ Oid *partition_oids;

That's it.

Best regards,
Etsuro Fujita

#20Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#19)
Re: non-bulk inserts and tuple routing

On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

if (resultRelInfo == NULL);
{
/* Initialize partition info. */
resultRelInfo = ExecInitPartitionInfo(mtstate,
saved_resultRelInfo,
proute,
estate,
leaf_part_index);
}

I'm pretty sure that code has one semicolon too many.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#19)
1 attachment(s)
Re: non-bulk inserts and tuple routing

Fujita-san,

Thanks a lot for the review.

I had mistakenly tagged these patches v24, but they were actually supposed
to be v5. So the attached updated patch is tagged v6.

On 2018/02/07 19:36, Etsuro Fujita wrote:

(2018/02/05 14:34), Amit Langote wrote:

The code in tupconv_map_for_subplan() currently assumes that it can rely
on all leaf partitions having been initialized.

On reflection I noticed this analysis is not 100% correct; I think what
that function actually assumes is that all sublplans' partitions have
already been initialized, not all leaf partitions.

Yes, you're right.

Since we're breaking that
assumption with this proposal, that needed to be changed. So the patch
contained some refactoring to make it not rely on that assumption.

I don't think we really need this refactoring because since that as in
another patch you posted, we could initialize all subplans' partitions in
ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be
called without any changes to that function because of what I said above.

What my previous approach failed to consider is that in the update case,
we'd already have ResultRelInfo's for some of the leaf partitions
initialized, which could be saved into proute->partitions array right away.

Updated patch does things that way, so all the changes I had proposed to
tupconv_map_for_subplan are rendered unnecessary.

Here are comments for the other patch (patch
v24-0002-During-tuple-routing-initialize-per-partition-ob.patch):

o On changes to ExecSetupPartitionTupleRouting:

* The comment below wouldn't be correct; no ExecInitResultRelInfo in the
patch.

-   proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-                                                  sizeof(ResultRelInfo *));
+   /*
+    * Actual ResultRelInfo's and TupleConversionMap's are allocated in
+    * ExecInitResultRelInfo().
+    */
+   proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+                                                   sizeof(ResultRelInfo

*));

I removed the comment altogether, as the comments elsewhere make the point
clear.

* The patch removes this from the initialization step for a subplan's
partition, but I think it would be better to keep this here because I
think it's a good thing to put the initialization stuff together into one
place.

- /*
- * This is required in order to we convert the partition's
- * tuple to be compatible with the root partitioned table's
- * tuple descriptor. When generating the per-subplan result
- * rels, this was not set.
- */
- leaf_part_rri->ri_PartitionRoot = rel;

It wasn't needed here with the previous approach, because we didn't touch
any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it
back in the updated patch.

* I think it would be better to keep this comment here.

- /* Remember the subplan offset for this ResultRelInfo */

Fixed.

* Why is this removed from that initialization?

- proute->partitions[i] = leaf_part_rri;

Because of the old approach. Now it's back in.

o On changes to ExecInitPartitionInfo:

* I don't understand the step starting from this, but I'm wondering if
that step can be removed by keeping the above setup of proute->partitions
for the subplan's partition in ExecSetupPartitionTupleRouting.

+   /*
+    * If we are doing tuple routing for update, try to reuse the
+    * per-subplan resultrel for this partition that ExecInitModifyTable()
+    * might already have created.
+    */
+   if (mtstate && mtstate->operation == CMD_UPDATE)

Done, as mentioned above.

On 2018/02/08 19:16, Etsuro Fujita wrote:

Here are some minor comments:

o On changes to ExecInsert

* This might be just my taste, but I think it would be better to (1)
change ExecInitPartitionInfo so that it creates and returns a
newly-initialized ResultRelInfo for an initialized partition, and (2)
rewrite this bit:

+       /* Initialize partition info, if not done already. */
+       ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
+                             leaf_part_index);
+
        /*
         * Save the old ResultRelInfo and switch to the one corresponding to
         * the selected partition.
         */
        saved_resultRelInfo = resultRelInfo;
        resultRelInfo = proute->partitions[leaf_part_index];
+       Assert(resultRelInfo != NULL);

to something like this (I would say the same thing to the copy.c changes):

    /*
     * Save the old ResultRelInfo and switch to the one corresponding to
     * the selected partition.
     */
    saved_resultRelInfo = resultRelInfo;
    resultRelInfo = proute->partitions[leaf_part_index];
    if (resultRelInfo == NULL);
    {
        /* Initialize partition info. */
        resultRelInfo = ExecInitPartitionInfo(mtstate,
                                              saved_resultRelInfo,
                                              proute,
                                              estate,
                                              leaf_part_index);
    }

This would make ExecInitPartitionInfo more simple because it can assume
that the given partition has not been initialized yet.

Agree that it's much better to do it this way. Done.

o On changes to execPartition.h

* Please add a brief decsription about partition_oids to the comments for
this struct.

@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
 {
    PartitionDispatch *partition_dispatch_info;
    int         num_dispatch;
+   Oid        *partition_oids;

Done.

Attached v6.

Thanks,
Amit

Attachments:

v6-0001-During-tuple-routing-initialize-per-partition-obj.patchtext/plain; charset=UTF-8; name=v6-0001-During-tuple-routing-initialize-per-partition-obj.patchDownload
From 8e1e08ce34e52c22c40cc03aeae23e38f1b8e3f1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v6] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |  10 +-
 src/backend/executor/execPartition.c   | 303 ++++++++++++++++++++++++---------
 src/backend/executor/nodeModifyTable.c | 131 ++------------
 src/include/executor/execPartition.h   |   9 +-
 4 files changed, 248 insertions(+), 205 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df9af..118452b602 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2607,6 +2607,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+				resultRelInfo = ExecInitPartitionInfo(NULL,
+													  saved_resultRelInfo,
+													  proute, estate,
+													  leaf_part_index);
+				Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4048c3ebc6..1c16a2b5aa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already have initialized ResultRelInfo's for
+ * some of the partitions, which are reused and assigned to their respective
+ * slot in the aforementioned array.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
+	ResultRelInfo *update_rri = NULL;
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	bool		is_update = false;
@@ -76,6 +81,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
+											sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -95,16 +102,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
-	{
-		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
-		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
-	}
 
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
@@ -117,11 +114,12 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
-		ResultRelInfo *leaf_part_rri;
+		ResultRelInfo *leaf_part_rri = NULL;
 		Relation	partrel = NULL;
 		TupleDesc	part_tupdesc;
 		Oid			leaf_oid = lfirst_oid(cell);
 
+		proute->partition_oids[i] = leaf_oid;
 		if (is_update)
 		{
 			/*
@@ -151,73 +149,34 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 				proute->subplan_partition_offsets[update_rri_index] = i;
 
 				update_rri_index++;
-			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
-		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-
-			/*
-			 * Since we've just initialized this ResultRelInfo, it's not in
-			 * any list attached to the estate as yet.  Add it, so that it can
-			 * be found later.
-			 */
-			estate->es_tuple_routing_result_relations =
-						lappend(estate->es_tuple_routing_result_relations,
-								leaf_part_rri);
-		}
 
-		part_tupdesc = RelationGetDescr(partrel);
+				part_tupdesc = RelationGetDescr(partrel);
 
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
+				/*
+				 * Save a tuple conversion map to convert a tuple routed to
+				 * this partition from the parent's type to the partition's.
+				 */
+				proute->parent_child_tupconv_maps[i] =
+					convert_tuples_by_name(tupDesc, part_tupdesc,
+							   gettext_noop("could not convert row type"));
 
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+				/*
+				 * Verify result relation is a valid target for an INSERT.  An
+				 * UPDATE of a partition-key becomes a DELETE+INSERT operation,
+				 * so this check is required even when the operation is
+				 * CMD_UPDATE.
+				 */
+				CheckValidResultRel(leaf_part_rri, CMD_INSERT);
 
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
+				/*
+				 * Open partition indices.  We wouldn't need speculative
+				 * insertions though.
+				 */
+				if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
+					leaf_part_rri->ri_IndexRelationDescs == NULL)
+					ExecOpenIndices(leaf_part_rri, false);
+			}
+		}
 
 		proute->partitions[i] = leaf_part_rri;
 		i++;
@@ -352,6 +311,185 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 }
 
 /*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ *
+ * Returns the ResultRelInfo
+ */
+ResultRelInfo *
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	rootrel = resultRelInfo->ri_RelationDesc,
+				partrel;
+	ResultRelInfo *leaf_part_rri;
+	int			firstVarno;
+	Relation	firstResultRel;
+	ModifyTable *node = NULL;
+
+	if (mtstate)
+	{
+		node = (ModifyTable *) mtstate->ps.plan;
+		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+	}
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+	InitResultRelInfo(leaf_part_rri,
+					  partrel,
+					  node ? node->nominalRelation : 1,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	/*
+	 * Since we've just initialized this ResultRelInfo, it's not in
+	 * any list attached to the estate as yet.  Add it, so that it can
+	 * be found later.
+	 *
+	 * Note that the entries in this list appear in no predetermined
+	 * order, because partition result rels are initialized as and when
+	 * they're needed.
+	 */
+	estate->es_tuple_routing_result_relations =
+					lappend(estate->es_tuple_routing_result_relations,
+							leaf_part_rri);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		leaf_part_rri->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(leaf_part_rri,
+						(mtstate != NULL &&
+						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for this partition rel.  Note
+	 * that we didn't build the withCheckOptionList for each partition
+	 * within the planner, but simple translation of the varattnos will
+	 * suffice.  This only occurs for the INSERT case or in the case of
+	 * UPDATE for which we didn't find a result rel above to reuse.
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *mapped_wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+
+		/*
+		 * In the case of INSERT on partitioned tables, there is only one
+		 * plan.  Likewise, there is only one WCO list, not one per
+		 * partition.  For UPDATE, there would be as many WCO lists as
+		 * there are plans, but we use the first one as reference.  Note
+		 * that if there are SubPlans in there, they all end up attached
+		 * to the one parent Plan node.
+		 */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->withCheckOptionLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->withCheckOptionLists) ==
+				list_length(node->plans)));
+		wcoList = linitial(node->withCheckOptionLists);
+
+		mapped_wcoList = map_partition_varattnos(wcoList, firstVarno,
+												 partrel, firstResultRel,
+												 NULL);
+		foreach(ll, mapped_wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		leaf_part_rri->ri_WithCheckOptions = mapped_wcoList;
+		leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build the RETURNING projection if any for the partition.  Note that
+	 * we didn't build the returningList for each partition within the
+	 * planner, but simple translation of the varattnos will suffice.
+	 * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+	 * ExecInitModifyTable() would've initialized this.
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		List	   *rlist;
+		TupleDesc	tupDesc;
+
+		/* See the comment written above for WCO lists. */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->returningLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->returningLists) ==
+				list_length(node->plans)));
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Initialize result tuple slot and assign its rowtype using the first
+		 * RETURNING list.  We assume the rest will look the same.
+		 */
+		tupDesc = ExecTypeFromTL(returningList, false);
+
+		/* Set up a slot for the output of the RETURNING projection(s) */
+		ExecInitResultTupleSlot(estate, &mtstate->ps);
+		ExecAssignResultType(&mtstate->ps, tupDesc);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+
+		/* Need an econtext too */
+		if (mtstate->ps.ps_ExprContext == NULL)
+			ExecAssignExprContext(estate, &mtstate->ps);
+		econtext = mtstate->ps.ps_ExprContext;
+
+		rlist = map_partition_varattnos(returningList, firstVarno,
+										partrel, firstResultRel, NULL);
+		leaf_part_rri->ri_projectReturning =
+			ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+									RelationGetDescr(partrel));
+	}
+
+	Assert (proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+
+	return leaf_part_rri;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -495,8 +633,11 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 			continue;
 		}
 
-		ExecCloseIndices(resultRelInfo);
-		heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		if (resultRelInfo)
+		{
+			ExecCloseIndices(resultRelInfo);
+			heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		}
 	}
 
 	/* Release the standalone partition tuple descriptors, if any */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a8ecbd830..36e2041755 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -310,6 +310,14 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecInitPartitionInfo(mtstate,
+												  saved_resultRelInfo,
+												  proute, estate,
+												  leaf_part_index);
+			Assert(resultRelInfo != NULL);
+		}
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2098,14 +2106,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2228,20 +2232,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2288,77 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2389,44 +2316,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..e94718608f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
+ * partition_oids				Array of leaf partitions OIDs
  * partitions					Array of ResultRelInfo* objects with one entry
  *								for every leaf partition in the partition tree.
  * num_partitions				Number of leaf partitions in the partition tree
@@ -91,6 +92,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +105,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
-- 
2.11.0

#22Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#20)
Re: non-bulk inserts and tuple routing

(2018/02/08 23:21), Robert Haas wrote:

On Thu, Feb 8, 2018 at 5:16 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

if (resultRelInfo == NULL);
{
/* Initialize partition info. */
resultRelInfo = ExecInitPartitionInfo(mtstate,
saved_resultRelInfo,
proute,
estate,
leaf_part_index);
}

I'm pretty sure that code has one semicolon too many.

Good catch!

Best regards,
Etsuro Fujita

#23Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#21)
Re: non-bulk inserts and tuple routing

(2018/02/09 14:32), Amit Langote wrote:

I had mistakenly tagged these patches v24, but they were actually supposed
to be v5. So the attached updated patch is tagged v6.

OK.

On 2018/02/07 19:36, Etsuro Fujita wrote:

(2018/02/05 14:34), Amit Langote wrote:

The code in tupconv_map_for_subplan() currently assumes that it can rely
on all leaf partitions having been initialized.

On reflection I noticed this analysis is not 100% correct; I think what
that function actually assumes is that all sublplans' partitions have
already been initialized, not all leaf partitions.

Yes, you're right.

Since we're breaking that
assumption with this proposal, that needed to be changed. So the patch
contained some refactoring to make it not rely on that assumption.

I don't think we really need this refactoring because since that as in
another patch you posted, we could initialize all subplans' partitions in
ExecSetupPartitionTupleRouting, I think tupconv_map_for_subplan could be
called without any changes to that function because of what I said above.

What my previous approach failed to consider is that in the update case,
we'd already have ResultRelInfo's for some of the leaf partitions
initialized, which could be saved into proute->partitions array right away.

Updated patch does things that way, so all the changes I had proposed to
tupconv_map_for_subplan are rendered unnecessary.

OK, thanks for the updated patch!

Here are comments for the other patch (patch
v24-0002-During-tuple-routing-initialize-per-partition-ob.patch):

o On changes to ExecSetupPartitionTupleRouting:

* The comment below wouldn't be correct; no ExecInitResultRelInfo in the
patch.

-   proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-                                                  sizeof(ResultRelInfo *));
+   /*
+    * Actual ResultRelInfo's and TupleConversionMap's are allocated in
+    * ExecInitResultRelInfo().
+    */
+   proute->partitions = (ResultRelInfo **) palloc0(proute->num_partitions *
+                                                   sizeof(ResultRelInfo

*));

I removed the comment altogether, as the comments elsewhere make the point
clear.

* The patch removes this from the initialization step for a subplan's
partition, but I think it would be better to keep this here because I
think it's a good thing to put the initialization stuff together into one
place.

- /*
- * This is required in order to we convert the partition's
- * tuple to be compatible with the root partitioned table's
- * tuple descriptor. When generating the per-subplan result
- * rels, this was not set.
- */
- leaf_part_rri->ri_PartitionRoot = rel;

It wasn't needed here with the previous approach, because we didn't touch
any ResultRelInfo's in ExecSetupPartitionTupleRouting, but I've added it
back in the updated patch.

* I think it would be better to keep this comment here.

- /* Remember the subplan offset for this ResultRelInfo */

Fixed.

* Why is this removed from that initialization?

- proute->partitions[i] = leaf_part_rri;

Because of the old approach. Now it's back in.

o On changes to ExecInitPartitionInfo:

* I don't understand the step starting from this, but I'm wondering if
that step can be removed by keeping the above setup of proute->partitions
for the subplan's partition in ExecSetupPartitionTupleRouting.

+   /*
+    * If we are doing tuple routing for update, try to reuse the
+    * per-subplan resultrel for this partition that ExecInitModifyTable()
+    * might already have created.
+    */
+   if (mtstate&&  mtstate->operation == CMD_UPDATE)

Done, as mentioned above.

On 2018/02/08 19:16, Etsuro Fujita wrote:

Here are some minor comments:

o On changes to ExecInsert

* This might be just my taste, but I think it would be better to (1)
change ExecInitPartitionInfo so that it creates and returns a
newly-initialized ResultRelInfo for an initialized partition, and (2)
rewrite this bit:

+       /* Initialize partition info, if not done already. */
+       ExecInitPartitionInfo(mtstate, resultRelInfo, proute, estate,
+                             leaf_part_index);
+
/*
* Save the old ResultRelInfo and switch to the one corresponding to
* the selected partition.
*/
saved_resultRelInfo = resultRelInfo;
resultRelInfo = proute->partitions[leaf_part_index];
+       Assert(resultRelInfo != NULL);

to something like this (I would say the same thing to the copy.c changes):

/*
* Save the old ResultRelInfo and switch to the one corresponding to
* the selected partition.
*/
saved_resultRelInfo = resultRelInfo;
resultRelInfo = proute->partitions[leaf_part_index];
if (resultRelInfo == NULL);
{
/* Initialize partition info. */
resultRelInfo = ExecInitPartitionInfo(mtstate,
saved_resultRelInfo,
proute,
estate,
leaf_part_index);
}

This would make ExecInitPartitionInfo more simple because it can assume
that the given partition has not been initialized yet.

Agree that it's much better to do it this way. Done.

Thanks for all those changes!

o On changes to execPartition.h

* Please add a brief decsription about partition_oids to the comments for
this struct.

@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
{
PartitionDispatch *partition_dispatch_info;
int num_dispatch;
+ Oid *partition_oids;

Done.

Thanks, but one thing I'm wondering is: do we really need this array? I
think we could store into PartitionTupleRouting the list of oids
returned by RelationGetPartitionDispatchInfo in
ExecSetupPartitionTupleRouting, instead. Sorry, I should have commented
this in a previous email, but what do you think about that?

Here are other comments:

o On changes to ExecSetupPartitionTupleRouting:

* This is nitpicking, but it would be better to define partrel and
part_tupdesc within the if (update_rri_index < num_update_rri &&
RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) ==
leaf_oid) block.

-               ResultRelInfo *leaf_part_rri;
+               ResultRelInfo *leaf_part_rri = NULL;
                 Relation        partrel = NULL;
                 TupleDesc       part_tupdesc;
                 Oid                     leaf_oid = lfirst_oid(cell);

* Do we need this? For a leaf partition that is already present in the
subplan resultrels, the partition's indices (if any) would have already
been opened.

+                               /*
+                                * Open partition indices.  We wouldn't 
need speculative
+                                * insertions though.
+                                */
+                               if 
(leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
+ 
leaf_part_rri->ri_IndexRelationDescs == NULL)
+                                       ExecOpenIndices(leaf_part_rri, 
false);

I'll look at the patch a bit more early next week, but other than that,
the patch looks fairly in good shape to me.

Best regards,
Etsuro Fujita

#24Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#23)
1 attachment(s)
Re: non-bulk inserts and tuple routing

Fujita-san,

Thanks for the review.

On 2018/02/09 21:20, Etsuro Fujita wrote:

* Please add a brief decsription about partition_oids to the comments for
this struct.

@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
  {
     PartitionDispatch *partition_dispatch_info;
     int         num_dispatch;
+   Oid        *partition_oids;

Done.

Thanks, but one thing I'm wondering is: do we really need this array?  I
think we could store into PartitionTupleRouting the list of oids returned
by RelationGetPartitionDispatchInfo in ExecSetupPartitionTupleRouting,
instead.  Sorry, I should have commented this in a previous email, but
what do you think about that?

ExecInitPartitionInfo() will have to iterate the list to get the OID of
the partition to be initialized. Isn't it much cheaper with the array?

Here are other comments:

o On changes to ExecSetupPartitionTupleRouting:

* This is nitpicking, but it would be better to define partrel and
part_tupdesc within the if (update_rri_index < num_update_rri &&
RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) ==
leaf_oid) block.

-               ResultRelInfo *leaf_part_rri;
+               ResultRelInfo *leaf_part_rri = NULL;
                Relation        partrel = NULL;
                TupleDesc       part_tupdesc;
                Oid                     leaf_oid = lfirst_oid(cell);

Sure, done.

* Do we need this?  For a leaf partition that is already present in the
subplan resultrels, the partition's indices (if any) would have already
been opened.

+                               /*
+                                * Open partition indices.  We wouldn't
need speculative
+                                * insertions though.
+                                */
+                               if
(leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
+ leaf_part_rri->ri_IndexRelationDescs == NULL)
+                                       ExecOpenIndices(leaf_part_rri,
false);

You're right. Removed the call.

Updated patch is attached.

Thanks,
Amit

Attachments:

v7-0001-During-tuple-routing-initialize-per-partition-obj.patchtext/plain; charset=UTF-8; name=v7-0001-During-tuple-routing-initialize-per-partition-obj.patchDownload
From 93a0fdedf326a930a7113489250ab2ddd1f478e3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v7] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |  10 +-
 src/backend/executor/execPartition.c   | 302 ++++++++++++++++++++++++---------
 src/backend/executor/nodeModifyTable.c | 131 ++------------
 src/include/executor/execPartition.h   |   9 +-
 4 files changed, 244 insertions(+), 208 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df9af..118452b602 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2607,6 +2607,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+				resultRelInfo = ExecInitPartitionInfo(NULL,
+													  saved_resultRelInfo,
+													  proute, estate,
+													  leaf_part_index);
+				Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4048c3ebc6..9e3910a4a6 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already have initialized ResultRelInfo's for
+ * some of the partitions, which are reused and assigned to their respective
+ * slot in the aforementioned array.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
+	ResultRelInfo *update_rri = NULL;
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	bool		is_update = false;
@@ -76,6 +81,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
+											sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -95,16 +102,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
-	{
-		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
-		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
-	}
 
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
@@ -117,11 +114,10 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
+		ResultRelInfo *leaf_part_rri = NULL;
 		Oid			leaf_oid = lfirst_oid(cell);
 
+		proute->partition_oids[i] = leaf_oid;
 		if (is_update)
 		{
 			/*
@@ -136,6 +132,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 			if (update_rri_index < num_update_rri &&
 				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
 			{
+				Relation	partrel = NULL;
+				TupleDesc	part_tupdesc;
+
 				leaf_part_rri = &update_rri[update_rri_index];
 				partrel = leaf_part_rri->ri_RelationDesc;
 
@@ -151,73 +150,26 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 				proute->subplan_partition_offsets[update_rri_index] = i;
 
 				update_rri_index++;
-			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
-		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-
-			/*
-			 * Since we've just initialized this ResultRelInfo, it's not in
-			 * any list attached to the estate as yet.  Add it, so that it can
-			 * be found later.
-			 */
-			estate->es_tuple_routing_result_relations =
-						lappend(estate->es_tuple_routing_result_relations,
-								leaf_part_rri);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
 
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
+				part_tupdesc = RelationGetDescr(partrel);
 
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+				/*
+				 * Save a tuple conversion map to convert a tuple routed to
+				 * this partition from the parent's type to the partition's.
+				 */
+				proute->parent_child_tupconv_maps[i] =
+					convert_tuples_by_name(tupDesc, part_tupdesc,
+							   gettext_noop("could not convert row type"));
 
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
+				/*
+				 * Verify result relation is a valid target for an INSERT.  An
+				 * UPDATE of a partition-key becomes a DELETE+INSERT operation,
+				 * so this check is required even when the operation is
+				 * CMD_UPDATE.
+				 */
+				CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+			}
+		}
 
 		proute->partitions[i] = leaf_part_rri;
 		i++;
@@ -352,6 +304,185 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 }
 
 /*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ *
+ * Returns the ResultRelInfo
+ */
+ResultRelInfo *
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	rootrel = resultRelInfo->ri_RelationDesc,
+				partrel;
+	ResultRelInfo *leaf_part_rri;
+	int			firstVarno;
+	Relation	firstResultRel;
+	ModifyTable *node = NULL;
+
+	if (mtstate)
+	{
+		node = (ModifyTable *) mtstate->ps.plan;
+		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+	}
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+	InitResultRelInfo(leaf_part_rri,
+					  partrel,
+					  node ? node->nominalRelation : 1,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	/*
+	 * Since we've just initialized this ResultRelInfo, it's not in
+	 * any list attached to the estate as yet.  Add it, so that it can
+	 * be found later.
+	 *
+	 * Note that the entries in this list appear in no predetermined
+	 * order, because partition result rels are initialized as and when
+	 * they're needed.
+	 */
+	estate->es_tuple_routing_result_relations =
+					lappend(estate->es_tuple_routing_result_relations,
+							leaf_part_rri);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		leaf_part_rri->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(leaf_part_rri,
+						(mtstate != NULL &&
+						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for this partition rel.  Note
+	 * that we didn't build the withCheckOptionList for each partition
+	 * within the planner, but simple translation of the varattnos will
+	 * suffice.  This only occurs for the INSERT case or in the case of
+	 * UPDATE for which we didn't find a result rel above to reuse.
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *mapped_wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+
+		/*
+		 * In the case of INSERT on partitioned tables, there is only one
+		 * plan.  Likewise, there is only one WCO list, not one per
+		 * partition.  For UPDATE, there would be as many WCO lists as
+		 * there are plans, but we use the first one as reference.  Note
+		 * that if there are SubPlans in there, they all end up attached
+		 * to the one parent Plan node.
+		 */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->withCheckOptionLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->withCheckOptionLists) ==
+				list_length(node->plans)));
+		wcoList = linitial(node->withCheckOptionLists);
+
+		mapped_wcoList = map_partition_varattnos(wcoList, firstVarno,
+												 partrel, firstResultRel,
+												 NULL);
+		foreach(ll, mapped_wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		leaf_part_rri->ri_WithCheckOptions = mapped_wcoList;
+		leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build the RETURNING projection if any for the partition.  Note that
+	 * we didn't build the returningList for each partition within the
+	 * planner, but simple translation of the varattnos will suffice.
+	 * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+	 * ExecInitModifyTable() would've initialized this.
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		List	   *rlist;
+		TupleDesc	tupDesc;
+
+		/* See the comment written above for WCO lists. */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->returningLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->returningLists) ==
+				list_length(node->plans)));
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Initialize result tuple slot and assign its rowtype using the first
+		 * RETURNING list.  We assume the rest will look the same.
+		 */
+		tupDesc = ExecTypeFromTL(returningList, false);
+
+		/* Set up a slot for the output of the RETURNING projection(s) */
+		ExecInitResultTupleSlot(estate, &mtstate->ps);
+		ExecAssignResultType(&mtstate->ps, tupDesc);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+
+		/* Need an econtext too */
+		if (mtstate->ps.ps_ExprContext == NULL)
+			ExecAssignExprContext(estate, &mtstate->ps);
+		econtext = mtstate->ps.ps_ExprContext;
+
+		rlist = map_partition_varattnos(returningList, firstVarno,
+										partrel, firstResultRel, NULL);
+		leaf_part_rri->ri_projectReturning =
+			ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+									RelationGetDescr(partrel));
+	}
+
+	Assert (proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+
+	return leaf_part_rri;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -495,8 +626,11 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 			continue;
 		}
 
-		ExecCloseIndices(resultRelInfo);
-		heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		if (resultRelInfo)
+		{
+			ExecCloseIndices(resultRelInfo);
+			heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+		}
 	}
 
 	/* Release the standalone partition tuple descriptors, if any */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a8ecbd830..36e2041755 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -310,6 +310,14 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecInitPartitionInfo(mtstate,
+												  saved_resultRelInfo,
+												  proute, estate,
+												  leaf_part_index);
+			Assert(resultRelInfo != NULL);
+		}
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2098,14 +2106,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2228,20 +2232,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2288,77 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2389,44 +2316,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..e94718608f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
+ * partition_oids				Array of leaf partitions OIDs
  * partitions					Array of ResultRelInfo* objects with one entry
  *								for every leaf partition in the partition tree.
  * num_partitions				Number of leaf partitions in the partition tree
@@ -91,6 +92,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +105,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
-- 
2.11.0

#25Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#24)
Re: non-bulk inserts and tuple routing

(2018/02/13 10:12), Amit Langote wrote:

On 2018/02/09 21:20, Etsuro Fujita wrote:

* Please add a brief decsription about partition_oids to the comments for
this struct.

@@ -91,6 +91,7 @@ typedef struct PartitionTupleRouting
{
PartitionDispatch *partition_dispatch_info;
int num_dispatch;
+ Oid *partition_oids;

Done.

Thanks, but one thing I'm wondering is: do we really need this array? I
think we could store into PartitionTupleRouting the list of oids returned
by RelationGetPartitionDispatchInfo in ExecSetupPartitionTupleRouting,
instead. Sorry, I should have commented this in a previous email, but
what do you think about that?

ExecInitPartitionInfo() will have to iterate the list to get the OID of
the partition to be initialized. Isn't it much cheaper with the array?

Good point! So I agree with adding that array.

Here are other comments:

o On changes to ExecSetupPartitionTupleRouting:

* This is nitpicking, but it would be better to define partrel and
part_tupdesc within the if (update_rri_index< num_update_rri&&
RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) ==
leaf_oid) block.

-               ResultRelInfo *leaf_part_rri;
+               ResultRelInfo *leaf_part_rri = NULL;
Relation        partrel = NULL;
TupleDesc       part_tupdesc;
Oid                     leaf_oid = lfirst_oid(cell);

Sure, done.

* Do we need this? For a leaf partition that is already present in the
subplan resultrels, the partition's indices (if any) would have already
been opened.

+                               /*
+                                * Open partition indices.  We wouldn't
need speculative
+                                * insertions though.
+                                */
+                               if
(leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex&&
+ leaf_part_rri->ri_IndexRelationDescs == NULL)
+                                       ExecOpenIndices(leaf_part_rri,
false);

You're right. Removed the call.

Thanks for the above changes!

Updated patch is attached.

Thanks, here are some minor comments:

o On changes to ExecCleanupTupleRouting:

-       ExecCloseIndices(resultRelInfo);
-       heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+       if (resultRelInfo)
+       {
+           ExecCloseIndices(resultRelInfo);
+           heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+       }

You might check at the top of the loop whether resultRelInfo is NULL and
if so do continue. I think that would save cycles a bit.

o On ExecInitPartitionInfo:

+ int firstVarno;
+ Relation firstResultRel;

My old compiler got "variable may be used uninitialized" warnings.

+   /*
+    * Build the RETURNING projection if any for the partition.  Note that
+    * we didn't build the returningList for each partition within the
+    * planner, but simple translation of the varattnos will suffice.
+    * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+    * ExecInitModifyTable() would've initialized this.
+    */

I think the last comment should be the same as for WCO lists: "This only
occurs for the INSERT case or in the case of UPDATE for which we didn't
find a result rel above to reuse."

+       /*
+        * Initialize result tuple slot and assign its rowtype using the 
first
+        * RETURNING list.  We assume the rest will look the same.
+        */
+       tupDesc = ExecTypeFromTL(returningList, false);
+
+       /* Set up a slot for the output of the RETURNING projection(s) */
+       ExecInitResultTupleSlot(estate, &mtstate->ps);
+       ExecAssignResultType(&mtstate->ps, tupDesc);
+       slot = mtstate->ps.ps_ResultTupleSlot;
+
+       /* Need an econtext too */
+       if (mtstate->ps.ps_ExprContext == NULL)
+           ExecAssignExprContext(estate, &mtstate->ps);
+       econtext = mtstate->ps.ps_ExprContext;

Do we need this initialization? I think we would already have the slot
and econtext initialized when we get here.

Other than that, the patch looks good to me.

Sorry for the delay.

Best regards,
Etsuro Fujita

#26Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#25)
1 attachment(s)
Re: non-bulk inserts and tuple routing

Fujita-san,

Thanks for the review.

On 2018/02/15 21:10, Etsuro Fujita wrote:

(2018/02/13 10:12), Amit Langote wrote:

Updated patch is attached.

Thanks, here are some minor comments:

o On changes to ExecCleanupTupleRouting:

-       ExecCloseIndices(resultRelInfo);
-       heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+       if (resultRelInfo)
+       {
+           ExecCloseIndices(resultRelInfo);
+           heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+       }

You might check at the top of the loop whether resultRelInfo is NULL and
if so do continue.  I think that would save cycles a bit.

Good point, done.

o On ExecInitPartitionInfo:

+   int         firstVarno;
+   Relation    firstResultRel;

My old compiler got "variable may be used uninitialized" warnings.

Fixed. Actually, I moved those declarations from out here into the blocks
where they're actually needed.

+   /*
+    * Build the RETURNING projection if any for the partition.  Note that
+    * we didn't build the returningList for each partition within the
+    * planner, but simple translation of the varattnos will suffice.
+    * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+    * ExecInitModifyTable() would've initialized this.
+    */

I think the last comment should be the same as for WCO lists: "This only
occurs for the INSERT case or in the case of UPDATE for which we didn't
find a result rel above to reuse."

Fixed. The "above" is no longer needed, because there is no code left in
ExecInitPartitionInfo() to find UPDATE result rels to reuse. That code is
now in ExecSetupPartitionTupleRouting().

+       /*
+        * Initialize result tuple slot and assign its rowtype using the
first
+        * RETURNING list.  We assume the rest will look the same.
+        */
+       tupDesc = ExecTypeFromTL(returningList, false);
+
+       /* Set up a slot for the output of the RETURNING projection(s) */
+       ExecInitResultTupleSlot(estate, &mtstate->ps);
+       ExecAssignResultType(&mtstate->ps, tupDesc);
+       slot = mtstate->ps.ps_ResultTupleSlot;
+
+       /* Need an econtext too */
+       if (mtstate->ps.ps_ExprContext == NULL)
+           ExecAssignExprContext(estate, &mtstate->ps);
+       econtext = mtstate->ps.ps_ExprContext;

Do we need this initialization?  I think we would already have the slot
and econtext initialized when we get here.

I think you're right. If node->returningLists is non-NULL at all,
ExecInitModifyTable() would've initialized the needed slot and expression
context. I added Assert()s to that affect.

Other than that, the patch looks good to me.

Sorry for the delay.

No problem! Thanks again.

Attached updated patch.

Thanks,
Amit

Attachments:

v8-0001-During-tuple-routing-initialize-per-partition-obj.patchtext/plain; charset=UTF-8; name=v8-0001-During-tuple-routing-initialize-per-partition-obj.patchDownload
From d4266a3072640b64f758e3de1f896fffbd9332ae Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v8] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |  10 +-
 src/backend/executor/execPartition.c   | 294 +++++++++++++++++++++++----------
 src/backend/executor/nodeModifyTable.c | 131 ++-------------
 src/include/executor/execPartition.h   |   9 +-
 4 files changed, 237 insertions(+), 207 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df9af..118452b602 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2607,6 +2607,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+				resultRelInfo = ExecInitPartitionInfo(NULL,
+													  saved_resultRelInfo,
+													  proute, estate,
+													  leaf_part_index);
+				Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4048c3ebc6..efcb7f134b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already have initialized ResultRelInfo's for
+ * some of the partitions, which are reused and assigned to their respective
+ * slot in the aforementioned array.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
+	ResultRelInfo *update_rri = NULL;
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	bool		is_update = false;
@@ -76,6 +81,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
+											sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -95,16 +102,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
-	{
-		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
-		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
-	}
 
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
@@ -117,11 +114,10 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
+		ResultRelInfo *leaf_part_rri = NULL;
 		Oid			leaf_oid = lfirst_oid(cell);
 
+		proute->partition_oids[i] = leaf_oid;
 		if (is_update)
 		{
 			/*
@@ -136,6 +132,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 			if (update_rri_index < num_update_rri &&
 				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
 			{
+				Relation	partrel;
+				TupleDesc	part_tupdesc;
+
 				leaf_part_rri = &update_rri[update_rri_index];
 				partrel = leaf_part_rri->ri_RelationDesc;
 
@@ -151,73 +150,26 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 				proute->subplan_partition_offsets[update_rri_index] = i;
 
 				update_rri_index++;
+
+				part_tupdesc = RelationGetDescr(partrel);
+
+				/*
+				 * Save a tuple conversion map to convert a tuple routed to
+				 * this partition from the parent's type to the partition's.
+				 */
+				proute->parent_child_tupconv_maps[i] =
+					convert_tuples_by_name(tupDesc, part_tupdesc,
+							   gettext_noop("could not convert row type"));
+
+				/*
+				 * Verify result relation is a valid target for an INSERT.  An
+				 * UPDATE of a partition-key becomes a DELETE+INSERT operation,
+				 * so this check is required even when the operation is
+				 * CMD_UPDATE.
+				 */
+				CheckValidResultRel(leaf_part_rri, CMD_INSERT);
 			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
 		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-
-			/*
-			 * Since we've just initialized this ResultRelInfo, it's not in
-			 * any list attached to the estate as yet.  Add it, so that it can
-			 * be found later.
-			 */
-			estate->es_tuple_routing_result_relations =
-						lappend(estate->es_tuple_routing_result_relations,
-								leaf_part_rri);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
 
 		proute->partitions[i] = leaf_part_rri;
 		i++;
@@ -352,6 +304,178 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 }
 
 /*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ *
+ * Returns the ResultRelInfo
+ */
+ResultRelInfo *
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	rootrel = resultRelInfo->ri_RelationDesc,
+				partrel;
+	ResultRelInfo *leaf_part_rri;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+	InitResultRelInfo(leaf_part_rri,
+					  partrel,
+					  node ? node->nominalRelation : 1,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	/*
+	 * Since we've just initialized this ResultRelInfo, it's not in
+	 * any list attached to the estate as yet.  Add it, so that it can
+	 * be found later.
+	 *
+	 * Note that the entries in this list appear in no predetermined
+	 * order, because partition result rels are initialized as and when
+	 * they're needed.
+	 */
+	estate->es_tuple_routing_result_relations =
+					lappend(estate->es_tuple_routing_result_relations,
+							leaf_part_rri);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		leaf_part_rri->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(leaf_part_rri,
+						(mtstate != NULL &&
+						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
+	 * didn't build the withCheckOptionList for partitions within the planner,
+	 * but simple translation of varattnos will suffice.  This only occurs for
+	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
+	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *mapped_wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/*
+		 * In the case of INSERT on partitioned tables, there is only one
+		 * plan.  Likewise, there is only one WCO list, not one per
+		 * partition.  For UPDATE, there would be as many WCO lists as
+		 * there are plans, but we use the first one as reference.  Note
+		 * that if there are SubPlans in there, they all end up attached
+		 * to the one parent Plan node.
+		 */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->withCheckOptionLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->withCheckOptionLists) ==
+				list_length(node->plans)));
+		wcoList = linitial(node->withCheckOptionLists);
+
+		mapped_wcoList = map_partition_varattnos(wcoList, firstVarno,
+												 partrel, firstResultRel,
+												 NULL);
+		foreach(ll, mapped_wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		leaf_part_rri->ri_WithCheckOptions = mapped_wcoList;
+		leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build the RETURNING projection if any for the partition.  Note that
+	 * we didn't build the returningList for partitions within the planner,
+	 * but simple translation of varattnos will suffice.  This only occurs for
+	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
+	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		List	   *rlist;
+		TupleDesc	tupDesc;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/* See the comment written above for WCO lists. */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->returningLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->returningLists) ==
+				list_length(node->plans)));
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Use the slot that would have been set up in ExecInitModifyTable()
+		 * for the output of the RETURNING projection(s).  Just make sure to
+		 * assign its rowtype using the RETURNING list.
+		 */
+		Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+		tupDesc = ExecTypeFromTL(returningList, false);
+		ExecAssignResultType(&mtstate->ps, tupDesc);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+
+		/* An expression context must have been set up too */
+		Assert(mtstate->ps.ps_ExprContext != NULL);
+		econtext = mtstate->ps.ps_ExprContext;
+
+		rlist = map_partition_varattnos(returningList, firstVarno,
+										partrel, firstResultRel, NULL);
+		leaf_part_rri->ri_projectReturning =
+			ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+									RelationGetDescr(partrel));
+	}
+
+	Assert (proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+
+	return leaf_part_rri;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -477,6 +601,10 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 	{
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
+		/* skip further processsing for uninitialized partitions */
+		if (resultRelInfo == NULL)
+			continue;
+
 		/*
 		 * If this result rel is one of the UPDATE subplan result rels, let
 		 * ExecEndPlan() close it. For INSERT or COPY,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a8ecbd830..36e2041755 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -310,6 +310,14 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecInitPartitionInfo(mtstate,
+												  saved_resultRelInfo,
+												  proute, estate,
+												  leaf_part_index);
+			Assert(resultRelInfo != NULL);
+		}
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2098,14 +2106,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2228,20 +2232,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2288,77 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2389,44 +2316,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..e94718608f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
+ * partition_oids				Array of leaf partitions OIDs
  * partitions					Array of ResultRelInfo* objects with one entry
  *								for every leaf partition in the partition tree.
  * num_partitions				Number of leaf partitions in the partition tree
@@ -91,6 +92,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +105,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
-- 
2.11.0

#27Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#26)
Re: non-bulk inserts and tuple routing

(2018/02/16 10:49), Amit Langote wrote:

On 2018/02/15 21:10, Etsuro Fujita wrote:

here are some minor comments:

o On changes to ExecCleanupTupleRouting:

-       ExecCloseIndices(resultRelInfo);
-       heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+       if (resultRelInfo)
+       {
+           ExecCloseIndices(resultRelInfo);
+           heap_close(resultRelInfo->ri_RelationDesc, NoLock);
+       }

You might check at the top of the loop whether resultRelInfo is NULL and
if so do continue. I think that would save cycles a bit.

Good point, done.

Thanks.

o On ExecInitPartitionInfo:

+ int firstVarno;
+ Relation firstResultRel;

My old compiler got "variable may be used uninitialized" warnings.

Fixed. Actually, I moved those declarations from out here into the blocks
where they're actually needed.

OK, my compiler gets no warnings now.

+   /*
+    * Build the RETURNING projection if any for the partition.  Note that
+    * we didn't build the returningList for each partition within the
+    * planner, but simple translation of the varattnos will suffice.
+    * This only occurs for the INSERT case; in the UPDATE/DELETE cases,
+    * ExecInitModifyTable() would've initialized this.
+    */

I think the last comment should be the same as for WCO lists: "This only
occurs for the INSERT case or in the case of UPDATE for which we didn't
find a result rel above to reuse."

Fixed. The "above" is no longer needed, because there is no code left in
ExecInitPartitionInfo() to find UPDATE result rels to reuse. That code is
now in ExecSetupPartitionTupleRouting().

OK, that makes sense.

+       /*
+        * Initialize result tuple slot and assign its rowtype using the
first
+        * RETURNING list.  We assume the rest will look the same.
+        */
+       tupDesc = ExecTypeFromTL(returningList, false);
+
+       /* Set up a slot for the output of the RETURNING projection(s) */
+       ExecInitResultTupleSlot(estate,&mtstate->ps);
+       ExecAssignResultType(&mtstate->ps, tupDesc);
+       slot = mtstate->ps.ps_ResultTupleSlot;
+
+       /* Need an econtext too */
+       if (mtstate->ps.ps_ExprContext == NULL)
+           ExecAssignExprContext(estate,&mtstate->ps);
+       econtext = mtstate->ps.ps_ExprContext;

Do we need this initialization? I think we would already have the slot
and econtext initialized when we get here.

I think you're right. If node->returningLists is non-NULL at all,
ExecInitModifyTable() would've initialized the needed slot and expression
context. I added Assert()s to that affect.

OK, but one thing I'd like to ask is:

+       /*
+        * Use the slot that would have been set up in ExecInitModifyTable()
+        * for the output of the RETURNING projection(s).  Just make sure to
+        * assign its rowtype using the RETURNING list.
+        */
+       Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+       tupDesc = ExecTypeFromTL(returningList, false);
+       ExecAssignResultType(&mtstate->ps, tupDesc);
+       slot = mtstate->ps.ps_ResultTupleSlot;

Do we need that assignment here?

Attached updated patch.

Thanks for updating the patch!

Best regards,
Etsuro Fujita

#28Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#27)
1 attachment(s)
Re: non-bulk inserts and tuple routing

On 2018/02/16 12:41, Etsuro Fujita wrote:

(2018/02/16 10:49), Amit Langote wrote:

I think you're right.  If node->returningLists is non-NULL at all,
ExecInitModifyTable() would've initialized the needed slot and expression
context.  I added Assert()s to that affect.

OK, but one thing I'd like to ask is:

+       /*
+        * Use the slot that would have been set up in ExecInitModifyTable()
+        * for the output of the RETURNING projection(s).  Just make sure to
+        * assign its rowtype using the RETURNING list.
+        */
+       Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+       tupDesc = ExecTypeFromTL(returningList, false);
+       ExecAssignResultType(&mtstate->ps, tupDesc);
+       slot = mtstate->ps.ps_ResultTupleSlot;

Do we need that assignment here?

I guess mean the assignment of rowtype, that is, the
ExecAssignResultType() line. On looking at this some more, it looks like
we don't need to ExecAssignResultType here, as you seem to be suspecting,
because we want the RETURNING projection output to use the rowtype of the
first of returningLists and that's what mtstate->ps.ps_ResultTupleSlot has
been set to use in the first place. So, removed the ExecAssignResultType().

Attached v9. Thanks a for the review!

Regards,
Amit

Attachments:

v9-0001-During-tuple-routing-initialize-per-partition-obj.patchtext/plain; charset=UTF-8; name=v9-0001-During-tuple-routing-initialize-per-partition-obj.patchDownload
From e74ce73c574bc1892fa957b105adccf1eada1c5f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v9] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |  10 +-
 src/backend/executor/execPartition.c   | 305 ++++++++++++++++++++++++---------
 src/backend/executor/nodeModifyTable.c | 131 ++------------
 src/include/executor/execPartition.h   |   9 +-
 4 files changed, 248 insertions(+), 207 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df9af..118452b602 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2607,6 +2607,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+				resultRelInfo = ExecInitPartitionInfo(NULL,
+													  saved_resultRelInfo,
+													  proute, estate,
+													  leaf_part_index);
+				Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4048c3ebc6..5fa6e2de09 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already have initialized ResultRelInfo's for
+ * some of the partitions, which are reused and assigned to their respective
+ * slot in the aforementioned array.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
+	ResultRelInfo *update_rri = NULL;
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	bool		is_update = false;
@@ -76,6 +81,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
+											sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -95,16 +102,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
-	{
-		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
-		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
-	}
 
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
@@ -117,11 +114,10 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
+		ResultRelInfo *leaf_part_rri = NULL;
 		Oid			leaf_oid = lfirst_oid(cell);
 
+		proute->partition_oids[i] = leaf_oid;
 		if (is_update)
 		{
 			/*
@@ -136,6 +132,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 			if (update_rri_index < num_update_rri &&
 				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
 			{
+				Relation	partrel;
+				TupleDesc	part_tupdesc;
+
 				leaf_part_rri = &update_rri[update_rri_index];
 				partrel = leaf_part_rri->ri_RelationDesc;
 
@@ -151,73 +150,26 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 				proute->subplan_partition_offsets[update_rri_index] = i;
 
 				update_rri_index++;
+
+				part_tupdesc = RelationGetDescr(partrel);
+
+				/*
+				 * Save a tuple conversion map to convert a tuple routed to
+				 * this partition from the parent's type to the partition's.
+				 */
+				proute->parent_child_tupconv_maps[i] =
+					convert_tuples_by_name(tupDesc, part_tupdesc,
+							   gettext_noop("could not convert row type"));
+
+				/*
+				 * Verify result relation is a valid target for an INSERT.  An
+				 * UPDATE of a partition-key becomes a DELETE+INSERT operation,
+				 * so this check is required even when the operation is
+				 * CMD_UPDATE.
+				 */
+				CheckValidResultRel(leaf_part_rri, CMD_INSERT);
 			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
 		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-
-			/*
-			 * Since we've just initialized this ResultRelInfo, it's not in
-			 * any list attached to the estate as yet.  Add it, so that it can
-			 * be found later.
-			 */
-			estate->es_tuple_routing_result_relations =
-						lappend(estate->es_tuple_routing_result_relations,
-								leaf_part_rri);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
 
 		proute->partitions[i] = leaf_part_rri;
 		i++;
@@ -352,6 +304,189 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 }
 
 /*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ *
+ * Returns the ResultRelInfo
+ */
+ResultRelInfo *
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	rootrel = resultRelInfo->ri_RelationDesc,
+				partrel;
+	ResultRelInfo *leaf_part_rri;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+	InitResultRelInfo(leaf_part_rri,
+					  partrel,
+					  node ? node->nominalRelation : 1,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	/*
+	 * Since we've just initialized this ResultRelInfo, it's not in
+	 * any list attached to the estate as yet.  Add it, so that it can
+	 * be found later.
+	 *
+	 * Note that the entries in this list appear in no predetermined
+	 * order, because partition result rels are initialized as and when
+	 * they're needed.
+	 */
+	estate->es_tuple_routing_result_relations =
+					lappend(estate->es_tuple_routing_result_relations,
+							leaf_part_rri);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		leaf_part_rri->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(leaf_part_rri,
+						(mtstate != NULL &&
+						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
+	 * didn't build the withCheckOptionList for partitions within the planner,
+	 * but simple translation of varattnos will suffice.  This only occurs for
+	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
+	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/*
+		 * In the case of INSERT on partitioned tables, there is only one
+		 * plan.  Likewise, there is only one WCO list, not one per
+		 * partition.  For UPDATE, there would be as many WCO lists as
+		 * there are plans, but we use the first one as reference.  Note
+		 * that if there are SubPlans in there, they all end up attached
+		 * to the one parent Plan node.
+		 */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->withCheckOptionLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->withCheckOptionLists) ==
+				list_length(node->plans)));
+		wcoList = linitial(node->withCheckOptionLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
+		 * reference to calculate attno's for the returning expression of
+		 * this partition.  In the INSERT case, that refers to the root
+		 * partitioned table, whereas in the UPDATE tuple routing case the
+		 * first partition in the mtstate->resultRelInfo array.  In any case,
+		 * both that relation and this partition should have the same columns,
+		 * so we should be able to map attributes successfully.
+		 */
+		wcoList = map_partition_varattnos(wcoList, firstVarno,
+										  partrel, firstResultRel, NULL);
+		foreach(ll, wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		leaf_part_rri->ri_WithCheckOptions = wcoList;
+		leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build the RETURNING projection for the partition.  Note that we didn't
+	 * build the returningList for partitions within the planner, but simple
+	 * translation of varattnos will suffice.  This only occurs for the INSERT
+	 * case or in the case of UPDATE tuple routing where we didn't find a
+	 * result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/* See the comment written above for WCO lists. */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->returningLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->returningLists) ==
+				list_length(node->plans)));
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 * Use the returning expression of the first resultRelInfo as a
+		 * reference to calculate attno's for the returning expression of
+		 * each of the partitions.  See the comment above for WCO list for
+		 * more details on why this is okay.
+		 */
+		returningList = map_partition_varattnos(returningList, firstVarno,
+												partrel, firstResultRel,
+												NULL);
+
+		/*
+		 * Initialize the projection itself.
+		 *
+		 * Use the slot and the expression context that would have been set up
+		 * in ExecInitModifyTable() for projection's output.
+		 */
+		Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+		Assert(mtstate->ps.ps_ExprContext != NULL);
+		econtext = mtstate->ps.ps_ExprContext;
+		leaf_part_rri->ri_projectReturning =
+			ExecBuildProjectionInfo(returningList, econtext, slot,
+									&mtstate->ps, RelationGetDescr(partrel));
+	}
+
+	Assert (proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+
+	return leaf_part_rri;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -477,6 +612,10 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 	{
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
+		/* skip further processsing for uninitialized partitions */
+		if (resultRelInfo == NULL)
+			continue;
+
 		/*
 		 * If this result rel is one of the UPDATE subplan result rels, let
 		 * ExecEndPlan() close it. For INSERT or COPY,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a8ecbd830..36e2041755 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -310,6 +310,14 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecInitPartitionInfo(mtstate,
+												  saved_resultRelInfo,
+												  proute, estate,
+												  leaf_part_index);
+			Assert(resultRelInfo != NULL);
+		}
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2098,14 +2106,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2228,20 +2232,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2288,77 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2389,44 +2316,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..e94718608f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
+ * partition_oids				Array of leaf partitions OIDs
  * partitions					Array of ResultRelInfo* objects with one entry
  *								for every leaf partition in the partition tree.
  * num_partitions				Number of leaf partitions in the partition tree
@@ -91,6 +92,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +105,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
-- 
2.11.0

#29Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#28)
Re: non-bulk inserts and tuple routing

(2018/02/16 13:42), Amit Langote wrote:

On 2018/02/16 12:41, Etsuro Fujita wrote:

(2018/02/16 10:49), Amit Langote wrote:

I think you're right. If node->returningLists is non-NULL at all,
ExecInitModifyTable() would've initialized the needed slot and expression
context. I added Assert()s to that affect.

OK, but one thing I'd like to ask is:

+       /*
+        * Use the slot that would have been set up in ExecInitModifyTable()
+        * for the output of the RETURNING projection(s).  Just make sure to
+        * assign its rowtype using the RETURNING list.
+        */
+       Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+       tupDesc = ExecTypeFromTL(returningList, false);
+       ExecAssignResultType(&mtstate->ps, tupDesc);
+       slot = mtstate->ps.ps_ResultTupleSlot;

Do we need that assignment here?

I guess mean the assignment of rowtype, that is, the
ExecAssignResultType() line.

That's right.

On looking at this some more, it looks like
we don't need to ExecAssignResultType here, as you seem to be suspecting,
because we want the RETURNING projection output to use the rowtype of the
first of returningLists and that's what mtstate->ps.ps_ResultTupleSlot has
been set to use in the first place.

Year, I think so, too.

So, removed the ExecAssignResultType().

OK, thinks.

Attached v9. Thanks a for the review!

Thanks for the updated patch! In the patch you added the comments:

+       wcoList = linitial(node->withCheckOptionLists);
+
+       /*
+        * Convert Vars in it to contain this partition's attribute numbers.
+        * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
+        * reference to calculate attno's for the returning expression of
+        * this partition.  In the INSERT case, that refers to the root
+        * partitioned table, whereas in the UPDATE tuple routing case the
+        * first partition in the mtstate->resultRelInfo array.  In any 
case,
+        * both that relation and this partition should have the same 
columns,
+        * so we should be able to map attributes successfully.
+        */
+       wcoList = map_partition_varattnos(wcoList, firstVarno,
+                                         partrel, firstResultRel, NULL);

This would be just nitpicking, but I think it would be better to arrange
these comments, maybe by dividing these to something like this:

/*
* Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
* reference to calculate attno's for the returning expression of
* this partition. In the INSERT case, that refers to the root
* partitioned table, whereas in the UPDATE tuple routing case the
* first partition in the mtstate->resultRelInfo array. In any
case,
* both that relation and this partition should have the same
columns,
* so we should be able to map attributes successfully.
*/
wcoList = linitial(node->withCheckOptionLists);

/*
* Convert Vars in it to contain this partition's attribute numbers.
*/
wcoList = map_partition_varattnos(wcoList, firstVarno,
partrel, firstResultRel, NULL);

I'd say the same thing to the comments you added for RETURNING.

Another thing I noticed about comments in the patch is:

+        * In the case of INSERT on partitioned tables, there is only one
+        * plan.  Likewise, there is only one WCO list, not one per
+        * partition.  For UPDATE, there would be as many WCO lists as
+        * there are plans, but we use the first one as reference.  Note
+        * that if there are SubPlans in there, they all end up attached
+        * to the one parent Plan node.

The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing
WCO constraints, which would change the place to attach SubPlans in WCO
constraints from the parent PlanState to the subplan's PlanState, which
would make the last comment obsolete. Since this change would be more
consistent with PG10, I don't think we need to update the comment as
such; I would vote for just removing that comment from here.

Best regards,
Etsuro Fujita

#30Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#29)
1 attachment(s)
Re: non-bulk inserts and tuple routing

Fujita-san,

Thanks for the review.

On 2018/02/16 18:12, Etsuro Fujita wrote:

(2018/02/16 13:42), Amit Langote wrote:

Attached v9.  Thanks a for the review!

Thanks for the updated patch!  In the patch you added the comments:

+       wcoList = linitial(node->withCheckOptionLists);
+
+       /*
+        * Convert Vars in it to contain this partition's attribute numbers.
+        * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
+        * reference to calculate attno's for the returning expression of
+        * this partition.  In the INSERT case, that refers to the root
+        * partitioned table, whereas in the UPDATE tuple routing case the
+        * first partition in the mtstate->resultRelInfo array.  In any case,
+        * both that relation and this partition should have the same
columns,
+        * so we should be able to map attributes successfully.
+        */
+       wcoList = map_partition_varattnos(wcoList, firstVarno,
+                                         partrel, firstResultRel, NULL);

This would be just nitpicking, but I think it would be better to arrange
these comments, maybe by dividing these to something like this:

       /*
        * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
        * reference to calculate attno's for the returning expression of
        * this partition.  In the INSERT case, that refers to the root
        * partitioned table, whereas in the UPDATE tuple routing case the
        * first partition in the mtstate->resultRelInfo array.  In any case,
        * both that relation and this partition should have the same columns,
        * so we should be able to map attributes successfully.
        */
       wcoList = linitial(node->withCheckOptionLists);

       /*
        * Convert Vars in it to contain this partition's attribute numbers.
        */
       wcoList = map_partition_varattnos(wcoList, firstVarno,
                                         partrel, firstResultRel, NULL);

I'd say the same thing to the comments you added for RETURNING.

Good idea. Done.

Another thing I noticed about comments in the patch is:

+        * In the case of INSERT on partitioned tables, there is only one
+        * plan.  Likewise, there is only one WCO list, not one per
+        * partition.  For UPDATE, there would be as many WCO lists as
+        * there are plans, but we use the first one as reference.  Note
+        * that if there are SubPlans in there, they all end up attached
+        * to the one parent Plan node.

The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing
WCO constraints, which would change the place to attach SubPlans in WCO
constraints from the parent PlanState to the subplan's PlanState, which
would make the last comment obsolete.  Since this change would be more
consistent with PG10, I don't think we need to update the comment as such;
I would vote for just removing that comment from here.

I have thought about removing it too, so done.

Updated patch attached.

Thanks,
Amit

Attachments:

v10-0001-During-tuple-routing-initialize-per-partition-ob.patchtext/plain; charset=UTF-8; name=v10-0001-During-tuple-routing-initialize-per-partition-ob.patchDownload
From 03c0f265537fc618d5f0de2c7b7e487b89af014d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v10] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |  10 +-
 src/backend/executor/execPartition.c   | 309 ++++++++++++++++++++++++---------
 src/backend/executor/nodeModifyTable.c | 131 ++------------
 src/include/executor/execPartition.h   |   9 +-
 4 files changed, 252 insertions(+), 207 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df9af..118452b602 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2607,6 +2607,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+				resultRelInfo = ExecInitPartitionInfo(NULL,
+													  saved_resultRelInfo,
+													  proute, estate,
+													  leaf_part_index);
+				Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4048c3ebc6..dfad50fb11 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already have initialized ResultRelInfo's for
+ * some of the partitions, which are reused and assigned to their respective
+ * slot in the aforementioned array.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
+	ResultRelInfo *update_rri = NULL;
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	bool		is_update = false;
@@ -76,6 +81,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
+											sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -95,16 +102,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
-	{
-		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
-		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
-	}
 
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
@@ -117,11 +114,10 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
+		ResultRelInfo *leaf_part_rri = NULL;
 		Oid			leaf_oid = lfirst_oid(cell);
 
+		proute->partition_oids[i] = leaf_oid;
 		if (is_update)
 		{
 			/*
@@ -136,6 +132,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 			if (update_rri_index < num_update_rri &&
 				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
 			{
+				Relation	partrel;
+				TupleDesc	part_tupdesc;
+
 				leaf_part_rri = &update_rri[update_rri_index];
 				partrel = leaf_part_rri->ri_RelationDesc;
 
@@ -151,73 +150,26 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 				proute->subplan_partition_offsets[update_rri_index] = i;
 
 				update_rri_index++;
+
+				part_tupdesc = RelationGetDescr(partrel);
+
+				/*
+				 * Save a tuple conversion map to convert a tuple routed to
+				 * this partition from the parent's type to the partition's.
+				 */
+				proute->parent_child_tupconv_maps[i] =
+					convert_tuples_by_name(tupDesc, part_tupdesc,
+							   gettext_noop("could not convert row type"));
+
+				/*
+				 * Verify result relation is a valid target for an INSERT.  An
+				 * UPDATE of a partition-key becomes a DELETE+INSERT operation,
+				 * so this check is required even when the operation is
+				 * CMD_UPDATE.
+				 */
+				CheckValidResultRel(leaf_part_rri, CMD_INSERT);
 			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
 		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-
-			/*
-			 * Since we've just initialized this ResultRelInfo, it's not in
-			 * any list attached to the estate as yet.  Add it, so that it can
-			 * be found later.
-			 */
-			estate->es_tuple_routing_result_relations =
-						lappend(estate->es_tuple_routing_result_relations,
-								leaf_part_rri);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
 
 		proute->partitions[i] = leaf_part_rri;
 		i++;
@@ -352,6 +304,193 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 }
 
 /*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ *
+ * Returns the ResultRelInfo
+ */
+ResultRelInfo *
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	rootrel = resultRelInfo->ri_RelationDesc,
+				partrel;
+	ResultRelInfo *leaf_part_rri;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+	InitResultRelInfo(leaf_part_rri,
+					  partrel,
+					  node ? node->nominalRelation : 1,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	/*
+	 * Since we've just initialized this ResultRelInfo, it's not in
+	 * any list attached to the estate as yet.  Add it, so that it can
+	 * be found later.
+	 *
+	 * Note that the entries in this list appear in no predetermined
+	 * order, because partition result rels are initialized as and when
+	 * they're needed.
+	 */
+	estate->es_tuple_routing_result_relations =
+					lappend(estate->es_tuple_routing_result_relations,
+							leaf_part_rri);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		leaf_part_rri->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(leaf_part_rri,
+						(mtstate != NULL &&
+						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
+	 * didn't build the withCheckOptionList for partitions within the planner,
+	 * but simple translation of varattnos will suffice.  This only occurs for
+	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
+	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/*
+		 * In the case of INSERT on partitioned tables, there is only one
+		 * plan.  Likewise, there is only one WCO list, not one per
+		 * partition.  For UPDATE, there would be as many WCO lists as
+		 * there are plans, but we use the first one as reference.
+		 */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->withCheckOptionLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->withCheckOptionLists) ==
+				list_length(node->plans)));
+
+		/*
+		 * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
+		 * reference to calculate attno's for the returning expression of
+		 * this partition.  In the INSERT case, that refers to the root
+		 * partitioned table, whereas in the UPDATE tuple routing case the
+		 * first partition in the mtstate->resultRelInfo array.  In any case,
+		 * both that relation and this partition should have the same columns,
+		 * so we should be able to map attributes successfully.
+		 */
+		wcoList = linitial(node->withCheckOptionLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		wcoList = map_partition_varattnos(wcoList, firstVarno,
+										  partrel, firstResultRel, NULL);
+		foreach(ll, wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		leaf_part_rri->ri_WithCheckOptions = wcoList;
+		leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build the RETURNING projection for the partition.  Note that we didn't
+	 * build the returningList for partitions within the planner, but simple
+	 * translation of varattnos will suffice.  This only occurs for the INSERT
+	 * case or in the case of UPDATE tuple routing where we didn't find a
+	 * result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/* See the comment written above for WCO lists. */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->returningLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->returningLists) ==
+				list_length(node->plans)));
+
+		/*
+		 * Use the returning expression of the first resultRelInfo as a
+		 * reference to calculate attno's for the returning expression of
+		 * each of the partitions.  See the comment above for WCO list for
+		 * more details on why this is okay.
+		 */
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Converti Vars in it to contain this partition's attribute numbers.
+		 */
+		returningList = map_partition_varattnos(returningList, firstVarno,
+												partrel, firstResultRel,
+												NULL);
+
+		/*
+		 * Initialize the projection itself.
+		 *
+		 * Use the slot and the expression context that would have been set up
+		 * in ExecInitModifyTable() for projection's output.
+		 */
+		Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+		Assert(mtstate->ps.ps_ExprContext != NULL);
+		econtext = mtstate->ps.ps_ExprContext;
+		leaf_part_rri->ri_projectReturning =
+			ExecBuildProjectionInfo(returningList, econtext, slot,
+									&mtstate->ps, RelationGetDescr(partrel));
+	}
+
+	Assert (proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+
+	return leaf_part_rri;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -477,6 +616,10 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 	{
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
+		/* skip further processsing for uninitialized partitions */
+		if (resultRelInfo == NULL)
+			continue;
+
 		/*
 		 * If this result rel is one of the UPDATE subplan result rels, let
 		 * ExecEndPlan() close it. For INSERT or COPY,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a8ecbd830..36e2041755 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -310,6 +310,14 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecInitPartitionInfo(mtstate,
+												  saved_resultRelInfo,
+												  proute, estate,
+												  leaf_part_index);
+			Assert(resultRelInfo != NULL);
+		}
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2098,14 +2106,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2228,20 +2232,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2288,77 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2389,44 +2316,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..e94718608f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
+ * partition_oids				Array of leaf partitions OIDs
  * partitions					Array of ResultRelInfo* objects with one entry
  *								for every leaf partition in the partition tree.
  * num_partitions				Number of leaf partitions in the partition tree
@@ -91,6 +92,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +105,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
-- 
2.11.0

#31Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#30)
1 attachment(s)
Re: non-bulk inserts and tuple routing

(2018/02/16 18:23), Amit Langote wrote:

On 2018/02/16 18:12, Etsuro Fujita wrote:

In the patch you added the comments:

+       wcoList = linitial(node->withCheckOptionLists);
+
+       /*
+        * Convert Vars in it to contain this partition's attribute numbers.
+        * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
+        * reference to calculate attno's for the returning expression of
+        * this partition.  In the INSERT case, that refers to the root
+        * partitioned table, whereas in the UPDATE tuple routing case the
+        * first partition in the mtstate->resultRelInfo array.  In any case,
+        * both that relation and this partition should have the same
columns,
+        * so we should be able to map attributes successfully.
+        */
+       wcoList = map_partition_varattnos(wcoList, firstVarno,
+                                         partrel, firstResultRel, NULL);

This would be just nitpicking, but I think it would be better to arrange
these comments, maybe by dividing these to something like this:

/*
* Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
* reference to calculate attno's for the returning expression of
* this partition. In the INSERT case, that refers to the root
* partitioned table, whereas in the UPDATE tuple routing case the
* first partition in the mtstate->resultRelInfo array. In any case,
* both that relation and this partition should have the same columns,
* so we should be able to map attributes successfully.
*/
wcoList = linitial(node->withCheckOptionLists);

/*
* Convert Vars in it to contain this partition's attribute numbers.
*/
wcoList = map_partition_varattnos(wcoList, firstVarno,
partrel, firstResultRel, NULL);

I'd say the same thing to the comments you added for RETURNING.

Good idea. Done.

Thanks. I fixed a typo (s/Converti/Convert/) and adjusted these
comments a bit further to match the preceding code/comments. Attached
is the updated version.

Another thing I noticed about comments in the patch is:

+        * In the case of INSERT on partitioned tables, there is only one
+        * plan.  Likewise, there is only one WCO list, not one per
+        * partition.  For UPDATE, there would be as many WCO lists as
+        * there are plans, but we use the first one as reference.  Note
+        * that if there are SubPlans in there, they all end up attached
+        * to the one parent Plan node.

The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing
WCO constraints, which would change the place to attach SubPlans in WCO
constraints from the parent PlanState to the subplan's PlanState, which
would make the last comment obsolete. Since this change would be more
consistent with PG10, I don't think we need to update the comment as such;
I would vote for just removing that comment from here.

I have thought about removing it too, so done.

OK.

Updated patch attached.

Thanks, I think the patch is in good shape, so I'll mark this as ready
for committer.

Best regards,
Etsuro Fujita

Attachments:

v10-0001-During-tuple-routing-initialize-per-partition-ob-efujita.patchtext/x-diff; name=v10-0001-During-tuple-routing-initialize-per-partition-ob-efujita.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df..118452b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2607,6 +2607,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+				resultRelInfo = ExecInitPartitionInfo(NULL,
+													  saved_resultRelInfo,
+													  proute, estate,
+													  leaf_part_index);
+				Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4048c3e..06a7607 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already have initialized ResultRelInfo's for
+ * some of the partitions, which are reused and assigned to their respective
+ * slot in the aforementioned array.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
+	ResultRelInfo *update_rri = NULL;
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	bool		is_update = false;
@@ -76,6 +81,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
+											sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -95,16 +102,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot();
 	}
-	else
-	{
-		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
-		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
-	}
 
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
@@ -117,11 +114,10 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
+		ResultRelInfo *leaf_part_rri = NULL;
 		Oid			leaf_oid = lfirst_oid(cell);
 
+		proute->partition_oids[i] = leaf_oid;
 		if (is_update)
 		{
 			/*
@@ -136,6 +132,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 			if (update_rri_index < num_update_rri &&
 				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
 			{
+				Relation	partrel;
+				TupleDesc	part_tupdesc;
+
 				leaf_part_rri = &update_rri[update_rri_index];
 				partrel = leaf_part_rri->ri_RelationDesc;
 
@@ -151,73 +150,26 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 				proute->subplan_partition_offsets[update_rri_index] = i;
 
 				update_rri_index++;
-			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
-		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
 
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-
-			/*
-			 * Since we've just initialized this ResultRelInfo, it's not in
-			 * any list attached to the estate as yet.  Add it, so that it can
-			 * be found later.
-			 */
-			estate->es_tuple_routing_result_relations =
-						lappend(estate->es_tuple_routing_result_relations,
-								leaf_part_rri);
-		}
+				part_tupdesc = RelationGetDescr(partrel);
 
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+				/*
+				 * Save a tuple conversion map to convert a tuple routed to
+				 * this partition from the parent's type to the partition's.
+				 */
+				proute->parent_child_tupconv_maps[i] =
+					convert_tuples_by_name(tupDesc, part_tupdesc,
+							   gettext_noop("could not convert row type"));
 
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
+				/*
+				 * Verify result relation is a valid target for an INSERT.  An
+				 * UPDATE of a partition-key becomes a DELETE+INSERT operation,
+				 * so this check is required even when the operation is
+				 * CMD_UPDATE.
+				 */
+				CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+			}
+		}
 
 		proute->partitions[i] = leaf_part_rri;
 		i++;
@@ -352,6 +304,193 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 }
 
 /*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ *
+ * Returns the ResultRelInfo
+ */
+ResultRelInfo *
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	rootrel = resultRelInfo->ri_RelationDesc,
+				partrel;
+	ResultRelInfo *leaf_part_rri;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+	InitResultRelInfo(leaf_part_rri,
+					  partrel,
+					  node ? node->nominalRelation : 1,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	/*
+	 * Since we've just initialized this ResultRelInfo, it's not in
+	 * any list attached to the estate as yet.  Add it, so that it can
+	 * be found later.
+	 *
+	 * Note that the entries in this list appear in no predetermined
+	 * order, because partition result rels are initialized as and when
+	 * they're needed.
+	 */
+	estate->es_tuple_routing_result_relations =
+					lappend(estate->es_tuple_routing_result_relations,
+							leaf_part_rri);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		leaf_part_rri->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(leaf_part_rri,
+						(mtstate != NULL &&
+						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
+	 * didn't build the withCheckOptionList for partitions within the planner,
+	 * but simple translation of varattnos will suffice.  This only occurs for
+	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
+	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/*
+		 * In the case of INSERT on partitioned tables, there is only one
+		 * plan.  Likewise, there is only one WCO list, not one per
+		 * partition.  For UPDATE, there would be as many WCO lists as
+		 * there are plans.
+		 */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->withCheckOptionLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->withCheckOptionLists) ==
+				list_length(node->plans)));
+
+		/*
+		 * Use the WCO list of the first plan as a reference to calculate
+		 * attno's for the WCO list of this partition.  In the INSERT case,
+		 * that refers to the root partitioned table, whereas in the UPDATE
+		 * tuple routing case, that refers to the first partition in the
+		 * mtstate->resultRelInfo array.  In any case, both that relation and
+		 * this partition should have the same columns, so we should be able
+		 * to map attributes successfully.
+		 */
+		wcoList = linitial(node->withCheckOptionLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		wcoList = map_partition_varattnos(wcoList, firstVarno,
+										  partrel, firstResultRel, NULL);
+		foreach(ll, wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		leaf_part_rri->ri_WithCheckOptions = wcoList;
+		leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build the RETURNING projection for the partition.  Note that we didn't
+	 * build the returningList for partitions within the planner, but simple
+	 * translation of varattnos will suffice.  This only occurs for the INSERT
+	 * case or in the case of UPDATE tuple routing where we didn't find a
+	 * result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/* See the comment above for WCO lists. */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->returningLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->returningLists) ==
+				list_length(node->plans)));
+
+		/*
+		 * Use the RETURNING list of the first plan as a reference to
+		 * calculate attno's for the RETURNING list of this partition.  See
+		 * the comment above for WCO lists for more details on why this is
+		 * okay.
+		 */
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		returningList = map_partition_varattnos(returningList, firstVarno,
+												partrel, firstResultRel,
+												NULL);
+
+		/*
+		 * Initialize the projection itself.
+		 *
+		 * Use the slot and the expression context that would have been set up
+		 * in ExecInitModifyTable() for projection's output.
+		 */
+		Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+		Assert(mtstate->ps.ps_ExprContext != NULL);
+		econtext = mtstate->ps.ps_ExprContext;
+		leaf_part_rri->ri_projectReturning =
+			ExecBuildProjectionInfo(returningList, econtext, slot,
+									&mtstate->ps, RelationGetDescr(partrel));
+	}
+
+	Assert (proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+
+	return leaf_part_rri;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -477,6 +616,10 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 	{
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
+		/* skip further processsing for uninitialized partitions */
+		if (resultRelInfo == NULL)
+			continue;
+
 		/*
 		 * If this result rel is one of the UPDATE subplan result rels, let
 		 * ExecEndPlan() close it. For INSERT or COPY,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a8ecbd..36e2041 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -310,6 +310,14 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecInitPartitionInfo(mtstate,
+												  saved_resultRelInfo,
+												  proute, estate,
+												  leaf_part_index);
+			Assert(resultRelInfo != NULL);
+		}
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2098,14 +2106,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2228,20 +2232,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2288,77 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2389,44 +2316,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c49..e947186 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
+ * partition_oids				Array of leaf partitions OIDs
  * partitions					Array of ResultRelInfo* objects with one entry
  *								for every leaf partition in the partition tree.
  * num_partitions				Number of leaf partitions in the partition tree
@@ -91,6 +92,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +105,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
#32Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#31)
1 attachment(s)
Re: non-bulk inserts and tuple routing

Fujita-san,

On 2018/02/16 19:50, Etsuro Fujita wrote:

(2018/02/16 18:23), Amit Langote wrote:

Good idea. Done.

Thanks.  I fixed a typo (s/Converti/Convert/) and adjusted these comments
a bit further to match the preceding code/comments.  Attached is the
updated version.

Thank you for updating the patch.

Updated patch attached.

Thanks, I think the patch is in good shape, so I'll mark this as ready for
committer.

Thanks a lot for reviewing!

Attached rebased patch.

Regards,
Amit

Attachments:

v11-0001-During-tuple-routing-initialize-per-partition-ob.patchtext/plain; charset=UTF-8; name=v11-0001-During-tuple-routing-initialize-per-partition-ob.patchDownload
From 20852fe6cc2f1b8bd57d8bae528083c9f8092dab Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 19 Feb 2018 13:15:44 +0900
Subject: [PATCH v11] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c            |  10 +-
 src/backend/executor/execPartition.c   | 309 ++++++++++++++++++++++++---------
 src/backend/executor/nodeModifyTable.c | 131 ++------------
 src/include/executor/execPartition.h   |   9 +-
 4 files changed, 252 insertions(+), 207 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d5883c98d1..4562a5121d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2469,7 +2469,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2606,6 +2606,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+				resultRelInfo = ExecInitPartitionInfo(NULL,
+													  saved_resultRelInfo,
+													  proute, estate,
+													  leaf_part_index);
+				Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 00523ce250..04f76b123a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already have initialized ResultRelInfo's for
+ * some of the partitions, which are reused and assigned to their respective
+ * slot in the aforementioned array.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
+	ResultRelInfo *update_rri = NULL;
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	bool		is_update = false;
@@ -76,6 +81,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
+											sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -95,16 +102,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot(NULL);
 	}
-	else
-	{
-		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
-		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
-	}
 
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
@@ -117,11 +114,10 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
+		ResultRelInfo *leaf_part_rri = NULL;
 		Oid			leaf_oid = lfirst_oid(cell);
 
+		proute->partition_oids[i] = leaf_oid;
 		if (is_update)
 		{
 			/*
@@ -136,6 +132,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 			if (update_rri_index < num_update_rri &&
 				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
 			{
+				Relation	partrel;
+				TupleDesc	part_tupdesc;
+
 				leaf_part_rri = &update_rri[update_rri_index];
 				partrel = leaf_part_rri->ri_RelationDesc;
 
@@ -151,73 +150,26 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 				proute->subplan_partition_offsets[update_rri_index] = i;
 
 				update_rri_index++;
+
+				part_tupdesc = RelationGetDescr(partrel);
+
+				/*
+				 * Save a tuple conversion map to convert a tuple routed to
+				 * this partition from the parent's type to the partition's.
+				 */
+				proute->parent_child_tupconv_maps[i] =
+					convert_tuples_by_name(tupDesc, part_tupdesc,
+							   gettext_noop("could not convert row type"));
+
+				/*
+				 * Verify result relation is a valid target for an INSERT.  An
+				 * UPDATE of a partition-key becomes a DELETE+INSERT operation,
+				 * so this check is required even when the operation is
+				 * CMD_UPDATE.
+				 */
+				CheckValidResultRel(leaf_part_rri, CMD_INSERT);
 			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
 		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
-
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-
-			/*
-			 * Since we've just initialized this ResultRelInfo, it's not in
-			 * any list attached to the estate as yet.  Add it, so that it can
-			 * be found later.
-			 */
-			estate->es_tuple_routing_result_relations =
-						lappend(estate->es_tuple_routing_result_relations,
-								leaf_part_rri);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
-
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
-
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
 
 		proute->partitions[i] = leaf_part_rri;
 		i++;
@@ -352,6 +304,193 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 }
 
 /*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ *
+ * Returns the ResultRelInfo
+ */
+ResultRelInfo *
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	rootrel = resultRelInfo->ri_RelationDesc,
+				partrel;
+	ResultRelInfo *leaf_part_rri;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+	InitResultRelInfo(leaf_part_rri,
+					  partrel,
+					  node ? node->nominalRelation : 1,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	/*
+	 * Since we've just initialized this ResultRelInfo, it's not in
+	 * any list attached to the estate as yet.  Add it, so that it can
+	 * be found later.
+	 *
+	 * Note that the entries in this list appear in no predetermined
+	 * order, because partition result rels are initialized as and when
+	 * they're needed.
+	 */
+	estate->es_tuple_routing_result_relations =
+					lappend(estate->es_tuple_routing_result_relations,
+							leaf_part_rri);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		leaf_part_rri->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(leaf_part_rri,
+						(mtstate != NULL &&
+						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
+	 * didn't build the withCheckOptionList for partitions within the planner,
+	 * but simple translation of varattnos will suffice.  This only occurs for
+	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
+	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/*
+		 * In the case of INSERT on partitioned tables, there is only one
+		 * plan.  Likewise, there is only one WCO list, not one per
+		 * partition.  For UPDATE, there would be as many WCO lists as
+		 * there are plans.
+		 */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->withCheckOptionLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->withCheckOptionLists) ==
+				list_length(node->plans)));
+
+		/*
+		 * Use the WCO list of the first plan as a reference to calculate
+		 * attno's for the WCO list of this partition.  In the INSERT case,
+		 * that refers to the root partitioned table, whereas in the UPDATE
+		 * tuple routing case, that refers to the first partition in the
+		 * mtstate->resultRelInfo array.  In any case, both that relation and
+		 * this partition should have the same columns, so we should be able
+		 * to map attributes successfully.
+		 */
+		wcoList = linitial(node->withCheckOptionLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		wcoList = map_partition_varattnos(wcoList, firstVarno,
+										  partrel, firstResultRel, NULL);
+		foreach(ll, wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		leaf_part_rri->ri_WithCheckOptions = wcoList;
+		leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build the RETURNING projection for the partition.  Note that we didn't
+	 * build the returningList for partitions within the planner, but simple
+	 * translation of varattnos will suffice.  This only occurs for the INSERT
+	 * case or in the case of UPDATE tuple routing where we didn't find a
+	 * result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/* See the comment above for WCO lists. */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->returningLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->returningLists) ==
+				list_length(node->plans)));
+
+		/*
+		 * Use the RETURNING list of the first plan as a reference to
+		 * calculate attno's for the RETURNING list of this partition.  See
+		 * the comment above for WCO lists for more details on why this is
+		 * okay.
+		 */
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		returningList = map_partition_varattnos(returningList, firstVarno,
+												partrel, firstResultRel,
+												NULL);
+
+		/*
+		 * Initialize the projection itself.
+		 *
+		 * Use the slot and the expression context that would have been set up
+		 * in ExecInitModifyTable() for projection's output.
+		 */
+		Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+		Assert(mtstate->ps.ps_ExprContext != NULL);
+		econtext = mtstate->ps.ps_ExprContext;
+		leaf_part_rri->ri_projectReturning =
+			ExecBuildProjectionInfo(returningList, econtext, slot,
+									&mtstate->ps, RelationGetDescr(partrel));
+	}
+
+	Assert (proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+
+	return leaf_part_rri;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -477,6 +616,10 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 	{
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
+		/* skip further processsing for uninitialized partitions */
+		if (resultRelInfo == NULL)
+			continue;
+
 		/*
 		 * If this result rel is one of the UPDATE subplan result rels, let
 		 * ExecEndPlan() close it. For INSERT or COPY,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 93c03cfb07..87a4a92072 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -310,6 +310,14 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecInitPartitionInfo(mtstate,
+												  saved_resultRelInfo,
+												  proute, estate,
+												  leaf_part_index);
+			Assert(resultRelInfo != NULL);
+		}
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2098,14 +2106,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2228,20 +2232,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2288,77 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2388,44 +2315,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..e94718608f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
+ * partition_oids				Array of leaf partitions OIDs
  * partitions					Array of ResultRelInfo* objects with one entry
  *								for every leaf partition in the partition tree.
  * num_partitions				Number of leaf partitions in the partition tree
@@ -91,6 +92,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +105,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
-- 
2.11.0

#33Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#32)
1 attachment(s)
Re: non-bulk inserts and tuple routing

(2018/02/19 13:19), Amit Langote wrote:

Attached rebased patch.

Thanks for the rebased patch!

One thing I noticed while updating the
tuple-routing-for-foreign-partitions patch on top of this is: we should
switch into the per-query memory context in ExecInitPartitionInfo.
Attached is an updated version for that.

Best regards,
Etsuro Fujita

Attachments:

v11-0001-During-tuple-routing-initialize-per-partition-ob-efujita.patchtext/x-diff; name=v11-0001-During-tuple-routing-initialize-per-partition-ob-efujita.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d5883c9..4562a51 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2469,7 +2469,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2606,6 +2606,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+				resultRelInfo = ExecInitPartitionInfo(NULL,
+													  saved_resultRelInfo,
+													  proute, estate,
+													  leaf_part_index);
+				Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 00523ce..d35dac1 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already have initialized ResultRelInfo's for
+ * some of the partitions, which are reused and assigned to their respective
+ * slot in the aforementioned array.
  */
 PartitionTupleRouting *
-ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate)
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 {
 	TupleDesc	tupDesc = RelationGetDescr(rel);
 	List	   *leaf_parts;
 	ListCell   *cell;
 	int			i;
-	ResultRelInfo *leaf_part_arr = NULL,
-			   *update_rri = NULL;
+	ResultRelInfo *update_rri = NULL;
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	bool		is_update = false;
@@ -76,6 +81,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	proute->parent_child_tupconv_maps =
 		(TupleConversionMap **) palloc0(proute->num_partitions *
 										sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
+											sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -95,16 +102,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 		 */
 		proute->root_tuple_slot = MakeTupleTableSlot(NULL);
 	}
-	else
-	{
-		/*
-		 * Since we are inserting tuples, we need to create all new result
-		 * rels. Avoid repeated pallocs by allocating memory for all the
-		 * result rels in bulk.
-		 */
-		leaf_part_arr = (ResultRelInfo *) palloc0(proute->num_partitions *
-												  sizeof(ResultRelInfo));
-	}
 
 	/*
 	 * Initialize an empty slot that will be used to manipulate tuples of any
@@ -117,11 +114,10 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
-		ResultRelInfo *leaf_part_rri;
-		Relation	partrel = NULL;
-		TupleDesc	part_tupdesc;
+		ResultRelInfo *leaf_part_rri = NULL;
 		Oid			leaf_oid = lfirst_oid(cell);
 
+		proute->partition_oids[i] = leaf_oid;
 		if (is_update)
 		{
 			/*
@@ -136,6 +132,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 			if (update_rri_index < num_update_rri &&
 				RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
 			{
+				Relation	partrel;
+				TupleDesc	part_tupdesc;
+
 				leaf_part_rri = &update_rri[update_rri_index];
 				partrel = leaf_part_rri->ri_RelationDesc;
 
@@ -151,73 +150,26 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
 				proute->subplan_partition_offsets[update_rri_index] = i;
 
 				update_rri_index++;
-			}
-			else
-				leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
-		}
-		else
-		{
-			/* For INSERTs, we already have an array of result rels allocated */
-			leaf_part_rri = &leaf_part_arr[i];
-		}
 
-		/*
-		 * If we didn't open the partition rel, it means we haven't
-		 * initialized the result rel either.
-		 */
-		if (!partrel)
-		{
-			/*
-			 * We locked all the partitions above including the leaf
-			 * partitions. Note that each of the newly opened relations in
-			 * proute->partitions are eventually closed by the caller.
-			 */
-			partrel = heap_open(leaf_oid, NoLock);
-			InitResultRelInfo(leaf_part_rri,
-							  partrel,
-							  resultRTindex,
-							  rel,
-							  estate->es_instrument);
-
-			/*
-			 * Since we've just initialized this ResultRelInfo, it's not in
-			 * any list attached to the estate as yet.  Add it, so that it can
-			 * be found later.
-			 */
-			estate->es_tuple_routing_result_relations =
-						lappend(estate->es_tuple_routing_result_relations,
-								leaf_part_rri);
-		}
-
-		part_tupdesc = RelationGetDescr(partrel);
-
-		/*
-		 * Save a tuple conversion map to convert a tuple routed to this
-		 * partition from the parent's type to the partition's.
-		 */
-		proute->parent_child_tupconv_maps[i] =
-			convert_tuples_by_name(tupDesc, part_tupdesc,
-								   gettext_noop("could not convert row type"));
+				part_tupdesc = RelationGetDescr(partrel);
 
-		/*
-		 * Verify result relation is a valid target for an INSERT.  An UPDATE
-		 * of a partition-key becomes a DELETE+INSERT operation, so this check
-		 * is still required when the operation is CMD_UPDATE.
-		 */
-		CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+				/*
+				 * Save a tuple conversion map to convert a tuple routed to
+				 * this partition from the parent's type to the partition's.
+				 */
+				proute->parent_child_tupconv_maps[i] =
+					convert_tuples_by_name(tupDesc, part_tupdesc,
+							   gettext_noop("could not convert row type"));
 
-		/*
-		 * Open partition indices.  The user may have asked to check for
-		 * conflicts within this leaf partition and do "nothing" instead of
-		 * throwing an error.  Be prepared in that case by initializing the
-		 * index information needed by ExecInsert() to perform speculative
-		 * insertions.
-		 */
-		if (leaf_part_rri->ri_RelationDesc->rd_rel->relhasindex &&
-			leaf_part_rri->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(leaf_part_rri,
-							mtstate != NULL &&
-							mtstate->mt_onconflict != ONCONFLICT_NONE);
+				/*
+				 * Verify result relation is a valid target for an INSERT.  An
+				 * UPDATE of a partition-key becomes a DELETE+INSERT operation,
+				 * so this check is required even when the operation is
+				 * CMD_UPDATE.
+				 */
+				CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+			}
+		}
 
 		proute->partitions[i] = leaf_part_rri;
 		i++;
@@ -352,6 +304,204 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 }
 
 /*
+ * ExecInitPartitionInfo
+ *		Initialize ResultRelInfo and other information for a partition if not
+ *		already done
+ *
+ * Returns the ResultRelInfo
+ */
+ResultRelInfo *
+ExecInitPartitionInfo(ModifyTableState *mtstate,
+					  ResultRelInfo *resultRelInfo,
+					  PartitionTupleRouting *proute,
+					  EState *estate, int partidx)
+{
+	Relation	rootrel = resultRelInfo->ri_RelationDesc,
+				partrel;
+	ResultRelInfo *leaf_part_rri;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
+	MemoryContext oldContext;
+
+	/*
+	 * We locked all the partitions in ExecSetupPartitionTupleRouting
+	 * including the leaf partitions.
+	 */
+	partrel = heap_open(proute->partition_oids[partidx], NoLock);
+
+	/*
+	 * Keep ResultRelInfo and other information for this partition in the
+	 * per-query memory context so they'll survive throughout the query.
+	 */
+	oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+	leaf_part_rri = (ResultRelInfo *) palloc0(sizeof(ResultRelInfo));
+	InitResultRelInfo(leaf_part_rri,
+					  partrel,
+					  node ? node->nominalRelation : 1,
+					  rootrel,
+					  estate->es_instrument);
+
+	/*
+	 * Verify result relation is a valid target for an INSERT.  An UPDATE
+	 * of a partition-key becomes a DELETE+INSERT operation, so this check
+	 * is still required when the operation is CMD_UPDATE.
+	 */
+	CheckValidResultRel(leaf_part_rri, CMD_INSERT);
+
+	/*
+	 * Since we've just initialized this ResultRelInfo, it's not in
+	 * any list attached to the estate as yet.  Add it, so that it can
+	 * be found later.
+	 *
+	 * Note that the entries in this list appear in no predetermined
+	 * order, because partition result rels are initialized as and when
+	 * they're needed.
+	 */
+	estate->es_tuple_routing_result_relations =
+					lappend(estate->es_tuple_routing_result_relations,
+							leaf_part_rri);
+
+	/*
+	 * Open partition indices.  The user may have asked to check for
+	 * conflicts within this leaf partition and do "nothing" instead of
+	 * throwing an error.  Be prepared in that case by initializing the
+	 * index information needed by ExecInsert() to perform speculative
+	 * insertions.
+	 */
+	if (partrel->rd_rel->relhasindex &&
+		leaf_part_rri->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(leaf_part_rri,
+						(mtstate != NULL &&
+						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+
+	/*
+	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
+	 * didn't build the withCheckOptionList for partitions within the planner,
+	 * but simple translation of varattnos will suffice.  This only occurs for
+	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
+	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->withCheckOptionLists != NIL)
+	{
+		List	   *wcoList;
+		List	   *wcoExprs = NIL;
+		ListCell   *ll;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/*
+		 * In the case of INSERT on partitioned tables, there is only one
+		 * plan.  Likewise, there is only one WCO list, not one per
+		 * partition.  For UPDATE, there would be as many WCO lists as
+		 * there are plans.
+		 */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->withCheckOptionLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->withCheckOptionLists) ==
+				list_length(node->plans)));
+
+		/*
+		 * Use the WCO list of the first plan as a reference to calculate
+		 * attno's for the WCO list of this partition.  In the INSERT case,
+		 * that refers to the root partitioned table, whereas in the UPDATE
+		 * tuple routing case, that refers to the first partition in the
+		 * mtstate->resultRelInfo array.  In any case, both that relation and
+		 * this partition should have the same columns, so we should be able
+		 * to map attributes successfully.
+		 */
+		wcoList = linitial(node->withCheckOptionLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		wcoList = map_partition_varattnos(wcoList, firstVarno,
+										  partrel, firstResultRel, NULL);
+		foreach(ll, wcoList)
+		{
+			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
+			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
+											   mtstate->mt_plans[0]);
+
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		leaf_part_rri->ri_WithCheckOptions = wcoList;
+		leaf_part_rri->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * Build the RETURNING projection for the partition.  Note that we didn't
+	 * build the returningList for partitions within the planner, but simple
+	 * translation of varattnos will suffice.  This only occurs for the INSERT
+	 * case or in the case of UPDATE tuple routing where we didn't find a
+	 * result rel to reuse in ExecSetupPartitionTupleRouting().
+	 */
+	if (node && node->returningLists != NIL)
+	{
+		TupleTableSlot *slot;
+		ExprContext *econtext;
+		List	   *returningList;
+		int		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+
+		/* See the comment above for WCO lists. */
+		Assert((node->operation == CMD_INSERT &&
+				list_length(node->returningLists) == 1 &&
+				list_length(node->plans) == 1) ||
+			   (node->operation == CMD_UPDATE &&
+				list_length(node->returningLists) ==
+				list_length(node->plans)));
+
+		/*
+		 * Use the RETURNING list of the first plan as a reference to
+		 * calculate attno's for the RETURNING list of this partition.  See
+		 * the comment above for WCO lists for more details on why this is
+		 * okay.
+		 */
+		returningList = linitial(node->returningLists);
+
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		returningList = map_partition_varattnos(returningList, firstVarno,
+												partrel, firstResultRel,
+												NULL);
+
+		/*
+		 * Initialize the projection itself.
+		 *
+		 * Use the slot and the expression context that would have been set up
+		 * in ExecInitModifyTable() for projection's output.
+		 */
+		Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+		slot = mtstate->ps.ps_ResultTupleSlot;
+		Assert(mtstate->ps.ps_ExprContext != NULL);
+		econtext = mtstate->ps.ps_ExprContext;
+		leaf_part_rri->ri_projectReturning =
+			ExecBuildProjectionInfo(returningList, econtext, slot,
+									&mtstate->ps, RelationGetDescr(partrel));
+	}
+
+	Assert (proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
+	/*
+	 * Save a tuple conversion map to convert a tuple routed to this
+	 * partition from the parent's type to the partition's.
+	 */
+	proute->parent_child_tupconv_maps[partidx] =
+							convert_tuples_by_name(RelationGetDescr(rootrel),
+												   RelationGetDescr(partrel),
+											   gettext_noop("could not convert row type"));
+
+	MemoryContextSwitchTo(oldContext);
+
+	return leaf_part_rri;
+}
+
+/*
  * ExecSetupChildParentMapForLeaf -- Initialize the per-leaf-partition
  * child-to-root tuple conversion map array.
  *
@@ -477,6 +627,10 @@ ExecCleanupTupleRouting(PartitionTupleRouting *proute)
 	{
 		ResultRelInfo *resultRelInfo = proute->partitions[i];
 
+		/* skip further processsing for uninitialized partitions */
+		if (resultRelInfo == NULL)
+			continue;
+
 		/*
 		 * If this result rel is one of the UPDATE subplan result rels, let
 		 * ExecEndPlan() close it. For INSERT or COPY,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 93c03cf..87a4a92 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -310,6 +310,14 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		saved_resultRelInfo = resultRelInfo;
 		resultRelInfo = proute->partitions[leaf_part_index];
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecInitPartitionInfo(mtstate,
+												  saved_resultRelInfo,
+												  proute, estate,
+												  leaf_part_index);
+			Assert(resultRelInfo != NULL);
+		}
 
 		/* We do not yet have a way to insert into a foreign partition */
 		if (resultRelInfo->ri_FdwRoutine)
@@ -2098,14 +2106,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	int			firstVarno = 0;
-	Relation	firstResultRel = NULL;
 	ListCell   *l;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
-	PartitionTupleRouting *proute = NULL;
-	int			num_partitions = 0;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2228,20 +2232,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
-	{
-		proute = mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(mtstate,
-										   rel, node->nominalRelation,
-										   estate);
-		num_partitions = proute->num_partitions;
-
-		/*
-		 * Below are required as reference objects for mapping partition
-		 * attno's in expressions such as WithCheckOptions and RETURNING.
-		 */
-		firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
-		firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
-	}
+		mtstate->mt_partition_tuple_routing =
+						ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2288,77 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
-	 * that we didn't build the withCheckOptionList for each partition within
-	 * the planner, but simple translation of the varattnos for each partition
-	 * will suffice.  This only occurs for the INSERT case or for UPDATE row
-	 * movement. DELETEs and local UPDATEs are handled above.
-	 */
-	if (node->withCheckOptionLists != NIL && num_partitions > 0)
-	{
-		List	   *first_wcoList;
-
-		/*
-		 * In case of INSERT on partitioned tables, there is only one plan.
-		 * Likewise, there is only one WITH CHECK OPTIONS list, not one per
-		 * partition. Whereas for UPDATE, there are as many WCOs as there are
-		 * plans. So in either case, use the WCO expression of the first
-		 * resultRelInfo as a reference to calculate attno's for the WCO
-		 * expression of each of the partitions. We make a copy of the WCO
-		 * qual for each partition. Note that, if there are SubPlans in there,
-		 * they all end up attached to the one parent Plan node.
-		 */
-		Assert(update_tuple_routing_needed ||
-			   (operation == CMD_INSERT &&
-				list_length(node->withCheckOptionLists) == 1 &&
-				mtstate->mt_nplans == 1));
-
-		first_wcoList = linitial(node->withCheckOptionLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *mapped_wcoList;
-			List	   *wcoExprs = NIL;
-			ListCell   *ll;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have
-			 * WithCheckOptions initialized.
-			 */
-			if (resultRelInfo->ri_WithCheckOptions)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			mapped_wcoList = map_partition_varattnos(first_wcoList,
-													 firstVarno,
-													 partrel, firstResultRel,
-													 NULL);
-			foreach(ll, mapped_wcoList)
-			{
-				WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
-				ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-												   &mtstate->ps);
-
-				wcoExprs = lappend(wcoExprs, wcoExpr);
-			}
-
-			resultRelInfo->ri_WithCheckOptions = mapped_wcoList;
-			resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		}
-	}
-
-	/*
 	 * Initialize RETURNING projections if needed.
 	 */
 	if (node->returningLists)
 	{
 		TupleTableSlot *slot;
 		ExprContext *econtext;
-		List	   *firstReturningList;
 
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
@@ -2388,44 +2315,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 										resultRelInfo->ri_RelationDesc->rd_att);
 			resultRelInfo++;
 		}
-
-		/*
-		 * Build a projection for each leaf partition rel.  Note that we
-		 * didn't build the returningList for each partition within the
-		 * planner, but simple translation of the varattnos for each partition
-		 * will suffice.  This only occurs for the INSERT case or for UPDATE
-		 * row movement. DELETEs and local UPDATEs are handled above.
-		 */
-		firstReturningList = linitial(node->returningLists);
-		for (i = 0; i < num_partitions; i++)
-		{
-			Relation	partrel;
-			List	   *rlist;
-
-			resultRelInfo = proute->partitions[i];
-
-			/*
-			 * If we are referring to a resultRelInfo from one of the update
-			 * result rels, that result rel would already have a returningList
-			 * built.
-			 */
-			if (resultRelInfo->ri_projectReturning)
-				continue;
-
-			partrel = resultRelInfo->ri_RelationDesc;
-
-			/*
-			 * Use the returning expression of the first resultRelInfo as a
-			 * reference to calculate attno's for the returning expression of
-			 * each of the partitions.
-			 */
-			rlist = map_partition_varattnos(firstReturningList,
-											firstVarno,
-											partrel, firstResultRel, NULL);
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-		}
 	}
 	else
 	{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c49..e947186 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
+ * partition_oids				Array of leaf partitions OIDs
  * partitions					Array of ResultRelInfo* objects with one entry
  *								for every leaf partition in the partition tree.
  * num_partitions				Number of leaf partitions in the partition tree
@@ -91,6 +92,7 @@ typedef struct PartitionTupleRouting
 {
 	PartitionDispatch *partition_dispatch_info;
 	int			num_dispatch;
+	Oid		   *partition_oids;
 	ResultRelInfo **partitions;
 	int			num_partitions;
 	TupleConversionMap **parent_child_tupconv_maps;
@@ -103,12 +105,15 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
-							   Relation rel, Index resultRTindex,
-							   EState *estate);
+							   Relation rel);
 extern int ExecFindPartition(ResultRelInfo *resultRelInfo,
 				  PartitionDispatch *pd,
 				  TupleTableSlot *slot,
 				  EState *estate);
+extern ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
+					ResultRelInfo *resultRelInfo,
+					PartitionTupleRouting *proute,
+					EState *estate, int partidx);
 extern void ExecSetupChildParentMapForLeaf(PartitionTupleRouting *proute);
 extern TupleConversionMap *TupConvMapForLeaf(PartitionTupleRouting *proute,
 				  ResultRelInfo *rootRelInfo, int leaf_index);
#34Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#33)
Re: non-bulk inserts and tuple routing

Fujita-san,

On 2018/02/20 19:40, Etsuro Fujita wrote:

(2018/02/19 13:19), Amit Langote wrote:

Attached rebased patch.

Thanks for the rebased patch!

One thing I noticed while updating the
tuple-routing-for-foreign-partitions patch on top of this is: we should
switch into the per-query memory context in ExecInitPartitionInfo.

Good catch!

Attached is an updated version for that.

Thanks for updating the patch.

Thanks,
Amit

#35Robert Haas
robertmhaas@gmail.com
In reply to: Amit Langote (#34)
Re: non-bulk inserts and tuple routing

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes. The big one was that I got rid of the
local variable is_update in ExecSetupPartitionTupleRouting. That
saved a level of indentation on a substantial chunk of code, and it
turns out that test was redundant anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#36Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Robert Haas (#35)
1 attachment(s)
Re: non-bulk inserts and tuple routing

Robert Haas wrote:

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes.

I propose to tweak a few comments to PartitionTupleRouting, as attached.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

comments.patchtext/plain; charset=us-asciiDownload
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index e94718608f..08a994bce1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,11 +58,15 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								partition tree.
  * num_dispatch					number of partitioned tables in the partition
  *								tree (= length of partition_dispatch_info[])
- * partition_oids				Array of leaf partitions OIDs
+ * partition_oids				Array of leaf partitions OIDs with one entry
+ *								for every leaf partition in the partition tree,
+ *								initialized in full by
+ *								ExecSetupPartitionTupleRouting.
  * partitions					Array of ResultRelInfo* objects with one entry
- *								for every leaf partition in the partition tree.
+ *								for every leaf partition in the partition tree,
+ *								initialized lazily.
  * num_partitions				Number of leaf partitions in the partition tree
- *								(= 'partitions' array length)
+ *								(= 'partitions_oid'/'partitions' array length)
  * parent_child_tupconv_maps	Array of TupleConversionMap objects with one
  *								entry for every leaf partition (required to
  *								convert tuple from the root table's rowtype to
#37Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#36)
Re: non-bulk inserts and tuple routing

On Thu, Feb 22, 2018 at 11:53 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:

Robert Haas wrote:

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes.

I propose to tweak a few comments to PartitionTupleRouting, as attached.

Sure, please go ahead.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#38Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#36)
Re: non-bulk inserts and tuple routing

On 2018/02/23 1:53, Alvaro Herrera wrote:

Robert Haas wrote:

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes.

I propose to tweak a few comments to PartitionTupleRouting, as attached.

I'd missed those. Thank you!

Regards,
Amit

#39Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#35)
Re: non-bulk inserts and tuple routing

On 2018/02/23 1:10, Robert Haas wrote:

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes. The big one was that I got rid of the
local variable is_update in ExecSetupPartitionTupleRouting. That
saved a level of indentation on a substantial chunk of code, and it
turns out that test was redundant anyway.

Thank you!

Regards,
Amit

#40Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#35)
Re: non-bulk inserts and tuple routing

Hi,

On 2018-02-22 11:10:57 -0500, Robert Haas wrote:

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes. The big one was that I got rid of the
local variable is_update in ExecSetupPartitionTupleRouting. That
saved a level of indentation on a substantial chunk of code, and it
turns out that test was redundant anyway.

I noticed that this patch broke my JIT patch when I force JITing to be
used everywhere (obviously pointless for perf reasons, but good for
testing). Turns out to be a single line.

ExecInitPartitionInfo has the following block:
foreach(ll, wcoList)
{
WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
ExprState *wcoExpr = ExecInitQual(castNode(List, wco->qual),
mtstate->mt_plans[0]);

wcoExprs = lappend(wcoExprs, wcoExpr);
}

note how it is passing mtstate->mt_plans[0] as the parent node for the
expression. I don't quite know why mtstate->mt_plans[0] was chosen
here, it doesn't seem right. The WCO will not be executed in that
context. Note that the ExecBuildProjectionInfo() call a few lines below
also uses a different context.

For JITing that fails because the compiled deform will assume the tuples
look like mt_plans[0]'s scantuples. But we're not dealing with those,
we're dealing with tuples returned by the relevant subplan.

Also note that the code before this commit used
ExprState *wcoExpr = ExecInitQual(castNode(List, wco->qual),
&mtstate->ps);
i.e. the ModifyTableState node, as I'd expect.

Greetings,

Andres Freund

#41Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#35)
Re: non-bulk inserts and tuple routing

On 2018-02-22 11:10:57 -0500, Robert Haas wrote:

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes. The big one was that I got rid of the
local variable is_update in ExecSetupPartitionTupleRouting. That
saved a level of indentation on a substantial chunk of code, and it
turns out that test was redundant anyway.

Btw, are there cases where this could change explain output? If there's
subplan references or such in any of returning / wcte expressions,
they'd not get added at explain time. It's probably fine because add
the expressions also "staticly" in ExecInitModifyTable()?

Greetings,

Andres Freund

#42Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#40)
1 attachment(s)
Re: non-bulk inserts and tuple routing

On 2018/03/03 13:38, Andres Freund wrote:

Hi,

On 2018-02-22 11:10:57 -0500, Robert Haas wrote:

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes. The big one was that I got rid of the
local variable is_update in ExecSetupPartitionTupleRouting. That
saved a level of indentation on a substantial chunk of code, and it
turns out that test was redundant anyway.

I noticed that this patch broke my JIT patch when I force JITing to be
used everywhere (obviously pointless for perf reasons, but good for
testing). Turns out to be a single line.

ExecInitPartitionInfo has the following block:
foreach(ll, wcoList)
{
WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
ExprState *wcoExpr = ExecInitQual(castNode(List, wco->qual),
mtstate->mt_plans[0]);

wcoExprs = lappend(wcoExprs, wcoExpr);
}

note how it is passing mtstate->mt_plans[0] as the parent node for the
expression. I don't quite know why mtstate->mt_plans[0] was chosen
here, it doesn't seem right. The WCO will not be executed in that
context. Note that the ExecBuildProjectionInfo() call a few lines below
also uses a different context.

For JITing that fails because the compiled deform will assume the tuples
look like mt_plans[0]'s scantuples. But we're not dealing with those,
we're dealing with tuples returned by the relevant subplan.

Also note that the code before this commit used
ExprState *wcoExpr = ExecInitQual(castNode(List, wco->qual),
&mtstate->ps);
i.e. the ModifyTableState node, as I'd expect.

I guess that it was an oversight in my patch. Please find attached a
patch to fix that.

Thanks,
Amit

Attachments:

partition-WCO-qual-parent-fix.patchtext/plain; charset=UTF-8; name=partition-WCO-qual-parent-fix.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 54efc9e545..f6fe7cd61d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -413,7 +413,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		{
 			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
 			ExprState  *wcoExpr = ExecInitQual(castNode(List, wco->qual),
-											   mtstate->mt_plans[0]);
+											   &mtstate->ps);
 
 			wcoExprs = lappend(wcoExprs, wcoExpr);
 		}
#43Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#41)
Re: non-bulk inserts and tuple routing

On 2018/03/03 13:48, Andres Freund wrote:

On 2018-02-22 11:10:57 -0500, Robert Haas wrote:

On Tue, Feb 20, 2018 at 8:06 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is an updated version for that.

Thanks for updating the patch.

Committed with a few changes. The big one was that I got rid of the
local variable is_update in ExecSetupPartitionTupleRouting. That
saved a level of indentation on a substantial chunk of code, and it
turns out that test was redundant anyway.

Btw, are there cases where this could change explain output? If there's
subplan references or such in any of returning / wcte expressions,
they'd not get added at explain time. It's probably fine because add
the expressions also "staticly" in ExecInitModifyTable()?

Yes, I think.

Afaics, explain.c only looks at the information that is "statically" added
to ModifyTableState by ExecInitModifyTable. It considers information
added by the tuple routing code only when printing information about
invoked triggers, that too, only in the case of EXPLAIN ANALYZE.

Thanks,
Amit