ON CONFLICT DO UPDATE for partitioned tables

Started by Alvaro Herreraalmost 8 years ago53 messages
#1Alvaro Herrera
alvherre@2ndquadrant.com
1 attachment(s)

I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1]/messages/by-id/c1651d5b-7bd6-b7e7-e1cc-16ecfe2c0da5@lab.ntt.co.jp.
Following the lead of edd44738bc88 ("Be lazier about partition tuple
routing.") this incarnation only does the necessary push-ups for the
specific partition that needs it, at execution time. As far as I can
tell, it works as intended.

I chose to refuse the case where the DO UPDATE clause causes the tuple
to move to another partition (i.e. you're updating the partition key of
the tuple). While it's probably possible to implement that, it doesn't
seem a very productive use of time.

However, there is a shortcoming in the design: it fails if there are
multiple levels of partitioning, because there is no easy (efficient)
way to map the index OID more than one level. I had already mentioned
this shortcoming to Amit's patch. So this case (which I added in the
regression tests) fails unexpectedly:

-- multiple-layered partitioning
create table parted_conflict_test (a int primary key, b text) partition by range (a);
create table parted_conflict_test_1 partition of parted_conflict_test
for values from (0) to (10000) partition by range (a);
create table parted_conflict_test_1_1 partition of parted_conflict_test_1
for values from (0) to (100);
insert into parted_conflict_test values ('10', 'ten');
insert into parted_conflict_test values ('10', 'ten two')
on conflict (a) do update set b = excluded.b;
ERROR: invalid ON CONFLICT DO UPDATE specification
DETAIL: An inferred index was not found in partition "parted_conflict_test_1_1".

So the problem here is that my MapPartitionIndexList() implementation is
too stupid. I think it would be smarter to use the ResultRelInfo
instead of bare Relation, for one. But that still doesn't answer how to
find a "path" from root to leaf partition, which is what I'd need to
verify that there are valid pg_inherits relationships for the partition
indexes. I'm probably missing something about the partitionDesc or
maybe the partitioned_rels lists that helps me do it efficiently, but I
hope figure out soon.

One idea I was toying with is to add RelationData->rd_children as a list
of OIDs of children relations. So it should be possible to walk the
list from the root to the desired descendant, without having to scan
pg_inherits repeatedly ... although that would probably require doing
relation_open() for every index, which sounds undesirable.

(ISTM that having RelationData->rd_children might be a good idea in
general anyway -- I mean to speed up some code that currently scans
pg_inherits via find_inheritance_children. However, since the partition
descriptor itself is already in relcache, maybe this doesn't matter too
much.)

Another idea is to abandon the notion that we need to find a path from
parent index to descendant index, and just run the inference algorithm
again on the partition. I'm not sure how much I like this idea, yet.

Anyway, while this is still WIP, I think it works correctly for the case
where there is a single partition level.

[1]: /messages/by-id/c1651d5b-7bd6-b7e7-e1cc-16ecfe2c0da5@lab.ntt.co.jp

--
�lvaro Herrer

Attachments:

v1-0001-fix-ON-CONFLICT-DO-UPDATE-for-partitioned-tables.patchtext/plain; charset=us-asciiDownload
From a2517bd315034de7f6c5a4728f66729136918e88 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Tue, 27 Feb 2018 20:52:56 -0300
Subject: [PATCH v1] fix ON CONFLICT DO UPDATE for partitioned tables

---
 src/backend/catalog/pg_inherits.c             |  72 ++++++++++++++++
 src/backend/executor/execPartition.c          |  29 +++++++
 src/backend/executor/nodeModifyTable.c        |  33 +++++++-
 src/backend/optimizer/util/plancat.c          |   3 +-
 src/backend/parser/analyze.c                  |   7 --
 src/include/catalog/pg_inherits_fn.h          |   3 +
 src/test/regress/expected/insert_conflict.out | 113 ++++++++++++++++++++++++--
 src/test/regress/sql/insert_conflict.sql      |  75 +++++++++++++++--
 8 files changed, 311 insertions(+), 24 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 5a5beb9273..e1a46bcd2b 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -407,6 +407,78 @@ typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
 }
 
 /*
+ * Given a list of index OIDs in the rootrel, return a list of OIDs of the
+ * corresponding indexes in the partrel.  If any index in the rootrel does not
+ * correspond to any index in the child, an error is raised.
+ *
+ * This processes the index list for INSERT ON CONFLICT DO UPDATE at execution
+ * time.  This fact is hardcoded in the error messages.
+ *
+ * XXX this implementation fails if the partition is not a direct child of
+ * rootrel.
+ */
+List *
+MapPartitionIndexList(Relation rootrel, Relation partrel, List *indexlist)
+{
+	List	   *result = NIL;
+	List	   *partIdxs;
+	Relation	inhRel;
+	ScanKeyData	key;
+	ListCell   *cell;
+
+	partIdxs = RelationGetIndexList(partrel);
+	/* quick exit if partition has no indexes */
+	if (partIdxs == NIL)
+		return NIL;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	foreach(cell, indexlist)
+	{
+		Oid			parentIdx = lfirst_oid(cell);
+		SysScanDesc	scan;
+		HeapTuple	tuple;
+		bool		found = false;
+
+		ScanKeyInit(&key,
+					Anum_pg_inherits_inhparent,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(parentIdx));
+
+		scan = systable_beginscan(inhRel, InheritsParentIndexId, true,
+								  NULL, 1, &key);
+		while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+		{
+			Oid indexoid = ((Form_pg_inherits) GETSTRUCT(tuple))->inhrelid;
+
+			if (list_member_oid(partIdxs, indexoid))
+			{
+				result = lappend_oid(result, indexoid);
+				found = true;
+				break;
+			}
+		}
+		systable_endscan(scan);
+
+		/*
+		 * Indexes can only be used as inference targets if they exist in the
+		 * partition that receives the tuple; bail out if we cannot find it.
+		 */
+		if (!found)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("invalid ON CONFLICT DO UPDATE specification"),
+					 errdetail("An inferred index was not found in partition \"%s\".",
+							   RelationGetRelationName(partrel))));
+	}
+
+	relation_close(inhRel, AccessShareLock);
+	list_free(partIdxs);
+
+	return result;
+}
+
+/*
  * Create a single pg_inherits row with the given data
  */
 void
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 54efc9e545..95a814e975 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -475,6 +475,35 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
+	/*
+	 * If needed, initialize projection and qual for ON CONFLICT DO UPDATE for
+	 * this partition.
+	 */
+	if (node && node->onConflictAction == ONCONFLICT_UPDATE)
+	{
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		List	   *leaf_oc_set;
+
+		leaf_oc_set = map_partition_varattnos(node->onConflictSet,
+											  node->nominalRelation,
+											  partrel, rootrel, NULL);
+		leaf_part_rri->ri_onConflictSetProj =
+			ExecBuildProjectionInfo(leaf_oc_set, econtext,
+									mtstate->mt_conflproj, &mtstate->ps,
+									RelationGetDescr(partrel));
+		if (node->onConflictWhere)
+		{
+			List	   *leaf_oc_where;
+
+			leaf_oc_where =
+				map_partition_varattnos((List *) node->onConflictWhere,
+										node->nominalRelation,
+										partrel, rootrel, NULL);
+			leaf_part_rri->ri_onConflictSetWhere =
+				ExecInitQual(leaf_oc_where, &mtstate->ps);
+		}
+	}
+
 	Assert(proute->partitions[partidx] == NULL);
 	proute->partitions[partidx] = leaf_part_rri;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c32928d9bd..9748f80ddc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -39,6 +39,7 @@
 
 #include "access/htup_details.h"
 #include "access/xact.h"
+#include "catalog/pg_inherits_fn.h"
 #include "commands/trigger.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
@@ -510,6 +511,20 @@ ExecInsert(ModifyTableState *mtstate,
 			uint32		specToken;
 			ItemPointerData conflictTid;
 			bool		specConflict;
+			List	   *mappedArbiterIndexes;
+
+			/*
+			 * Map the arbiter index list to the OIDs in the corresponding
+			 * partition.
+			 */
+			if (saved_resultRelInfo &&
+				resultRelInfo->ri_RelationDesc->rd_rel->relispartition)
+				mappedArbiterIndexes =
+					MapPartitionIndexList(saved_resultRelInfo->ri_RelationDesc,
+										  resultRelInfo->ri_RelationDesc,
+										  arbiterIndexes);
+			else
+				mappedArbiterIndexes = arbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -526,7 +541,7 @@ ExecInsert(ModifyTableState *mtstate,
 	vlock:
 			specConflict = false;
 			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+										   mappedArbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -581,7 +596,7 @@ ExecInsert(ModifyTableState *mtstate,
 			/* insert index entries for tuple */
 			recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
 												   estate, true, &specConflict,
-												   arbiterIndexes);
+												   mappedArbiterIndexes);
 
 			/* adjust the tuple's state accordingly */
 			if (!specConflict)
@@ -1146,6 +1161,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -2329,7 +2356,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 
 	/*
-	 * If needed, Initialize target list, projection and qual for ON CONFLICT
+	 * If needed, initialize target list, projection and qual for ON CONFLICT
 	 * DO UPDATE.
 	 */
 	resultRelInfo = mtstate->resultRelInfo;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..db7c0030ca 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -558,7 +558,8 @@ get_relation_foreign_keys(PlannerInfo *root, RelOptInfo *rel,
 
 /*
  * infer_arbiter_indexes -
- *	  Determine the unique indexes used to arbitrate speculative insertion.
+ *	  Determine the unique indexes used to arbitrate speculative insertion,
+ *	  and return them as a list of OIDs.
  *
  * Uses user-supplied inference clause expressions and predicate to match a
  * unique index from those defined and ready on the heap relation (target).
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c3a9617f67..92696f0607 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1025,13 +1025,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h
index eebee977a5..20fb96db51 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -16,6 +16,7 @@
 
 #include "nodes/pg_list.h"
 #include "storage/lock.h"
+#include "utils/relcache.h"
 
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
@@ -23,6 +24,8 @@ extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
 extern bool has_subclass(Oid relationId);
 extern bool has_superclass(Oid relationId);
 extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId);
+extern List *MapPartitionIndexList(Relation rootrel, Relation partrel,
+					  List *indexlist);
 extern void StoreSingleInheritance(Oid relationId, Oid parentOid,
 					   int32 seqNumber);
 extern bool DeleteInheritsTuple(Oid inhrelid, Oid inhparent);
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..da8fe11120 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,115 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
+--
+-- INSERT ON CONFLICT and partitioned tables
+--
+-- DO NOTHING works
 -- insert into partitioned_table on conflict do nothing
 create table parted_conflict_test (a int, b char) partition by list (a);
 create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+drop table parted_conflict_test;
+-- simple DO UPDATE works, as long as the tuple remains in the same partition
+create table parted_conflict_test (a int primary key, b text) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test for values in (1, 2);
+create table parted_conflict_test_2 partition of parted_conflict_test for values in (3, 4);
+insert into parted_conflict_test values (1, 'first');
+insert into parted_conflict_test values (1, 'second')
+  on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'third')
+  on conflict (a) do update set b = format('%s (was %s)', excluded.b, parted_conflict_test.b);
+select * from parted_conflict_test;
+ a |         b         
+---+-------------------
+ 1 | third (was first)
+(1 row)
+
+insert into parted_conflict_test values (1, 'b')
+  on conflict (a) do update set b = 'fourth'
+  where parted_conflict_test.b = 'third (was first)';
+select * from parted_conflict_test;
+ a |   b    
+---+--------
+ 1 | fourth
+(1 row)
+
+insert into parted_conflict_test values (1, 'c')
+  on conflict (a) do update set b = 'fourth'
+  where parted_conflict_test.b = 'b';
+select * from parted_conflict_test;
+ a |   b    
+---+--------
+ 1 | fourth
+(1 row)
+
+insert into parted_conflict_test values (1, 'fifth')
+  on conflict (a) do update set a = parted_conflict_test.a * 2,
+  b = format('%s (was %s)', excluded.b, parted_conflict_test.b);
+select * from parted_conflict_test;
+ a |         b          
+---+--------------------
+ 2 | fifth (was fourth)
+(1 row)
+
+-- targetting the partition directly also works
+insert into parted_conflict_test_1 values (2, 'sixth') on conflict (a) do
+  update set b = format('%s (was %s)', excluded.b, parted_conflict_test_1.b);
+select * from parted_conflict_test;
+ a |               b                
+---+--------------------------------
+ 2 | sixth (was fifth (was fourth))
+(1 row)
+
+drop table parted_conflict_test;
+-- moving tuple to another partition in the UPDATE clause is not supported
+create table parted_conflict_test (a int, b text) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test for values in (1);
+create table parted_conflict_test_2 partition of parted_conflict_test for values in (2);
+insert into parted_conflict_test values (1, 'one');
+insert into parted_conflict_test values (1, 'one two')
+  on conflict (a) do update set a = excluded.a * 2;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+drop table parted_conflict_test;
+-- multiple-layered partitioning
+create table parted_conflict_test (a int primary key, b text) partition by range (a);
+create table parted_conflict_test_1 partition of parted_conflict_test
+  for values from (0) to (10000) partition by range (a);
+create table parted_conflict_test_1_1 partition of parted_conflict_test_1
+  for values from (0) to (100);
+insert into parted_conflict_test values ('10', 'ten');
+insert into parted_conflict_test values ('10', 'ten two')
+  on conflict (a) do update set b = excluded.b;
+ERROR:  invalid ON CONFLICT DO UPDATE specification
+DETAIL:  An inferred index was not found in partition "parted_conflict_test_1_1".
+select * from parted_conflict_test;
+ a  |  b  
+----+-----
+ 10 | ten
+(1 row)
+
+insert into parted_conflict_test_1 values ('10', 'ten three')
+  on conflict (a) do update set b = excluded.b;
+select * from parted_conflict_test;
+ a  |     b     
+----+-----------
+ 10 | ten three
+(1 row)
+
+drop table parted_conflict_test;
+-- a partitioned table with an index and no corresponding index on the
+-- partition; should raise an error
+create table parted_conflict_test (a int, b text) partition by range (a);
+create table parted_conflict_test_1 partition of parted_conflict_test for values from (0) to (10000);
+alter table only parted_conflict_test add primary key (a);
+insert into parted_conflict_test values (100, 'hundred');
+insert into parted_conflict_test values (100, 'hundred (two)') on conflict (a) do update set b = excluded.b;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+select * from parted_conflict_test;
+  a  |    b    
+-----+---------
+ 100 | hundred
+(1 row)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..61758d2ea9 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,78 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
+--
+-- INSERT ON CONFLICT and partitioned tables
+--
+
+-- DO NOTHING works
 -- insert into partitioned_table on conflict do nothing
 create table parted_conflict_test (a int, b char) partition by list (a);
 create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+drop table parted_conflict_test;
+
+-- simple DO UPDATE works, as long as the tuple remains in the same partition
+create table parted_conflict_test (a int primary key, b text) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test for values in (1, 2);
+create table parted_conflict_test_2 partition of parted_conflict_test for values in (3, 4);
+insert into parted_conflict_test values (1, 'first');
+insert into parted_conflict_test values (1, 'second')
+  on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'third')
+  on conflict (a) do update set b = format('%s (was %s)', excluded.b, parted_conflict_test.b);
+select * from parted_conflict_test;
+insert into parted_conflict_test values (1, 'b')
+  on conflict (a) do update set b = 'fourth'
+  where parted_conflict_test.b = 'third (was first)';
+select * from parted_conflict_test;
+insert into parted_conflict_test values (1, 'c')
+  on conflict (a) do update set b = 'fourth'
+  where parted_conflict_test.b = 'b';
+select * from parted_conflict_test;
+insert into parted_conflict_test values (1, 'fifth')
+  on conflict (a) do update set a = parted_conflict_test.a * 2,
+  b = format('%s (was %s)', excluded.b, parted_conflict_test.b);
+select * from parted_conflict_test;
+
+-- targetting the partition directly also works
+insert into parted_conflict_test_1 values (2, 'sixth') on conflict (a) do
+  update set b = format('%s (was %s)', excluded.b, parted_conflict_test_1.b);
+select * from parted_conflict_test;
+drop table parted_conflict_test;
+
+-- moving tuple to another partition in the UPDATE clause is not supported
+create table parted_conflict_test (a int, b text) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test for values in (1);
+create table parted_conflict_test_2 partition of parted_conflict_test for values in (2);
+insert into parted_conflict_test values (1, 'one');
+insert into parted_conflict_test values (1, 'one two')
+  on conflict (a) do update set a = excluded.a * 2;
+drop table parted_conflict_test;
+
+-- multiple-layered partitioning
+create table parted_conflict_test (a int primary key, b text) partition by range (a);
+create table parted_conflict_test_1 partition of parted_conflict_test
+  for values from (0) to (10000) partition by range (a);
+create table parted_conflict_test_1_1 partition of parted_conflict_test_1
+  for values from (0) to (100);
+insert into parted_conflict_test values ('10', 'ten');
+insert into parted_conflict_test values ('10', 'ten two')
+  on conflict (a) do update set b = excluded.b;
+select * from parted_conflict_test;
+
+insert into parted_conflict_test_1 values ('10', 'ten three')
+  on conflict (a) do update set b = excluded.b;
+select * from parted_conflict_test;
+drop table parted_conflict_test;
+
+-- a partitioned table with an index and no corresponding index on the
+-- partition; should raise an error
+create table parted_conflict_test (a int, b text) partition by range (a);
+create table parted_conflict_test_1 partition of parted_conflict_test for values from (0) to (10000);
+alter table only parted_conflict_test add primary key (a);
+insert into parted_conflict_test values (100, 'hundred');
+insert into parted_conflict_test values (100, 'hundred (two)') on conflict (a) do update set b = excluded.b;
+select * from parted_conflict_test;
 drop table parted_conflict_test;
-- 
2.11.0

#2Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#1)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/02/28 9:46, Alvaro Herrera wrote:

I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
Following the lead of edd44738bc88 ("Be lazier about partition tuple
routing.") this incarnation only does the necessary push-ups for the
specific partition that needs it, at execution time. As far as I can
tell, it works as intended.

Thanks. I too have been meaning to send an updated (nay, significantly
rewritten) version of the patch, but you beat me to it.

I chose to refuse the case where the DO UPDATE clause causes the tuple
to move to another partition (i.e. you're updating the partition key of
the tuple). While it's probably possible to implement that, it doesn't
seem a very productive use of time.

Probably a good idea to save it for later.

However, there is a shortcoming in the design: it fails if there are
multiple levels of partitioning, because there is no easy (efficient)
way to map the index OID more than one level. I had already mentioned
this shortcoming to Amit's patch. So this case (which I added in the
regression tests) fails unexpectedly:

-- multiple-layered partitioning
create table parted_conflict_test (a int primary key, b text) partition by range (a);
create table parted_conflict_test_1 partition of parted_conflict_test
for values from (0) to (10000) partition by range (a);
create table parted_conflict_test_1_1 partition of parted_conflict_test_1
for values from (0) to (100);
insert into parted_conflict_test values ('10', 'ten');
insert into parted_conflict_test values ('10', 'ten two')
on conflict (a) do update set b = excluded.b;
ERROR: invalid ON CONFLICT DO UPDATE specification
DETAIL: An inferred index was not found in partition "parted_conflict_test_1_1".

So the problem here is that my MapPartitionIndexList() implementation is
too stupid. I think it would be smarter to use the ResultRelInfo
instead of bare Relation, for one. But that still doesn't answer how to
find a "path" from root to leaf partition, which is what I'd need to
verify that there are valid pg_inherits relationships for the partition
indexes. I'm probably missing something about the partitionDesc or
maybe the partitioned_rels lists that helps me do it efficiently, but I
hope figure out soon.

One idea I was toying with is to add RelationData->rd_children as a list
of OIDs of children relations. So it should be possible to walk the
list from the root to the desired descendant, without having to scan
pg_inherits repeatedly ... although that would probably require doing
relation_open() for every index, which sounds undesirable.

(ISTM that having RelationData->rd_children might be a good idea in
general anyway -- I mean to speed up some code that currently scans
pg_inherits via find_inheritance_children. However, since the partition
descriptor itself is already in relcache, maybe this doesn't matter too
much.)

Another idea is to abandon the notion that we need to find a path from
parent index to descendant index, and just run the inference algorithm
again on the partition. I'm not sure how much I like this idea, yet.

Anyway, while this is still WIP, I think it works correctly for the case
where there is a single partition level.

Actually, after your comment on my original patch [1]/messages/by-id/20171227225031.osh6vunpuhsath25@alvherre.pgsql, I did make it work
for multiple levels by teaching the partition initialization code to find
a given partition's indexes that are inherited from the root table (that
is the table mentioned in command). So, after a tuple is routed to a
partition, we switch from the original arbiterIndexes list to the one we
created for the partition, which must contain OIDs corresponding to those
in the original list. After all, for each of the parent's indexes that
the planner put into the original arbiterIndexes list, there must exist an
index in each of the leaf partitions.

I had also observed when working on the patch that various TupleTableSlots
used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
inheritance-translated target list (DO UPDATE SET target list). In fact,
that has to take into account also the dropped columns; we may have
dropped columns either in parent or in a partition or in both at same or
different attnum positions. That means, simple map_partition_varattnos()
translation doesn't help in this case.

For example, with your patch (sorry, I know you said it's a WIP patch), I
see either a crash or errors when dealing with such differing attribute
numbers:

drop table p;
create table p (a int, b text) partition by list (a);
create table p12 (b text, a int);
alter table p attach partition p12 for values in (1, 2);
alter table p drop b, add b text;
create table p4 partition of p for values in (4);
create unique index on p (a);

insert into p values (1, 'a') on conflict (a) do update set b = excluded.b;

insert into p values (1, 'b') on conflict (a) do update set b = excluded.b;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

insert into p values (4, 'a') on conflict (a) do update set b = excluded.b;

postgres=# insert into p values (4, 'b') on conflict (a) do update set b =
excluded.b;
ERROR: attribute number 3 exceeds number of columns 2

I attach my patch here for your reference, which I polished this morning
after seeing your email and the patch. It works for most of the cases, as
you can see in the updated tests in insert_conflict.sql. Since I agree
with you that we should, for now, error out if DO UPDATE causes row
movement, I adopted the code from your patch for that.

Thanks,
Amit

[1]: /messages/by-id/20171227225031.osh6vunpuhsath25@alvherre.pgsql
/messages/by-id/20171227225031.osh6vunpuhsath25@alvherre.pgsql

Attachments:

v1-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchtext/plain; charset=UTF-8; name=v1-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchDownload
From fab3ed1f695667b64cc037e0faa7dc7909c58915 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 28 Feb 2018 17:58:00 +0900
Subject: [PATCH v1] Fix ON CONFLICT to work with partitioned tables

---
 doc/src/sgml/ddl.sgml                         |  15 ---
 src/backend/catalog/heap.c                    |   2 +-
 src/backend/catalog/partition.c               |  36 ++++--
 src/backend/commands/tablecmds.c              |  15 ++-
 src/backend/executor/execPartition.c          | 170 +++++++++++++++++++++++++-
 src/backend/executor/nodeModifyTable.c        |  30 +++++
 src/backend/optimizer/prep/preptlist.c        |   4 +-
 src/backend/optimizer/prep/prepunion.c        |  31 +++--
 src/backend/parser/analyze.c                  |   7 --
 src/include/catalog/partition.h               |   2 +-
 src/include/executor/execPartition.h          |  17 +++
 src/include/optimizer/prep.h                  |  11 ++
 src/test/regress/expected/insert_conflict.out |  73 +++++++++--
 src/test/regress/sql/insert_conflict.sql      |  64 ++++++++--
 14 files changed, 405 insertions(+), 72 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2b879ead4b..b2b3485b83 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3325,21 +3325,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index cf36ce4add..6d49c41217 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1777,7 +1777,7 @@ heap_drop_with_catalog(Oid relid)
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 	if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
 	{
-		parentOid = get_partition_parent(relid);
+		parentOid = get_partition_parent(relid, false);
 		LockRelationOid(parentOid, AccessExclusiveLock);
 
 		/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8c9a11493..68e4f171ec 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
+static Oid get_partition_parent_recurse(Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1392,14 +1393,25 @@ check_default_allows_bound(Relation parent, Relation default_rel,
  * when it is known that the relation is a partition.
  */
 Oid
-get_partition_parent(Oid relid)
+get_partition_parent(Oid relid, bool getroot)
+{
+	Oid		parentOid = get_partition_parent_recurse(relid, getroot);
+
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find parent of relation %u", relid);
+
+	return parentOid;
+}
+
+static Oid
+get_partition_parent_recurse(Oid relid, bool getroot)
 {
 	Form_pg_inherits form;
 	Relation	catalogRelation;
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
-	Oid			result;
+	Oid			result = InvalidOid;
 
 	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
 
@@ -1416,15 +1428,25 @@ get_partition_parent(Oid relid)
 							  NULL, 2, key);
 
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
+	if (HeapTupleIsValid(tuple))
+	{
+		form = (Form_pg_inherits) GETSTRUCT(tuple);
+		result = form->inhparent;
 
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
+		if (getroot)
+			result = get_partition_parent_recurse(result, getroot);
+	}
 
 	systable_endscan(scan);
 	heap_close(catalogRelation, AccessShareLock);
 
+	/*
+	 * If we recursed and got InvalidOid as parent, that means we reached the
+	 * root of this partition tree in the form of 'relid' itself.
+	 */
+	if (getroot && !OidIsValid(result))
+		return relid;
+
 	return result;
 }
 
@@ -2508,7 +2530,7 @@ generate_partition_qual(Relation rel)
 		return copyObject(rel->rd_partcheck);
 
 	/* Grab at least an AccessShareLock on the parent table */
-	parent = heap_open(get_partition_parent(RelationGetRelid(rel)),
+	parent = heap_open(get_partition_parent(RelationGetRelid(rel), false),
 					   AccessShareLock);
 
 	/* Get pg_class.relpartbound */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 74e020bffc..6e103f26ca 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1292,7 +1292,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	 */
 	if (is_partition && relOid != oldRelOid)
 	{
-		state->partParentOid = get_partition_parent(relOid);
+		state->partParentOid = get_partition_parent(relOid, false);
 		if (OidIsValid(state->partParentOid))
 			LockRelationOid(state->partParentOid, AccessExclusiveLock);
 	}
@@ -5843,7 +5843,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	/* If rel is partition, shouldn't drop NOT NULL if parent has the same */
 	if (rel->rd_rel->relispartition)
 	{
-		Oid			parentId = get_partition_parent(RelationGetRelid(rel));
+		Oid			parentId = get_partition_parent(RelationGetRelid(rel),
+													false);
 		Relation	parent = heap_open(parentId, AccessShareLock);
 		TupleDesc	tupDesc = RelationGetDescr(parent);
 		AttrNumber	parent_attnum;
@@ -14346,7 +14347,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 		if (!has_superclass(idxid))
 			continue;
 
-		Assert((IndexGetRelation(get_partition_parent(idxid), false) ==
+		Assert((IndexGetRelation(get_partition_parent(idxid, false), false) ==
 			   RelationGetRelid(rel)));
 
 		idx = index_open(idxid, AccessExclusiveLock);
@@ -14475,7 +14476,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 
 	/* Silently do nothing if already in the right state */
 	currParent = !has_superclass(partIdxId) ? InvalidOid :
-		get_partition_parent(partIdxId);
+		get_partition_parent(partIdxId, false);
 	if (currParent != RelationGetRelid(parentIdx))
 	{
 		IndexInfo  *childInfo;
@@ -14708,8 +14709,10 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		/* make sure we see the validation we just did */
 		CommandCounterIncrement();
 
-		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
-		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
+		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx),
+										   false);
+		parentTblId = get_partition_parent(RelationGetRelid(partedTbl),
+										   false);
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
 		parentTbl = relation_open(parentTblId, AccessExclusiveLock);
 		Assert(!parentIdx->rd_index->indisvalid);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 54efc9e545..3f7b61dc37 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -19,6 +19,7 @@
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "optimizer/prep.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -109,6 +110,23 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot(NULL);
 
+	/*
+	 * We might need these arrays for conflict checking and handling the
+	 * DO UPDATE action
+	 */
+	if (mtstate && mtstate->mt_onconflict != ONCONFLICT_NONE)
+	{
+		proute->partition_arbiter_indexes = (List **)
+											palloc(proute->num_partitions *
+												   sizeof(List *));
+		proute->partition_conflproj_slots = (TupleTableSlot **)
+											palloc(proute->num_partitions *
+												   sizeof(TupleTableSlot *));
+		proute->partition_existing_slots = (TupleTableSlot **)
+											palloc(proute->num_partitions *
+												   sizeof(TupleTableSlot *));
+	}
+
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
@@ -475,9 +493,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +502,155 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * Initialize information about this partition that's needed to handle
+	 * the ON CONFLICT clause.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		TupleTableSlot *part_conflproj_slot,
+					   *part_existing_slot;
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell *lc;
+		List	 *my_arbiterindexes = NIL;
+
+		/*
+		 * If the root parent and partition have the same tuple
+		 * descriptor, just reuse the original projection and WHERE
+		 * clause expressions for partition.
+		 */
+		if (map == NULL)
+		{
+			/* Use the existing slot. */
+			part_existing_slot = mtstate->mt_existing;
+			part_conflproj_slot = mtstate->mt_conflproj;
+			leaf_part_rri->ri_onConflictSetProj =
+									resultRelInfo->ri_onConflictSetProj;
+			leaf_part_rri->ri_onConflictSetWhere =
+									resultRelInfo->ri_onConflictSetWhere;
+		}
+		else
+		{
+			/* Convert expressions contain partition's attnos. */
+			List *conv_setproj;
+			AppendRelInfo appinfo;
+			TupleDesc	tupDesc;
+
+			/* Need our own slot. */
+			part_existing_slot =
+					ExecInitExtraTupleSlot(mtstate->ps.state, partrelDesc);
+
+			/* First convert references to EXCLUDED pseudo-relation. */
+			conv_setproj = map_partition_varattnos((List *)
+												   node->onConflictSet,
+												   INNER_VAR,
+												   partrel,
+												   firstResultRel, NULL);
+			/* Then convert references to main target relation. */
+			conv_setproj = map_partition_varattnos((List *)
+												   conv_setproj,
+												   firstVarno,
+												   partrel,
+												   firstResultRel, NULL);
+
+			/*
+			 * Need to fix the target entries' resnos too by using
+			 * inheritance translation.
+			 */
+			appinfo.type = T_AppendRelInfo;
+			appinfo.parent_relid = firstVarno;
+			appinfo.parent_reltype = firstResultRel->rd_rel->reltype;
+			appinfo.child_relid = partrel->rd_id;
+			appinfo.child_reltype = partrel->rd_rel->reltype;
+			appinfo.parent_reloid = firstResultRel->rd_id;
+			make_inh_translation_list(firstResultRel, partrel,
+									  1, /* dummy */
+									  &appinfo.translated_vars);
+			conv_setproj = adjust_inherited_tlist((List *) conv_setproj,
+												  &appinfo);
+
+			/*
+			 * Add any attributes that are missing in the source list, such
+			 * as, dropped columns in the partition.
+			 */
+			conv_setproj = expand_targetlist(conv_setproj, CMD_UPDATE,
+											 firstVarno, partrel);
+
+			tupDesc = ExecTypeFromTL(conv_setproj, partrelDesc->tdhasoid);
+			part_conflproj_slot = ExecInitExtraTupleSlot(mtstate->ps.state,
+														 tupDesc);
+			leaf_part_rri->ri_onConflictSetProj =
+					ExecBuildProjectionInfo(conv_setproj, econtext,
+											part_conflproj_slot,
+											&mtstate->ps, partrelDesc);
+
+			/* For WHERE quals, map_partition_varattnos() suffices. */
+			if (node->onConflictWhere)
+			{
+				List *conv_where;
+				ExprState  *qualexpr;
+
+				/* First convert references to EXCLUDED pseudo-relation. */
+				conv_where = map_partition_varattnos((List *)
+													 node->onConflictWhere,
+													 INNER_VAR,
+													 partrel,
+													 firstResultRel, NULL);
+				/* Then convert references to main target relation. */
+				conv_where = map_partition_varattnos((List *)
+													 conv_where,
+													 firstVarno,
+													 partrel, firstResultRel,
+													 NULL);
+				qualexpr = ExecInitQual(conv_where, &mtstate->ps);
+				leaf_part_rri->ri_onConflictSetWhere = qualexpr;
+			}
+		}
+
+		/*
+		 * Save away for use later.  Set mtstate->mt_existing and
+		 * mtstate->mt_conflproj, respectively, to these values before
+		 * calling ExecOnConflictUpdate().
+		 */
+		proute->partition_existing_slots[partidx] = part_existing_slot;
+		proute->partition_conflproj_slots[partidx] = part_conflproj_slot;
+
+		/* Initialize arbiter indexes list, if any. */
+		foreach(lc, mtstate->mt_arbiterindexes)
+		{
+			Oid		parentArbiterIndexOid = lfirst_oid(lc);
+			int		i;
+
+			/*
+			 * Find parentArbiterIndexOid's child in this partition and add it
+			 * to my_arbiterindexes.
+			 */
+			for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+			{
+				Relation index = leaf_part_rri->ri_IndexRelationDescs[i];
+				Oid		 indexOid = RelationGetRelid(index);
+
+				if (parentArbiterIndexOid ==
+					get_partition_parent(indexOid, true))
+					my_arbiterindexes = lappend_oid(my_arbiterindexes,
+													indexOid);
+			}
+		}
+
+		/*
+		 * Use this list instead of the original one containing parent's
+		 * indexes.
+		 */
+		proute->partition_arbiter_indexes[partidx] = my_arbiterindexes;
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c32928d9bd..0f9ca6586e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -374,6 +374,24 @@ ExecInsert(ModifyTableState *mtstate,
 										  tuple,
 										  proute->partition_tuple_slot,
 										  &slot);
+
+		/* Switch to partition's ON CONFLICT information. */
+		if (arbiterIndexes)
+		{
+			Assert(onconflict != ONCONFLICT_NONE);
+			arbiterIndexes = proute->partition_arbiter_indexes[leaf_part_index];
+
+			/* Use correct existing and projection slots for DO UPDATE */
+			if (onconflict == ONCONFLICT_UPDATE)
+			{
+				Assert(proute->partition_existing_slots[leaf_part_index]);
+				mtstate->mt_existing =
+						proute->partition_existing_slots[leaf_part_index];
+				Assert(proute->partition_conflproj_slots[leaf_part_index]);
+				mtstate->mt_conflproj =
+						proute->partition_conflproj_slots[leaf_part_index];
+			}
+		}
 	}
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
@@ -1146,6 +1164,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 8603feef2b..f5ba93db4a 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -53,8 +53,6 @@
 #include "utils/rel.h"
 
 
-static List *expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, Relation rel);
 
 
 /*
@@ -251,7 +249,7 @@ preprocess_targetlist(PlannerInfo *root)
  *	  add targetlist entries for any missing attributes, and ensure the
  *	  non-junk attributes appear in proper field order.
  */
-static List *
+List *
 expand_targetlist(List *tlist, int command_type,
 				  Index result_relation, Relation rel)
 {
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..4bd72026f0 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -113,18 +113,12 @@ static void expand_single_inheritance_child(PlannerInfo *root,
 								PlanRowMark *top_parentrc, Relation childrel,
 								List **appinfos, RangeTblEntry **childrte_p,
 								Index *childRTindex_p);
-static void make_inh_translation_list(Relation oldrelation,
-						  Relation newrelation,
-						  Index newvarno,
-						  List **translated_vars);
 static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
 					List *translated_vars);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
 							   adjust_appendrel_attrs_context *context);
 static Relids adjust_child_relids(Relids relids, int nappinfos,
 					AppendRelInfo **appinfos);
-static List *adjust_inherited_tlist(List *tlist,
-					   AppendRelInfo *context);
 
 
 /*
@@ -1793,7 +1787,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
  *
  * For paranoia's sake, we match type/collation as well as attribute name.
  */
-static void
+void
 make_inh_translation_list(Relation oldrelation, Relation newrelation,
 						  Index newvarno,
 						  List **translated_vars)
@@ -2353,7 +2347,7 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
  *
  * Note that this is not needed for INSERT because INSERT isn't inheritable.
  */
-static List *
+List *
 adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 {
 	bool		changed_it = false;
@@ -2374,6 +2368,13 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 		if (tle->resjunk)
 			continue;			/* ignore junk items */
 
+		/*
+		 * ignore dummy tlist entry added by exapnd_targetlist() for
+		 * dropped columns in the parent table.
+		 */
+		if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+			continue;
+
 		/* Look up the translation of this column: it must be a Var */
 		if (tle->resno <= 0 ||
 			tle->resno > list_length(context->translated_vars))
@@ -2412,6 +2413,13 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 			if (tle->resjunk)
 				continue;		/* ignore junk items */
 
+			/*
+			 * ignore dummy tlist entry added by exapnd_targetlist() for
+			 * dropped columns in the parent table.
+			 */
+			if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+				continue;
+
 			if (tle->resno == attrno)
 				new_tlist = lappend(new_tlist, tle);
 			else if (tle->resno > attrno)
@@ -2426,6 +2434,13 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 		if (!tle->resjunk)
 			continue;			/* here, ignore non-junk items */
 
+		/*
+		 * ignore dummy tlist entry added by exapnd_targetlist() for
+		 * dropped columns in the parent table.
+		 */
+		if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+			continue;
+
 		tle->resno = attrno;
 		new_tlist = lappend(new_tlist, tle);
 		attrno++;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c3a9617f67..92696f0607 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1025,13 +1025,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..70ddb225a1 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -51,7 +51,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
-extern Oid	get_partition_parent(Oid relid);
+extern Oid	get_partition_parent(Oid relid, bool getroot);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 03a599ad57..9fc1ab6711 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -90,6 +90,20 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								given leaf partition's rowtype after that
  *								partition is chosen for insertion by
  *								tuple-routing.
+ * partition_arbiter_indexes	Array of Lists with each slot containing the
+ *								list of OIDs of a given partition's indexes
+ *								that are to be used as arbiter indexes for
+ *								ON CONFLICT checking
+ * partition_conflproj_slots	Array of TupleTableSlots to hold tuples of
+ *								ON CONFLICT DO UPDATE SET projections;
+ *								contains NULL for partitions whose tuple
+ *								descriptor exactly matches the root parent's
+ *								(including dropped columns)
+ * partition_existing_slots		Array of TupleTableSlots to hold existing
+ *								tuple during ON CONFLICT DO UPDATE handling;
+ *								contains NULL for partitions whose tuple
+ *								descriptor exactly matches the root parent's
+ *								(including dropped columns)
  *-----------------------
  */
 typedef struct PartitionTupleRouting
@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
 	int			num_subplan_partition_offsets;
 	TupleTableSlot *partition_tuple_slot;
 	TupleTableSlot *root_tuple_slot;
+	List	  **partition_arbiter_indexes;
+	TupleTableSlot **partition_conflproj_slots;
+	TupleTableSlot **partition_existing_slots;
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 89b7ef337c..d380b419d7 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -42,6 +42,10 @@ extern List *preprocess_targetlist(PlannerInfo *root);
 
 extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
 
+typedef struct RelationData *Relation;
+extern List *expand_targetlist(List *tlist, int command_type,
+				  Index result_relation, Relation rel);
+
 /*
  * prototypes for prepunion.c
  */
@@ -65,4 +69,11 @@ extern SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 							   Relids child_relids, Relids top_parent_relids);
 
+extern void make_inh_translation_list(Relation oldrelation,
+						  Relation newrelation,
+						  Index newvarno,
+						  List **translated_vars);
+extern List *adjust_inherited_tlist(List *tlist,
+					   AppendRelInfo *context);
+
 #endif							/* PREP_H */
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..a9677f06e6 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,67 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..73122479a3 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,59 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
-- 
2.11.0

#3Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#1)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
Following the lead of edd44738bc88 ("Be lazier about partition tuple
routing.") this incarnation only does the necessary push-ups for the
specific partition that needs it, at execution time. As far as I can
tell, it works as intended.

I chose to refuse the case where the DO UPDATE clause causes the tuple
to move to another partition (i.e. you're updating the partition key of
the tuple). While it's probably possible to implement that, it doesn't
seem a very productive use of time.

I would have thought that to be the only case we could support with
the current infrastructure. Doesn't a correct implementation for any
other case require a global index?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Robert Haas (#3)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/01 1:03, Robert Haas wrote:

On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
Following the lead of edd44738bc88 ("Be lazier about partition tuple
routing.") this incarnation only does the necessary push-ups for the
specific partition that needs it, at execution time. As far as I can
tell, it works as intended.

I chose to refuse the case where the DO UPDATE clause causes the tuple
to move to another partition (i.e. you're updating the partition key of
the tuple). While it's probably possible to implement that, it doesn't
seem a very productive use of time.

I would have thought that to be the only case we could support with
the current infrastructure. Doesn't a correct implementation for any
other case require a global index?

I'm thinking that Alvaro is talking here about the DO UPDATE action part,
not the conflict checking part. The latter will definitely require global
indexes if conflict were to be checked on columns not containing the
partition key.

The case Alvaro mentions arises after checking the conflict, presumably
using an inherited unique index whose keys must include the partition
keys. If the conflict action is DO UPDATE and its SET clause changes
partition key columns such that the row will have to change the partition,
then the current patch will result in an error. I think that's because
making update row movement work in this case will require some adjustments
to what 2f178441044 (Allow UPDATE to move rows between partitions)
implemented. We wouldn't have things set up in the ModifyTableState that
the current row-movement code depends on being set up; for example, there
wouldn't be per-subplan ResultRelInfo's in the ON CONFLICT DO UPDATE case.
The following Assert in ExecUpdate() will fail for instance:

map_index = resultRelInfo - mtstate->resultRelInfo;
Assert(map_index >= 0 && map_index < mtstate->mt_nplans);

Thanks,
Amit

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#2)
Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote wrote:

Actually, after your comment on my original patch [1], I did make it work
for multiple levels by teaching the partition initialization code to find
a given partition's indexes that are inherited from the root table (that
is the table mentioned in command). So, after a tuple is routed to a
partition, we switch from the original arbiterIndexes list to the one we
created for the partition, which must contain OIDs corresponding to those
in the original list. After all, for each of the parent's indexes that
the planner put into the original arbiterIndexes list, there must exist an
index in each of the leaf partitions.

Oh, your solution for this seems simple enough. Silly me, I was trying
to implement it in a quite roundabout way. Thanks. (I do wonder if we
should save the "root" reloid in the relcache).

I had also observed when working on the patch that various TupleTableSlots
used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
inheritance-translated target list (DO UPDATE SET target list). In fact,
that has to take into account also the dropped columns; we may have
dropped columns either in parent or in a partition or in both at same or
different attnum positions. That means, simple map_partition_varattnos()
translation doesn't help in this case.

Yeah, I was aware these corner cases could become a problem though I
hadn't gotten around to testing them yet. Thanks for all your work on
this.

The usage of the few optimizer/prep/ functions that are currently static
doesn't fill me with joy. These functions have weird APIs because
they're static so we don't rally care, but once we export them we should
strive to be more careful. I'd rather stay away from just exporting
them all, so I chose to encapsulate those calls in a single function and
export only expand_targetlist from preptlist.c, keeping the others
static in prepunion.c. In the attached patch set, I put an API change
(work on tupdescs rather than full-blown relations) for a couple of
those functions as 0001, then your patch as 0002, then a few fixups of
my own. (0002 is not bit-by-bit identical to yours; I think I had to
fix some merge conflict with 0001, but should be pretty much the same).

But looking further, I think there is much cruft that has accumulated in
those functions (because they've gotten simplified over time), and we
could do some additional cleanup surgery. For example, there is no
reason to pass a list pointer to make_inh_translation_list(); we could
just return it. And then we don't have to cons up a fake AppendRelInfo
with all dummy values that adjust_inherited_tlist doesn't even care
about. I think there was a point for all these contortions back at some
point (visible by checking git history of this code), but it all seems
useless now.

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result? I'm obviously confused about what
expand_targetlist call this comment is talking about. Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday. I'll get back to this
soon, but in the meantime, here's what I have.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#2)
3 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote wrote:

Actually, after your comment on my original patch [1], I did make it work
for multiple levels by teaching the partition initialization code to find
a given partition's indexes that are inherited from the root table (that
is the table mentioned in command). So, after a tuple is routed to a
partition, we switch from the original arbiterIndexes list to the one we
created for the partition, which must contain OIDs corresponding to those
in the original list. After all, for each of the parent's indexes that
the planner put into the original arbiterIndexes list, there must exist an
index in each of the leaf partitions.

Oh, your solution for this seems simple enough. Silly me, I was trying
to implement it in a quite roundabout way. Thanks. (I do wonder if we
should save the "root" reloid in the relcache).

I had also observed when working on the patch that various TupleTableSlots
used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
inheritance-translated target list (DO UPDATE SET target list). In fact,
that has to take into account also the dropped columns; we may have
dropped columns either in parent or in a partition or in both at same or
different attnum positions. That means, simple map_partition_varattnos()
translation doesn't help in this case.

Yeah, I was aware these corner cases could become a problem though I
hadn't gotten around to testing them yet. Thanks for all your work on
this.

The usage of the few optimizer/prep/ functions that are currently static
doesn't fill me with joy. These functions have weird APIs because
they're static so we don't rally care, but once we export them we should
strive to be more careful. I'd rather stay away from just exporting
them all, so I chose to encapsulate those calls in a single function and
export only expand_targetlist from preptlist.c, keeping the others
static in prepunion.c. In the attached patch set, I put an API change
(work on tupdescs rather than full-blown relations) for a couple of
those functions as 0001, then your patch as 0002, then a few fixups of
my own. (0002 is not bit-by-bit identical to yours; I think I had to
fix some merge conflict with 0001, but should be pretty much the same).

But looking further, I think there is much cruft that has accumulated in
those functions (because they've gotten simplified over time), and we
could do some additional cleanup surgery. For example, there is no
reason to pass a list pointer to make_inh_translation_list(); we could
just return it. And then we don't have to cons up a fake AppendRelInfo
with all dummy values that adjust_inherited_tlist doesn't even care
about. I think there was a point for all these contortions back at some
point (visible by checking git history of this code), but it all seems
useless now.

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result? I'm obviously confused about what
expand_targetlist call this comment is talking about. Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday. I'll get back to this
soon, but in the meantime, here's what I have.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-0001-Make-some-static-functions-work-on-TupleDesc-rath.patchtext/plain; charset=us-asciiDownload
From d5b3b7c252da2c2a25bd9b8de40a03f2d5f30081 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 1 Mar 2018 19:58:50 -0300
Subject: [PATCH v3 1/3] Make some static functions work on TupleDesc rather
 than Relation

---
 src/backend/optimizer/prep/preptlist.c | 21 +++++++++++----------
 src/backend/optimizer/prep/prepunion.c | 26 ++++++++++++++++----------
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 8603feef2b..8c94dd4f59 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -116,7 +116,8 @@ preprocess_targetlist(PlannerInfo *root)
 	tlist = parse->targetList;
 	if (command_type == CMD_INSERT || command_type == CMD_UPDATE)
 		tlist = expand_targetlist(tlist, command_type,
-								  result_relation, target_relation);
+								  result_relation,
+								  RelationGetDescr(target_relation));
 
 	/*
 	 * Add necessary junk columns for rowmarked rels.  These values are needed
@@ -230,7 +231,7 @@ preprocess_targetlist(PlannerInfo *root)
 			expand_targetlist(parse->onConflict->onConflictSet,
 							  CMD_UPDATE,
 							  result_relation,
-							  target_relation);
+							  RelationGetDescr(target_relation));
 
 	if (target_relation)
 		heap_close(target_relation, NoLock);
@@ -247,13 +248,13 @@ preprocess_targetlist(PlannerInfo *root)
 
 /*
  * expand_targetlist
- *	  Given a target list as generated by the parser and a result relation,
- *	  add targetlist entries for any missing attributes, and ensure the
- *	  non-junk attributes appear in proper field order.
+ *	  Given a target list as generated by the parser and a result relation's
+ *	  tuple descriptor, add targetlist entries for any missing attributes, and
+ *	  ensure the non-junk attributes appear in proper field order.
  */
 static List *
 expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, Relation rel)
+				  Index result_relation, TupleDesc tupdesc)
 {
 	List	   *new_tlist = NIL;
 	ListCell   *tlist_item;
@@ -266,14 +267,14 @@ expand_targetlist(List *tlist, int command_type,
 	 * The rewriter should have already ensured that the TLEs are in correct
 	 * order; but we have to insert TLEs for any missing attributes.
 	 *
-	 * Scan the tuple description in the relation's relcache entry to make
-	 * sure we have all the user attributes in the right order.
+	 * Scan the tuple description to make sure we have all the user attributes
+	 * in the right order.
 	 */
-	numattrs = RelationGetNumberOfAttributes(rel);
+	numattrs = tupdesc->natts;
 
 	for (attrno = 1; attrno <= numattrs; attrno++)
 	{
-		Form_pg_attribute att_tup = TupleDescAttr(rel->rd_att, attrno - 1);
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
 		TargetEntry *new_tle = NULL;
 
 		if (tlist_item != NULL)
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..7949829b31 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -113,8 +113,9 @@ static void expand_single_inheritance_child(PlannerInfo *root,
 								PlanRowMark *top_parentrc, Relation childrel,
 								List **appinfos, RangeTblEntry **childrte_p,
 								Index *childRTindex_p);
-static void make_inh_translation_list(Relation oldrelation,
-						  Relation newrelation,
+static void make_inh_translation_list(TupleDesc old_tupdesc,
+						  TupleDesc new_tupdesc,
+						  char *new_rel_name,
 						  Index newvarno,
 						  List **translated_vars);
 static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
@@ -1730,7 +1731,10 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 		appinfo->child_relid = childRTindex;
 		appinfo->parent_reltype = parentrel->rd_rel->reltype;
 		appinfo->child_reltype = childrel->rd_rel->reltype;
-		make_inh_translation_list(parentrel, childrel, childRTindex,
+		make_inh_translation_list(RelationGetDescr(parentrel),
+								  RelationGetDescr(childrel),
+								  RelationGetRelationName(childrel),
+								  childRTindex,
 								  &appinfo->translated_vars);
 		appinfo->parent_reloid = parentOID;
 		*appinfos = lappend(*appinfos, appinfo);
@@ -1794,16 +1798,18 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
  * For paranoia's sake, we match type/collation as well as attribute name.
  */
 static void
-make_inh_translation_list(Relation oldrelation, Relation newrelation,
+make_inh_translation_list(TupleDesc old_tupdesc, TupleDesc new_tupdesc,
+						  char *new_rel_name,
 						  Index newvarno,
 						  List **translated_vars)
 {
 	List	   *vars = NIL;
-	TupleDesc	old_tupdesc = RelationGetDescr(oldrelation);
-	TupleDesc	new_tupdesc = RelationGetDescr(newrelation);
 	int			oldnatts = old_tupdesc->natts;
 	int			newnatts = new_tupdesc->natts;
 	int			old_attno;
+	bool		equal_tupdescs;
+
+	equal_tupdescs = equalTupleDescs(old_tupdesc, new_tupdesc);
 
 	for (old_attno = 0; old_attno < oldnatts; old_attno++)
 	{
@@ -1830,7 +1836,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 		 * When we are generating the "translation list" for the parent table
 		 * of an inheritance set, no need to search for matches.
 		 */
-		if (oldrelation == newrelation)
+		if (equal_tupdescs)
 		{
 			vars = lappend(vars, makeVar(newvarno,
 										 (AttrNumber) (old_attno + 1),
@@ -1867,16 +1873,16 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 			}
 			if (new_attno >= newnatts)
 				elog(ERROR, "could not find inherited attribute \"%s\" of relation \"%s\"",
-					 attname, RelationGetRelationName(newrelation));
+					 attname, new_rel_name);
 		}
 
 		/* Found it, check type and collation match */
 		if (atttypid != att->atttypid || atttypmod != att->atttypmod)
 			elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's type",
-				 attname, RelationGetRelationName(newrelation));
+				 attname, new_rel_name);
 		if (attcollation != att->attcollation)
 			elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's collation",
-				 attname, RelationGetRelationName(newrelation));
+				 attname, new_rel_name);
 
 		vars = lappend(vars, makeVar(newvarno,
 									 (AttrNumber) (new_attno + 1),
-- 
2.11.0

v3-0002-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchtext/plain; charset=us-asciiDownload
From 84b1310b3abdcd91a20a82399fa59a566f62b7ac Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 28 Feb 2018 17:58:00 +0900
Subject: [PATCH v3 2/3] Fix ON CONFLICT to work with partitioned tables

---
 doc/src/sgml/ddl.sgml                         |  15 ---
 src/backend/catalog/heap.c                    |   2 +-
 src/backend/catalog/partition.c               |  36 ++++--
 src/backend/commands/tablecmds.c              |  15 ++-
 src/backend/executor/execPartition.c          | 170 +++++++++++++++++++++++++-
 src/backend/executor/nodeModifyTable.c        |  30 +++++
 src/backend/optimizer/prep/preptlist.c        |   4 +-
 src/backend/optimizer/prep/prepunion.c        |  25 +++-
 src/backend/parser/analyze.c                  |   7 --
 src/include/catalog/partition.h               |   2 +-
 src/include/executor/execPartition.h          |  17 +++
 src/include/optimizer/prep.h                  |  11 ++
 src/test/regress/expected/insert_conflict.out |  73 +++++++++--
 src/test/regress/sql/insert_conflict.sql      |  64 ++++++++--
 14 files changed, 404 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2b879ead4b..b2b3485b83 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3325,21 +3325,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index cf36ce4add..6d49c41217 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1777,7 +1777,7 @@ heap_drop_with_catalog(Oid relid)
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 	if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
 	{
-		parentOid = get_partition_parent(relid);
+		parentOid = get_partition_parent(relid, false);
 		LockRelationOid(parentOid, AccessExclusiveLock);
 
 		/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fcf7655553..9d1ad09595 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
+static Oid get_partition_parent_recurse(Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1392,14 +1393,25 @@ check_default_allows_bound(Relation parent, Relation default_rel,
  * when it is known that the relation is a partition.
  */
 Oid
-get_partition_parent(Oid relid)
+get_partition_parent(Oid relid, bool getroot)
+{
+	Oid		parentOid = get_partition_parent_recurse(relid, getroot);
+
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find parent of relation %u", relid);
+
+	return parentOid;
+}
+
+static Oid
+get_partition_parent_recurse(Oid relid, bool getroot)
 {
 	Form_pg_inherits form;
 	Relation	catalogRelation;
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
-	Oid			result;
+	Oid			result = InvalidOid;
 
 	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
 
@@ -1416,15 +1428,25 @@ get_partition_parent(Oid relid)
 							  NULL, 2, key);
 
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
+	if (HeapTupleIsValid(tuple))
+	{
+		form = (Form_pg_inherits) GETSTRUCT(tuple);
+		result = form->inhparent;
 
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
+		if (getroot)
+			result = get_partition_parent_recurse(result, getroot);
+	}
 
 	systable_endscan(scan);
 	heap_close(catalogRelation, AccessShareLock);
 
+	/*
+	 * If we recursed and got InvalidOid as parent, that means we reached the
+	 * root of this partition tree in the form of 'relid' itself.
+	 */
+	if (getroot && !OidIsValid(result))
+		return relid;
+
 	return result;
 }
 
@@ -2505,7 +2527,7 @@ generate_partition_qual(Relation rel)
 		return copyObject(rel->rd_partcheck);
 
 	/* Grab at least an AccessShareLock on the parent table */
-	parent = heap_open(get_partition_parent(RelationGetRelid(rel)),
+	parent = heap_open(get_partition_parent(RelationGetRelid(rel), false),
 					   AccessShareLock);
 
 	/* Get pg_class.relpartbound */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 74e020bffc..6e103f26ca 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1292,7 +1292,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	 */
 	if (is_partition && relOid != oldRelOid)
 	{
-		state->partParentOid = get_partition_parent(relOid);
+		state->partParentOid = get_partition_parent(relOid, false);
 		if (OidIsValid(state->partParentOid))
 			LockRelationOid(state->partParentOid, AccessExclusiveLock);
 	}
@@ -5843,7 +5843,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	/* If rel is partition, shouldn't drop NOT NULL if parent has the same */
 	if (rel->rd_rel->relispartition)
 	{
-		Oid			parentId = get_partition_parent(RelationGetRelid(rel));
+		Oid			parentId = get_partition_parent(RelationGetRelid(rel),
+													false);
 		Relation	parent = heap_open(parentId, AccessShareLock);
 		TupleDesc	tupDesc = RelationGetDescr(parent);
 		AttrNumber	parent_attnum;
@@ -14346,7 +14347,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 		if (!has_superclass(idxid))
 			continue;
 
-		Assert((IndexGetRelation(get_partition_parent(idxid), false) ==
+		Assert((IndexGetRelation(get_partition_parent(idxid, false), false) ==
 			   RelationGetRelid(rel)));
 
 		idx = index_open(idxid, AccessExclusiveLock);
@@ -14475,7 +14476,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 
 	/* Silently do nothing if already in the right state */
 	currParent = !has_superclass(partIdxId) ? InvalidOid :
-		get_partition_parent(partIdxId);
+		get_partition_parent(partIdxId, false);
 	if (currParent != RelationGetRelid(parentIdx))
 	{
 		IndexInfo  *childInfo;
@@ -14708,8 +14709,10 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		/* make sure we see the validation we just did */
 		CommandCounterIncrement();
 
-		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
-		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
+		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx),
+										   false);
+		parentTblId = get_partition_parent(RelationGetRelid(partedTbl),
+										   false);
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
 		parentTbl = relation_open(parentTblId, AccessExclusiveLock);
 		Assert(!parentIdx->rd_index->indisvalid);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 54efc9e545..3f7b61dc37 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -19,6 +19,7 @@
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "optimizer/prep.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -109,6 +110,23 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot(NULL);
 
+	/*
+	 * We might need these arrays for conflict checking and handling the
+	 * DO UPDATE action
+	 */
+	if (mtstate && mtstate->mt_onconflict != ONCONFLICT_NONE)
+	{
+		proute->partition_arbiter_indexes = (List **)
+											palloc(proute->num_partitions *
+												   sizeof(List *));
+		proute->partition_conflproj_slots = (TupleTableSlot **)
+											palloc(proute->num_partitions *
+												   sizeof(TupleTableSlot *));
+		proute->partition_existing_slots = (TupleTableSlot **)
+											palloc(proute->num_partitions *
+												   sizeof(TupleTableSlot *));
+	}
+
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
@@ -475,9 +493,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +502,155 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * Initialize information about this partition that's needed to handle
+	 * the ON CONFLICT clause.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		TupleTableSlot *part_conflproj_slot,
+					   *part_existing_slot;
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell *lc;
+		List	 *my_arbiterindexes = NIL;
+
+		/*
+		 * If the root parent and partition have the same tuple
+		 * descriptor, just reuse the original projection and WHERE
+		 * clause expressions for partition.
+		 */
+		if (map == NULL)
+		{
+			/* Use the existing slot. */
+			part_existing_slot = mtstate->mt_existing;
+			part_conflproj_slot = mtstate->mt_conflproj;
+			leaf_part_rri->ri_onConflictSetProj =
+									resultRelInfo->ri_onConflictSetProj;
+			leaf_part_rri->ri_onConflictSetWhere =
+									resultRelInfo->ri_onConflictSetWhere;
+		}
+		else
+		{
+			/* Convert expressions contain partition's attnos. */
+			List *conv_setproj;
+			AppendRelInfo appinfo;
+			TupleDesc	tupDesc;
+
+			/* Need our own slot. */
+			part_existing_slot =
+					ExecInitExtraTupleSlot(mtstate->ps.state, partrelDesc);
+
+			/* First convert references to EXCLUDED pseudo-relation. */
+			conv_setproj = map_partition_varattnos((List *)
+												   node->onConflictSet,
+												   INNER_VAR,
+												   partrel,
+												   firstResultRel, NULL);
+			/* Then convert references to main target relation. */
+			conv_setproj = map_partition_varattnos((List *)
+												   conv_setproj,
+												   firstVarno,
+												   partrel,
+												   firstResultRel, NULL);
+
+			/*
+			 * Need to fix the target entries' resnos too by using
+			 * inheritance translation.
+			 */
+			appinfo.type = T_AppendRelInfo;
+			appinfo.parent_relid = firstVarno;
+			appinfo.parent_reltype = firstResultRel->rd_rel->reltype;
+			appinfo.child_relid = partrel->rd_id;
+			appinfo.child_reltype = partrel->rd_rel->reltype;
+			appinfo.parent_reloid = firstResultRel->rd_id;
+			make_inh_translation_list(firstResultRel, partrel,
+									  1, /* dummy */
+									  &appinfo.translated_vars);
+			conv_setproj = adjust_inherited_tlist((List *) conv_setproj,
+												  &appinfo);
+
+			/*
+			 * Add any attributes that are missing in the source list, such
+			 * as, dropped columns in the partition.
+			 */
+			conv_setproj = expand_targetlist(conv_setproj, CMD_UPDATE,
+											 firstVarno, partrel);
+
+			tupDesc = ExecTypeFromTL(conv_setproj, partrelDesc->tdhasoid);
+			part_conflproj_slot = ExecInitExtraTupleSlot(mtstate->ps.state,
+														 tupDesc);
+			leaf_part_rri->ri_onConflictSetProj =
+					ExecBuildProjectionInfo(conv_setproj, econtext,
+											part_conflproj_slot,
+											&mtstate->ps, partrelDesc);
+
+			/* For WHERE quals, map_partition_varattnos() suffices. */
+			if (node->onConflictWhere)
+			{
+				List *conv_where;
+				ExprState  *qualexpr;
+
+				/* First convert references to EXCLUDED pseudo-relation. */
+				conv_where = map_partition_varattnos((List *)
+													 node->onConflictWhere,
+													 INNER_VAR,
+													 partrel,
+													 firstResultRel, NULL);
+				/* Then convert references to main target relation. */
+				conv_where = map_partition_varattnos((List *)
+													 conv_where,
+													 firstVarno,
+													 partrel, firstResultRel,
+													 NULL);
+				qualexpr = ExecInitQual(conv_where, &mtstate->ps);
+				leaf_part_rri->ri_onConflictSetWhere = qualexpr;
+			}
+		}
+
+		/*
+		 * Save away for use later.  Set mtstate->mt_existing and
+		 * mtstate->mt_conflproj, respectively, to these values before
+		 * calling ExecOnConflictUpdate().
+		 */
+		proute->partition_existing_slots[partidx] = part_existing_slot;
+		proute->partition_conflproj_slots[partidx] = part_conflproj_slot;
+
+		/* Initialize arbiter indexes list, if any. */
+		foreach(lc, mtstate->mt_arbiterindexes)
+		{
+			Oid		parentArbiterIndexOid = lfirst_oid(lc);
+			int		i;
+
+			/*
+			 * Find parentArbiterIndexOid's child in this partition and add it
+			 * to my_arbiterindexes.
+			 */
+			for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+			{
+				Relation index = leaf_part_rri->ri_IndexRelationDescs[i];
+				Oid		 indexOid = RelationGetRelid(index);
+
+				if (parentArbiterIndexOid ==
+					get_partition_parent(indexOid, true))
+					my_arbiterindexes = lappend_oid(my_arbiterindexes,
+													indexOid);
+			}
+		}
+
+		/*
+		 * Use this list instead of the original one containing parent's
+		 * indexes.
+		 */
+		proute->partition_arbiter_indexes[partidx] = my_arbiterindexes;
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c32928d9bd..0f9ca6586e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -374,6 +374,24 @@ ExecInsert(ModifyTableState *mtstate,
 										  tuple,
 										  proute->partition_tuple_slot,
 										  &slot);
+
+		/* Switch to partition's ON CONFLICT information. */
+		if (arbiterIndexes)
+		{
+			Assert(onconflict != ONCONFLICT_NONE);
+			arbiterIndexes = proute->partition_arbiter_indexes[leaf_part_index];
+
+			/* Use correct existing and projection slots for DO UPDATE */
+			if (onconflict == ONCONFLICT_UPDATE)
+			{
+				Assert(proute->partition_existing_slots[leaf_part_index]);
+				mtstate->mt_existing =
+						proute->partition_existing_slots[leaf_part_index];
+				Assert(proute->partition_conflproj_slots[leaf_part_index]);
+				mtstate->mt_conflproj =
+						proute->partition_conflproj_slots[leaf_part_index];
+			}
+		}
 	}
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
@@ -1146,6 +1164,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 8c94dd4f59..e2995e6592 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -53,8 +53,6 @@
 #include "utils/rel.h"
 
 
-static List *expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, Relation rel);
 
 
 /*
@@ -252,7 +250,7 @@ preprocess_targetlist(PlannerInfo *root)
  *	  tuple descriptor, add targetlist entries for any missing attributes, and
  *	  ensure the non-junk attributes appear in proper field order.
  */
-static List *
+List *
 expand_targetlist(List *tlist, int command_type,
 				  Index result_relation, TupleDesc tupdesc)
 {
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 7949829b31..4153891f29 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -124,8 +124,6 @@ static Node *adjust_appendrel_attrs_mutator(Node *node,
 							   adjust_appendrel_attrs_context *context);
 static Relids adjust_child_relids(Relids relids, int nappinfos,
 					AppendRelInfo **appinfos);
-static List *adjust_inherited_tlist(List *tlist,
-					   AppendRelInfo *context);
 
 
 /*
@@ -2359,7 +2357,7 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
  *
  * Note that this is not needed for INSERT because INSERT isn't inheritable.
  */
-static List *
+List *
 adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 {
 	bool		changed_it = false;
@@ -2380,6 +2378,13 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 		if (tle->resjunk)
 			continue;			/* ignore junk items */
 
+		/*
+		 * ignore dummy tlist entry added by exapnd_targetlist() for
+		 * dropped columns in the parent table.
+		 */
+		if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+			continue;
+
 		/* Look up the translation of this column: it must be a Var */
 		if (tle->resno <= 0 ||
 			tle->resno > list_length(context->translated_vars))
@@ -2418,6 +2423,13 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 			if (tle->resjunk)
 				continue;		/* ignore junk items */
 
+			/*
+			 * ignore dummy tlist entry added by exapnd_targetlist() for
+			 * dropped columns in the parent table.
+			 */
+			if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+				continue;
+
 			if (tle->resno == attrno)
 				new_tlist = lappend(new_tlist, tle);
 			else if (tle->resno > attrno)
@@ -2432,6 +2444,13 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 		if (!tle->resjunk)
 			continue;			/* here, ignore non-junk items */
 
+		/*
+		 * ignore dummy tlist entry added by exapnd_targetlist() for
+		 * dropped columns in the parent table.
+		 */
+		if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+			continue;
+
 		tle->resno = attrno;
 		new_tlist = lappend(new_tlist, tle);
 		attrno++;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c3a9617f67..92696f0607 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1025,13 +1025,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..70ddb225a1 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -51,7 +51,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
-extern Oid	get_partition_parent(Oid relid);
+extern Oid	get_partition_parent(Oid relid, bool getroot);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 03a599ad57..9fc1ab6711 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -90,6 +90,20 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								given leaf partition's rowtype after that
  *								partition is chosen for insertion by
  *								tuple-routing.
+ * partition_arbiter_indexes	Array of Lists with each slot containing the
+ *								list of OIDs of a given partition's indexes
+ *								that are to be used as arbiter indexes for
+ *								ON CONFLICT checking
+ * partition_conflproj_slots	Array of TupleTableSlots to hold tuples of
+ *								ON CONFLICT DO UPDATE SET projections;
+ *								contains NULL for partitions whose tuple
+ *								descriptor exactly matches the root parent's
+ *								(including dropped columns)
+ * partition_existing_slots		Array of TupleTableSlots to hold existing
+ *								tuple during ON CONFLICT DO UPDATE handling;
+ *								contains NULL for partitions whose tuple
+ *								descriptor exactly matches the root parent's
+ *								(including dropped columns)
  *-----------------------
  */
 typedef struct PartitionTupleRouting
@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
 	int			num_subplan_partition_offsets;
 	TupleTableSlot *partition_tuple_slot;
 	TupleTableSlot *root_tuple_slot;
+	List	  **partition_arbiter_indexes;
+	TupleTableSlot **partition_conflproj_slots;
+	TupleTableSlot **partition_existing_slots;
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 89b7ef337c..d380b419d7 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -42,6 +42,10 @@ extern List *preprocess_targetlist(PlannerInfo *root);
 
 extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
 
+typedef struct RelationData *Relation;
+extern List *expand_targetlist(List *tlist, int command_type,
+				  Index result_relation, Relation rel);
+
 /*
  * prototypes for prepunion.c
  */
@@ -65,4 +69,11 @@ extern SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 							   Relids child_relids, Relids top_parent_relids);
 
+extern void make_inh_translation_list(Relation oldrelation,
+						  Relation newrelation,
+						  Index newvarno,
+						  List **translated_vars);
+extern List *adjust_inherited_tlist(List *tlist,
+					   AppendRelInfo *context);
+
 #endif							/* PREP_H */
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..a9677f06e6 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,67 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..73122479a3 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,59 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
-- 
2.11.0

v3-0003-fixups.patchtext/plain; charset=us-asciiDownload
From eb11d8fbe02530dc6f4fbfa162c59baae46df80c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 28 Feb 2018 20:20:08 -0300
Subject: [PATCH v3 3/3] fixups

---
 src/backend/catalog/partition.c        | 52 ++++++++++++----------
 src/backend/executor/execPartition.c   | 81 +++++++++++++---------------------
 src/backend/optimizer/prep/prepunion.c | 59 ++++++++++++++++++++-----
 src/include/optimizer/prep.h           | 15 +++----
 4 files changed, 115 insertions(+), 92 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 9d1ad09595..ef2ef3aa80 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,7 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
-static Oid get_partition_parent_recurse(Oid relid, bool getroot);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1385,8 +1385,10 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent or topmost ancestor of given relation
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance hierarchy.
  *
  * Note: Because this function assumes that the relation whose OID is passed
  * as an argument will have precisely one parent, it should only be called
@@ -1395,26 +1397,32 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 Oid
 get_partition_parent(Oid relid, bool getroot)
 {
-	Oid		parentOid = get_partition_parent_recurse(relid, getroot);
+	Relation	inhRel;
+	Oid		parentOid;
 
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, getroot);
 	if (parentOid == InvalidOid)
 		elog(ERROR, "could not find parent of relation %u", relid);
 
+	heap_close(inhRel, AccessShareLock);
+
 	return parentOid;
 }
 
+/*
+ * get_partition_parent_recurse
+ *		Recursive part of get_partition_parent
+ */
 static Oid
-get_partition_parent_recurse(Oid relid, bool getroot)
+get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot)
 {
-	Form_pg_inherits form;
-	Relation	catalogRelation;
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
 	Oid			result = InvalidOid;
 
-	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
-
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
 				BTEqualStrategyNumber, F_OIDEQ,
@@ -1424,28 +1432,26 @@ get_partition_parent_recurse(Oid relid, bool getroot)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	/* Obtain the direct parent, and release resources before recursing */
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
 	if (HeapTupleIsValid(tuple))
-	{
-		form = (Form_pg_inherits) GETSTRUCT(tuple);
-		result = form->inhparent;
-
-		if (getroot)
-			result = get_partition_parent_recurse(result, getroot);
-	}
-
+		result = ((Form_pg_inherits) GETSTRUCT(tuple))->inhparent;
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
 
 	/*
-	 * If we recursed and got InvalidOid as parent, that means we reached the
-	 * root of this partition tree in the form of 'relid' itself.
+	 * If we were asked to recurse, do so now.  Except that if we didn't get a
+	 * valid parent, then the 'relid' argument was already the topmost parent,
+	 * so return that.
 	 */
-	if (getroot && !OidIsValid(result))
-		return relid;
+	if (getroot)
+	{
+		if (OidIsValid(result))
+			return get_partition_parent_recurse(inhRel, result, getroot);
+		else
+			return relid;
+	}
 
 	return result;
 }
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 3f7b61dc37..7ea0295d3c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -65,6 +65,7 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -75,14 +76,12 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -116,15 +115,12 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 */
 	if (mtstate && mtstate->mt_onconflict != ONCONFLICT_NONE)
 	{
-		proute->partition_arbiter_indexes = (List **)
-											palloc(proute->num_partitions *
-												   sizeof(List *));
-		proute->partition_conflproj_slots = (TupleTableSlot **)
-											palloc(proute->num_partitions *
-												   sizeof(TupleTableSlot *));
-		proute->partition_existing_slots = (TupleTableSlot **)
-											palloc(proute->num_partitions *
-												   sizeof(TupleTableSlot *));
+		proute->partition_arbiter_indexes =
+			(List **) palloc(nparts * sizeof(List *));
+		proute->partition_conflproj_slots =
+			(TupleTableSlot **) palloc(nparts * sizeof(TupleTableSlot *));
+		proute->partition_existing_slots =
+			(TupleTableSlot **) palloc(nparts * sizeof(TupleTableSlot *));
 	}
 
 	i = 0;
@@ -537,48 +533,33 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		{
 			/* Convert expressions contain partition's attnos. */
 			List *conv_setproj;
-			AppendRelInfo appinfo;
 			TupleDesc	tupDesc;
 
 			/* Need our own slot. */
 			part_existing_slot =
 					ExecInitExtraTupleSlot(mtstate->ps.state, partrelDesc);
 
-			/* First convert references to EXCLUDED pseudo-relation. */
-			conv_setproj = map_partition_varattnos((List *)
-												   node->onConflictSet,
-												   INNER_VAR,
-												   partrel,
-												   firstResultRel, NULL);
+			/*
+			 * First convert references to the EXCLUDED pseudo-relation, which
+			 * was set to INNER_VAR by set_plan_references.
+			 */
+			conv_setproj =
+				map_partition_varattnos((List *) node->onConflictSet,
+										INNER_VAR, partrel,
+										firstResultRel, NULL);
+
 			/* Then convert references to main target relation. */
-			conv_setproj = map_partition_varattnos((List *)
-												   conv_setproj,
-												   firstVarno,
-												   partrel,
-												   firstResultRel, NULL);
+			conv_setproj =
+				map_partition_varattnos((List *) conv_setproj,
+										firstVarno, partrel,
+										firstResultRel, NULL);
 
-			/*
-			 * Need to fix the target entries' resnos too by using
-			 * inheritance translation.
-			 */
-			appinfo.type = T_AppendRelInfo;
-			appinfo.parent_relid = firstVarno;
-			appinfo.parent_reltype = firstResultRel->rd_rel->reltype;
-			appinfo.child_relid = partrel->rd_id;
-			appinfo.child_reltype = partrel->rd_rel->reltype;
-			appinfo.parent_reloid = firstResultRel->rd_id;
-			make_inh_translation_list(firstResultRel, partrel,
-									  1, /* dummy */
-									  &appinfo.translated_vars);
-			conv_setproj = adjust_inherited_tlist((List *) conv_setproj,
-												  &appinfo);
-
-			/*
-			 * Add any attributes that are missing in the source list, such
-			 * as, dropped columns in the partition.
-			 */
-			conv_setproj = expand_targetlist(conv_setproj, CMD_UPDATE,
-											 firstVarno, partrel);
+			conv_setproj =
+				adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
+												  RelationGetDescr(partrel),
+												  RelationGetRelationName(partrel),
+												  firstVarno,
+												  conv_setproj);
 
 			tupDesc = ExecTypeFromTL(conv_setproj, partrelDesc->tdhasoid);
 			part_conflproj_slot = ExecInitExtraTupleSlot(mtstate->ps.state,
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 4153891f29..c11f6c20ab 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -124,6 +124,8 @@ static Node *adjust_appendrel_attrs_mutator(Node *node,
 							   adjust_appendrel_attrs_context *context);
 static Relids adjust_child_relids(Relids relids, int nappinfos,
 					AppendRelInfo **appinfos);
+static List *adjust_inherited_tlist(List *tlist,
+					   AppendRelInfo *context);
 
 
 /*
@@ -2357,7 +2359,7 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
  *
  * Note that this is not needed for INSERT because INSERT isn't inheritable.
  */
-List *
+static List *
 adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 {
 	bool		changed_it = false;
@@ -2379,8 +2381,10 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 			continue;			/* ignore junk items */
 
 		/*
-		 * ignore dummy tlist entry added by exapnd_targetlist() for
-		 * dropped columns in the parent table.
+		 * XXX ugly hack: must ignore dummy tlist entry added by
+		 * expand_targetlist() for dropped columns in the parent table or we
+		 * fail because there is no translation.  Must find a better way to
+		 * deal with this case, though.
 		 */
 		if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
 			continue;
@@ -2423,10 +2427,7 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 			if (tle->resjunk)
 				continue;		/* ignore junk items */
 
-			/*
-			 * ignore dummy tlist entry added by exapnd_targetlist() for
-			 * dropped columns in the parent table.
-			 */
+			/* XXX ugly hack; see above */
 			if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
 				continue;
 
@@ -2444,10 +2445,7 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 		if (!tle->resjunk)
 			continue;			/* here, ignore non-junk items */
 
-		/*
-		 * ignore dummy tlist entry added by exapnd_targetlist() for
-		 * dropped columns in the parent table.
-		 */
+		/* XXX ugly hack; see above */
 		if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
 			continue;
 
@@ -2460,6 +2458,45 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 }
 
 /*
+ * Given a targetlist for the parentRel of the given varno, adjust it to be in
+ * the correct order and to contain all the needed elements for the given
+ * partition.
+ */
+List *
+adjust_and_expand_partition_tlist(TupleDesc parentDesc,
+								  TupleDesc partitionDesc,
+								  char *partitionRelname,
+								  int parentVarno,
+								  List *targetlist)
+{
+	AppendRelInfo appinfo;
+	List *result_tl;
+
+	/*
+	 * Fist, fix the target entries' resnos, by using inheritance translation.
+	 */
+	appinfo.type = T_AppendRelInfo;
+	appinfo.parent_relid = parentVarno;
+	appinfo.parent_reltype = InvalidOid; // parentRel->rd_rel->reltype;
+	appinfo.child_relid = -1;
+	appinfo.child_reltype = InvalidOid; // partrel->rd_rel->reltype;
+	appinfo.parent_reloid = 1; // dummy  parentRel->rd_id;
+	make_inh_translation_list(parentDesc, partitionDesc, partitionRelname,
+							  1, /* dummy */
+							  &appinfo.translated_vars);
+	result_tl = adjust_inherited_tlist((List *) targetlist, &appinfo);
+
+	/*
+	 * Add any attributes that are missing in the source list, such
+	 * as dropped columns in the partition.
+	 */
+	result_tl = expand_targetlist(result_tl, CMD_UPDATE,
+								  parentVarno, partitionDesc);
+
+	return result_tl;
+}
+
+/*
  * adjust_appendrel_attrs_multilevel
  *	  Apply Var translations from a toplevel appendrel parent down to a child.
  *
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index d380b419d7..c5263f65dc 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -14,6 +14,7 @@
 #ifndef PREP_H
 #define PREP_H
 
+#include "access/tupdesc.h"
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
 
@@ -42,9 +43,8 @@ extern List *preprocess_targetlist(PlannerInfo *root);
 
 extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
 
-typedef struct RelationData *Relation;
 extern List *expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, Relation rel);
+				  Index result_relation, TupleDesc tupdesc);
 
 /*
  * prototypes for prepunion.c
@@ -69,11 +69,10 @@ extern SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 							   Relids child_relids, Relids top_parent_relids);
 
-extern void make_inh_translation_list(Relation oldrelation,
-						  Relation newrelation,
-						  Index newvarno,
-						  List **translated_vars);
-extern List *adjust_inherited_tlist(List *tlist,
-					   AppendRelInfo *context);
+extern List *adjust_and_expand_partition_tlist(TupleDesc parentDesc,
+								  TupleDesc partitionDesc,
+								  char *partitionRelname,
+								  int parentVarno,
+								  List *targetlist);
 
 #endif							/* PREP_H */
-- 
2.11.0

#7Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#6)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/03 0:36, Alvaro Herrera wrote:

Amit Langote wrote:

Actually, after your comment on my original patch [1], I did make it work
for multiple levels by teaching the partition initialization code to find
a given partition's indexes that are inherited from the root table (that
is the table mentioned in command). So, after a tuple is routed to a
partition, we switch from the original arbiterIndexes list to the one we
created for the partition, which must contain OIDs corresponding to those
in the original list. After all, for each of the parent's indexes that
the planner put into the original arbiterIndexes list, there must exist an
index in each of the leaf partitions.

Oh, your solution for this seems simple enough. Silly me, I was trying
to implement it in a quite roundabout way. Thanks. (I do wonder if we
should save the "root" reloid in the relcache).

Do you mean store the root reloid for "any" partition (table or index)?

I had also observed when working on the patch that various TupleTableSlots
used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
inheritance-translated target list (DO UPDATE SET target list). In fact,
that has to take into account also the dropped columns; we may have
dropped columns either in parent or in a partition or in both at same or
different attnum positions. That means, simple map_partition_varattnos()
translation doesn't help in this case.

Yeah, I was aware these corner cases could become a problem though I
hadn't gotten around to testing them yet. Thanks for all your work on
this.

The usage of the few optimizer/prep/ functions that are currently static
doesn't fill me with joy. These functions have weird APIs because
they're static so we don't rally care, but once we export them we should
strive to be more careful. I'd rather stay away from just exporting
them all, so I chose to encapsulate those calls in a single function and
export only expand_targetlist from preptlist.c, keeping the others
static in prepunion.c. In the attached patch set, I put an API change
(work on tupdescs rather than full-blown relations) for a couple of
those functions as 0001, then your patch as 0002, then a few fixups of
my own. (0002 is not bit-by-bit identical to yours; I think I had to
fix some merge conflict with 0001, but should be pretty much the same).

But looking further, I think there is much cruft that has accumulated in
those functions (because they've gotten simplified over time), and we
could do some additional cleanup surgery. For example, there is no
reason to pass a list pointer to make_inh_translation_list(); we could
just return it. And then we don't have to cons up a fake AppendRelInfo
with all dummy values that adjust_inherited_tlist doesn't even care
about. I think there was a point for all these contortions back at some
point (visible by checking git history of this code), but it all seems
useless now.

Yeah, I think it might as well work to fix up these interfaces the way
you've done.

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result? I'm obviously confused about what
expand_targetlist call this comment is talking about. Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday. I'll get back to this
soon, but in the meantime, here's what I have.

Hmm. I can imagine how the newly added comments in
adjust_inherited_tlist() may be confusing. With this patch, we're now
calling adjust_inherited_tlist() from the executor, whereas it was
designed only to be run in the planner prep phase. What we're passing to
it from the executor is the DO UPDATE SET's targetlist that has undergone
the expand_targetlist() treatment by the planner. Maybe, we need to
update the adjust_inherited_tlist() comments to reflect the expansion of
its scope due to this patch.

Some comments on 0003:

+ * Fist, fix the target entries' resnos, by using inheritance
translation.

First

+ appinfo.parent_reltype = InvalidOid; // parentRel->rd_rel->reltype;

I guess you won't retain that comment. :)

+ result_tl = expand_targetlist(result_tl, CMD_UPDATE,

Should we add a CmdType argument to adjust_and_expand_partition_tlist()
and use it instead of hard-coding CMD_UPDATE here?

Thanks,
Amit

#8Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Alvaro Herrera (#6)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result?

If I understand correctly, planner must have called expand_targetlist()
once for the parent relation's descriptor and added any dropped columns
from the parent relation. So we may not find mapped attributes for those
dropped columns in the parent. I haven't actually tested this case though.

I wonder if it makes sense to actually avoid expanding on-conflict-set
targetlist in case the target is a partition table and deal with it during
execution, like we are doing now.

I'm obviously confused about what
expand_targetlist call this comment is talking about. Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday. I'll get back to this
soon, but in the meantime, here's what I have.

+           conv_setproj =
+
 adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
+                                                 RelationGetDescr(partrel),
+
 RelationGetRelationName(partrel),
+                                                 firstVarno,
+                                                 conv_setproj);

Aren't we are adding back Vars referring to the parent relation, after
having converted the existing ones to use partition relation's varno? May
be that works because missing attributes are already added during planning
and expand_targetlist() here only adds dropped columns, which are just NULL
constants.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Alvaro Herrera (#6)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
    int         num_subplan_partition_offsets;
    TupleTableSlot *partition_tuple_slot;
    TupleTableSlot *root_tuple_slot;
+   List      **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
 } PartitionTupleRouting;

I am curious why you decided to add these members to PartitionTupleRouting
structure. Wouldn't ResultRelationInfo be a better place to track these or
is there some rule that we follow?

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Pavan Deolasee (#9)
Re: ON CONFLICT DO UPDATE for partitioned tables

(2018/03/16 19:43), Pavan Deolasee wrote:

On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com
<mailto:alvherre@2ndquadrant.com>> wrote:

@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
int         num_subplan_partition_offsets;
TupleTableSlot *partition_tuple_slot;
TupleTableSlot *root_tuple_slot;
+   List      **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
} PartitionTupleRouting;

I am curious why you decided to add these members to
PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better
place to track these or is there some rule that we follow?

I just started reviewing the patch, so maybe I'm missing something, but
I think it would be a good idea to have these in that structure, not in
ResultRelInfo, because these would be required only for partitions
chosen via tuple routing.

Best regards,
Etsuro Fujita

#11Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Etsuro Fujita (#10)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Fri, Mar 16, 2018 at 5:13 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:

(2018/03/16 19:43), Pavan Deolasee wrote:

On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com
<mailto:alvherre@2ndquadrant.com>> wrote:

@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting

int         num_subplan_partition_offsets;
TupleTableSlot *partition_tuple_slot;
TupleTableSlot *root_tuple_slot;
+   List      **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
} PartitionTupleRouting;

I am curious why you decided to add these members to

PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better
place to track these or is there some rule that we follow?

I just started reviewing the patch, so maybe I'm missing something, but I
think it would be a good idea to have these in that structure, not in
ResultRelInfo, because these would be required only for partitions chosen
via tuple routing.

Hmm, yeah, probably you're right. The reason I got confused is because the
patch uses ri_onConflictSetProj/ri_onConflictSetWhere to store the
converted projection info/where clause for a given partition in its
ResultRelationInfo. So I was wondering if we can
move mt_arbiterindexes, mt_existing and mt_conflproj to ResultRelInfo and
then just use that per-partition structures too.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#12Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Pavan Deolasee (#11)
Re: ON CONFLICT DO UPDATE for partitioned tables

Pavan Deolasee wrote:

Hmm, yeah, probably you're right. The reason I got confused is because the
patch uses ri_onConflictSetProj/ri_onConflictSetWhere to store the
converted projection info/where clause for a given partition in its
ResultRelationInfo. So I was wondering if we can
move mt_arbiterindexes, mt_existing and mt_conflproj to ResultRelInfo and
then just use that per-partition structures too.

I wonder if the proposed structure is good actually.

Some notes as I go along.

1. ModifyTableState->mt_arbiterindexes is just a copy of
ModifyTable->arbiterIndexes. So why do we need it? For an
unpartitioned relation we can just use
ModifyTableState.ps->arbiterIndexes. Now, for each partition we need to
map these indexes onto the partition indexes. Not sure where to put
these; I'm tempted to say ResultRelInfo is the place. Maybe the list
should always be in ResultRelInfo instead of the state/plan node? Not
sure.

2. We don't need mt_existing to be per-relation; a single tuple slot can
do for all partitions; we just need to ExecSlotSetDescriptor to the
partition's descriptor whenever the slot is going to be used. (This
means the slot no longer has a fixed tupdesc. That seems OK).

3. ModifyTableState->mt_conflproj is more or less the same thing; the
same TTS can be reused by all the different projections, as long as we
set the descriptor before projecting. So we don't
need PartitionTupleRouting->partition_conflproj_slots, but we do need a
descriptor to be used when needed. Let's say
PartitionTupleRouting->partition_confl_tupdesc

I'll experiment with these ideas and see where that leads me.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#12)
Re: ON CONFLICT DO UPDATE for partitioned tables

So ExecInsert receives the ModifyTableState, and separately it receives
arbiterIndexes and the OnConflictAction, both of which are members of
the passed ModifyTableState. I wonder why does it do that; wouldn't it
be simpler to extract those members from the node?

With the patch proposed upthread, we receive arbiterIndexes as a
parameter and if the table is a partition we summarily ignore those and
use the list as extracted from the PartitionRoutingInfo. This is
confusing and pointless. It seems to me that the logic ought to be "if
partition then use the list in PartitionRoutingInfo; if not partition
use it from ModifyTableState". This requires changing as per above,
i.e. make the arbiter index list not part of the ExecInsert's API.

The OnConflictAction doesn't matter much; not passing it is merely a
matter of cleanliness.

Or is there another reason to pass the index list?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Alvaro Herrera (#13)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Fri, Mar 16, 2018 at 11:21 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:

So ExecInsert receives the ModifyTableState, and separately it receives
arbiterIndexes and the OnConflictAction, both of which are members of
the passed ModifyTableState. I wonder why does it do that; wouldn't it
be simpler to extract those members from the node?

Or is there another reason to pass the index list?

It works that way pretty much by accident, as far as I can tell.
Removing the two extra arguments sounds like a good idea.

--
Peter Geoghegan

#15Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Peter Geoghegan (#14)
Re: ON CONFLICT DO UPDATE for partitioned tables

Peter Geoghegan wrote:

On Fri, Mar 16, 2018 at 11:21 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:

So ExecInsert receives the ModifyTableState, and separately it receives
arbiterIndexes and the OnConflictAction, both of which are members of
the passed ModifyTableState. I wonder why does it do that; wouldn't it
be simpler to extract those members from the node?

Or is there another reason to pass the index list?

It works that way pretty much by accident, as far as I can tell.
Removing the two extra arguments sounds like a good idea.

Great, thanks.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#12)
Re: ON CONFLICT DO UPDATE for partitioned tables

Another thing I noticed is that the split of the ON CONFLICT slot and
its corresponding projection is pretty weird. The projection is in
ResultRelInfo, but the slot itself is in ModifyTableState. You can't
make the projection work without a corresponding slot initialized with
the correct descriptor, so splitting it this way doesn't make a lot of
sense to me.

(Now, TBH the split between resultRelInfo->ri_projectReturning and
ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
returning project uses, doesn't make a lot of sense to me either; so
maybe there some reason that I'm just not seeing. But I digress.)

So I want to propose that we move the slot to be together with the
projection node that it serves, ie. we put the slot in ResultRelInfo:

typedef struct ResultRelInfo
{
...
/* for computing ON CONFLICT DO UPDATE SET */
TupleTableSlot *ri_onConflictProjSlot;
ProjectionInfo *ri_onConflictSetProj;

and with this the structure makes more sense. So ExecInitModifyTable
does this

/* create target slot for UPDATE SET projection */
tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
relationDesc->tdhasoid);
resultRelInfo->ri_onConflictProjSlot =
ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);

/* build UPDATE SET projection state */
resultRelInfo->ri_onConflictSetProj =
ExecBuildProjectionInfo(node->onConflictSet, econtext,
resultRelInfo->ri_onConflictProjSlot,
&mtstate->ps, relationDesc);

and then ExecOnConflictUpdate can simply do this:

/* Project the new tuple version */
ExecProject(resultRelInfo->ri_onConflictSetProj);

/* Execute UPDATE with projection */
*returning = ExecUpdate(mtstate, &tuple.t_self, NULL,
resultRelInfo->ri_onConflictProjSlot, planSlot,
&mtstate->mt_epqstate, mtstate->ps.state,
canSetTag);

Now, maybe there is some reason I'm missing for the on conflict slot for
the projection to be in ModifyTableState rather than resultRelInfo. But
this code passes all current tests, so I don't know what that reason
would be.

Overall, the resulting code looks simpler to me than the previous
arrangements.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#16)
Re: ON CONFLICT DO UPDATE for partitioned tables

Hi,

On 2018-03-16 18:23:44 -0300, Alvaro Herrera wrote:

Another thing I noticed is that the split of the ON CONFLICT slot and
its corresponding projection is pretty weird. The projection is in
ResultRelInfo, but the slot itself is in ModifyTableState. You can't
make the projection work without a corresponding slot initialized with
the correct descriptor, so splitting it this way doesn't make a lot of
sense to me.

(Now, TBH the split between resultRelInfo->ri_projectReturning and
ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
returning project uses, doesn't make a lot of sense to me either; so
maybe there some reason that I'm just not seeing. But I digress.)

The projections for different child tables / child plans can look
different, therefore it can't be stored in ModifyTableState itself. No?

The slot's descriptor is changed to be "appropriate" when necessary:
if (slot->tts_tupleDescriptor != RelationGetDescr(resultRelationDesc))
ExecSetSlotDescriptor(slot, RelationGetDescr(resultRelationDesc));

So I want to propose that we move the slot to be together with the
projection node that it serves, ie. we put the slot in ResultRelInfo:

That'll mean we have a good number of additional slots in some cases. I
don't think the overhead of that is going to break the bank, but it's
worth considering.

Greetings,

Andres Freund

#18Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andres Freund (#17)
Re: ON CONFLICT DO UPDATE for partitioned tables

Andres Freund wrote:

Hi,

On 2018-03-16 18:23:44 -0300, Alvaro Herrera wrote:

Another thing I noticed is that the split of the ON CONFLICT slot and
its corresponding projection is pretty weird. The projection is in
ResultRelInfo, but the slot itself is in ModifyTableState. You can't
make the projection work without a corresponding slot initialized with
the correct descriptor, so splitting it this way doesn't make a lot of
sense to me.

(Now, TBH the split between resultRelInfo->ri_projectReturning and
ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
returning project uses, doesn't make a lot of sense to me either; so
maybe there some reason that I'm just not seeing. But I digress.)

The projections for different child tables / child plans can look
different, therefore it can't be stored in ModifyTableState itself. No?

The slot's descriptor is changed to be "appropriate" when necessary:
if (slot->tts_tupleDescriptor != RelationGetDescr(resultRelationDesc))
ExecSetSlotDescriptor(slot, RelationGetDescr(resultRelationDesc));

Grumble. This stuff looks like full of cheats to me, but I won't touch
it anyway.

So I want to propose that we move the slot to be together with the
projection node that it serves, ie. we put the slot in ResultRelInfo:

That'll mean we have a good number of additional slots in some cases. I
don't think the overhead of that is going to break the bank, but it's
worth considering.

Good point.

I think what I should be doing is the same as the returning stuff: keep
a tupdesc around, and use a single slot, whose descriptor is changed
just before the projection.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#18)
3 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

Alvaro Herrera wrote:

I think what I should be doing is the same as the returning stuff: keep
a tupdesc around, and use a single slot, whose descriptor is changed
just before the projection.

Yes, this works, though it's ugly. Not any uglier than what's already
there, though, so I think it's okay.

The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing. I fear we may be doing
redundant and/or misplaced work. I'll look into it next week.

0001 should be pretty much ready to push -- adjustments to ExecInsert
and ModifyTableState I already mentioned.

0002 is stuff I would like to get rid of completely -- changes to
planner code so that it better supports functionality we need for
0003.

0003 is the main patch. Compared to the previous version, this one
reuses slots by switching them to different tupdescs as needed.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v4-0001-Simplify-ExecInsert-API-re.-ON-CONFLICT-data.patchtext/plain; charset=us-asciiDownload
From 9f9d78e02a474402ee37ebcbed8390f4f3470743 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 16 Mar 2018 14:29:28 -0300
Subject: [PATCH v4 1/3] Simplify ExecInsert API re. ON CONFLICT data

Instead of passing the ON CONFLICT-related members of ModifyTableState
into ExecInsert(), we can have that routine obtain them from the node,
since that is already an argument into the function.

While at it, remove arbiterIndexes from ModifyTableState, since that's
just a copy of the list already in the ModifyTable node, to which the
state node already has access.
---
 src/backend/executor/nodeModifyTable.c | 18 +++++++++---------
 src/include/nodes/execnodes.h          |  2 --
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3332ae4bf3..a9a48e914f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -258,8 +258,6 @@ static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
-		   List *arbiterIndexes,
-		   OnConflictAction onconflict,
 		   EState *estate,
 		   bool canSetTag)
 {
@@ -271,6 +269,7 @@ ExecInsert(ModifyTableState *mtstate,
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
+	OnConflictAction onconflict = mtstate->mt_onconflict;
 
 	/*
 	 * get the heap tuple out of the tuple table slot, making sure we have a
@@ -455,6 +454,7 @@ ExecInsert(ModifyTableState *mtstate,
 	else
 	{
 		WCOKind		wco_kind;
+		bool		check_partition_constr;
 
 		/*
 		 * We always check the partition constraint, including when the tuple
@@ -463,8 +463,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 * trigger might modify the tuple such that the partition constraint
 		 * is no longer satisfied, so we need to check in that case.
 		 */
-		bool		check_partition_constr =
-		(resultRelInfo->ri_PartitionCheck != NIL);
+		check_partition_constr = (resultRelInfo->ri_PartitionCheck != NIL);
 
 		/*
 		 * Constraints might reference the tableoid column, so initialize
@@ -510,6 +509,9 @@ ExecInsert(ModifyTableState *mtstate,
 			uint32		specToken;
 			ItemPointerData conflictTid;
 			bool		specConflict;
+			List	   *arbiterIndexes;
+
+			arbiterIndexes = ((ModifyTable *) mtstate->ps.plan)->arbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -627,7 +629,7 @@ ExecInsert(ModifyTableState *mtstate,
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
 													   estate, false, NULL,
-													   arbiterIndexes);
+													   NIL);
 		}
 	}
 
@@ -1217,8 +1219,8 @@ lreplace:;
 			Assert(mtstate->rootResultRelInfo != NULL);
 			estate->es_result_relation_info = mtstate->rootResultRelInfo;
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot, NULL,
-								  ONCONFLICT_NONE, estate, canSetTag);
+			ret_slot = ExecInsert(mtstate, slot, planSlot,
+								  estate, canSetTag);
 
 			/*
 			 * Revert back the active result relation and the active
@@ -2052,7 +2054,6 @@ ExecModifyTable(PlanState *pstate)
 		{
 			case CMD_INSERT:
 				slot = ExecInsert(node, slot, planSlot,
-								  node->mt_arbiterindexes, node->mt_onconflict,
 								  estate, node->canSetTag);
 				break;
 			case CMD_UPDATE:
@@ -2137,7 +2138,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
 	mtstate->mt_onconflict = node->onConflictAction;
-	mtstate->mt_arbiterindexes = node->arbiterIndexes;
 
 	/* set up epqstate with dummy subplan data for the moment */
 	EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a953820f43..1bee5ccbeb 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -990,8 +990,6 @@ typedef struct ModifyTableState
 	EPQState	mt_epqstate;	/* for evaluating EvalPlanQual rechecks */
 	bool		fireBSTriggers; /* do we need to fire stmt triggers? */
 	OnConflictAction mt_onconflict; /* ON CONFLICT type */
-	List	   *mt_arbiterindexes;	/* unique index OIDs to arbitrate taking
-									 * alt path */
 	TupleTableSlot *mt_existing;	/* slot to store existing target tuple in */
 	List	   *mt_excludedtlist;	/* the excluded pseudo relation's tlist  */
 	TupleTableSlot *mt_conflproj;	/* CONFLICT ... SET ... projection target */
-- 
2.11.0

v4-0002-Make-some-static-functions-work-on-TupleDesc-rath.patchtext/plain; charset=us-asciiDownload
From a317e436e8073fde92d87b7a56a6f38504c3fb0b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 1 Mar 2018 19:58:50 -0300
Subject: [PATCH v4 2/3] Make some static functions work on TupleDesc rather
 than Relation

---
 src/backend/optimizer/prep/preptlist.c | 23 ++++++++---------
 src/backend/optimizer/prep/prepunion.c | 45 ++++++++++++++++++----------------
 2 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 8603feef2b..b6e658fe81 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -54,7 +54,7 @@
 
 
 static List *expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, Relation rel);
+				  Index result_relation, TupleDesc tupdesc);
 
 
 /*
@@ -116,7 +116,8 @@ preprocess_targetlist(PlannerInfo *root)
 	tlist = parse->targetList;
 	if (command_type == CMD_INSERT || command_type == CMD_UPDATE)
 		tlist = expand_targetlist(tlist, command_type,
-								  result_relation, target_relation);
+								  result_relation,
+								  RelationGetDescr(target_relation));
 
 	/*
 	 * Add necessary junk columns for rowmarked rels.  These values are needed
@@ -230,7 +231,7 @@ preprocess_targetlist(PlannerInfo *root)
 			expand_targetlist(parse->onConflict->onConflictSet,
 							  CMD_UPDATE,
 							  result_relation,
-							  target_relation);
+							  RelationGetDescr(target_relation));
 
 	if (target_relation)
 		heap_close(target_relation, NoLock);
@@ -247,13 +248,13 @@ preprocess_targetlist(PlannerInfo *root)
 
 /*
  * expand_targetlist
- *	  Given a target list as generated by the parser and a result relation,
- *	  add targetlist entries for any missing attributes, and ensure the
- *	  non-junk attributes appear in proper field order.
+ *	  Given a target list as generated by the parser and a result relation's
+ *	  tuple descriptor, add targetlist entries for any missing attributes, and
+ *	  ensure the non-junk attributes appear in proper field order.
  */
 static List *
 expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, Relation rel)
+				  Index result_relation, TupleDesc tupdesc)
 {
 	List	   *new_tlist = NIL;
 	ListCell   *tlist_item;
@@ -266,14 +267,14 @@ expand_targetlist(List *tlist, int command_type,
 	 * The rewriter should have already ensured that the TLEs are in correct
 	 * order; but we have to insert TLEs for any missing attributes.
 	 *
-	 * Scan the tuple description in the relation's relcache entry to make
-	 * sure we have all the user attributes in the right order.
+	 * Scan the tuple description to make sure we have all the user attributes
+	 * in the right order.
 	 */
-	numattrs = RelationGetNumberOfAttributes(rel);
+	numattrs = tupdesc->natts;
 
 	for (attrno = 1; attrno <= numattrs; attrno++)
 	{
-		Form_pg_attribute att_tup = TupleDescAttr(rel->rd_att, attrno - 1);
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
 		TargetEntry *new_tle = NULL;
 
 		if (tlist_item != NULL)
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..d0d9812da6 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -113,10 +113,10 @@ static void expand_single_inheritance_child(PlannerInfo *root,
 								PlanRowMark *top_parentrc, Relation childrel,
 								List **appinfos, RangeTblEntry **childrte_p,
 								Index *childRTindex_p);
-static void make_inh_translation_list(Relation oldrelation,
-						  Relation newrelation,
-						  Index newvarno,
-						  List **translated_vars);
+static List *make_inh_translation_list(TupleDesc old_tupdesc,
+						  TupleDesc new_tupdesc,
+						  char *new_rel_name,
+						  Index newvarno);
 static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
 					List *translated_vars);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
@@ -1730,8 +1730,11 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 		appinfo->child_relid = childRTindex;
 		appinfo->parent_reltype = parentrel->rd_rel->reltype;
 		appinfo->child_reltype = childrel->rd_rel->reltype;
-		make_inh_translation_list(parentrel, childrel, childRTindex,
-								  &appinfo->translated_vars);
+		appinfo->translated_vars =
+			make_inh_translation_list(RelationGetDescr(parentrel),
+									  RelationGetDescr(childrel),
+									  RelationGetRelationName(childrel),
+									  childRTindex);
 		appinfo->parent_reloid = parentOID;
 		*appinfos = lappend(*appinfos, appinfo);
 
@@ -1788,22 +1791,23 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 
 /*
  * make_inh_translation_list
- *	  Build the list of translations from parent Vars to child Vars for
- *	  an inheritance child.
+ *	  Build the list of translations from parent Vars ("old" rel) to child
+ *	  Vars ("new" rel) for an inheritance child.
  *
  * For paranoia's sake, we match type/collation as well as attribute name.
  */
-static void
-make_inh_translation_list(Relation oldrelation, Relation newrelation,
-						  Index newvarno,
-						  List **translated_vars)
+static List *
+make_inh_translation_list(TupleDesc old_tupdesc, TupleDesc new_tupdesc,
+						  char *new_rel_name,
+						  Index newvarno)
 {
 	List	   *vars = NIL;
-	TupleDesc	old_tupdesc = RelationGetDescr(oldrelation);
-	TupleDesc	new_tupdesc = RelationGetDescr(newrelation);
 	int			oldnatts = old_tupdesc->natts;
 	int			newnatts = new_tupdesc->natts;
 	int			old_attno;
+	bool		equal_tupdescs;
+
+	equal_tupdescs = equalTupleDescs(old_tupdesc, new_tupdesc);
 
 	for (old_attno = 0; old_attno < oldnatts; old_attno++)
 	{
@@ -1827,10 +1831,9 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 		attcollation = att->attcollation;
 
 		/*
-		 * When we are generating the "translation list" for the parent table
-		 * of an inheritance set, no need to search for matches.
+		 * When the tupledescs are identical, no need to search for matches.
 		 */
-		if (oldrelation == newrelation)
+		if (equal_tupdescs)
 		{
 			vars = lappend(vars, makeVar(newvarno,
 										 (AttrNumber) (old_attno + 1),
@@ -1867,16 +1870,16 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 			}
 			if (new_attno >= newnatts)
 				elog(ERROR, "could not find inherited attribute \"%s\" of relation \"%s\"",
-					 attname, RelationGetRelationName(newrelation));
+					 attname, new_rel_name);
 		}
 
 		/* Found it, check type and collation match */
 		if (atttypid != att->atttypid || atttypmod != att->atttypmod)
 			elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's type",
-				 attname, RelationGetRelationName(newrelation));
+				 attname, new_rel_name);
 		if (attcollation != att->attcollation)
 			elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's collation",
-				 attname, RelationGetRelationName(newrelation));
+				 attname, new_rel_name);
 
 		vars = lappend(vars, makeVar(newvarno,
 									 (AttrNumber) (new_attno + 1),
@@ -1886,7 +1889,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 									 0));
 	}
 
-	*translated_vars = vars;
+	return vars;
 }
 
 /*
-- 
2.11.0

v4-0003-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchtext/plain; charset=us-asciiDownload
From b61c6a375ef46ac72127f96eb8f0ca50b4230a4d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 28 Feb 2018 17:58:00 +0900
Subject: [PATCH v4 3/3] Fix ON CONFLICT to work with partitioned tables

---
 doc/src/sgml/ddl.sgml                         |  15 ---
 src/backend/catalog/heap.c                    |   2 +-
 src/backend/catalog/partition.c               |  62 ++++++++---
 src/backend/commands/tablecmds.c              |  15 +--
 src/backend/executor/execPartition.c          | 149 ++++++++++++++++++++++++--
 src/backend/executor/nodeModifyTable.c        |  58 ++++++++--
 src/backend/optimizer/prep/preptlist.c        |   6 +-
 src/backend/optimizer/prep/prepunion.c        |  56 ++++++++++
 src/backend/parser/analyze.c                  |   7 --
 src/include/catalog/partition.h               |   2 +-
 src/include/executor/execPartition.h          |  16 +++
 src/include/nodes/execnodes.h                 |   1 +
 src/include/optimizer/prep.h                  |  10 ++
 src/test/regress/expected/insert_conflict.out |  73 +++++++++++--
 src/test/regress/sql/insert_conflict.sql      |  64 +++++++++--
 15 files changed, 443 insertions(+), 93 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3d80ff9e5b..13489162df 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1776,7 +1776,7 @@ heap_drop_with_catalog(Oid relid)
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 	if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
 	{
-		parentOid = get_partition_parent(relid);
+		parentOid = get_partition_parent(relid, false);
 		LockRelationOid(parentOid, AccessExclusiveLock);
 
 		/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 786c05df73..8dc73ae092 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1384,24 +1385,43 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent or topmost ancestor of given relation
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance hierarchy.
  *
  * Note: Because this function assumes that the relation whose OID is passed
  * as an argument will have precisely one parent, it should only be called
  * when it is known that the relation is a partition.
  */
 Oid
-get_partition_parent(Oid relid)
+get_partition_parent(Oid relid, bool getroot)
+{
+	Relation	inhRel;
+	Oid		parentOid;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, getroot);
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find parent of relation %u", relid);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return parentOid;
+}
+
+/*
+ * get_partition_parent_recurse
+ *		Recursive part of get_partition_parent
+ */
+static Oid
+get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot)
 {
-	Form_pg_inherits form;
-	Relation	catalogRelation;
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
-	Oid			result;
-
-	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
+	Oid			result = InvalidOid;
 
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
@@ -1412,18 +1432,26 @@ get_partition_parent(Oid relid)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	/* Obtain the direct parent, and release resources before recursing */
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
-
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
-
+	if (HeapTupleIsValid(tuple))
+		result = ((Form_pg_inherits) GETSTRUCT(tuple))->inhparent;
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
+
+	/*
+	 * If we were asked to recurse, do so now.  Except that if we didn't get a
+	 * valid parent, then the 'relid' argument was already the topmost parent,
+	 * so return that.
+	 */
+	if (getroot)
+	{
+		if (OidIsValid(result))
+			return get_partition_parent_recurse(inhRel, result, getroot);
+		else
+			return relid;
+	}
 
 	return result;
 }
@@ -2505,7 +2533,7 @@ generate_partition_qual(Relation rel)
 		return copyObject(rel->rd_partcheck);
 
 	/* Grab at least an AccessShareLock on the parent table */
-	parent = heap_open(get_partition_parent(RelationGetRelid(rel)),
+	parent = heap_open(get_partition_parent(RelationGetRelid(rel), false),
 					   AccessShareLock);
 
 	/* Get pg_class.relpartbound */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 218224a156..6003afdd03 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1292,7 +1292,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	 */
 	if (is_partition && relOid != oldRelOid)
 	{
-		state->partParentOid = get_partition_parent(relOid);
+		state->partParentOid = get_partition_parent(relOid, false);
 		if (OidIsValid(state->partParentOid))
 			LockRelationOid(state->partParentOid, AccessExclusiveLock);
 	}
@@ -5843,7 +5843,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	/* If rel is partition, shouldn't drop NOT NULL if parent has the same */
 	if (rel->rd_rel->relispartition)
 	{
-		Oid			parentId = get_partition_parent(RelationGetRelid(rel));
+		Oid			parentId = get_partition_parent(RelationGetRelid(rel),
+													false);
 		Relation	parent = heap_open(parentId, AccessShareLock);
 		TupleDesc	tupDesc = RelationGetDescr(parent);
 		AttrNumber	parent_attnum;
@@ -14360,7 +14361,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 		if (!has_superclass(idxid))
 			continue;
 
-		Assert((IndexGetRelation(get_partition_parent(idxid), false) ==
+		Assert((IndexGetRelation(get_partition_parent(idxid, false), false) ==
 			   RelationGetRelid(rel)));
 
 		idx = index_open(idxid, AccessExclusiveLock);
@@ -14489,7 +14490,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 
 	/* Silently do nothing if already in the right state */
 	currParent = !has_superclass(partIdxId) ? InvalidOid :
-		get_partition_parent(partIdxId);
+		get_partition_parent(partIdxId, false);
 	if (currParent != RelationGetRelid(parentIdx))
 	{
 		IndexInfo  *childInfo;
@@ -14722,8 +14723,10 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		/* make sure we see the validation we just did */
 		CommandCounterIncrement();
 
-		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
-		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
+		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx),
+										   false);
+		parentTblId = get_partition_parent(RelationGetRelid(partedTbl),
+										   false);
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
 		parentTbl = relation_open(parentTblId, AccessExclusiveLock);
 		Assert(!parentIdx->rd_index->indisvalid);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f6fe7cd61d..8876d91a44 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -19,6 +19,7 @@
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "optimizer/prep.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -64,6 +65,7 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -74,14 +76,12 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
 	if (mtstate && mtstate->operation == CMD_UPDATE)
@@ -109,6 +109,18 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot(NULL);
 
+	/*
+	 * We might need these arrays for conflict checking and handling the
+	 * DO UPDATE action
+	 */
+	if (mtstate && mtstate->mt_onconflict != ONCONFLICT_NONE)
+	{
+		proute->partition_arbiter_indexes =
+			(List **) palloc(nparts * sizeof(List *));
+		proute->partition_onconfl_tdescs =
+			(TupleDesc *) palloc(nparts * sizeof(TupleDesc));
+	}
+
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
@@ -475,9 +487,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +496,126 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * Initialize information about this partition that's needed to handle
+	 * the ON CONFLICT clause.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell *lc;
+		List	 *my_arbiterindexes = NIL;
+
+		/*
+		 * If the root parent and partition have the same tuple
+		 * descriptor, just reuse the original projection and WHERE
+		 * clause expressions for partition.
+		 */
+		if (map == NULL)
+		{
+			/* Use the existing slot. */
+			leaf_part_rri->ri_onConflictSetProj =
+				resultRelInfo->ri_onConflictSetProj;
+			leaf_part_rri->ri_onConflictSetWhere =
+				resultRelInfo->ri_onConflictSetWhere;
+			proute->partition_onconfl_tdescs[partidx] =
+				mtstate->mt_conflproj_tupdesc;
+			PinTupleDesc(proute->partition_onconfl_tdescs[partidx]);
+		}
+		else
+		{
+			/* Convert expressions contain partition's attnos. */
+			List *conv_setproj;
+
+			/*
+			 * First convert references to the EXCLUDED pseudo-relation, which
+			 * was set to INNER_VAR by set_plan_references.
+			 */
+			conv_setproj =
+				map_partition_varattnos((List *) node->onConflictSet,
+										INNER_VAR, partrel,
+										firstResultRel, NULL);
+
+			/* Then convert references to main target relation. */
+			conv_setproj =
+				map_partition_varattnos((List *) conv_setproj,
+										firstVarno, partrel,
+										firstResultRel, NULL);
+
+			conv_setproj =
+				adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
+												  RelationGetDescr(partrel),
+												  RelationGetRelationName(partrel),
+												  firstVarno,
+												  conv_setproj);
+			proute->partition_onconfl_tdescs[partidx] =
+				ExecTypeFromTL(conv_setproj, partrelDesc->tdhasoid);
+			PinTupleDesc(proute->partition_onconfl_tdescs[partidx]);
+
+			leaf_part_rri->ri_onConflictSetProj =
+					ExecBuildProjectionInfo(conv_setproj, econtext,
+											mtstate->mt_conflproj,
+											&mtstate->ps, partrelDesc);
+
+			/* For WHERE quals, map_partition_varattnos() suffices. */
+			if (node->onConflictWhere)
+			{
+				List *conv_where;
+				ExprState  *qualexpr;
+
+				/* First convert references to EXCLUDED pseudo-relation. */
+				conv_where = map_partition_varattnos((List *)
+													 node->onConflictWhere,
+													 INNER_VAR,
+													 partrel,
+													 firstResultRel, NULL);
+				/* Then convert references to main target relation. */
+				conv_where = map_partition_varattnos((List *)
+													 conv_where,
+													 firstVarno,
+													 partrel, firstResultRel,
+													 NULL);
+				qualexpr = ExecInitQual(conv_where, &mtstate->ps);
+				leaf_part_rri->ri_onConflictSetWhere = qualexpr;
+			}
+		}
+
+		/* Initialize arbiter indexes list, if any. */
+		foreach(lc, ((ModifyTable *) mtstate->ps.plan)->arbiterIndexes)
+		{
+			Oid		parentArbiterIndexOid = lfirst_oid(lc);
+			int		i;
+
+			/*
+			 * Find parentArbiterIndexOid's child in this partition and add it
+			 * to my_arbiterindexes.
+			 */
+			for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+			{
+				Relation index = leaf_part_rri->ri_IndexRelationDescs[i];
+				Oid		 indexOid = RelationGetRelid(index);
+
+				if (parentArbiterIndexOid ==
+					get_partition_parent(indexOid, true))
+					my_arbiterindexes = lappend_oid(my_arbiterindexes,
+													indexOid);
+			}
+		}
+
+		/*
+		 * Use this list instead of the original one containing parent's
+		 * indexes.
+		 */
+		proute->partition_arbiter_indexes[partidx] = my_arbiterindexes;
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a9a48e914f..11780c0801 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -56,6 +56,7 @@
 
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 ResultRelInfo *resultRelInfo,
+					 TupleDesc onConflictSetTupdesc,
 					 ItemPointer conflictTid,
 					 TupleTableSlot *planSlot,
 					 TupleTableSlot *excludedSlot,
@@ -269,7 +270,9 @@ ExecInsert(ModifyTableState *mtstate,
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
+	TupleDesc	partConflTupdesc = NULL;
 	OnConflictAction onconflict = mtstate->mt_onconflict;
+	List	   *arbiterIndexes = NIL;
 
 	/*
 	 * get the heap tuple out of the tuple table slot, making sure we have a
@@ -285,8 +288,8 @@ ExecInsert(ModifyTableState *mtstate,
 	/* Determine the partition to heap_insert the tuple into */
 	if (mtstate->mt_partition_tuple_routing)
 	{
-		int			leaf_part_index;
 		PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+		int			leaf_part_index;
 
 		/*
 		 * Away we go ... If we end up not finding a partition after all,
@@ -373,6 +376,13 @@ ExecInsert(ModifyTableState *mtstate,
 										  tuple,
 										  proute->partition_tuple_slot,
 										  &slot);
+
+		/* determine this partition's ON CONFLICT information */
+		if (onconflict != ONCONFLICT_NONE)
+		{
+			arbiterIndexes = proute->partition_arbiter_indexes[leaf_part_index];
+			partConflTupdesc = proute->partition_onconfl_tdescs[leaf_part_index];
+		}
 	}
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
@@ -509,9 +519,18 @@ ExecInsert(ModifyTableState *mtstate,
 			uint32		specToken;
 			ItemPointerData conflictTid;
 			bool		specConflict;
-			List	   *arbiterIndexes;
+			TupleDesc	onconfl_tupdesc;
 
-			arbiterIndexes = ((ModifyTable *) mtstate->ps.plan)->arbiterIndexes;
+			if (!mtstate->mt_partition_tuple_routing)
+			{
+				/* in the partition case, this was already done */
+				arbiterIndexes =
+					((ModifyTable *) mtstate->ps.plan)->arbiterIndexes;
+				onconfl_tupdesc = mtstate->mt_conflproj_tupdesc;
+			}
+
+			if (mtstate->mt_partition_tuple_routing)
+				onconfl_tupdesc = partConflTupdesc;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -542,6 +561,7 @@ ExecInsert(ModifyTableState *mtstate,
 					TupleTableSlot *returning = NULL;
 
 					if (ExecOnConflictUpdate(mtstate, resultRelInfo,
+											 onconfl_tupdesc,
 											 &conflictTid, planSlot, slot,
 											 estate, canSetTag, &returning))
 					{
@@ -1148,6 +1168,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -1398,6 +1430,7 @@ lreplace:;
 static bool
 ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 ResultRelInfo *resultRelInfo,
+					 TupleDesc onConflictSetTupdesc,
 					 ItemPointer conflictTid,
 					 TupleTableSlot *planSlot,
 					 TupleTableSlot *excludedSlot,
@@ -1513,6 +1546,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
 
 	/* Store target's existing tuple in the state's dedicated slot */
+	ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation));
 	ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
 
 	/*
@@ -1556,6 +1590,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	}
 
 	/* Project the new tuple version */
+	ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc);
 	ExecProject(resultRelInfo->ri_onConflictSetProj);
 
 	/*
@@ -2160,8 +2195,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		subplan = (Plan *) lfirst(l);
 
 		/* Initialize the usesFdwDirectModify flag */
-		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
-															  node->fdwDirectModifyPlans);
+		resultRelInfo->ri_usesFdwDirectModify =
+			bms_is_member(i, node->fdwDirectModifyPlans);
 
 		/*
 		 * Verify result relation is a valid target for the current operation
@@ -2233,7 +2268,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
 		mtstate->mt_partition_tuple_routing =
-						ExecSetupPartitionTupleRouting(mtstate, rel);
+			ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2349,9 +2384,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/* initialize slot for the existing tuple */
-		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		/*
+		 * initialize slot for the existing tuple.
+		 */
+		mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2359,8 +2395,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		/* create target slot for UPDATE SET projection */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 								 relationDesc->tdhasoid);
+		PinTupleDesc(tupDesc);
+		mtstate->mt_conflproj_tupdesc = tupDesc;
 		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* build UPDATE SET projection state */
 		resultRelInfo->ri_onConflictSetProj =
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index b6e658fe81..804e30c500 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -53,10 +53,6 @@
 #include "utils/rel.h"
 
 
-static List *expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, TupleDesc tupdesc);
-
-
 /*
  * preprocess_targetlist
  *	  Driver for preprocessing the parse tree targetlist.
@@ -252,7 +248,7 @@ preprocess_targetlist(PlannerInfo *root)
  *	  tuple descriptor, add targetlist entries for any missing attributes, and
  *	  ensure the non-junk attributes appear in proper field order.
  */
-static List *
+List *
 expand_targetlist(List *tlist, int command_type,
 				  Index result_relation, TupleDesc tupdesc)
 {
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index d0d9812da6..6f78b6a75a 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -2377,6 +2377,15 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 		if (tle->resjunk)
 			continue;			/* ignore junk items */
 
+		/*
+		 * XXX ugly hack: must ignore dummy tlist entry added by
+		 * expand_targetlist() for dropped columns in the parent table or we
+		 * fail because there is no translation.  Must find a better way to
+		 * deal with this case, though.
+		 */
+		if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+			continue;
+
 		/* Look up the translation of this column: it must be a Var */
 		if (tle->resno <= 0 ||
 			tle->resno > list_length(context->translated_vars))
@@ -2415,6 +2424,10 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 			if (tle->resjunk)
 				continue;		/* ignore junk items */
 
+			/* XXX ugly hack; see above */
+			if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+				continue;
+
 			if (tle->resno == attrno)
 				new_tlist = lappend(new_tlist, tle);
 			else if (tle->resno > attrno)
@@ -2429,6 +2442,10 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 		if (!tle->resjunk)
 			continue;			/* here, ignore non-junk items */
 
+		/* XXX ugly hack; see above */
+		if (IsA(tle->expr, Const) && ((Const *) tle->expr)->constisnull)
+			continue;
+
 		tle->resno = attrno;
 		new_tlist = lappend(new_tlist, tle);
 		attrno++;
@@ -2438,6 +2455,45 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 }
 
 /*
+ * Given a targetlist for the parentRel of the given varno, adjust it to be in
+ * the correct order and to contain all the needed elements for the given
+ * partition.
+ */
+List *
+adjust_and_expand_partition_tlist(TupleDesc parentDesc,
+								  TupleDesc partitionDesc,
+								  char *partitionRelname,
+								  int parentVarno,
+								  List *targetlist)
+{
+	AppendRelInfo appinfo;
+	List *result_tl;
+
+	/*
+	 * Fist, fix the target entries' resnos, by using inheritance translation.
+	 */
+	appinfo.type = T_AppendRelInfo;
+	appinfo.parent_relid = parentVarno;
+	appinfo.parent_reltype = InvalidOid; // parentRel->rd_rel->reltype;
+	appinfo.child_relid = -1;
+	appinfo.child_reltype = InvalidOid; // partrel->rd_rel->reltype;
+	appinfo.parent_reloid = 1; // dummy  parentRel->rd_id;
+	appinfo.translated_vars =
+		make_inh_translation_list(parentDesc, partitionDesc,
+								  partitionRelname, 1);
+	result_tl = adjust_inherited_tlist((List *) targetlist, &appinfo);
+
+	/*
+	 * Add any attributes that are missing in the source list, such
+	 * as dropped columns in the partition.
+	 */
+	result_tl = expand_targetlist(result_tl, CMD_UPDATE,
+								  parentVarno, partitionDesc);
+
+	return result_tl;
+}
+
+/*
  * adjust_appendrel_attrs_multilevel
  *	  Apply Var translations from a toplevel appendrel parent down to a child.
  *
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c3a9617f67..92696f0607 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1025,13 +1025,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..70ddb225a1 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -51,7 +51,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
-extern Oid	get_partition_parent(Oid relid);
+extern Oid	get_partition_parent(Oid relid, bool getroot);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 03a599ad57..9bad06a8e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -90,6 +90,20 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								given leaf partition's rowtype after that
  *								partition is chosen for insertion by
  *								tuple-routing.
+ * partition_arbiter_indexes	Array of Lists with each slot containing the
+ *								list of OIDs of a given partition's indexes
+ *								that are to be used as arbiter indexes for
+ *								ON CONFLICT checking
+ * partition_conflproj_slots	Array of TupleTableSlots to hold tuples of
+ *								ON CONFLICT DO UPDATE SET projections;
+ *								contains NULL for partitions whose tuple
+ *								descriptor exactly matches the root parent's
+ *								(including dropped columns)
+ * partition_existing_slots		Array of TupleTableSlots to hold existing
+ *								tuple during ON CONFLICT DO UPDATE handling;
+ *								contains NULL for partitions whose tuple
+ *								descriptor exactly matches the root parent's
+ *								(including dropped columns)
  *-----------------------
  */
 typedef struct PartitionTupleRouting
@@ -106,6 +120,8 @@ typedef struct PartitionTupleRouting
 	int			num_subplan_partition_offsets;
 	TupleTableSlot *partition_tuple_slot;
 	TupleTableSlot *root_tuple_slot;
+	TupleDesc *partition_onconfl_tdescs;
+	List	  **partition_arbiter_indexes;
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1bee5ccbeb..cc2f55ee12 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -993,6 +993,7 @@ typedef struct ModifyTableState
 	TupleTableSlot *mt_existing;	/* slot to store existing target tuple in */
 	List	   *mt_excludedtlist;	/* the excluded pseudo relation's tlist  */
 	TupleTableSlot *mt_conflproj;	/* CONFLICT ... SET ... projection target */
+	TupleDesc	mt_conflproj_tupdesc; /* tuple descriptor for it */
 	struct PartitionTupleRouting *mt_partition_tuple_routing;
 	/* Tuple-routing support info */
 	struct TransitionCaptureState *mt_transition_capture;
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 38608770a2..7074bae79a 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -14,6 +14,7 @@
 #ifndef PREP_H
 #define PREP_H
 
+#include "access/tupdesc.h"
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
 
@@ -42,6 +43,9 @@ extern List *preprocess_targetlist(PlannerInfo *root);
 
 extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
 
+extern List *expand_targetlist(List *tlist, int command_type,
+				  Index result_relation, TupleDesc tupdesc);
+
 /*
  * prototypes for prepunion.c
  */
@@ -65,4 +69,10 @@ extern SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 							   Relids child_relids, Relids top_parent_relids);
 
+extern List *adjust_and_expand_partition_tlist(TupleDesc parentDesc,
+								  TupleDesc partitionDesc,
+								  char *partitionRelname,
+								  int parentVarno,
+								  List *targetlist);
+
 #endif							/* PREP_H */
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..a9677f06e6 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,67 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..73122479a3 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,59 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
-- 
2.11.0

#20Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Pavan Deolasee (#8)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/05 18:04, Pavan Deolasee wrote:

On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result?

If I understand correctly, planner must have called expand_targetlist()
once for the parent relation's descriptor and added any dropped columns
from the parent relation. So we may not find mapped attributes for those
dropped columns in the parent. I haven't actually tested this case though.

I wonder if it makes sense to actually avoid expanding on-conflict-set
targetlist in case the target is a partition table and deal with it during
execution, like we are doing now.

I'm obviously confused about what
expand_targetlist call this comment is talking about. Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday. I'll get back to this
soon, but in the meantime, here's what I have.

+           conv_setproj =
+
adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
+                                                 RelationGetDescr(partrel),
+
RelationGetRelationName(partrel),
+                                                 firstVarno,
+                                                 conv_setproj);

Aren't we are adding back Vars referring to the parent relation, after
having converted the existing ones to use partition relation's varno? May
be that works because missing attributes are already added during planning
and expand_targetlist() here only adds dropped columns, which are just NULL
constants.

I think this suggestion to defer onConflictSet target list expansion to
execution time is a good one. So, in preprocess_targetlist(), we'd only
perform exapand_targetlist on the onConflictSet list if the table is not a
partitioned table. For partitioned tables, we don't know which
partition(s) will be affected, so it's useless to do the expansion there.
Instead it's better to expand in ExecInitPartitionInfo(), possibly after
changing original TargetEntry nodes to have correct resnos due to
attribute number differences (calling adjust_inherited_tlist(), etc.).

And then since we're not expanding the target list in the planner anymore,
we don't need to install those hacks in adjust_inherited_tlist() like the
patch currently does to ignore entries for dropped columns in the parent
the plan-time expansion adds.

Thanks,
Amit

#21Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#19)
3 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

Thanks for the updated patches.

On 2018/03/18 13:17, Alvaro Herrera wrote:

Alvaro Herrera wrote:

I think what I should be doing is the same as the returning stuff: keep
a tupdesc around, and use a single slot, whose descriptor is changed
just before the projection.

Yes, this works, though it's ugly. Not any uglier than what's already
there, though, so I think it's okay.

The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing. I fear we may be doing
redundant and/or misplaced work. I'll look into it next week.

0001 should be pretty much ready to push -- adjustments to ExecInsert
and ModifyTableState I already mentioned.

This seems like good cleanup.

While at it, why not also get rid of mt_onconflict in favor of always just
using its counterpart in ModifyTable -- onConflictAction?

0002 is stuff I would like to get rid of completely -- changes to
planner code so that it better supports functionality we need for
0003.

Hmm. I'm not sure if we can completely get rid of this, because we do
need the adjust_inherited_tlist() facility to translate TargetEntry resnos
in any case.

But as I just said in reply to Pavan's email suggesting deferring
onConflistSet expansion to execution time, we don't need the hack in
adjust_inherited_tlist() if we go with the suggestion.

0003 is the main patch. Compared to the previous version, this one
reuses slots by switching them to different tupdescs as needed.

Your proposed change to use just one slot (the existing mt_conflproj slot)
sounds good. Instead, it seems now we have an array to hold tupleDescs
for the onConflistSet target lists for each partition.

Some comments:

1. I noticed a bug that crashes a test in insert_conflit.sql that uses DO
NOTHING instead of DO UPDATE SET. It's illegal for ExecInitPartitionInfo
to expect mt_conflproj_tupdesc to be valid in the DO NOTHING case, because
ExecInitModifyTable would only set it if (onConflictAction == DO_UPDATE).

2. It seems better to name the new array field in PartitionTupleRouting
partition_conflproj_tupdescs rather than partition_onconfl_tupdescs to be
consistent with the new field in ModifyTableState.

3. I think it was an oversight in my original patch, but it seems we
should allocate the partition_onconfl_tdescs array only if DO UPDATE
action is used. Also, ri_onConflictSetProj, ri_onConflictSetWhere should
be only set in that case. OTOH, we always need to set
partition_arbiter_indexes, that is, for both DO NOTHING and DO UPDATE SET
actions.

4. Need to remove the comments for partition_conflproj_slots and
partition_existing_slots, fields of PartitionTupleRouting that no longer
exist. Instead one for partition_conflproj_tupdescs should be added.

5. I know the following is so as not to break the Assert in
adjust_inherited_tlist(), so why not have a parentOid argument for
adjust_and_expand_partition_tlist()?

+ appinfo.parent_reloid = 1; // dummy parentRel->rd_id;

6. There is a sentence in the comment above adjust_inherited_tlist():

Note that this is not needed for INSERT because INSERT isn't
inheritable.

Maybe, we need to delete that and mention that we do need it in the case
of INSERT ON CONFLICT DO UPDATE on partitioned tables for translating DO
UPDATE SET target list.

7. In ExecInsert, it'd be better to have a partArbiterIndexes, just like
partConflTupdesc in the outermost scope and then do:

+			/* Use the appropriate list of arbiter indexes. */
+			if (mtstate->mt_partition_tuple_routing != NULL)
+				arbiterIndexes = partArbiterIndexes;
+			else
+				arbiterIndexes = node->arbiterIndexes;

and

+		/* Use the appropriate tuple descriptor. */
+		if (mtstate->mt_partition_tuple_routing != NULL)
+			onconfl_tupdesc = partConflTupdesc;
+		else
+			onconfl_tupdesc = mtstate->mt_conflproj_tupdesc;

using arbiterIndexes and onconfl_tupdesc declared in the appropriate scopes.

I have tried to make these changes and attached are the updated patches
containing those, including the change I suggested for 0001 (that is,
getting rid of mt_onconflict). I also expanded some comments in 0003
while making those changes.

Thanks,
Amit

Attachments:

v5-0001-Simplify-ExecInsert-API-re.-ON-CONFLICT-data.patchtext/plain; charset=UTF-8; name=v5-0001-Simplify-ExecInsert-API-re.-ON-CONFLICT-data.patchDownload
From e208f2fb3c2eaac6f7932f8c15afab789679e5ee Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 16 Mar 2018 14:29:28 -0300
Subject: [PATCH v5 1/3] Simplify ExecInsert API re. ON CONFLICT data

Instead of passing the ON CONFLICT-related members of ModifyTableState
into ExecInsert(), we can have that routine obtain them from the node,
since that is already an argument into the function.

While at it, remove arbiterIndexes from ModifyTableState, since that's
just a copy of the list already in the ModifyTable node, to which the
state node already has access.
---
 src/backend/executor/execPartition.c   |  4 ++--
 src/backend/executor/nodeModifyTable.c | 34 +++++++++++++++++++---------------
 src/include/nodes/execnodes.h          |  3 ---
 3 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f6fe7cd61d..ce9a4e16cf 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -363,8 +363,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 	if (partrel->rd_rel->relhasindex &&
 		leaf_part_rri->ri_IndexRelationDescs == NULL)
 		ExecOpenIndices(leaf_part_rri,
-						(mtstate != NULL &&
-						 mtstate->mt_onconflict != ONCONFLICT_NONE));
+						(node != NULL &&
+						 node->onConflictAction != ONCONFLICT_NONE));
 
 	/*
 	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3332ae4bf3..745be7ba30 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -258,8 +258,6 @@ static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
-		   List *arbiterIndexes,
-		   OnConflictAction onconflict,
 		   EState *estate,
 		   bool canSetTag)
 {
@@ -271,6 +269,8 @@ ExecInsert(ModifyTableState *mtstate,
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
+	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+	OnConflictAction onconflict = node->onConflictAction;
 
 	/*
 	 * get the heap tuple out of the tuple table slot, making sure we have a
@@ -455,6 +455,7 @@ ExecInsert(ModifyTableState *mtstate,
 	else
 	{
 		WCOKind		wco_kind;
+		bool		check_partition_constr;
 
 		/*
 		 * We always check the partition constraint, including when the tuple
@@ -463,8 +464,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 * trigger might modify the tuple such that the partition constraint
 		 * is no longer satisfied, so we need to check in that case.
 		 */
-		bool		check_partition_constr =
-		(resultRelInfo->ri_PartitionCheck != NIL);
+		check_partition_constr = (resultRelInfo->ri_PartitionCheck != NIL);
 
 		/*
 		 * Constraints might reference the tableoid column, so initialize
@@ -510,6 +510,9 @@ ExecInsert(ModifyTableState *mtstate,
 			uint32		specToken;
 			ItemPointerData conflictTid;
 			bool		specConflict;
+			List	   *arbiterIndexes;
+
+			arbiterIndexes = node->arbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -627,7 +630,7 @@ ExecInsert(ModifyTableState *mtstate,
 			if (resultRelInfo->ri_NumIndices > 0)
 				recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
 													   estate, false, NULL,
-													   arbiterIndexes);
+													   NIL);
 		}
 	}
 
@@ -1217,8 +1220,8 @@ lreplace:;
 			Assert(mtstate->rootResultRelInfo != NULL);
 			estate->es_result_relation_info = mtstate->rootResultRelInfo;
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot, NULL,
-								  ONCONFLICT_NONE, estate, canSetTag);
+			ret_slot = ExecInsert(mtstate, slot, planSlot,
+								  estate, canSetTag);
 
 			/*
 			 * Revert back the active result relation and the active
@@ -1582,6 +1585,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 static void
 fireBSTriggers(ModifyTableState *node)
 {
+	ModifyTable *plan = (ModifyTable *) node->ps.plan;
 	ResultRelInfo *resultRelInfo = node->resultRelInfo;
 
 	/*
@@ -1596,7 +1600,7 @@ fireBSTriggers(ModifyTableState *node)
 	{
 		case CMD_INSERT:
 			ExecBSInsertTriggers(node->ps.state, resultRelInfo);
-			if (node->mt_onconflict == ONCONFLICT_UPDATE)
+			if (plan->onConflictAction == ONCONFLICT_UPDATE)
 				ExecBSUpdateTriggers(node->ps.state,
 									 resultRelInfo);
 			break;
@@ -1640,12 +1644,13 @@ getTargetResultRelInfo(ModifyTableState *node)
 static void
 fireASTriggers(ModifyTableState *node)
 {
+	ModifyTable *plan = (ModifyTable *) node->ps.plan;
 	ResultRelInfo *resultRelInfo = getTargetResultRelInfo(node);
 
 	switch (node->operation)
 	{
 		case CMD_INSERT:
-			if (node->mt_onconflict == ONCONFLICT_UPDATE)
+			if (plan->onConflictAction == ONCONFLICT_UPDATE)
 				ExecASUpdateTriggers(node->ps.state,
 									 resultRelInfo,
 									 node->mt_oc_transition_capture);
@@ -1673,6 +1678,7 @@ fireASTriggers(ModifyTableState *node)
 static void
 ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 {
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
 	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
 
 	/* Check for transition tables on the directly targeted relation. */
@@ -1680,8 +1686,8 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 		MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 								   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 								   mtstate->operation);
-	if (mtstate->operation == CMD_INSERT &&
-		mtstate->mt_onconflict == ONCONFLICT_UPDATE)
+	if (plan->operation == CMD_INSERT &&
+		plan->onConflictAction == ONCONFLICT_UPDATE)
 		mtstate->mt_oc_transition_capture =
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
@@ -2052,7 +2058,6 @@ ExecModifyTable(PlanState *pstate)
 		{
 			case CMD_INSERT:
 				slot = ExecInsert(node, slot, planSlot,
-								  node->mt_arbiterindexes, node->mt_onconflict,
 								  estate, node->canSetTag);
 				break;
 			case CMD_UPDATE:
@@ -2136,8 +2141,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
-	mtstate->mt_onconflict = node->onConflictAction;
-	mtstate->mt_arbiterindexes = node->arbiterIndexes;
 
 	/* set up epqstate with dummy subplan data for the moment */
 	EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
@@ -2180,7 +2183,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
 			operation != CMD_DELETE &&
 			resultRelInfo->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(resultRelInfo, mtstate->mt_onconflict != ONCONFLICT_NONE);
+			ExecOpenIndices(resultRelInfo,
+							node->onConflictAction != ONCONFLICT_NONE);
 
 		/*
 		 * If this is an UPDATE and a BEFORE UPDATE trigger is present, the
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a953820f43..3b926014b6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -989,9 +989,6 @@ typedef struct ModifyTableState
 	List	  **mt_arowmarks;	/* per-subplan ExecAuxRowMark lists */
 	EPQState	mt_epqstate;	/* for evaluating EvalPlanQual rechecks */
 	bool		fireBSTriggers; /* do we need to fire stmt triggers? */
-	OnConflictAction mt_onconflict; /* ON CONFLICT type */
-	List	   *mt_arbiterindexes;	/* unique index OIDs to arbitrate taking
-									 * alt path */
 	TupleTableSlot *mt_existing;	/* slot to store existing target tuple in */
 	List	   *mt_excludedtlist;	/* the excluded pseudo relation's tlist  */
 	TupleTableSlot *mt_conflproj;	/* CONFLICT ... SET ... projection target */
-- 
2.11.0

v5-0002-Make-some-static-functions-work-on-TupleDesc-rath.patchtext/plain; charset=UTF-8; name=v5-0002-Make-some-static-functions-work-on-TupleDesc-rath.patchDownload
From fa2667d9546ccaff21e2221b4766eec1c160d482 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 1 Mar 2018 19:58:50 -0300
Subject: [PATCH v5 2/3] Make some static functions work on TupleDesc rather
 than Relation

---
 src/backend/optimizer/prep/preptlist.c | 23 ++++++++---------
 src/backend/optimizer/prep/prepunion.c | 45 ++++++++++++++++++----------------
 2 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 8603feef2b..b6e658fe81 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -54,7 +54,7 @@
 
 
 static List *expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, Relation rel);
+				  Index result_relation, TupleDesc tupdesc);
 
 
 /*
@@ -116,7 +116,8 @@ preprocess_targetlist(PlannerInfo *root)
 	tlist = parse->targetList;
 	if (command_type == CMD_INSERT || command_type == CMD_UPDATE)
 		tlist = expand_targetlist(tlist, command_type,
-								  result_relation, target_relation);
+								  result_relation,
+								  RelationGetDescr(target_relation));
 
 	/*
 	 * Add necessary junk columns for rowmarked rels.  These values are needed
@@ -230,7 +231,7 @@ preprocess_targetlist(PlannerInfo *root)
 			expand_targetlist(parse->onConflict->onConflictSet,
 							  CMD_UPDATE,
 							  result_relation,
-							  target_relation);
+							  RelationGetDescr(target_relation));
 
 	if (target_relation)
 		heap_close(target_relation, NoLock);
@@ -247,13 +248,13 @@ preprocess_targetlist(PlannerInfo *root)
 
 /*
  * expand_targetlist
- *	  Given a target list as generated by the parser and a result relation,
- *	  add targetlist entries for any missing attributes, and ensure the
- *	  non-junk attributes appear in proper field order.
+ *	  Given a target list as generated by the parser and a result relation's
+ *	  tuple descriptor, add targetlist entries for any missing attributes, and
+ *	  ensure the non-junk attributes appear in proper field order.
  */
 static List *
 expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, Relation rel)
+				  Index result_relation, TupleDesc tupdesc)
 {
 	List	   *new_tlist = NIL;
 	ListCell   *tlist_item;
@@ -266,14 +267,14 @@ expand_targetlist(List *tlist, int command_type,
 	 * The rewriter should have already ensured that the TLEs are in correct
 	 * order; but we have to insert TLEs for any missing attributes.
 	 *
-	 * Scan the tuple description in the relation's relcache entry to make
-	 * sure we have all the user attributes in the right order.
+	 * Scan the tuple description to make sure we have all the user attributes
+	 * in the right order.
 	 */
-	numattrs = RelationGetNumberOfAttributes(rel);
+	numattrs = tupdesc->natts;
 
 	for (attrno = 1; attrno <= numattrs; attrno++)
 	{
-		Form_pg_attribute att_tup = TupleDescAttr(rel->rd_att, attrno - 1);
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
 		TargetEntry *new_tle = NULL;
 
 		if (tlist_item != NULL)
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..d0d9812da6 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -113,10 +113,10 @@ static void expand_single_inheritance_child(PlannerInfo *root,
 								PlanRowMark *top_parentrc, Relation childrel,
 								List **appinfos, RangeTblEntry **childrte_p,
 								Index *childRTindex_p);
-static void make_inh_translation_list(Relation oldrelation,
-						  Relation newrelation,
-						  Index newvarno,
-						  List **translated_vars);
+static List *make_inh_translation_list(TupleDesc old_tupdesc,
+						  TupleDesc new_tupdesc,
+						  char *new_rel_name,
+						  Index newvarno);
 static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
 					List *translated_vars);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
@@ -1730,8 +1730,11 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 		appinfo->child_relid = childRTindex;
 		appinfo->parent_reltype = parentrel->rd_rel->reltype;
 		appinfo->child_reltype = childrel->rd_rel->reltype;
-		make_inh_translation_list(parentrel, childrel, childRTindex,
-								  &appinfo->translated_vars);
+		appinfo->translated_vars =
+			make_inh_translation_list(RelationGetDescr(parentrel),
+									  RelationGetDescr(childrel),
+									  RelationGetRelationName(childrel),
+									  childRTindex);
 		appinfo->parent_reloid = parentOID;
 		*appinfos = lappend(*appinfos, appinfo);
 
@@ -1788,22 +1791,23 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 
 /*
  * make_inh_translation_list
- *	  Build the list of translations from parent Vars to child Vars for
- *	  an inheritance child.
+ *	  Build the list of translations from parent Vars ("old" rel) to child
+ *	  Vars ("new" rel) for an inheritance child.
  *
  * For paranoia's sake, we match type/collation as well as attribute name.
  */
-static void
-make_inh_translation_list(Relation oldrelation, Relation newrelation,
-						  Index newvarno,
-						  List **translated_vars)
+static List *
+make_inh_translation_list(TupleDesc old_tupdesc, TupleDesc new_tupdesc,
+						  char *new_rel_name,
+						  Index newvarno)
 {
 	List	   *vars = NIL;
-	TupleDesc	old_tupdesc = RelationGetDescr(oldrelation);
-	TupleDesc	new_tupdesc = RelationGetDescr(newrelation);
 	int			oldnatts = old_tupdesc->natts;
 	int			newnatts = new_tupdesc->natts;
 	int			old_attno;
+	bool		equal_tupdescs;
+
+	equal_tupdescs = equalTupleDescs(old_tupdesc, new_tupdesc);
 
 	for (old_attno = 0; old_attno < oldnatts; old_attno++)
 	{
@@ -1827,10 +1831,9 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 		attcollation = att->attcollation;
 
 		/*
-		 * When we are generating the "translation list" for the parent table
-		 * of an inheritance set, no need to search for matches.
+		 * When the tupledescs are identical, no need to search for matches.
 		 */
-		if (oldrelation == newrelation)
+		if (equal_tupdescs)
 		{
 			vars = lappend(vars, makeVar(newvarno,
 										 (AttrNumber) (old_attno + 1),
@@ -1867,16 +1870,16 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 			}
 			if (new_attno >= newnatts)
 				elog(ERROR, "could not find inherited attribute \"%s\" of relation \"%s\"",
-					 attname, RelationGetRelationName(newrelation));
+					 attname, new_rel_name);
 		}
 
 		/* Found it, check type and collation match */
 		if (atttypid != att->atttypid || atttypmod != att->atttypmod)
 			elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's type",
-				 attname, RelationGetRelationName(newrelation));
+				 attname, new_rel_name);
 		if (attcollation != att->attcollation)
 			elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's collation",
-				 attname, RelationGetRelationName(newrelation));
+				 attname, new_rel_name);
 
 		vars = lappend(vars, makeVar(newvarno,
 									 (AttrNumber) (new_attno + 1),
@@ -1886,7 +1889,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 									 0));
 	}
 
-	*translated_vars = vars;
+	return vars;
 }
 
 /*
-- 
2.11.0

v5-0003-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchtext/plain; charset=UTF-8; name=v5-0003-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchDownload
From 6861c5632f09405dbcc544c97b72e1c5458e9147 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 28 Feb 2018 17:58:00 +0900
Subject: [PATCH v5 3/3] Fix ON CONFLICT to work with partitioned tables

---
 doc/src/sgml/ddl.sgml                         |  15 ---
 src/backend/catalog/heap.c                    |   2 +-
 src/backend/catalog/partition.c               |  62 ++++++---
 src/backend/commands/tablecmds.c              |  15 ++-
 src/backend/executor/execPartition.c          | 179 ++++++++++++++++++++++++--
 src/backend/executor/nodeModifyTable.c        |  70 ++++++++--
 src/backend/optimizer/prep/preptlist.c        |  25 ++--
 src/backend/optimizer/prep/prepunion.c        |  45 ++++++-
 src/backend/parser/analyze.c                  |   7 -
 src/include/catalog/partition.h               |   2 +-
 src/include/executor/execPartition.h          |  10 ++
 src/include/nodes/execnodes.h                 |   1 +
 src/include/optimizer/prep.h                  |  11 ++
 src/test/regress/expected/insert_conflict.out |  73 +++++++++--
 src/test/regress/sql/insert_conflict.sql      |  64 +++++++--
 15 files changed, 480 insertions(+), 101 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3d80ff9e5b..13489162df 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1776,7 +1776,7 @@ heap_drop_with_catalog(Oid relid)
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 	if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
 	{
-		parentOid = get_partition_parent(relid);
+		parentOid = get_partition_parent(relid, false);
 		LockRelationOid(parentOid, AccessExclusiveLock);
 
 		/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 786c05df73..8dc73ae092 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1384,24 +1385,43 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent or topmost ancestor of given relation
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance hierarchy.
  *
  * Note: Because this function assumes that the relation whose OID is passed
  * as an argument will have precisely one parent, it should only be called
  * when it is known that the relation is a partition.
  */
 Oid
-get_partition_parent(Oid relid)
+get_partition_parent(Oid relid, bool getroot)
+{
+	Relation	inhRel;
+	Oid		parentOid;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, getroot);
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find parent of relation %u", relid);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return parentOid;
+}
+
+/*
+ * get_partition_parent_recurse
+ *		Recursive part of get_partition_parent
+ */
+static Oid
+get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot)
 {
-	Form_pg_inherits form;
-	Relation	catalogRelation;
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
-	Oid			result;
-
-	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
+	Oid			result = InvalidOid;
 
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
@@ -1412,18 +1432,26 @@ get_partition_parent(Oid relid)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	/* Obtain the direct parent, and release resources before recursing */
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
-
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
-
+	if (HeapTupleIsValid(tuple))
+		result = ((Form_pg_inherits) GETSTRUCT(tuple))->inhparent;
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
+
+	/*
+	 * If we were asked to recurse, do so now.  Except that if we didn't get a
+	 * valid parent, then the 'relid' argument was already the topmost parent,
+	 * so return that.
+	 */
+	if (getroot)
+	{
+		if (OidIsValid(result))
+			return get_partition_parent_recurse(inhRel, result, getroot);
+		else
+			return relid;
+	}
 
 	return result;
 }
@@ -2505,7 +2533,7 @@ generate_partition_qual(Relation rel)
 		return copyObject(rel->rd_partcheck);
 
 	/* Grab at least an AccessShareLock on the parent table */
-	parent = heap_open(get_partition_parent(RelationGetRelid(rel)),
+	parent = heap_open(get_partition_parent(RelationGetRelid(rel), false),
 					   AccessShareLock);
 
 	/* Get pg_class.relpartbound */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 218224a156..6003afdd03 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1292,7 +1292,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	 */
 	if (is_partition && relOid != oldRelOid)
 	{
-		state->partParentOid = get_partition_parent(relOid);
+		state->partParentOid = get_partition_parent(relOid, false);
 		if (OidIsValid(state->partParentOid))
 			LockRelationOid(state->partParentOid, AccessExclusiveLock);
 	}
@@ -5843,7 +5843,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	/* If rel is partition, shouldn't drop NOT NULL if parent has the same */
 	if (rel->rd_rel->relispartition)
 	{
-		Oid			parentId = get_partition_parent(RelationGetRelid(rel));
+		Oid			parentId = get_partition_parent(RelationGetRelid(rel),
+													false);
 		Relation	parent = heap_open(parentId, AccessShareLock);
 		TupleDesc	tupDesc = RelationGetDescr(parent);
 		AttrNumber	parent_attnum;
@@ -14360,7 +14361,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 		if (!has_superclass(idxid))
 			continue;
 
-		Assert((IndexGetRelation(get_partition_parent(idxid), false) ==
+		Assert((IndexGetRelation(get_partition_parent(idxid, false), false) ==
 			   RelationGetRelid(rel)));
 
 		idx = index_open(idxid, AccessExclusiveLock);
@@ -14489,7 +14490,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 
 	/* Silently do nothing if already in the right state */
 	currParent = !has_superclass(partIdxId) ? InvalidOid :
-		get_partition_parent(partIdxId);
+		get_partition_parent(partIdxId, false);
 	if (currParent != RelationGetRelid(parentIdx))
 	{
 		IndexInfo  *childInfo;
@@ -14722,8 +14723,10 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		/* make sure we see the validation we just did */
 		CommandCounterIncrement();
 
-		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
-		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
+		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx),
+										   false);
+		parentTblId = get_partition_parent(RelationGetRelid(partedTbl),
+										   false);
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
 		parentTbl = relation_open(parentTblId, AccessExclusiveLock);
 		Assert(!parentIdx->rd_index->indisvalid);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ce9a4e16cf..4c0812c5d2 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -19,6 +19,7 @@
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "optimizer/prep.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -64,6 +65,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -74,20 +77,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
-	if (mtstate && mtstate->operation == CMD_UPDATE)
+	if (node && node->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		update_rri = mtstate->resultRelInfo;
 		num_update_rri = list_length(node->plans);
 		proute->subplan_partition_offsets =
@@ -109,6 +108,21 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot(NULL);
 
+	/*
+	 * We might need these arrays for conflict checking and handling the
+	 * DO UPDATE action
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		/* Indexes are always needed. */
+		proute->partition_arbiter_indexes =
+			(List **) palloc(nparts * sizeof(List *));
+		/* Only needed for the DO UPDATE action. */
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+			proute->partition_conflproj_tdescs =
+				(TupleDesc *) palloc(nparts * sizeof(TupleDesc));
+	}
+
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
@@ -475,9 +489,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +498,148 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * Initialize information about this partition that's needed to handle
+	 * the ON CONFLICT clause.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		TupleDesc	rootrelDesc = RelationGetDescr(firstResultRel);
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell *lc;
+		List	 *my_arbiterindexes = NIL;
+
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+		{
+			List	 *onconflset;
+			TupleDesc tupDesc;
+
+			/*
+			 * Expand the ON CONFLICT DO UPDATE SET target list so that it
+			 * contains any attributes of partition that are missing in the
+			 * original list (including any dropped columns).  We may need to
+			 * adjust it for inheritance translation of attributes if the
+			 * partition's tuple descriptor doesn't match the root parent's,
+			 * so pass it through adjust_and_expand_partition_tlist() instead
+			 * of directly calling expand_targetlist().
+			 */
+			if (map != NULL)
+			{
+				/*
+				 * Convert the Vars to contain partition's atttribute numbers
+				 */
+
+				/* First convert references to EXCLUDED pseudo-relation. */
+				onconflset = map_partition_varattnos(node->onConflictSet,
+													 INNER_VAR,
+													 partrel,
+													 firstResultRel, NULL);
+				/* Then convert references to main target relation. */
+				onconflset = map_partition_varattnos(onconflset,
+													 firstVarno,
+													 partrel,
+													 firstResultRel, NULL);
+
+				/*
+				 * We also need to change TargetEntry nodes to have correct
+				 * resnos.
+				 */
+				onconflset =
+					adjust_and_expand_partition_tlist(rootrelDesc,
+													  partrelDesc,
+										  RelationGetRelationName(partrel),
+													  firstVarno,
+										  RelationGetRelid(firstResultRel),
+													  onconflset);
+			}
+			else
+				/* Just expand. */
+				onconflset = expand_targetlist(node->onConflictSet,
+											   CMD_UPDATE,
+											   firstVarno,
+											   partrelDesc);
+
+			/*
+			 * We must set mtstate->mt_conflproj's tuple descriptor to this
+			 * before trying to use it for projection.
+			 */
+			tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+			PinTupleDesc(tupDesc);
+			proute->partition_conflproj_tdescs[partidx] = tupDesc;
+
+			leaf_part_rri->ri_onConflictSetProj =
+					ExecBuildProjectionInfo(onconflset, econtext,
+											mtstate->mt_conflproj,
+											&mtstate->ps, partrelDesc);
+
+			if (node->onConflictWhere)
+			{
+				if (map != NULL)
+				{
+					/*
+					 * Convert the Vars to contain partition's atttribute
+					 * numbers
+					 */
+					List *onconflwhere;
+
+					/* First convert references to EXCLUDED pseudo-relation. */
+					onconflwhere = map_partition_varattnos((List *)
+														node->onConflictWhere,
+														INNER_VAR,
+														partrel,
+														firstResultRel, NULL);
+					/* Then convert references to main target relation. */
+					onconflwhere = map_partition_varattnos((List *)
+														onconflwhere,
+														firstVarno,
+														partrel,
+														firstResultRel, NULL);
+					leaf_part_rri->ri_onConflictSetWhere =
+						ExecInitQual(onconflwhere, &mtstate->ps);
+				}
+				else
+					/* Just reuse the original one. */
+					leaf_part_rri->ri_onConflictSetWhere =
+						resultRelInfo->ri_onConflictSetWhere;
+			}
+		}
+
+		/* Initialize arbiter indexes list, if any. */
+		foreach(lc, ((ModifyTable *) mtstate->ps.plan)->arbiterIndexes)
+		{
+			Oid		parentArbiterIndexOid = lfirst_oid(lc);
+			int		i;
+
+			/*
+			 * Find parentArbiterIndexOid's child in this partition and add it
+			 * to my_arbiterindexes.
+			 */
+			for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+			{
+				Relation index = leaf_part_rri->ri_IndexRelationDescs[i];
+				Oid		 indexOid = RelationGetRelid(index);
+
+				if (parentArbiterIndexOid ==
+					get_partition_parent(indexOid, true))
+					my_arbiterindexes = lappend_oid(my_arbiterindexes,
+													indexOid);
+			}
+		}
+
+		/*
+		 * Use this list instead of the original one containing parent's
+		 * indexes.
+		 */
+		proute->partition_arbiter_indexes[partidx] = my_arbiterindexes;
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 745be7ba30..21466920f1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -56,6 +56,7 @@
 
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 ResultRelInfo *resultRelInfo,
+					 TupleDesc onConflictSetTupdesc,
 					 ItemPointer conflictTid,
 					 TupleTableSlot *planSlot,
 					 TupleTableSlot *excludedSlot,
@@ -269,8 +270,10 @@ ExecInsert(ModifyTableState *mtstate,
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
+	TupleDesc	partConflTupdesc = NULL;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	List	   *partArbiterIndexes = NIL;
 
 	/*
 	 * get the heap tuple out of the tuple table slot, making sure we have a
@@ -286,8 +289,8 @@ ExecInsert(ModifyTableState *mtstate,
 	/* Determine the partition to heap_insert the tuple into */
 	if (mtstate->mt_partition_tuple_routing)
 	{
-		int			leaf_part_index;
 		PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+		int			leaf_part_index;
 
 		/*
 		 * Away we go ... If we end up not finding a partition after all,
@@ -374,6 +377,16 @@ ExecInsert(ModifyTableState *mtstate,
 										  tuple,
 										  proute->partition_tuple_slot,
 										  &slot);
+
+		/* determine this partition's ON CONFLICT information */
+		if (onconflict != ONCONFLICT_NONE)
+		{
+			partArbiterIndexes =
+						proute->partition_arbiter_indexes[leaf_part_index];
+			if (onconflict == ONCONFLICT_UPDATE)
+				partConflTupdesc =
+						proute->partition_conflproj_tdescs[leaf_part_index];
+		}
 	}
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
@@ -512,7 +525,11 @@ ExecInsert(ModifyTableState *mtstate,
 			bool		specConflict;
 			List	   *arbiterIndexes;
 
-			arbiterIndexes = node->arbiterIndexes;
+			/* Use the appropriate list of arbiter indexes. */
+			if (mtstate->mt_partition_tuple_routing != NULL)
+				arbiterIndexes = partArbiterIndexes;
+			else
+				arbiterIndexes = node->arbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -541,8 +558,16 @@ ExecInsert(ModifyTableState *mtstate,
 					 * tuple.
 					 */
 					TupleTableSlot *returning = NULL;
+					TupleDesc	onconfl_tupdesc;
+
+					/* Use the appropriate tuple descriptor. */
+					if (mtstate->mt_partition_tuple_routing != NULL)
+						onconfl_tupdesc = partConflTupdesc;
+					else
+						onconfl_tupdesc = mtstate->mt_conflproj_tupdesc;
 
 					if (ExecOnConflictUpdate(mtstate, resultRelInfo,
+											 onconfl_tupdesc,
 											 &conflictTid, planSlot, slot,
 											 estate, canSetTag, &returning))
 					{
@@ -1149,6 +1174,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -1399,6 +1436,7 @@ lreplace:;
 static bool
 ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 ResultRelInfo *resultRelInfo,
+					 TupleDesc onConflictSetTupdesc,
 					 ItemPointer conflictTid,
 					 TupleTableSlot *planSlot,
 					 TupleTableSlot *excludedSlot,
@@ -1514,6 +1552,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
 
 	/* Store target's existing tuple in the state's dedicated slot */
+	ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation));
 	ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
 
 	/*
@@ -1557,6 +1596,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	}
 
 	/* Project the new tuple version */
+	ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc);
 	ExecProject(resultRelInfo->ri_onConflictSetProj);
 
 	/*
@@ -2163,8 +2203,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		subplan = (Plan *) lfirst(l);
 
 		/* Initialize the usesFdwDirectModify flag */
-		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
-															  node->fdwDirectModifyPlans);
+		resultRelInfo->ri_usesFdwDirectModify =
+			bms_is_member(i, node->fdwDirectModifyPlans);
 
 		/*
 		 * Verify result relation is a valid target for the current operation
@@ -2237,7 +2277,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
 		mtstate->mt_partition_tuple_routing =
-						ExecSetupPartitionTupleRouting(mtstate, rel);
+			ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2353,9 +2393,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/* initialize slot for the existing tuple */
-		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		/*
+		 * Initialize slot for the existing tuple.  We determine which
+		 * tupleDesc to use for this after we have determined which relation
+		 * the insert/update will be applied to, possibly after performing
+		 * tuple routing.
+		 */
+		mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2363,8 +2407,16 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		/* create target slot for UPDATE SET projection */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 								 relationDesc->tdhasoid);
+		PinTupleDesc(tupDesc);
+		mtstate->mt_conflproj_tupdesc = tupDesc;
+
+		/*
+		 * Just like the "existing tuple" slot, we'll defer deciding which
+		 * tupleDesc to use for this slot to a point where tuple routing has
+		 * been performed.
+		 */
 		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* build UPDATE SET projection state */
 		resultRelInfo->ri_onConflictSetProj =
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index b6e658fe81..6eda8be4b1 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -53,10 +53,6 @@
 #include "utils/rel.h"
 
 
-static List *expand_targetlist(List *tlist, int command_type,
-				  Index result_relation, TupleDesc tupdesc);
-
-
 /*
  * preprocess_targetlist
  *	  Driver for preprocessing the parse tree targetlist.
@@ -227,11 +223,20 @@ preprocess_targetlist(PlannerInfo *root)
 	 * while we have the relation open.
 	 */
 	if (parse->onConflict)
-		parse->onConflict->onConflictSet =
-			expand_targetlist(parse->onConflict->onConflictSet,
-							  CMD_UPDATE,
-							  result_relation,
-							  RelationGetDescr(target_relation));
+	{
+		Assert(target_relation != NULL);
+		/*
+		 * For partitioned tables, there is no point in expanding here.
+		 * We rather do it when we know which one of its partitions is chosen
+		 * for a given tuple and use its tuple descriptor for expansion.
+		 */
+		if (target_relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+			parse->onConflict->onConflictSet =
+				expand_targetlist(parse->onConflict->onConflictSet,
+								  CMD_UPDATE,
+								  result_relation,
+								  RelationGetDescr(target_relation));
+	}
 
 	if (target_relation)
 		heap_close(target_relation, NoLock);
@@ -252,7 +257,7 @@ preprocess_targetlist(PlannerInfo *root)
  *	  tuple descriptor, add targetlist entries for any missing attributes, and
  *	  ensure the non-junk attributes appear in proper field order.
  */
-static List *
+List *
 expand_targetlist(List *tlist, int command_type,
 				  Index result_relation, TupleDesc tupdesc)
 {
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index d0d9812da6..cb30cfa2d7 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -2354,7 +2354,10 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
  * therefore the TargetEntry nodes are fresh copies that it's okay to
  * scribble on.
  *
- * Note that this is not needed for INSERT because INSERT isn't inheritable.
+ * This is also used for INSERT ON CONFLICT DO UPDATE performed on partitioned
+ * tables, to translate the DO UPDATE SET target list from root parent
+ * attribute numbers to the chosen partition's attribute numbers, which means
+ * this function is called from the executor.
  */
 static List *
 adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
@@ -2438,6 +2441,46 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 }
 
 /*
+ * Given a targetlist for the parentRel of the given varno, adjust it to be in
+ * the correct order and to contain all the needed elements for the given
+ * partition.
+ */
+List *
+adjust_and_expand_partition_tlist(TupleDesc parentDesc,
+								  TupleDesc partitionDesc,
+								  char *partitionRelname,
+								  int parentVarno,
+								  int parentOid,
+								  List *targetlist)
+{
+	AppendRelInfo appinfo;
+	List *result_tl;
+
+	/*
+	 * Fist, fix the target entries' resnos, by using inheritance translation.
+	 */
+	appinfo.type = T_AppendRelInfo;
+	appinfo.parent_relid = parentVarno;
+	appinfo.parent_reltype = InvalidOid;
+	appinfo.child_relid = 0;
+	appinfo.child_reltype = InvalidOid;
+	appinfo.parent_reloid = parentOid;
+	appinfo.translated_vars =
+		make_inh_translation_list(parentDesc, partitionDesc,
+								  partitionRelname, 1);
+	result_tl = adjust_inherited_tlist((List *) targetlist, &appinfo);
+
+	/*
+	 * Add any attributes that are missing in the source list, such
+	 * as dropped columns in the partition.
+	 */
+	result_tl = expand_targetlist(result_tl, CMD_UPDATE,
+								  parentVarno, partitionDesc);
+
+	return result_tl;
+}
+
+/*
  * adjust_appendrel_attrs_multilevel
  *	  Apply Var translations from a toplevel appendrel parent down to a child.
  *
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c3a9617f67..92696f0607 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1025,13 +1025,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..70ddb225a1 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -51,7 +51,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
-extern Oid	get_partition_parent(Oid relid);
+extern Oid	get_partition_parent(Oid relid, bool getroot);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 03a599ad57..93f490233e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -90,6 +90,14 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								given leaf partition's rowtype after that
  *								partition is chosen for insertion by
  *								tuple-routing.
+ * partition_conflproj_tdescs	Array of TupleDescs per partition, each
+ *								describing the record type of the ON CONFLICT
+ *								DO UPDATE SET target list as applied to a
+ *								given partition
+ * partition_arbiter_indexes	Array of Lists with each slot containing the
+ *								list of OIDs of a given partition's indexes
+ *								that are to be used as arbiter indexes for
+ *								ON CONFLICT checking
  *-----------------------
  */
 typedef struct PartitionTupleRouting
@@ -106,6 +114,8 @@ typedef struct PartitionTupleRouting
 	int			num_subplan_partition_offsets;
 	TupleTableSlot *partition_tuple_slot;
 	TupleTableSlot *root_tuple_slot;
+	TupleDesc *partition_conflproj_tdescs;
+	List	  **partition_arbiter_indexes;
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3b926014b6..0e394c9dce 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -992,6 +992,7 @@ typedef struct ModifyTableState
 	TupleTableSlot *mt_existing;	/* slot to store existing target tuple in */
 	List	   *mt_excludedtlist;	/* the excluded pseudo relation's tlist  */
 	TupleTableSlot *mt_conflproj;	/* CONFLICT ... SET ... projection target */
+	TupleDesc	mt_conflproj_tupdesc; /* tuple descriptor for it */
 	struct PartitionTupleRouting *mt_partition_tuple_routing;
 	/* Tuple-routing support info */
 	struct TransitionCaptureState *mt_transition_capture;
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 38608770a2..4d3e8a9b90 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -14,6 +14,7 @@
 #ifndef PREP_H
 #define PREP_H
 
+#include "access/tupdesc.h"
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
 
@@ -42,6 +43,9 @@ extern List *preprocess_targetlist(PlannerInfo *root);
 
 extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
 
+extern List *expand_targetlist(List *tlist, int command_type,
+				  Index result_relation, TupleDesc tupdesc);
+
 /*
  * prototypes for prepunion.c
  */
@@ -65,4 +69,11 @@ extern SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 							   Relids child_relids, Relids top_parent_relids);
 
+extern List *adjust_and_expand_partition_tlist(TupleDesc parentDesc,
+								  TupleDesc partitionDesc,
+								  char *partitionRelname,
+								  int parentVarno,
+								  int parentOid,
+								  List *targetlist);
+
 #endif							/* PREP_H */
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..a9677f06e6 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,67 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..73122479a3 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,59 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
-- 
2.11.0

#22Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#21)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/19 16:45, Amit Langote wrote:

I have tried to make these changes and attached are the updated patches
containing those, including the change I suggested for 0001 (that is,
getting rid of mt_onconflict). I also expanded some comments in 0003
while making those changes.

I realized that there should be a test where transition table is involved
for an ON CONFLICT DO UPDATE on a partitioned table due to relevant
statement trigger on the table; something like the attached. But due to a
bug being discussed over at [1]/messages/by-id/ba19eff9-2120-680e-4671-55a9bea9454f@lab.ntt.co.jp, we won't get the correct expected output
for the test until the latest patch submitted for that bug [2]/messages/by-id/df921671-32df-45ea-c0e4-9b51ee86ba3b@lab.ntt.co.jp is
committed as a bug fix.

Thanks,
Amit

[1]: /messages/by-id/ba19eff9-2120-680e-4671-55a9bea9454f@lab.ntt.co.jp
/messages/by-id/ba19eff9-2120-680e-4671-55a9bea9454f@lab.ntt.co.jp

[2]: /messages/by-id/df921671-32df-45ea-c0e4-9b51ee86ba3b@lab.ntt.co.jp
/messages/by-id/df921671-32df-45ea-c0e4-9b51ee86ba3b@lab.ntt.co.jp

Attachments:

0001-More-tests-for-ON-CONFLICT-DO-UPDATE-on-partitioned-.patchtext/plain; charset=UTF-8; name=0001-More-tests-for-ON-CONFLICT-DO-UPDATE-on-partitioned-.patchDownload
From 152c6d5afed21e775caf4862c00e5c6388f7403b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 19 Mar 2018 17:13:08 +0900
Subject: [PATCH] More tests for ON CONFLICT DO UPDATE on partitioned tables

For transition tables of the DO UPDATE action.
---
 src/test/regress/expected/triggers.out | 33 +++++++++++++++++++++++++++++++++
 src/test/regress/sql/triggers.sql      | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 99be9ac6e9..e8b849f773 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2328,6 +2328,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
 NOTICE:  trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
 NOTICE:  trigger = my_table_insert_trig, new table = <NULL>
 --
+-- now using a partitioned table
+--
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL>
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB)
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+ERROR:  new row for relation "iocdu_tt_parted1" violates partition constraint
+DETAIL:  Failing row contains (2, BBB).
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL>
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD)
+drop table iocdu_tt_parted;
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 3354f4899f..3773c6bc98 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1773,6 +1773,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
   update set b = my_table.b || ':' || excluded.b;
 
 --
+-- now using a partitioned table
+--
+
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+drop table iocdu_tt_parted;
+
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
-- 
2.11.0

#23Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Alvaro Herrera (#19)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

(2018/03/18 13:17), Alvaro Herrera wrote:

Alvaro Herrera wrote:
The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing. I fear we may be doing
redundant and/or misplaced work. I'll look into it next week.

I'm still reviewing the patches, but I really agree on that point. As
Pavan mentioned upthread, the onConflictSet tlist for the root parent,
from which we create a translated onConflictSet tlist for a partition,
would have already been processed by expand_targetlist() to contain all
missing columns as well, so I think we could create the tlist for the
partition by simply re-ordering the expression-converted tlist (ie,
conv_setproj) based on the conversion map for the partition. The
Attached defines a function for that, which could be called, instead of
calling adjust_and_expand_partition_tlist(). This would allow us to get
rid of planner changes from the patches. Maybe I'm missing something,
though.

Best regards,
Etsuro Fujita

Attachments:

simplify-tlist-adjustment.patchtext/x-diff; name=simplify-tlist-adjustment.patchDownload
*** src/backend/executor/execPartition.c.org	2018-03-19 21:32:52.000000000 +0900
--- src/backend/executor/execPartition.c	2018-03-19 21:44:17.000000000 +0900
***************
*** 15,24 ****
--- 15,26 ----
  #include "postgres.h"
  
  #include "catalog/pg_inherits_fn.h"
+ #include "catalog/pg_type.h"
  #include "executor/execPartition.h"
  #include "executor/executor.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
+ #include "nodes/makefuncs.h"
  #include "optimizer/prep.h"
  #include "utils/lsyscache.h"
  #include "utils/rls.h"
***************
*** 37,42 ****
--- 39,45 ----
  									 Datum *values,
  									 bool *isnull,
  									 int maxfieldlen);
+ static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap *map);
  
  /*
   * ExecSetupPartitionTupleRouting - sets up information needed during
***************
*** 554,565 ****
  										firstVarno, partrel,
  										firstResultRel, NULL);
  
! 			conv_setproj =
! 				adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
! 												  RelationGetDescr(partrel),
! 												  RelationGetRelationName(partrel),
! 												  firstVarno,
! 												  conv_setproj);
  
  			tupDesc = ExecTypeFromTL(conv_setproj, partrelDesc->tdhasoid);
  			part_conflproj_slot = ExecInitExtraTupleSlot(mtstate->ps.state,
--- 557,563 ----
  										firstVarno, partrel,
  										firstResultRel, NULL);
  
! 			conv_setproj = adjust_onconflictset_tlist(conv_setproj, map);
  
  			tupDesc = ExecTypeFromTL(conv_setproj, partrelDesc->tdhasoid);
  			part_conflproj_slot = ExecInitExtraTupleSlot(mtstate->ps.state,
***************
*** 1091,1093 ****
--- 1089,1153 ----
  
  	return buf.data;
  }
+ 
+ /*
+  * Adjust the targetlist entries of a translated ON CONFLICT UPDATE operation
+  *
+  * The expressions have already been fixed, but we need to re-order the tlist
+  * so that the target resnos match the child table.
+  *
+  * The given tlist has already been through expression_tree_mutator;
+  * therefore the TargetEntry nodes are fresh copies that it's okay to
+  * scribble on.
+  */
+ static List *
+ adjust_onconflictset_tlist(List *tlist, TupleConversionMap *map)
+ {
+ 	List	   *new_tlist = NIL;
+ 	TupleDesc	tupdesc = map->outdesc;
+ 	AttrNumber *attrMap = map->attrMap;
+ 	int			numattrs = tupdesc->natts;
+ 	int			attrno;
+ 
+ 	for (attrno = 1; attrno <= numattrs; attrno++)
+ 	{
+ 		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+ 		TargetEntry *tle;
+ 
+ 		if (attrMap[attrno - 1] != 0)
+ 		{
+ 			Assert(!att_tup->attisdropped);
+ 
+ 			/* Get the corresponding tlist entry from the given tlist */
+ 			tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1);
+ 
+ 			/* Get the resno right */
+ 			if (tle->resno != attrno)
+ 				tle->resno = attrno;
+ 		}
+ 		else
+ 		{
+ 			Node	   *expr;
+ 
+ 			Assert(att_tup->attisdropped);
+ 
+ 			/* Insert NULL for dropped column */
+ 			expr = (Node *) makeConst(INT4OID,
+ 									  -1,
+ 									  InvalidOid,
+ 									  sizeof(int32),
+ 									  (Datum) 0,
+ 									  true, /* isnull */
+ 									  true /* byval */ );
+ 
+ 			tle = makeTargetEntry((Expr *) expr,
+ 								  attrno,
+ 								  pstrdup(NameStr(att_tup->attname)),
+ 								  false);
+ 		}
+ 
+ 		new_tlist = lappend(new_tlist, tle);
+ 	}
+ 
+ 	return new_tlist;
+ }
#24Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#23)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

Fujita-san,

On 2018/03/19 21:59, Etsuro Fujita wrote:

(2018/03/18 13:17), Alvaro Herrera wrote:

Alvaro Herrera wrote:
The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing.� I fear we may be doing
redundant and/or misplaced work.� I'll look into it next week.

I'm still reviewing the patches, but I really agree on that point.� As
Pavan mentioned upthread, the onConflictSet tlist for the root parent,
from which we create a translated onConflictSet tlist for a partition,
would have already been processed by expand_targetlist() to contain all
missing columns as well, so I think we could create the tlist for the
partition by simply re-ordering the expression-converted tlist (ie,
conv_setproj) based on the conversion map for the partition.� The Attached
defines a function for that, which could be called, instead of calling
adjust_and_expand_partition_tlist().� This would allow us to get rid of
planner changes from the patches.� Maybe I'm missing something, though.

Thanks for the patch. I can confirm your proposed
adjust_onconflictset_tlist() is enough to replace adjust_inherited_tlist()
+ expand_targetlist() combination (aka
adjust_and_expand_partition_tlist()), thus rendering the planner changes
in this patch unnecessary. I tested it with a partition tree involving
partitions of varying attribute numbers (dropped columns included) and it
seems to work as expected (as also exercised in regression tests) as shown
below.

Partitioned table p has partitions p1, p2, p3, p4, and p5 whose attributes
look like this; shown as (colname: attnum, ...).

p: (a: 1, b: 4)
p1: (a: 1, b: 4)
p2: (a: 2, b: 4)
p3: (a: 1, b: 3)
p4: (a: 3, b: 8)
p5: (a: 1, b: 5)

You may notice that all partitions but p1 will have a tuple conversion map
and hence will undergo adjust_onconflictset_tlist() treatment.

insert into p values (1, 'a') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (1, 'b') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (1, 'c') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

insert into p values (1) on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

insert into p values (2, 'a') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (2, 'b') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (2, 'c') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

insert into p values (5, 'a') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (5, 'b') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 1

insert into p values (5, 'c') on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

insert into p values (5) on conflict (a) do update set b = excluded.b
where excluded.b = 'b';
INSERT 0 0

select tableoid::regclass, * from p;
tableoid | a | b
----------+---+---
p1 | 1 | b
p2 | 2 | b
p5 | 5 | b
(3 rows)

I have incorporated your patch in the main patch after updating the
comments a bit. Also, now that 6666ee49f49 is in [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6666ee49f49, the transition
table related tests I proposed yesterday pass nicely. Instead of posting
as a separate patch, I have merged it with the main patch. So now that
planner refactoring is unnecessary, attached is just one patch.

Thanks,
Amit

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6666ee49f49

Attachments:

v6-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchtext/plain; charset=UTF-8; name=v6-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchDownload
From ac4f82d67720994ff8c632515bcf6760542c0d2f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 20 Mar 2018 10:09:38 +0900
Subject: [PATCH v6] Fix ON CONFLICT to work with partitioned tables

Author: Amit Langote, Alvaro Herrera, Etsuro Fujita
---
 doc/src/sgml/ddl.sgml                         |  15 --
 src/backend/catalog/heap.c                    |   2 +-
 src/backend/catalog/partition.c               |  62 +++++--
 src/backend/commands/tablecmds.c              |  15 +-
 src/backend/executor/execPartition.c          | 237 ++++++++++++++++++++++++--
 src/backend/executor/nodeModifyTable.c        |  93 ++++++++--
 src/backend/parser/analyze.c                  |   7 -
 src/include/catalog/partition.h               |   2 +-
 src/include/executor/execPartition.h          |  10 ++
 src/include/nodes/execnodes.h                 |   1 +
 src/include/optimizer/prep.h                  |   9 +
 src/test/regress/expected/insert_conflict.out |  73 ++++++--
 src/test/regress/expected/triggers.out        |  33 ++++
 src/test/regress/sql/insert_conflict.sql      |  64 +++++--
 src/test/regress/sql/triggers.sql             |  33 ++++
 15 files changed, 560 insertions(+), 96 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3d80ff9e5b..13489162df 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1776,7 +1776,7 @@ heap_drop_with_catalog(Oid relid)
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 	if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
 	{
-		parentOid = get_partition_parent(relid);
+		parentOid = get_partition_parent(relid, false);
 		LockRelationOid(parentOid, AccessExclusiveLock);
 
 		/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 786c05df73..8dc73ae092 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1384,24 +1385,43 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent or topmost ancestor of given relation
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance hierarchy.
  *
  * Note: Because this function assumes that the relation whose OID is passed
  * as an argument will have precisely one parent, it should only be called
  * when it is known that the relation is a partition.
  */
 Oid
-get_partition_parent(Oid relid)
+get_partition_parent(Oid relid, bool getroot)
+{
+	Relation	inhRel;
+	Oid		parentOid;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, getroot);
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find parent of relation %u", relid);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return parentOid;
+}
+
+/*
+ * get_partition_parent_recurse
+ *		Recursive part of get_partition_parent
+ */
+static Oid
+get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot)
 {
-	Form_pg_inherits form;
-	Relation	catalogRelation;
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
-	Oid			result;
-
-	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
+	Oid			result = InvalidOid;
 
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
@@ -1412,18 +1432,26 @@ get_partition_parent(Oid relid)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	/* Obtain the direct parent, and release resources before recursing */
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
-
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
-
+	if (HeapTupleIsValid(tuple))
+		result = ((Form_pg_inherits) GETSTRUCT(tuple))->inhparent;
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
+
+	/*
+	 * If we were asked to recurse, do so now.  Except that if we didn't get a
+	 * valid parent, then the 'relid' argument was already the topmost parent,
+	 * so return that.
+	 */
+	if (getroot)
+	{
+		if (OidIsValid(result))
+			return get_partition_parent_recurse(inhRel, result, getroot);
+		else
+			return relid;
+	}
 
 	return result;
 }
@@ -2505,7 +2533,7 @@ generate_partition_qual(Relation rel)
 		return copyObject(rel->rd_partcheck);
 
 	/* Grab at least an AccessShareLock on the parent table */
-	parent = heap_open(get_partition_parent(RelationGetRelid(rel)),
+	parent = heap_open(get_partition_parent(RelationGetRelid(rel), false),
 					   AccessShareLock);
 
 	/* Get pg_class.relpartbound */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 218224a156..6003afdd03 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1292,7 +1292,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	 */
 	if (is_partition && relOid != oldRelOid)
 	{
-		state->partParentOid = get_partition_parent(relOid);
+		state->partParentOid = get_partition_parent(relOid, false);
 		if (OidIsValid(state->partParentOid))
 			LockRelationOid(state->partParentOid, AccessExclusiveLock);
 	}
@@ -5843,7 +5843,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	/* If rel is partition, shouldn't drop NOT NULL if parent has the same */
 	if (rel->rd_rel->relispartition)
 	{
-		Oid			parentId = get_partition_parent(RelationGetRelid(rel));
+		Oid			parentId = get_partition_parent(RelationGetRelid(rel),
+													false);
 		Relation	parent = heap_open(parentId, AccessShareLock);
 		TupleDesc	tupDesc = RelationGetDescr(parent);
 		AttrNumber	parent_attnum;
@@ -14360,7 +14361,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 		if (!has_superclass(idxid))
 			continue;
 
-		Assert((IndexGetRelation(get_partition_parent(idxid), false) ==
+		Assert((IndexGetRelation(get_partition_parent(idxid, false), false) ==
 			   RelationGetRelid(rel)));
 
 		idx = index_open(idxid, AccessExclusiveLock);
@@ -14489,7 +14490,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 
 	/* Silently do nothing if already in the right state */
 	currParent = !has_superclass(partIdxId) ? InvalidOid :
-		get_partition_parent(partIdxId);
+		get_partition_parent(partIdxId, false);
 	if (currParent != RelationGetRelid(parentIdx))
 	{
 		IndexInfo  *childInfo;
@@ -14722,8 +14723,10 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		/* make sure we see the validation we just did */
 		CommandCounterIncrement();
 
-		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
-		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
+		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx),
+										   false);
+		parentTblId = get_partition_parent(RelationGetRelid(partedTbl),
+										   false);
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
 		parentTbl = relation_open(parentTblId, AccessExclusiveLock);
 		Assert(!parentIdx->rd_index->indisvalid);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ce9a4e16cf..579cb3ddb9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -15,10 +15,12 @@
 #include "postgres.h"
 
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_type.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -36,6 +38,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 Datum *values,
 									 bool *isnull,
 									 int maxfieldlen);
+static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap *map);
 
 /*
  * ExecSetupPartitionTupleRouting - sets up information needed during
@@ -64,6 +67,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -74,20 +79,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
-	if (mtstate && mtstate->operation == CMD_UPDATE)
+	if (node && node->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		update_rri = mtstate->resultRelInfo;
 		num_update_rri = list_length(node->plans);
 		proute->subplan_partition_offsets =
@@ -109,6 +110,21 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot(NULL);
 
+	/*
+	 * We might need these arrays for conflict checking and handling the
+	 * DO UPDATE action
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		/* Indexes are always needed. */
+		proute->partition_arbiter_indexes =
+			(List **) palloc(nparts * sizeof(List *));
+		/* Only needed for the DO UPDATE action. */
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+			proute->partition_conflproj_tdescs =
+				(TupleDesc *) palloc(nparts * sizeof(TupleDesc));
+	}
+
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
@@ -475,9 +491,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +500,141 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * Initialize information about this partition that's needed to handle
+	 * the ON CONFLICT clause.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell *lc;
+		List	 *my_arbiterindexes = NIL;
+
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+		{
+			List	 *onconflset;
+			TupleDesc tupDesc;
+
+			Assert(node->onConflictSet != NIL);
+
+			/*
+			 * If partition's tuple descriptor differs from the root parent,
+			 * we need to adjust the onConflictSet target list to account for
+			 * differences in attribute numbers.
+			 */
+			if (map != NULL)
+			{
+				/*
+				 * First convert Vars to contain partition's atttribute
+				 * numbers.
+				 */
+
+				/* Convert Vars referencing EXCLUDED pseudo-relation. */
+				onconflset = map_partition_varattnos(node->onConflictSet,
+													 INNER_VAR,
+													 partrel,
+													 firstResultRel, NULL);
+				/* Convert Vars referencing main target relation. */
+				onconflset = map_partition_varattnos(onconflset,
+													 firstVarno,
+													 partrel,
+													 firstResultRel, NULL);
+
+				/*
+				 * The original list wouldn't contain entries for the
+				 * partition's dropped attributes, which we must be accounted
+				 * for because targetlist must have all the attributes of the
+				 * underlying table including the dropped ones.  Fix that and
+				 * reorder target list entries if their resnos change as a
+				 * result of the adjustment.
+				 */
+				onconflset = adjust_onconflictset_tlist(onconflset, map);
+			}
+			else
+				/* Just reuse the original one. */
+				onconflset = node->onConflictSet;
+
+			/*
+			 * We must set mtstate->mt_conflproj's tuple descriptor to this
+			 * before trying to use it for projection.
+			 */
+			tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+			PinTupleDesc(tupDesc);
+			proute->partition_conflproj_tdescs[partidx] = tupDesc;
+
+			leaf_part_rri->ri_onConflictSetProj =
+					ExecBuildProjectionInfo(onconflset, econtext,
+											mtstate->mt_conflproj,
+											&mtstate->ps, partrelDesc);
+
+			if (node->onConflictWhere)
+			{
+				if (map != NULL)
+				{
+					/*
+					 * Convert the Vars to contain partition's atttribute
+					 * numbers
+					 */
+					List *onconflwhere;
+
+					/* Convert Vars referencing EXCLUDED pseudo-relation. */
+					onconflwhere = map_partition_varattnos((List *)
+														node->onConflictWhere,
+														INNER_VAR,
+														partrel,
+														firstResultRel, NULL);
+					/* Convert Vars referencing main target relation. */
+					onconflwhere = map_partition_varattnos((List *)
+														onconflwhere,
+														firstVarno,
+														partrel,
+														firstResultRel, NULL);
+					leaf_part_rri->ri_onConflictSetWhere =
+						ExecInitQual(onconflwhere, &mtstate->ps);
+				}
+				else
+					/* Just reuse the original one. */
+					leaf_part_rri->ri_onConflictSetWhere =
+						resultRelInfo->ri_onConflictSetWhere;
+			}
+		}
+
+		/* Initialize arbiter indexes list, if any. */
+		foreach(lc, ((ModifyTable *) mtstate->ps.plan)->arbiterIndexes)
+		{
+			Oid		parentArbiterIndexOid = lfirst_oid(lc);
+			int		i;
+
+			/*
+			 * Find parentArbiterIndexOid's child in this partition and add it
+			 * to my_arbiterindexes.
+			 */
+			for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+			{
+				Relation index = leaf_part_rri->ri_IndexRelationDescs[i];
+				Oid		 indexOid = RelationGetRelid(index);
+
+				if (parentArbiterIndexOid ==
+					get_partition_parent(indexOid, true))
+					my_arbiterindexes = lappend_oid(my_arbiterindexes,
+													indexOid);
+			}
+		}
+
+		/*
+		 * Use this list instead of the original one containing parent's
+		 * indexes.
+		 */
+		proute->partition_arbiter_indexes[partidx] = my_arbiterindexes;
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
@@ -946,3 +1094,66 @@ ExecBuildSlotPartitionKeyDescription(Relation rel,
 
 	return buf.data;
 }
+
+/*
+ * Adjust the targetlist entries of an inherited ON CONFLICT DO UPDATE
+ * operation for a given partition
+ *
+ * The expressions have already been fixed, but we have to make sure that the
+ * target resnos match the partition.  In some cases, this can force us to
+ * re-order the tlist to preserve resno ordering.
+ *
+ * Scribbles on the input tlist, so callers must make sure to make a copy
+ * before passing it to us.
+ */
+static List *
+adjust_onconflictset_tlist(List *tlist, TupleConversionMap *map)
+{
+	List	   *new_tlist = NIL;
+	TupleDesc	tupdesc = map->outdesc;
+	AttrNumber *attrMap = map->attrMap;
+	int			numattrs = tupdesc->natts;
+	int			attrno;
+
+	for (attrno = 1; attrno <= numattrs; attrno++)
+	{
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+		TargetEntry *tle;
+
+		if (attrMap[attrno - 1] != 0)
+		{
+			Assert(!att_tup->attisdropped);
+
+			/* Get the corresponding tlist entry from the given tlist */
+			tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1);
+
+			/* Get the resno right */
+			if (tle->resno != attrno)
+				tle->resno = attrno;
+		}
+		else
+		{
+			Node	   *expr;
+
+			Assert(att_tup->attisdropped);
+
+			/* Insert NULL for dropped column */
+			expr = (Node *) makeConst(INT4OID,
+									  -1,
+									  InvalidOid,
+									  sizeof(int32),
+									  (Datum) 0,
+									  true, /* isnull */
+									  true /* byval */ );
+
+			tle = makeTargetEntry((Expr *) expr,
+								  attrno,
+								  pstrdup(NameStr(att_tup->attname)),
+								  false);
+		}
+
+		new_tlist = lappend(new_tlist, tle);
+	}
+
+	return new_tlist;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4fa2d7265f..29f155e3a5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -56,6 +56,7 @@
 
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 ResultRelInfo *resultRelInfo,
+					 TupleDesc onConflictSetTupdesc,
 					 ItemPointer conflictTid,
 					 TupleTableSlot *planSlot,
 					 TupleTableSlot *excludedSlot,
@@ -66,7 +67,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot);
+						TupleTableSlot *slot,
+						int *partition_index);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
@@ -264,6 +266,7 @@ ExecInsert(ModifyTableState *mtstate,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
+		   int partition_index,
 		   bool canSetTag)
 {
 	HeapTuple	tuple;
@@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
 			ItemPointerData conflictTid;
 			bool		specConflict;
 			List	   *arbiterIndexes;
+			PartitionTupleRouting *proute =
+										mtstate->mt_partition_tuple_routing;
 
-			arbiterIndexes = node->arbiterIndexes;
+			/* Use the appropriate list of arbiter indexes. */
+			if (mtstate->mt_partition_tuple_routing != NULL)
+			{
+				Assert(partition_index >= 0 && proute != NULL);
+				arbiterIndexes =
+						proute->partition_arbiter_indexes[partition_index];
+			}
+			else
+				arbiterIndexes = node->arbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -451,8 +464,20 @@ ExecInsert(ModifyTableState *mtstate,
 					 * tuple.
 					 */
 					TupleTableSlot *returning = NULL;
+					TupleDesc	onconfl_tupdesc;
+
+					/* Use the appropriate tuple descriptor. */
+					if (mtstate->mt_partition_tuple_routing != NULL)
+					{
+						Assert(partition_index >= 0 && proute != NULL);
+						onconfl_tupdesc =
+						  proute->partition_conflproj_tdescs[partition_index];
+					}
+					else
+						onconfl_tupdesc = mtstate->mt_conflproj_tupdesc;
 
 					if (ExecOnConflictUpdate(mtstate, resultRelInfo,
+											 onconfl_tupdesc,
 											 &conflictTid, planSlot, slot,
 											 estate, canSetTag, &returning))
 					{
@@ -1052,10 +1077,23 @@ lreplace:;
 			bool		tuple_deleted;
 			TupleTableSlot *ret_slot;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
+			int			map_index,
+						partition_index;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -1125,10 +1163,11 @@ lreplace:;
 			 */
 			Assert(mtstate->rootResultRelInfo != NULL);
 			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+										   mtstate->rootResultRelInfo, slot,
+										   &partition_index);
 
 			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
+								  estate, partition_index, canSetTag);
 
 			/* Revert ExecPrepareTupleRouting's node change. */
 			estate->es_result_relation_info = resultRelInfo;
@@ -1304,6 +1343,7 @@ lreplace:;
 static bool
 ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 ResultRelInfo *resultRelInfo,
+					 TupleDesc onConflictSetTupdesc,
 					 ItemPointer conflictTid,
 					 TupleTableSlot *planSlot,
 					 TupleTableSlot *excludedSlot,
@@ -1419,6 +1459,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
 
 	/* Store target's existing tuple in the state's dedicated slot */
+	ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation));
 	ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
 
 	/*
@@ -1462,6 +1503,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	}
 
 	/* Project the new tuple version */
+	ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc);
 	ExecProject(resultRelInfo->ri_onConflictSetProj);
 
 	/*
@@ -1631,13 +1673,16 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * In mtstate, transition capture changes may also need to be reverted.
  *
  * Returns a slot holding the tuple of the partition rowtype.
+ * *partition_index is set to the index of the partition that the input
+ * tuple is routed to.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						int *partition_index)
 {
 	int			partidx;
 	ResultRelInfo *partrel;
@@ -1720,6 +1765,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 							  proute->partition_tuple_slot,
 							  &slot);
 
+	*partition_index = partidx;
 	return slot;
 }
 
@@ -2065,16 +2111,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
+			{
+				int		partition_index;
+
 				/* Prepare for tuple routing if needed. */
 				if (proute)
 					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
+												   resultRelInfo, slot,
+												   &partition_index);
 				slot = ExecInsert(node, slot, planSlot,
-								  estate, node->canSetTag);
+								  estate, partition_index, node->canSetTag);
 				/* Revert ExecPrepareTupleRouting's state change. */
 				if (proute)
 					estate->es_result_relation_info = resultRelInfo;
 				break;
+			}
 			case CMD_UPDATE:
 				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
 								  &node->mt_epqstate, estate, node->canSetTag);
@@ -2178,8 +2229,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		subplan = (Plan *) lfirst(l);
 
 		/* Initialize the usesFdwDirectModify flag */
-		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
-															  node->fdwDirectModifyPlans);
+		resultRelInfo->ri_usesFdwDirectModify =
+			bms_is_member(i, node->fdwDirectModifyPlans);
 
 		/*
 		 * Verify result relation is a valid target for the current operation
@@ -2252,7 +2303,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
 		mtstate->mt_partition_tuple_routing =
-						ExecSetupPartitionTupleRouting(mtstate, rel);
+			ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/* initialize slot for the existing tuple */
-		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		/*
+		 * Initialize slot for the existing tuple.  We determine which
+		 * tupleDesc to use for this after we have determined which relation
+		 * the insert/update will be applied to, possibly after performing
+		 * tuple routing.
+		 */
+		mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		/* create target slot for UPDATE SET projection */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 								 relationDesc->tdhasoid);
+		PinTupleDesc(tupDesc);
+		mtstate->mt_conflproj_tupdesc = tupDesc;
+
+		/*
+		 * Just like the "existing tuple" slot, we'll defer deciding which
+		 * tupleDesc to use for this slot to a point where tuple routing has
+		 * been performed.
+		 */
 		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* build UPDATE SET projection state */
 		resultRelInfo->ri_onConflictSetProj =
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index cf1a34e41a..a4b5aaef44 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1026,13 +1026,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..70ddb225a1 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -51,7 +51,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
-extern Oid	get_partition_parent(Oid relid);
+extern Oid	get_partition_parent(Oid relid, bool getroot);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 03a599ad57..93f490233e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -90,6 +90,14 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								given leaf partition's rowtype after that
  *								partition is chosen for insertion by
  *								tuple-routing.
+ * partition_conflproj_tdescs	Array of TupleDescs per partition, each
+ *								describing the record type of the ON CONFLICT
+ *								DO UPDATE SET target list as applied to a
+ *								given partition
+ * partition_arbiter_indexes	Array of Lists with each slot containing the
+ *								list of OIDs of a given partition's indexes
+ *								that are to be used as arbiter indexes for
+ *								ON CONFLICT checking
  *-----------------------
  */
 typedef struct PartitionTupleRouting
@@ -106,6 +114,8 @@ typedef struct PartitionTupleRouting
 	int			num_subplan_partition_offsets;
 	TupleTableSlot *partition_tuple_slot;
 	TupleTableSlot *root_tuple_slot;
+	TupleDesc *partition_conflproj_tdescs;
+	List	  **partition_arbiter_indexes;
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d9e591802f..88ef2b71f3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -993,6 +993,7 @@ typedef struct ModifyTableState
 	List	   *mt_excludedtlist;	/* the excluded pseudo relation's tlist  */
 	TupleTableSlot *mt_conflproj;	/* CONFLICT ... SET ... projection target */
 
+	TupleDesc	mt_conflproj_tupdesc; /* tuple descriptor for it */
 	/* Tuple-routing support info */
 	struct PartitionTupleRouting *mt_partition_tuple_routing;
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 38608770a2..b6a0dda338 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -14,6 +14,7 @@
 #ifndef PREP_H
 #define PREP_H
 
+#include "access/tupdesc.h"
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
 
@@ -42,6 +43,7 @@ extern List *preprocess_targetlist(PlannerInfo *root);
 
 extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
 
+
 /*
  * prototypes for prepunion.c
  */
@@ -65,4 +67,11 @@ extern SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 							   Relids child_relids, Relids top_parent_relids);
 
+extern List *adjust_and_expand_partition_tlist(TupleDesc parentDesc,
+								  TupleDesc partitionDesc,
+								  char *partitionRelname,
+								  int parentVarno,
+								  int parentOid,
+								  List *targetlist);
+
 #endif							/* PREP_H */
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..a9677f06e6 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,67 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 99be9ac6e9..f53ac6bdf1 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2328,6 +2328,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
 NOTICE:  trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
 NOTICE:  trigger = my_table_insert_trig, new table = <NULL>
 --
+-- now using a partitioned table
+--
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL>
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB)
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (1,AAA), (2,BBB), new table = (1,AAA:AAA), (2,BBB:BBB)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD)
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = <NULL>
+drop table iocdu_tt_parted;
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..73122479a3 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,59 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 3354f4899f..3773c6bc98 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1773,6 +1773,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
   update set b = my_table.b || ':' || excluded.b;
 
 --
+-- now using a partitioned table
+--
+
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+drop table iocdu_tt_parted;
+
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
-- 
2.11.0

#25Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#24)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/20 13:30, Amit Langote wrote:

I have incorporated your patch in the main patch after updating the
comments a bit. Also, now that 6666ee49f49 is in [1], the transition
table related tests I proposed yesterday pass nicely. Instead of posting
as a separate patch, I have merged it with the main patch. So now that
planner refactoring is unnecessary, attached is just one patch.

Sorry, I forgot to remove a hunk in the patch affecting
src/include/optimizer/prep.h. Fixed in the attached updated version.

Thanks,
Amit

Attachments:

v7-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchtext/plain; charset=UTF-8; name=v7-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchDownload
From 793b407545b0d24715f0d44a9a546689e9a4282a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 20 Mar 2018 10:09:38 +0900
Subject: [PATCH v7] Fix ON CONFLICT to work with partitioned tables

Author: Amit Langote, Alvaro Herrera, Etsuro Fujita
---
 doc/src/sgml/ddl.sgml                         |  15 --
 src/backend/catalog/heap.c                    |   2 +-
 src/backend/catalog/partition.c               |  62 +++++--
 src/backend/commands/tablecmds.c              |  15 +-
 src/backend/executor/execPartition.c          | 237 ++++++++++++++++++++++++--
 src/backend/executor/nodeModifyTable.c        |  93 ++++++++--
 src/backend/parser/analyze.c                  |   7 -
 src/include/catalog/partition.h               |   2 +-
 src/include/executor/execPartition.h          |  10 ++
 src/include/nodes/execnodes.h                 |   1 +
 src/test/regress/expected/insert_conflict.out |  73 ++++++--
 src/test/regress/expected/triggers.out        |  33 ++++
 src/test/regress/sql/insert_conflict.sql      |  64 +++++--
 src/test/regress/sql/triggers.sql             |  33 ++++
 14 files changed, 551 insertions(+), 96 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3d80ff9e5b..13489162df 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1776,7 +1776,7 @@ heap_drop_with_catalog(Oid relid)
 		elog(ERROR, "cache lookup failed for relation %u", relid);
 	if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
 	{
-		parentOid = get_partition_parent(relid);
+		parentOid = get_partition_parent(relid, false);
 		LockRelationOid(parentOid, AccessExclusiveLock);
 
 		/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 786c05df73..8dc73ae092 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1384,24 +1385,43 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent or topmost ancestor of given relation
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance hierarchy.
  *
  * Note: Because this function assumes that the relation whose OID is passed
  * as an argument will have precisely one parent, it should only be called
  * when it is known that the relation is a partition.
  */
 Oid
-get_partition_parent(Oid relid)
+get_partition_parent(Oid relid, bool getroot)
+{
+	Relation	inhRel;
+	Oid		parentOid;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, getroot);
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find parent of relation %u", relid);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return parentOid;
+}
+
+/*
+ * get_partition_parent_recurse
+ *		Recursive part of get_partition_parent
+ */
+static Oid
+get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot)
 {
-	Form_pg_inherits form;
-	Relation	catalogRelation;
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
-	Oid			result;
-
-	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
+	Oid			result = InvalidOid;
 
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
@@ -1412,18 +1432,26 @@ get_partition_parent(Oid relid)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	/* Obtain the direct parent, and release resources before recursing */
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
-
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
-
+	if (HeapTupleIsValid(tuple))
+		result = ((Form_pg_inherits) GETSTRUCT(tuple))->inhparent;
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
+
+	/*
+	 * If we were asked to recurse, do so now.  Except that if we didn't get a
+	 * valid parent, then the 'relid' argument was already the topmost parent,
+	 * so return that.
+	 */
+	if (getroot)
+	{
+		if (OidIsValid(result))
+			return get_partition_parent_recurse(inhRel, result, getroot);
+		else
+			return relid;
+	}
 
 	return result;
 }
@@ -2505,7 +2533,7 @@ generate_partition_qual(Relation rel)
 		return copyObject(rel->rd_partcheck);
 
 	/* Grab at least an AccessShareLock on the parent table */
-	parent = heap_open(get_partition_parent(RelationGetRelid(rel)),
+	parent = heap_open(get_partition_parent(RelationGetRelid(rel), false),
 					   AccessShareLock);
 
 	/* Get pg_class.relpartbound */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 218224a156..6003afdd03 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1292,7 +1292,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	 */
 	if (is_partition && relOid != oldRelOid)
 	{
-		state->partParentOid = get_partition_parent(relOid);
+		state->partParentOid = get_partition_parent(relOid, false);
 		if (OidIsValid(state->partParentOid))
 			LockRelationOid(state->partParentOid, AccessExclusiveLock);
 	}
@@ -5843,7 +5843,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	/* If rel is partition, shouldn't drop NOT NULL if parent has the same */
 	if (rel->rd_rel->relispartition)
 	{
-		Oid			parentId = get_partition_parent(RelationGetRelid(rel));
+		Oid			parentId = get_partition_parent(RelationGetRelid(rel),
+													false);
 		Relation	parent = heap_open(parentId, AccessShareLock);
 		TupleDesc	tupDesc = RelationGetDescr(parent);
 		AttrNumber	parent_attnum;
@@ -14360,7 +14361,7 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 		if (!has_superclass(idxid))
 			continue;
 
-		Assert((IndexGetRelation(get_partition_parent(idxid), false) ==
+		Assert((IndexGetRelation(get_partition_parent(idxid, false), false) ==
 			   RelationGetRelid(rel)));
 
 		idx = index_open(idxid, AccessExclusiveLock);
@@ -14489,7 +14490,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 
 	/* Silently do nothing if already in the right state */
 	currParent = !has_superclass(partIdxId) ? InvalidOid :
-		get_partition_parent(partIdxId);
+		get_partition_parent(partIdxId, false);
 	if (currParent != RelationGetRelid(parentIdx))
 	{
 		IndexInfo  *childInfo;
@@ -14722,8 +14723,10 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		/* make sure we see the validation we just did */
 		CommandCounterIncrement();
 
-		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
-		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
+		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx),
+										   false);
+		parentTblId = get_partition_parent(RelationGetRelid(partedTbl),
+										   false);
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
 		parentTbl = relation_open(parentTblId, AccessExclusiveLock);
 		Assert(!parentIdx->rd_index->indisvalid);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ce9a4e16cf..579cb3ddb9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -15,10 +15,12 @@
 #include "postgres.h"
 
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_type.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -36,6 +38,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 Datum *values,
 									 bool *isnull,
 									 int maxfieldlen);
+static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap *map);
 
 /*
  * ExecSetupPartitionTupleRouting - sets up information needed during
@@ -64,6 +67,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -74,20 +79,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
-	if (mtstate && mtstate->operation == CMD_UPDATE)
+	if (node && node->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		update_rri = mtstate->resultRelInfo;
 		num_update_rri = list_length(node->plans);
 		proute->subplan_partition_offsets =
@@ -109,6 +110,21 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	 */
 	proute->partition_tuple_slot = MakeTupleTableSlot(NULL);
 
+	/*
+	 * We might need these arrays for conflict checking and handling the
+	 * DO UPDATE action
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		/* Indexes are always needed. */
+		proute->partition_arbiter_indexes =
+			(List **) palloc(nparts * sizeof(List *));
+		/* Only needed for the DO UPDATE action. */
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+			proute->partition_conflproj_tdescs =
+				(TupleDesc *) palloc(nparts * sizeof(TupleDesc));
+	}
+
 	i = 0;
 	foreach(cell, leaf_parts)
 	{
@@ -475,9 +491,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +500,141 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * Initialize information about this partition that's needed to handle
+	 * the ON CONFLICT clause.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell *lc;
+		List	 *my_arbiterindexes = NIL;
+
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+		{
+			List	 *onconflset;
+			TupleDesc tupDesc;
+
+			Assert(node->onConflictSet != NIL);
+
+			/*
+			 * If partition's tuple descriptor differs from the root parent,
+			 * we need to adjust the onConflictSet target list to account for
+			 * differences in attribute numbers.
+			 */
+			if (map != NULL)
+			{
+				/*
+				 * First convert Vars to contain partition's atttribute
+				 * numbers.
+				 */
+
+				/* Convert Vars referencing EXCLUDED pseudo-relation. */
+				onconflset = map_partition_varattnos(node->onConflictSet,
+													 INNER_VAR,
+													 partrel,
+													 firstResultRel, NULL);
+				/* Convert Vars referencing main target relation. */
+				onconflset = map_partition_varattnos(onconflset,
+													 firstVarno,
+													 partrel,
+													 firstResultRel, NULL);
+
+				/*
+				 * The original list wouldn't contain entries for the
+				 * partition's dropped attributes, which we must be accounted
+				 * for because targetlist must have all the attributes of the
+				 * underlying table including the dropped ones.  Fix that and
+				 * reorder target list entries if their resnos change as a
+				 * result of the adjustment.
+				 */
+				onconflset = adjust_onconflictset_tlist(onconflset, map);
+			}
+			else
+				/* Just reuse the original one. */
+				onconflset = node->onConflictSet;
+
+			/*
+			 * We must set mtstate->mt_conflproj's tuple descriptor to this
+			 * before trying to use it for projection.
+			 */
+			tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+			PinTupleDesc(tupDesc);
+			proute->partition_conflproj_tdescs[partidx] = tupDesc;
+
+			leaf_part_rri->ri_onConflictSetProj =
+					ExecBuildProjectionInfo(onconflset, econtext,
+											mtstate->mt_conflproj,
+											&mtstate->ps, partrelDesc);
+
+			if (node->onConflictWhere)
+			{
+				if (map != NULL)
+				{
+					/*
+					 * Convert the Vars to contain partition's atttribute
+					 * numbers
+					 */
+					List *onconflwhere;
+
+					/* Convert Vars referencing EXCLUDED pseudo-relation. */
+					onconflwhere = map_partition_varattnos((List *)
+														node->onConflictWhere,
+														INNER_VAR,
+														partrel,
+														firstResultRel, NULL);
+					/* Convert Vars referencing main target relation. */
+					onconflwhere = map_partition_varattnos((List *)
+														onconflwhere,
+														firstVarno,
+														partrel,
+														firstResultRel, NULL);
+					leaf_part_rri->ri_onConflictSetWhere =
+						ExecInitQual(onconflwhere, &mtstate->ps);
+				}
+				else
+					/* Just reuse the original one. */
+					leaf_part_rri->ri_onConflictSetWhere =
+						resultRelInfo->ri_onConflictSetWhere;
+			}
+		}
+
+		/* Initialize arbiter indexes list, if any. */
+		foreach(lc, ((ModifyTable *) mtstate->ps.plan)->arbiterIndexes)
+		{
+			Oid		parentArbiterIndexOid = lfirst_oid(lc);
+			int		i;
+
+			/*
+			 * Find parentArbiterIndexOid's child in this partition and add it
+			 * to my_arbiterindexes.
+			 */
+			for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+			{
+				Relation index = leaf_part_rri->ri_IndexRelationDescs[i];
+				Oid		 indexOid = RelationGetRelid(index);
+
+				if (parentArbiterIndexOid ==
+					get_partition_parent(indexOid, true))
+					my_arbiterindexes = lappend_oid(my_arbiterindexes,
+													indexOid);
+			}
+		}
+
+		/*
+		 * Use this list instead of the original one containing parent's
+		 * indexes.
+		 */
+		proute->partition_arbiter_indexes[partidx] = my_arbiterindexes;
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
@@ -946,3 +1094,66 @@ ExecBuildSlotPartitionKeyDescription(Relation rel,
 
 	return buf.data;
 }
+
+/*
+ * Adjust the targetlist entries of an inherited ON CONFLICT DO UPDATE
+ * operation for a given partition
+ *
+ * The expressions have already been fixed, but we have to make sure that the
+ * target resnos match the partition.  In some cases, this can force us to
+ * re-order the tlist to preserve resno ordering.
+ *
+ * Scribbles on the input tlist, so callers must make sure to make a copy
+ * before passing it to us.
+ */
+static List *
+adjust_onconflictset_tlist(List *tlist, TupleConversionMap *map)
+{
+	List	   *new_tlist = NIL;
+	TupleDesc	tupdesc = map->outdesc;
+	AttrNumber *attrMap = map->attrMap;
+	int			numattrs = tupdesc->natts;
+	int			attrno;
+
+	for (attrno = 1; attrno <= numattrs; attrno++)
+	{
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+		TargetEntry *tle;
+
+		if (attrMap[attrno - 1] != 0)
+		{
+			Assert(!att_tup->attisdropped);
+
+			/* Get the corresponding tlist entry from the given tlist */
+			tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1);
+
+			/* Get the resno right */
+			if (tle->resno != attrno)
+				tle->resno = attrno;
+		}
+		else
+		{
+			Node	   *expr;
+
+			Assert(att_tup->attisdropped);
+
+			/* Insert NULL for dropped column */
+			expr = (Node *) makeConst(INT4OID,
+									  -1,
+									  InvalidOid,
+									  sizeof(int32),
+									  (Datum) 0,
+									  true, /* isnull */
+									  true /* byval */ );
+
+			tle = makeTargetEntry((Expr *) expr,
+								  attrno,
+								  pstrdup(NameStr(att_tup->attname)),
+								  false);
+		}
+
+		new_tlist = lappend(new_tlist, tle);
+	}
+
+	return new_tlist;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4fa2d7265f..29f155e3a5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -56,6 +56,7 @@
 
 static bool ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 ResultRelInfo *resultRelInfo,
+					 TupleDesc onConflictSetTupdesc,
 					 ItemPointer conflictTid,
 					 TupleTableSlot *planSlot,
 					 TupleTableSlot *excludedSlot,
@@ -66,7 +67,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot);
+						TupleTableSlot *slot,
+						int *partition_index);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
@@ -264,6 +266,7 @@ ExecInsert(ModifyTableState *mtstate,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
+		   int partition_index,
 		   bool canSetTag)
 {
 	HeapTuple	tuple;
@@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
 			ItemPointerData conflictTid;
 			bool		specConflict;
 			List	   *arbiterIndexes;
+			PartitionTupleRouting *proute =
+										mtstate->mt_partition_tuple_routing;
 
-			arbiterIndexes = node->arbiterIndexes;
+			/* Use the appropriate list of arbiter indexes. */
+			if (mtstate->mt_partition_tuple_routing != NULL)
+			{
+				Assert(partition_index >= 0 && proute != NULL);
+				arbiterIndexes =
+						proute->partition_arbiter_indexes[partition_index];
+			}
+			else
+				arbiterIndexes = node->arbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -451,8 +464,20 @@ ExecInsert(ModifyTableState *mtstate,
 					 * tuple.
 					 */
 					TupleTableSlot *returning = NULL;
+					TupleDesc	onconfl_tupdesc;
+
+					/* Use the appropriate tuple descriptor. */
+					if (mtstate->mt_partition_tuple_routing != NULL)
+					{
+						Assert(partition_index >= 0 && proute != NULL);
+						onconfl_tupdesc =
+						  proute->partition_conflproj_tdescs[partition_index];
+					}
+					else
+						onconfl_tupdesc = mtstate->mt_conflproj_tupdesc;
 
 					if (ExecOnConflictUpdate(mtstate, resultRelInfo,
+											 onconfl_tupdesc,
 											 &conflictTid, planSlot, slot,
 											 estate, canSetTag, &returning))
 					{
@@ -1052,10 +1077,23 @@ lreplace:;
 			bool		tuple_deleted;
 			TupleTableSlot *ret_slot;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
+			int			map_index,
+						partition_index;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -1125,10 +1163,11 @@ lreplace:;
 			 */
 			Assert(mtstate->rootResultRelInfo != NULL);
 			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+										   mtstate->rootResultRelInfo, slot,
+										   &partition_index);
 
 			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
+								  estate, partition_index, canSetTag);
 
 			/* Revert ExecPrepareTupleRouting's node change. */
 			estate->es_result_relation_info = resultRelInfo;
@@ -1304,6 +1343,7 @@ lreplace:;
 static bool
 ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 ResultRelInfo *resultRelInfo,
+					 TupleDesc onConflictSetTupdesc,
 					 ItemPointer conflictTid,
 					 TupleTableSlot *planSlot,
 					 TupleTableSlot *excludedSlot,
@@ -1419,6 +1459,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
 
 	/* Store target's existing tuple in the state's dedicated slot */
+	ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation));
 	ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
 
 	/*
@@ -1462,6 +1503,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	}
 
 	/* Project the new tuple version */
+	ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc);
 	ExecProject(resultRelInfo->ri_onConflictSetProj);
 
 	/*
@@ -1631,13 +1673,16 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * In mtstate, transition capture changes may also need to be reverted.
  *
  * Returns a slot holding the tuple of the partition rowtype.
+ * *partition_index is set to the index of the partition that the input
+ * tuple is routed to.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						int *partition_index)
 {
 	int			partidx;
 	ResultRelInfo *partrel;
@@ -1720,6 +1765,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 							  proute->partition_tuple_slot,
 							  &slot);
 
+	*partition_index = partidx;
 	return slot;
 }
 
@@ -2065,16 +2111,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
+			{
+				int		partition_index;
+
 				/* Prepare for tuple routing if needed. */
 				if (proute)
 					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
+												   resultRelInfo, slot,
+												   &partition_index);
 				slot = ExecInsert(node, slot, planSlot,
-								  estate, node->canSetTag);
+								  estate, partition_index, node->canSetTag);
 				/* Revert ExecPrepareTupleRouting's state change. */
 				if (proute)
 					estate->es_result_relation_info = resultRelInfo;
 				break;
+			}
 			case CMD_UPDATE:
 				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
 								  &node->mt_epqstate, estate, node->canSetTag);
@@ -2178,8 +2229,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		subplan = (Plan *) lfirst(l);
 
 		/* Initialize the usesFdwDirectModify flag */
-		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
-															  node->fdwDirectModifyPlans);
+		resultRelInfo->ri_usesFdwDirectModify =
+			bms_is_member(i, node->fdwDirectModifyPlans);
 
 		/*
 		 * Verify result relation is a valid target for the current operation
@@ -2252,7 +2303,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
 		(operation == CMD_INSERT || update_tuple_routing_needed))
 		mtstate->mt_partition_tuple_routing =
-						ExecSetupPartitionTupleRouting(mtstate, rel);
+			ExecSetupPartitionTupleRouting(mtstate, rel);
 
 	/*
 	 * Build state for collecting transition tuples.  This requires having a
@@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/* initialize slot for the existing tuple */
-		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		/*
+		 * Initialize slot for the existing tuple.  We determine which
+		 * tupleDesc to use for this after we have determined which relation
+		 * the insert/update will be applied to, possibly after performing
+		 * tuple routing.
+		 */
+		mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		/* create target slot for UPDATE SET projection */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 								 relationDesc->tdhasoid);
+		PinTupleDesc(tupDesc);
+		mtstate->mt_conflproj_tupdesc = tupDesc;
+
+		/*
+		 * Just like the "existing tuple" slot, we'll defer deciding which
+		 * tupleDesc to use for this slot to a point where tuple routing has
+		 * been performed.
+		 */
 		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* build UPDATE SET projection state */
 		resultRelInfo->ri_onConflictSetProj =
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index cf1a34e41a..a4b5aaef44 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1026,13 +1026,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..70ddb225a1 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -51,7 +51,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
-extern Oid	get_partition_parent(Oid relid);
+extern Oid	get_partition_parent(Oid relid, bool getroot);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 03a599ad57..93f490233e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -90,6 +90,14 @@ typedef struct PartitionDispatchData *PartitionDispatch;
  *								given leaf partition's rowtype after that
  *								partition is chosen for insertion by
  *								tuple-routing.
+ * partition_conflproj_tdescs	Array of TupleDescs per partition, each
+ *								describing the record type of the ON CONFLICT
+ *								DO UPDATE SET target list as applied to a
+ *								given partition
+ * partition_arbiter_indexes	Array of Lists with each slot containing the
+ *								list of OIDs of a given partition's indexes
+ *								that are to be used as arbiter indexes for
+ *								ON CONFLICT checking
  *-----------------------
  */
 typedef struct PartitionTupleRouting
@@ -106,6 +114,8 @@ typedef struct PartitionTupleRouting
 	int			num_subplan_partition_offsets;
 	TupleTableSlot *partition_tuple_slot;
 	TupleTableSlot *root_tuple_slot;
+	TupleDesc *partition_conflproj_tdescs;
+	List	  **partition_arbiter_indexes;
 } PartitionTupleRouting;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d9e591802f..88ef2b71f3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -993,6 +993,7 @@ typedef struct ModifyTableState
 	List	   *mt_excludedtlist;	/* the excluded pseudo relation's tlist  */
 	TupleTableSlot *mt_conflproj;	/* CONFLICT ... SET ... projection target */
 
+	TupleDesc	mt_conflproj_tupdesc; /* tuple descriptor for it */
 	/* Tuple-routing support info */
 	struct PartitionTupleRouting *mt_partition_tuple_routing;
 
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..a9677f06e6 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,67 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 99be9ac6e9..f53ac6bdf1 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2328,6 +2328,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
 NOTICE:  trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
 NOTICE:  trigger = my_table_insert_trig, new table = <NULL>
 --
+-- now using a partitioned table
+--
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL>
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB)
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (1,AAA), (2,BBB), new table = (1,AAA:AAA), (2,BBB:BBB)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD)
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = <NULL>
+drop table iocdu_tt_parted;
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..73122479a3 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,59 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 3354f4899f..3773c6bc98 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1773,6 +1773,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
   update set b = my_table.b || ':' || excluded.b;
 
 --
+-- now using a partitioned table
+--
+
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+drop table iocdu_tt_parted;
+
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
-- 
2.11.0

#26Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#24)
Re: ON CONFLICT DO UPDATE for partitioned tables

(2018/03/20 13:30), Amit Langote wrote:

On 2018/03/19 21:59, Etsuro Fujita wrote:

(2018/03/18 13:17), Alvaro Herrera wrote:

Alvaro Herrera wrote:
The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing. I fear we may be doing
redundant and/or misplaced work. I'll look into it next week.

I'm still reviewing the patches, but I really agree on that point. As
Pavan mentioned upthread, the onConflictSet tlist for the root parent,
from which we create a translated onConflictSet tlist for a partition,
would have already been processed by expand_targetlist() to contain all
missing columns as well, so I think we could create the tlist for the
partition by simply re-ordering the expression-converted tlist (ie,
conv_setproj) based on the conversion map for the partition. The Attached
defines a function for that, which could be called, instead of calling
adjust_and_expand_partition_tlist(). This would allow us to get rid of
planner changes from the patches. Maybe I'm missing something, though.

Thanks for the patch. I can confirm your proposed
adjust_onconflictset_tlist() is enough to replace adjust_inherited_tlist()
+ expand_targetlist() combination (aka
adjust_and_expand_partition_tlist()), thus rendering the planner changes
in this patch unnecessary. I tested it with a partition tree involving
partitions of varying attribute numbers (dropped columns included) and it
seems to work as expected (as also exercised in regression tests) as shown
below.

Thanks for testing!

I have incorporated your patch in the main patch after updating the
comments a bit. Also, now that 6666ee49f49 is in [1], the transition
table related tests I proposed yesterday pass nicely. Instead of posting
as a separate patch, I have merged it with the main patch. So now that
planner refactoring is unnecessary, attached is just one patch.

Here are comments on executor changes in (the latest version of) the patch:

@@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
  			ItemPointerData conflictTid;
  			bool		specConflict;
  			List	   *arbiterIndexes;
+			PartitionTupleRouting *proute =
+										mtstate->mt_partition_tuple_routing;
-			arbiterIndexes = node->arbiterIndexes;
+			/* Use the appropriate list of arbiter indexes. */
+			if (mtstate->mt_partition_tuple_routing != NULL)
+			{
+				Assert(partition_index >= 0 && proute != NULL);
+				arbiterIndexes =
+						proute->partition_arbiter_indexes[partition_index];
+			}
+			else
+				arbiterIndexes = node->arbiterIndexes;

To handle both cases the same way, I wonder if it would be better to
have the arbiterindexes list in ResultRelInfo as well, as mentioned by
Alvaro upthread, or to re-add mt_arbiterindexes as before and set it to
proute->partition_arbiter_indexes[partition_index] before we get here,
maybe in ExecPrepareTupleRouting, in the case of tuple routing.

ExecOnConflictUpdate(ModifyTableState *mtstate,
ResultRelInfo *resultRelInfo,
+ TupleDesc onConflictSetTupdesc,
ItemPointer conflictTid,
TupleTableSlot *planSlot,
TupleTableSlot *excludedSlot,
@@ -1419,6 +1459,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
ExecCheckHeapTupleVisible(estate, &tuple, buffer);

/* Store target's existing tuple in the state's dedicated slot */
+ ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation));
ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);

/*
@@ -1462,6 +1503,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
}

/* Project the new tuple version */
+ ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc);
ExecProject(resultRelInfo->ri_onConflictSetProj);

Can we do ExecSetSlotDescriptor for mtstate->mt_existing and
mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
routing? That would make the API changes to ExecOnConflictUpdate
unnecessary.

@@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
econtext = mtstate->ps.ps_ExprContext;
relationDesc = resultRelInfo->ri_RelationDesc->rd_att;

-		/* initialize slot for the existing tuple */
-		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		/*
+		 * Initialize slot for the existing tuple.  We determine which
+		 * tupleDesc to use for this after we have determined which relation
+		 * the insert/update will be applied to, possibly after performing
+		 * tuple routing.
+		 */
+		mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
  		/* carried forward solely for the benefit of explain */
  		mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState 
*estate, int eflags)
  		/* create target slot for UPDATE SET projection */
  		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
  								 relationDesc->tdhasoid);
+		PinTupleDesc(tupDesc);
+		mtstate->mt_conflproj_tupdesc = tupDesc;
+
+		/*
+		 * Just like the "existing tuple" slot, we'll defer deciding which
+		 * tupleDesc to use for this slot to a point where tuple routing has
+		 * been performed.
+		 */
  		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state, NULL);

If we do ExecInitExtraTupleSlot for mtstate->mt_existing and
mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
routing, as said above, we wouldn't need this changes. I think doing
that only in the case of tuple routing and keeping this as-is would be
better because that would save cycles in the normal case.

I'll look at other parts of the patch next.

Best regards,
Etsuro Fujita

#27Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Amit Langote (#25)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Tue, Mar 20, 2018 at 10:09 AM, Amit Langote <
Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/03/20 13:30, Amit Langote wrote:

I have incorporated your patch in the main patch after updating the
comments a bit. Also, now that 6666ee49f49 is in [1], the transition
table related tests I proposed yesterday pass nicely. Instead of posting
as a separate patch, I have merged it with the main patch. So now that
planner refactoring is unnecessary, attached is just one patch.

Sorry, I forgot to remove a hunk in the patch affecting
src/include/optimizer/prep.h. Fixed in the attached updated version.

Thanks for writing a new version. A few comments:

<listitem>
<para>
- Using the <literal>ON CONFLICT</literal> clause with partitioned
tables
- will cause an error if the conflict target is specified (see
- <xref linkend="sql-on-conflict" /> for more details on how the
clause
- works). Therefore, it is not possible to specify
- <literal>DO UPDATE</literal> as the alternative action, because
- specifying the conflict target is mandatory in that case. On the
other
- hand, specifying <literal>DO NOTHING</literal> as the alternative
action
- works fine provided the conflict target is not specified. In that
case,
- unique constraints (or exclusion constraints) of the individual leaf
- partitions are considered.
- </para>
- </listitem>

We should document it somewhere that partition key update is not supported
by
ON CONFLICT DO UPDATE

 /*
  * get_partition_parent
+ * Obtain direct parent or topmost ancestor of given relation
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning
pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance
hierarchy.
  *
  * Note: Because this function assumes that the relation whose OID is
passed
  * as an argument will have precisely one parent, it should only be called
  * when it is known that the relation is a partition.
  */

Given that most callers only look for immediate parent, I wonder if it makes
sense to have a new function, get_partition_root(), instead of changing
signature of the current function. That will reduce foot-print of this patch
quite a lot.

@@ -36,6 +38,7 @@ static char
*ExecBuildSlotPartitionKeyDescription(Relation rel,
  Datum *values,
  bool *isnull,
  int maxfieldlen);
+static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap
*map);

We should name this function in a more generic way, given that it's going
to be
used for other things too. What about adjust_partition_tlist?

+
+ /*
+ * If partition's tuple descriptor differs from the root parent,
+ * we need to adjust the onConflictSet target list to account for
+ * differences in attribute numbers.
+ */
+ if (map != NULL)
+ {
+ /*
+ * First convert Vars to contain partition's atttribute
+ * numbers.
+ */
+
+ /* Convert Vars referencing EXCLUDED pseudo-relation. */
+ onconflset = map_partition_varattnos(node->onConflictSet,
+ INNER_VAR,
+ partrel,
+ firstResultRel, NULL);

Are we not modifying node->onConflictSet in place? Or does
map_partition_varattnos() create a fresh copy before scribbling on the
input?
If it's former then I guess that's a problem. If it's latter then we ought
to
have comments explaining that.

+ tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+ PinTupleDesc(tupDesc);

Why do we need to pin the descriptor? If we do need, why don't we need
corresponding ReleaseTupDesc() call?

I am still confused if the partition_conflproj_tdescs really belongs to
PartitionTupleRouting or should it be a part of the ResultRelInfo. FWIW for
the
MERGE patch, I moved everything to a new struct and made it part of the
ResultRelInfo. If no re-mapping is necessary, we can just point to the
struct
in the root relation's ResultRelInfo. Otherwise create and populate a new
one
in the partition relation's ResultRelInfo.

+ leaf_part_rri->ri_onConflictSetWhere =
+ ExecInitQual(onconflwhere, &mtstate->ps);
+ }

So ri_onConflictSetWhere and ri_onConflictSetProj are part of the
ResultRelInfo. Why not move mt_conflproj_tupdesc,
partition_conflproj_tdescs,
partition_arbiter_indexes etc to the ResultRelInfo as well?

+
+/*
+ * Adjust the targetlist entries of an inherited ON CONFLICT DO UPDATE
+ * operation for a given partition
+ *

As I said above, we should disassociate this function from ON CONFLICT DO
UPDATE and rather have it as a general purpose facility.

+ * The expressions have already been fixed, but we have to make sure that
the
+ * target resnos match the partition.  In some cases, this can force us to
+ * re-order the tlist to preserve resno ordering.
+ *

Can we have some explanation regarding how the targetlist is reordered? I
know
the function does that by updating the resno in place, but some explanation
would help. Also, should we add an assertion-build check to confirm that the
resultant list is actually ordered?

@@ -66,7 +67,8 @@ static TupleTableSlot
*ExecPrepareTupleRouting(ModifyTableState *mtstate,
EState *estate,
PartitionTupleRouting *proute,
ResultRelInfo *targetRelInfo,
- TupleTableSlot *slot);
+ TupleTableSlot *slot,
+ int *partition_index);
static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate);
static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
@@ -264,6 +266,7 @@ ExecInsert(ModifyTableState *mtstate,
TupleTableSlot *slot,
TupleTableSlot *planSlot,
EState *estate,
+ int partition_index,
bool canSetTag)
{
HeapTuple tuple;

If we move the list of arbiter indexes and the tuple desc to ResultRelInfo,
as
suggested above, I think we can avoid making any API changes to
ExecPrepareTupleRouting and ExecInsert.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#28Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Etsuro Fujita (#26)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

Fujita-san, Pavan,

Thank you both for reviewing. Replying to both emails here.

On 2018/03/20 20:53, Etsuro Fujita wrote:

Here are comments on executor changes in (the latest version of) the patch:

@@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
             ItemPointerData conflictTid;
             bool        specConflict;
             List       *arbiterIndexes;
+            PartitionTupleRouting *proute =
+                                        mtstate->mt_partition_tuple_routing;
-            arbiterIndexes = node->arbiterIndexes;
+            /* Use the appropriate list of arbiter indexes. */
+            if (mtstate->mt_partition_tuple_routing != NULL)
+            {
+                Assert(partition_index >= 0 && proute != NULL);
+                arbiterIndexes =
+                        proute->partition_arbiter_indexes[partition_index];
+            }
+            else
+                arbiterIndexes = node->arbiterIndexes;

To handle both cases the same way, I wonder if it would be better to have
the arbiterindexes list in ResultRelInfo as well, as mentioned by Alvaro
upthread, or to re-add mt_arbiterindexes as before and set it to
proute->partition_arbiter_indexes[partition_index] before we get here,
maybe in ExecPrepareTupleRouting, in the case of tuple routing.

It's a good idea. I somehow missed that Alvaro had already mentioned it.

In HEAD, we now have ri_onConflictSetProj and ri_onConflictSetWhere. I
propose we name the field ri_onConflictArbiterIndexes as done in the
updated patch.

 ExecOnConflictUpdate(ModifyTableState *mtstate,
                      ResultRelInfo *resultRelInfo,
+                     TupleDesc onConflictSetTupdesc,
                      ItemPointer conflictTid,
                      TupleTableSlot *planSlot,
                      TupleTableSlot *excludedSlot,
@@ -1419,6 +1459,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
     ExecCheckHeapTupleVisible(estate, &tuple, buffer);

     /* Store target's existing tuple in the state's dedicated slot */
+    ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation));
     ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);

     /*
@@ -1462,6 +1503,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
     }

     /* Project the new tuple version */
+    ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc);
     ExecProject(resultRelInfo->ri_onConflictSetProj);

Can we do ExecSetSlotDescriptor for mtstate->mt_existing and
mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
routing?  That would make the API changes to ExecOnConflictUpdate
unnecessary.

That's a good idea too, so done.

@@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
         econtext = mtstate->ps.ps_ExprContext;
         relationDesc = resultRelInfo->ri_RelationDesc->rd_att;

-        /* initialize slot for the existing tuple */
-        mtstate->mt_existing =
-            ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+        /*
+         * Initialize slot for the existing tuple.  We determine which
+         * tupleDesc to use for this after we have determined which relation
+         * the insert/update will be applied to, possibly after performing
+         * tuple routing.
+         */
+        mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state,
NULL);
         /* carried forward solely for the benefit of explain */
         mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
         /* create target slot for UPDATE SET projection */
         tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
                                  relationDesc->tdhasoid);
+        PinTupleDesc(tupDesc);
+        mtstate->mt_conflproj_tupdesc = tupDesc;
+
+        /*
+         * Just like the "existing tuple" slot, we'll defer deciding which
+         * tupleDesc to use for this slot to a point where tuple routing has
+         * been performed.
+         */
         mtstate->mt_conflproj =
-            ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+            ExecInitExtraTupleSlot(mtstate->ps.state, NULL);

If we do ExecInitExtraTupleSlot for mtstate->mt_existing and
mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
routing, as said above, we wouldn't need this changes.  I think doing that
only in the case of tuple routing and keeping this as-is would be better
because that would save cycles in the normal case.

Hmm, I think we shouldn't be doing ExecInitExtraTupleSlot in
ExecPrepareTupleRouting, because we shouldn't have more than one instance
of mtstate->mt_existing and mtstate->mt_conflproj slots.

As you also said above, I think you meant to say here that we do
ExecInitExtraTupleSlot only once for both mtstate->mt_existing and
mtstate->mt_conflproj in ExecInitModifyTable and only do
ExecSetSlotDescriptor in ExecPrepareTupleRouting. I have changed it so
that ExecInitModifyTable now both creates the slot and sets the descriptor
for non-tuple-routing cases and only creates but doesn't set the
descriptor in the tuple-routing case.

For ExecPrepareTupleRouting to be able to access the tupDesc of the
onConflictSet target list, I've added ri_onConflictSetProjTupDesc which is
set by ExecInitPartitionInfo on first call for a give partition. This is
also suggested by Pavan in his review.

Considering all of that, both mt_conflproj_tupdesc and
partition_conflproj_tdescs (the array in PartitionTupleRouting) no longer
exist in the patch. And since arbiterIndexes has been moved into
ResultRelInfo too, partition_arbiter_indexes (the array in
PartitionTupleRouting) is gone too.

On 2018/03/22 13:34, Pavan Deolasee wrote:

Thanks for writing a new version. A few comments:

<listitem>
<para>
- Using the <literal>ON CONFLICT</literal> clause with partitioned
tables
- will cause an error if the conflict target is specified (see
- <xref linkend="sql-on-conflict" /> for more details on how the
clause
- works). Therefore, it is not possible to specify
- <literal>DO UPDATE</literal> as the alternative action, because
- specifying the conflict target is mandatory in that case. On the
other
- hand, specifying <literal>DO NOTHING</literal> as the alternative
action
- works fine provided the conflict target is not specified. In that
case,
- unique constraints (or exclusion constraints) of the individual leaf
- partitions are considered.
- </para>
- </listitem>

We should document it somewhere that partition key update is not supported
by
ON CONFLICT DO UPDATE

Agreed. I have added a line on INSERT reference page to mention this
limitation.

/*
* get_partition_parent
+ * Obtain direct parent or topmost ancestor of given relation
*
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns direct inheritance parent of a partition by scanning
pg_inherits;
+ * or, if 'getroot' is true, the topmost parent in the inheritance
hierarchy.
*
* Note: Because this function assumes that the relation whose OID is
passed
* as an argument will have precisely one parent, it should only be called
* when it is known that the relation is a partition.
*/

Given that most callers only look for immediate parent, I wonder if it makes
sense to have a new function, get_partition_root(), instead of changing
signature of the current function. That will reduce foot-print of this patch
quite a lot.

It seems like a good idea, so done that way.

@@ -36,6 +38,7 @@ static char
*ExecBuildSlotPartitionKeyDescription(Relation rel,
Datum *values,
bool *isnull,
int maxfieldlen);
+static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap
*map);

We should name this function in a more generic way, given that it's going
to be
used for other things too. What about adjust_partition_tlist?

I think that makes sense. We were trying to use adjust_inherited_tlist in
the earlier versions of this patch, so adjust_partition_tlist sounds like
a good name for this piece of code.

+
+ /*
+ * If partition's tuple descriptor differs from the root parent,
+ * we need to adjust the onConflictSet target list to account for
+ * differences in attribute numbers.
+ */
+ if (map != NULL)
+ {
+ /*
+ * First convert Vars to contain partition's atttribute
+ * numbers.
+ */
+
+ /* Convert Vars referencing EXCLUDED pseudo-relation. */
+ onconflset = map_partition_varattnos(node->onConflictSet,
+ INNER_VAR,
+ partrel,
+ firstResultRel, NULL);

Are we not modifying node->onConflictSet in place? Or does
map_partition_varattnos() create a fresh copy before scribbling on the
input?
If it's former then I guess that's a problem. If it's latter then we ought
to
have comments explaining that.

A copy is made before scribbling. Clarified that in the nearby comment.

+ tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+ PinTupleDesc(tupDesc);

Why do we need to pin the descriptor? If we do need, why don't we need
corresponding ReleaseTupDesc() call?

PinTupleDesc was added in the patch as Alvaro had submitted it upthread,
which it wasn't clear to me either why it was needed. Looking at it
closely, it seems we can get rid of it if for the lack of corresponding
ReleaseTupleDesc(). ExecSetSlotDescriptor() seems to take care of pinning
and releasing tuple descriptors that are passed to it. If some
partition's tupDesc remains pinned because it was the last one that was
passed to it, the final ExecResetTupleTable will take care of releasing it.

I have removed the instances of PinTupleDesc in the updated patch, but I'm
not sure why the loose PinTupleDesc() in the previous version of the patch
didn't cause reference leak warnings or some such.

I am still confused if the partition_conflproj_tdescs really belongs to
PartitionTupleRouting or should it be a part of the ResultRelInfo. FWIW for
the
MERGE patch, I moved everything to a new struct and made it part of the
ResultRelInfo. If no re-mapping is necessary, we can just point to the
struct
in the root relation's ResultRelInfo. Otherwise create and populate a new
one
in the partition relation's ResultRelInfo.

+ leaf_part_rri->ri_onConflictSetWhere =
+ ExecInitQual(onconflwhere, &mtstate->ps);
+ }

So ri_onConflictSetWhere and ri_onConflictSetProj are part of the
ResultRelInfo. Why not move mt_conflproj_tupdesc,
partition_conflproj_tdescs,
partition_arbiter_indexes etc to the ResultRelInfo as well?

I have moved both the projection tupdesc and the arbiter indexes list into
ResultRelInfo as I wrote above.

+
+/*
+ * Adjust the targetlist entries of an inherited ON CONFLICT DO UPDATE
+ * operation for a given partition
+ *

As I said above, we should disassociate this function from ON CONFLICT DO
UPDATE and rather have it as a general purpose facility.

OK, have fixed the comment and the name as mentioned above.

+ * The expressions have already been fixed, but we have to make sure that
the
+ * target resnos match the partition.  In some cases, this can force us to
+ * re-order the tlist to preserve resno ordering.
+ *

Can we have some explanation regarding how the targetlist is reordered? I
know
the function does that by updating the resno in place, but some explanation
would help. Also, should we add an assertion-build check to confirm that the
resultant list is actually ordered?

OK, added a comment and also the assertion-build check on the order of
entries.

@@ -66,7 +67,8 @@ static TupleTableSlot
*ExecPrepareTupleRouting(ModifyTableState *mtstate,
EState *estate,
PartitionTupleRouting *proute,
ResultRelInfo *targetRelInfo,
- TupleTableSlot *slot);
+ TupleTableSlot *slot,
+ int *partition_index);
static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate);
static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
@@ -264,6 +266,7 @@ ExecInsert(ModifyTableState *mtstate,
TupleTableSlot *slot,
TupleTableSlot *planSlot,
EState *estate,
+ int partition_index,
bool canSetTag)
{
HeapTuple tuple;

If we move the list of arbiter indexes and the tuple desc to ResultRelInfo,
as
suggested above, I think we can avoid making any API changes to
ExecPrepareTupleRouting and ExecInsert.

Those API changes are no longer part of the patch.

Attached please find an updated version.

Thanks,
Amit

Attachments:

v8-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchtext/plain; charset=UTF-8; name=v8-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchDownload
From edf4d0d6081a00f9eef5d6e8fae1db56ecc0fdeb Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 20 Mar 2018 10:09:38 +0900
Subject: [PATCH v8] Fix ON CONFLICT to work with partitioned tables

Author: Amit Langote, Alvaro Herrera, Etsuro Fujita
---
 doc/src/sgml/ddl.sgml                         |  15 --
 doc/src/sgml/ref/insert.sgml                  |   8 +
 src/backend/catalog/partition.c               |  83 +++++++--
 src/backend/executor/execMain.c               |   5 +
 src/backend/executor/execPartition.c          | 253 ++++++++++++++++++++++++--
 src/backend/executor/nodeModifyTable.c        |  65 ++++++-
 src/backend/parser/analyze.c                  |   7 -
 src/include/catalog/partition.h               |   1 +
 src/include/nodes/execnodes.h                 |   6 +
 src/test/regress/expected/insert_conflict.out |  73 ++++++--
 src/test/regress/expected/triggers.out        |  33 ++++
 src/test/regress/sql/insert_conflict.sql      |  64 ++++++-
 src/test/regress/sql/triggers.sql             |  33 ++++
 13 files changed, 569 insertions(+), 77 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 134092fa9c..62e142fd8e 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -518,6 +518,14 @@ INSERT INTO <replaceable class="parameter">table_name</replaceable> [ AS <replac
     not duplicate each other in terms of attributes constrained by an
     arbiter index or constraint.
    </para>
+
+   <para>
+    Note that it is currently not supported for the
+    <literal>ON CONFLICT DO UPDATE</literal> clause of an
+    <command>INSERT</command> applied to a partitioned table to update the
+    partition key of a conflicting row such that it requires the row be moved
+    to a new partition.
+   </para>
    <tip>
     <para>
      It is often preferable to use unique index inference rather than
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..bfe559490e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1377,6 +1378,7 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent of given relation
  *
  * Returns inheritance parent of a partition by scanning pg_inherits
  *
@@ -1387,14 +1389,59 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 Oid
 get_partition_parent(Oid relid)
 {
-	Form_pg_inherits form;
-	Relation	catalogRelation;
+	Relation	inhRel;
+	Oid		parentOid;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, false);
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find parent of relation %u", relid);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return parentOid;
+}
+
+/*
+ * get_partition_parent
+ *		Obtain topmost ancestor of given relation
+ *
+ * Returns the topmost parent inheritance parent of a partition by scanning
+ * pg_inherits
+ *
+ * Note: Because this function assumes that the relation whose OID is passed
+ * as an argument will have precisely one parent, it should only be called
+ * when it is known that the relation is a partition.
+ */
+Oid
+get_partition_root_parent(Oid relid)
+{
+	Relation	inhRel;
+	Oid		parentOid;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, true);
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find root parent of relation %u", relid);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return parentOid;
+}
+
+/*
+ * get_partition_parent_recurse
+ *		Recursive part of get_partition_parent
+ */
+static Oid
+get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot)
+{
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
-	Oid			result;
-
-	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
+	Oid			result = InvalidOid;
 
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
@@ -1405,18 +1452,26 @@ get_partition_parent(Oid relid)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	/* Obtain the direct parent, and release resources before recursing */
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
-
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
-
+	if (HeapTupleIsValid(tuple))
+		result = ((Form_pg_inherits) GETSTRUCT(tuple))->inhparent;
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
+
+	/*
+	 * If we were asked to recurse, do so now.  Except that if we didn't get a
+	 * valid parent, then the 'relid' argument was already the topmost parent,
+	 * so return that.
+	 */
+	if (getroot)
+	{
+		if (OidIsValid(result))
+			return get_partition_parent_recurse(inhRel, result, getroot);
+		else
+			return relid;
+	}
 
 	return result;
 }
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 91ba939bdc..5439c44770 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1341,11 +1341,16 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 		resultRelInfo->ri_FdwRoutine = GetFdwRoutineForRelation(resultRelationDesc, true);
 	else
 		resultRelInfo->ri_FdwRoutine = NULL;
+
+	/* The following fields are set later if needed */
 	resultRelInfo->ri_FdwState = NULL;
 	resultRelInfo->ri_usesFdwDirectModify = false;
 	resultRelInfo->ri_ConstraintExprs = NULL;
 	resultRelInfo->ri_junkFilter = NULL;
 	resultRelInfo->ri_projectReturning = NULL;
+	resultRelInfo->ri_onConflictArbiterIndexes = NIL;
+	resultRelInfo->ri_onConflictSetProj = NULL;
+	resultRelInfo->ri_onConflictSetWhere = NULL;
 
 	/*
 	 * Partition constraint, which also includes the partition constraint of
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ce9a4e16cf..69efb13c4f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -15,10 +15,12 @@
 #include "postgres.h"
 
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_type.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -36,6 +38,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 Datum *values,
 									 bool *isnull,
 									 int maxfieldlen);
+static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
 
 /*
  * ExecSetupPartitionTupleRouting - sets up information needed during
@@ -64,6 +67,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -74,20 +79,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
-	if (mtstate && mtstate->operation == CMD_UPDATE)
+	if (node && node->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		update_rri = mtstate->resultRelInfo;
 		num_update_rri = list_length(node->plans);
 		proute->subplan_partition_offsets =
@@ -475,9 +476,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +485,144 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * Initialize information about this partition that's needed to handle
+	 * the ON CONFLICT clause.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell *lc;
+		List	 *arbiterIndexes = NIL;
+
+		/* Generate a list of arbiter indexes for the partition. */
+		foreach(lc, resultRelInfo->ri_onConflictArbiterIndexes)
+		{
+			Oid		parentArbiterIndexOid = lfirst_oid(lc);
+			int		i;
+
+			/*
+			 * Find parentArbiterIndexOid's child in this partition and add it
+			 * to my_arbiterindexes.
+			 */
+			for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+			{
+				Relation index = leaf_part_rri->ri_IndexRelationDescs[i];
+				Oid		 indexOid = RelationGetRelid(index);
+
+				if (parentArbiterIndexOid ==
+					get_partition_root_parent(indexOid))
+					arbiterIndexes = lappend_oid(arbiterIndexes, indexOid);
+			}
+		}
+		leaf_part_rri->ri_onConflictArbiterIndexes = arbiterIndexes;
+
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+		{
+			List	 *onconflset;
+			TupleDesc tupDesc;
+
+			Assert(node->onConflictSet != NIL);
+
+			/*
+			 * If partition's tuple descriptor differs from the root parent,
+			 * we need to adjust the onConflictSet target list to account for
+			 * differences in attribute numbers.
+			 */
+			if (map != NULL)
+			{
+				/*
+				 * First convert Vars to contain partition's atttribute
+				 * numbers.
+				 */
+
+				/*
+				 * Convert Vars referencing EXCLUDED pseudo-relation.
+				 *
+				 * Note that node->onConflictSet itself remains unmodified,
+				 * because a copy is made before changing any nodes.
+				 */
+				onconflset = map_partition_varattnos(node->onConflictSet,
+													 INNER_VAR,
+													 partrel,
+													 firstResultRel, NULL);
+				/* Convert Vars referencing main target relation. */
+				onconflset = map_partition_varattnos(onconflset,
+													 firstVarno,
+													 partrel,
+													 firstResultRel, NULL);
+
+				/*
+				 * The original list wouldn't contain entries for the
+				 * partition's dropped attributes, which we must be accounted
+				 * for because targetlist must have all the attributes of the
+				 * underlying table including the dropped ones.  Fix that and
+				 * reorder target list entries if their resnos change as a
+				 * result of the adjustment.
+				 */
+				onconflset = adjust_partition_tlist(onconflset, map);
+			}
+			else
+				/* Just reuse the original one. */
+				onconflset = node->onConflictSet;
+
+			/*
+			 * We must set mtstate->mt_conflproj's tuple descriptor to this
+			 * before trying to use it for projection.
+			 */
+			tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+			leaf_part_rri->ri_onConflictSetProj =
+					ExecBuildProjectionInfo(onconflset, econtext,
+											mtstate->mt_conflproj,
+											&mtstate->ps, partrelDesc);
+			leaf_part_rri->ri_onConflictSetProjTupDesc = tupDesc;
+
+			if (node->onConflictWhere)
+			{
+				if (map != NULL)
+				{
+					/*
+					 * Convert the Vars to contain partition's atttribute
+					 * numbers
+					 */
+					List *onconflwhere;
+
+					/*
+					 * Convert Vars referencing EXCLUDED pseudo-relation.
+					 *
+					 * Note that node->onConflictWhere itself remains
+					 * unmodified, because a copy is made before changing any
+					 * nodes.
+					 */
+					onconflwhere = map_partition_varattnos((List *)
+														node->onConflictWhere,
+														INNER_VAR,
+														partrel,
+														firstResultRel, NULL);
+					/* Convert Vars referencing main target relation. */
+					onconflwhere = map_partition_varattnos((List *)
+														onconflwhere,
+														firstVarno,
+														partrel,
+														firstResultRel, NULL);
+					leaf_part_rri->ri_onConflictSetWhere =
+						ExecInitQual(onconflwhere, &mtstate->ps);
+				}
+				else
+					/* Just reuse the original one. */
+					leaf_part_rri->ri_onConflictSetWhere =
+						resultRelInfo->ri_onConflictSetWhere;
+			}
+		}
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
@@ -946,3 +1082,94 @@ ExecBuildSlotPartitionKeyDescription(Relation rel,
 
 	return buf.data;
 }
+
+/*
+ * adjust_partition_tlist
+ *		Adjust the targetlist entries for a given partition to account for
+ *		attribute differences between parent and the partition
+ *
+ * The expressions have already been fixed, but we have to make sure that the
+ * target resnos match the partition's attribute numbers.  This results in
+ * generating a copy of the original target list in which the entries appear
+ * in sorted order of resno, including both the existing entries (that may
+ * have their resno changed in-place) and the newly added entries.
+ *
+ * Scribbles on the input tlist, so callers must make sure to make a copy
+ * before passing it to us.
+ */
+static List *
+adjust_partition_tlist(List *tlist, TupleConversionMap *map)
+{
+	List	   *new_tlist = NIL;
+	TupleDesc	tupdesc = map->outdesc;
+	AttrNumber *attrMap = map->attrMap;
+	int			numattrs = tupdesc->natts;
+	int			attrno;
+
+	for (attrno = 1; attrno <= numattrs; attrno++)
+	{
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+		TargetEntry *tle;
+
+		if (attrMap[attrno - 1] != 0)
+		{
+			/*
+			 * Use the existing entry from the parent, but make sure to
+			 * update the resno to match the partition's attno.
+			 */
+			Assert(!att_tup->attisdropped);
+
+			/* Get the corresponding tlist entry from the given tlist */
+			tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1);
+
+			/* Get the resno right */
+			if (tle->resno != attrno)
+				tle->resno = attrno;
+		}
+		else
+		{
+			/*
+			 * This corresponds to a dropped attribute in the partition, for
+			 * which we enerate a dummy entry with resno matching the
+			 * partition's attno.
+			 */
+			Node	   *expr;
+
+			Assert(att_tup->attisdropped);
+
+			/* Insert NULL for dropped column */
+			expr = (Node *) makeConst(INT4OID,
+									  -1,
+									  InvalidOid,
+									  sizeof(int32),
+									  (Datum) 0,
+									  true, /* isnull */
+									  true /* byval */ );
+
+			tle = makeTargetEntry((Expr *) expr,
+								  attrno,
+								  pstrdup(NameStr(att_tup->attname)),
+								  false);
+		}
+
+		new_tlist = lappend(new_tlist, tle);
+	}
+
+	/* Sanity check on the order of entries in the new tlist. */
+#ifdef USE_ASSERT_CHECKING
+	{
+		TargetEntry *prev = NULL;
+		ListCell *lc;
+
+		foreach(lc, new_tlist)
+		{
+			TargetEntry *cur = lfirst(lc);
+
+			Assert(prev == NULL || cur->resno > prev->resno);
+			prev = cur;
+		}
+	}
+#endif
+
+	return new_tlist;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4fa2d7265f..d205cebe0f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -422,7 +422,7 @@ ExecInsert(ModifyTableState *mtstate,
 			bool		specConflict;
 			List	   *arbiterIndexes;
 
-			arbiterIndexes = node->arbiterIndexes;
+			arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -1056,6 +1056,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -1639,6 +1651,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						ResultRelInfo *targetRelInfo,
 						TupleTableSlot *slot)
 {
+	ModifyTable *node;
 	int			partidx;
 	ResultRelInfo *partrel;
 	HeapTuple	tuple;
@@ -1720,6 +1733,19 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 							  proute->partition_tuple_slot,
 							  &slot);
 
+	/* Initialize information needed to handle ON CONFLICT DO UPDATE. */
+	Assert(mtstate != NULL);
+	node = (ModifyTable *) mtstate->ps.plan;
+	if (node->onConflictAction == ONCONFLICT_UPDATE)
+	{
+		Assert(mtstate->mt_existing != NULL);
+		ExecSetSlotDescriptor(mtstate->mt_existing,
+							  RelationGetDescr(partrel->ri_RelationDesc));
+		Assert(mtstate->mt_conflproj != NULL);
+		ExecSetSlotDescriptor(mtstate->mt_conflproj,
+							  partrel->ri_onConflictSetProjTupDesc);
+	}
+
 	return slot;
 }
 
@@ -2347,11 +2373,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		mtstate->ps.ps_ExprContext = NULL;
 	}
 
+	/* Set the list of arbiter indexes if needed for ON CONFLICT */
+	resultRelInfo = mtstate->resultRelInfo;
+	if (node->onConflictAction != ONCONFLICT_NONE)
+		resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
+
 	/*
 	 * If needed, Initialize target list, projection and qual for ON CONFLICT
 	 * DO UPDATE.
 	 */
-	resultRelInfo = mtstate->resultRelInfo;
 	if (node->onConflictAction == ONCONFLICT_UPDATE)
 	{
 		ExprContext *econtext;
@@ -2368,9 +2398,18 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/* initialize slot for the existing tuple */
-		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		/*
+		 * Initialize slot for the existing tuple.  If we'll be performing
+		 * tuple routing, the tuple descriptor to use for this will be
+		 * determined based on which relation the update is actually applied
+		 * to, so we don't set its tuple descriptor here.
+		 */
+		if (mtstate->mt_partition_tuple_routing == NULL)
+			mtstate->mt_existing =
+				ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		else
+			mtstate->mt_existing =
+				ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2378,8 +2417,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		/* create target slot for UPDATE SET projection */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 								 relationDesc->tdhasoid);
-		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+
+		/*
+		 * Just like the "existing tuple" slot, we leave this slot's
+		 * tuple descriptor set to NULL in the tuple routing case.
+		 */
+		if (mtstate->mt_partition_tuple_routing == NULL)
+		{
+			mtstate->mt_conflproj =
+				ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+			resultRelInfo->ri_onConflictSetProjTupDesc = tupDesc;
+		}
+		else
+			mtstate->mt_conflproj =
+				ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* build UPDATE SET projection state */
 		resultRelInfo->ri_onConflictSetProj =
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index cf1a34e41a..a4b5aaef44 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1026,13 +1026,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..287642b01b 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -52,6 +52,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
 extern Oid	get_partition_parent(Oid relid);
+extern Oid	get_partition_root_parent(Oid relid);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d9e591802f..4836a1e0cf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -412,9 +412,15 @@ typedef struct ResultRelInfo
 	/* for computing a RETURNING list */
 	ProjectionInfo *ri_projectReturning;
 
+	/* list of arbiter indexes to use to check conflicts */
+	List		   *ri_onConflictArbiterIndexes;
+
 	/* for computing ON CONFLICT DO UPDATE SET */
 	ProjectionInfo *ri_onConflictSetProj;
 
+	/* TupleDesc describing the result of the above */
+	TupleDesc	ri_onConflictSetProjTupDesc;
+
 	/* list of ON CONFLICT DO UPDATE exprs (qual) */
 	ExprState  *ri_onConflictSetWhere;
 
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..a9677f06e6 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,67 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 99be9ac6e9..f53ac6bdf1 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2328,6 +2328,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
 NOTICE:  trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
 NOTICE:  trigger = my_table_insert_trig, new table = <NULL>
 --
+-- now using a partitioned table
+--
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL>
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB)
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (1,AAA), (2,BBB), new table = (1,AAA:AAA), (2,BBB:BBB)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD)
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = <NULL>
+drop table iocdu_tt_parted;
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..73122479a3 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,59 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 3354f4899f..3773c6bc98 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1773,6 +1773,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
   update set b = my_table.b || ':' || excluded.b;
 
 --
+-- now using a partitioned table
+--
+
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+drop table iocdu_tt_parted;
+
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
-- 
2.11.0

#29Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Amit Langote (#28)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Thu, Mar 22, 2018 at 3:01 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp

wrote:

Why do we need to pin the descriptor? If we do need, why don't we need
corresponding ReleaseTupDesc() call?

PinTupleDesc was added in the patch as Alvaro had submitted it upthread,
which it wasn't clear to me either why it was needed. Looking at it
closely, it seems we can get rid of it if for the lack of corresponding
ReleaseTupleDesc(). ExecSetSlotDescriptor() seems to take care of pinning
and releasing tuple descriptors that are passed to it. If some
partition's tupDesc remains pinned because it was the last one that was
passed to it, the final ExecResetTupleTable will take care of releasing it.

I have removed the instances of PinTupleDesc in the updated patch, but I'm
not sure why the loose PinTupleDesc() in the previous version of the patch
didn't cause reference leak warnings or some such.

Yeah, it wasn't clear to me as well. But I did not investigate. May be
Alvaro knows better.

I am still confused if the partition_conflproj_tdescs really belongs to
PartitionTupleRouting or should it be a part of the ResultRelInfo. FWIW

for

the
MERGE patch, I moved everything to a new struct and made it part of the
ResultRelInfo. If no re-mapping is necessary, we can just point to the
struct
in the root relation's ResultRelInfo. Otherwise create and populate a new
one
in the partition relation's ResultRelInfo.

+ leaf_part_rri->ri_onConflictSetWhere =
+ ExecInitQual(onconflwhere, &mtstate->ps);
+ }

So ri_onConflictSetWhere and ri_onConflictSetProj are part of the
ResultRelInfo. Why not move mt_conflproj_tupdesc,
partition_conflproj_tdescs,
partition_arbiter_indexes etc to the ResultRelInfo as well?

I have moved both the projection tupdesc and the arbiter indexes list into
ResultRelInfo as I wrote above.

Thanks. It's looking much better now. I think we can possibly move all ON
CONFLICT related members to a separate structure and just copy the pointer
to the structure if (map == NULL). That might make the code a bit more tidy.

Is there anything that needs to be done for transition tables? I checked
and didn't see anything, but please check.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#30Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Pavan Deolasee (#29)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/22 20:48, Pavan Deolasee wrote:

Thanks. It's looking much better now.

Thanks.

I think we can possibly move all ON
CONFLICT related members to a separate structure and just copy the pointer
to the structure if (map == NULL). That might make the code a bit more tidy.

OK, I tried that in the attached updated patch.

Is there anything that needs to be done for transition tables? I checked
and didn't see anything, but please check.

There doesn't seem to be anything that this patch has to do for transition
tables. If you look at the tests I added in triggers.sql which exercise
INSERT ON CONFLICT's interaction with transition tables, you can see that
we get the same output for a partitioned table as we get for a normal table.

Thanks,
Amit

Attachments:

v9-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchtext/plain; charset=UTF-8; name=v9-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patchDownload
From d385c307fbe98935661d7b983229eb5b2e2e6436 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 20 Mar 2018 10:09:38 +0900
Subject: [PATCH v9] Fix ON CONFLICT to work with partitioned tables

Author: Amit Langote, Alvaro Herrera, Etsuro Fujita
---
 doc/src/sgml/ddl.sgml                         |  15 --
 doc/src/sgml/ref/insert.sgml                  |   8 +
 src/backend/catalog/partition.c               |  83 +++++++--
 src/backend/executor/execMain.c               |   4 +
 src/backend/executor/execPartition.c          | 252 ++++++++++++++++++++++++--
 src/backend/executor/nodeModifyTable.c        |  82 +++++++--
 src/backend/parser/analyze.c                  |   7 -
 src/include/catalog/partition.h               |   1 +
 src/include/nodes/execnodes.h                 |  25 ++-
 src/test/regress/expected/insert_conflict.out |  86 +++++++--
 src/test/regress/expected/triggers.out        |  33 ++++
 src/test/regress/sql/insert_conflict.sql      |  72 +++++++-
 src/test/regress/sql/triggers.sql             |  33 ++++
 13 files changed, 616 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 134092fa9c..62e142fd8e 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -518,6 +518,14 @@ INSERT INTO <replaceable class="parameter">table_name</replaceable> [ AS <replac
     not duplicate each other in terms of attributes constrained by an
     arbiter index or constraint.
    </para>
+
+   <para>
+    Note that it is currently not supported for the
+    <literal>ON CONFLICT DO UPDATE</literal> clause of an
+    <command>INSERT</command> applied to a partitioned table to update the
+    partition key of a conflicting row such that it requires the row be moved
+    to a new partition.
+   </para>
    <tip>
     <para>
      It is often preferable to use unique index inference rather than
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..bfe559490e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -192,6 +192,7 @@ static int	get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int	get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
 								 Datum *values, bool *isnull);
+static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot);
 
 /*
  * RelationBuildPartitionDesc
@@ -1377,6 +1378,7 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent of given relation
  *
  * Returns inheritance parent of a partition by scanning pg_inherits
  *
@@ -1387,14 +1389,59 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 Oid
 get_partition_parent(Oid relid)
 {
-	Form_pg_inherits form;
-	Relation	catalogRelation;
+	Relation	inhRel;
+	Oid		parentOid;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, false);
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find parent of relation %u", relid);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return parentOid;
+}
+
+/*
+ * get_partition_parent
+ *		Obtain topmost ancestor of given relation
+ *
+ * Returns the topmost parent inheritance parent of a partition by scanning
+ * pg_inherits
+ *
+ * Note: Because this function assumes that the relation whose OID is passed
+ * as an argument will have precisely one parent, it should only be called
+ * when it is known that the relation is a partition.
+ */
+Oid
+get_partition_root_parent(Oid relid)
+{
+	Relation	inhRel;
+	Oid		parentOid;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	parentOid = get_partition_parent_recurse(inhRel, relid, true);
+	if (parentOid == InvalidOid)
+		elog(ERROR, "could not find root parent of relation %u", relid);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return parentOid;
+}
+
+/*
+ * get_partition_parent_recurse
+ *		Recursive part of get_partition_parent
+ */
+static Oid
+get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot)
+{
 	SysScanDesc scan;
 	ScanKeyData key[2];
 	HeapTuple	tuple;
-	Oid			result;
-
-	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
+	Oid			result = InvalidOid;
 
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
@@ -1405,18 +1452,26 @@ get_partition_parent(Oid relid)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	/* Obtain the direct parent, and release resources before recursing */
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
-
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
-
+	if (HeapTupleIsValid(tuple))
+		result = ((Form_pg_inherits) GETSTRUCT(tuple))->inhparent;
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
+
+	/*
+	 * If we were asked to recurse, do so now.  Except that if we didn't get a
+	 * valid parent, then the 'relid' argument was already the topmost parent,
+	 * so return that.
+	 */
+	if (getroot)
+	{
+		if (OidIsValid(result))
+			return get_partition_parent_recurse(inhRel, result, getroot);
+		else
+			return relid;
+	}
 
 	return result;
 }
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 890067757c..352553da4b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1349,11 +1349,15 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 		resultRelInfo->ri_FdwRoutine = GetFdwRoutineForRelation(resultRelationDesc, true);
 	else
 		resultRelInfo->ri_FdwRoutine = NULL;
+
+	/* The following fields are set later if needed */
 	resultRelInfo->ri_FdwState = NULL;
 	resultRelInfo->ri_usesFdwDirectModify = false;
 	resultRelInfo->ri_ConstraintExprs = NULL;
 	resultRelInfo->ri_junkFilter = NULL;
 	resultRelInfo->ri_projectReturning = NULL;
+	resultRelInfo->ri_onConflictArbiterIndexes = NIL;
+	resultRelInfo->ri_onConflictSet = NULL;
 
 	/*
 	 * Partition constraint, which also includes the partition constraint of
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ce9a4e16cf..d92879442c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -15,10 +15,12 @@
 #include "postgres.h"
 
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_type.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -36,6 +38,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 Datum *values,
 									 bool *isnull,
 									 int maxfieldlen);
+static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
 
 /*
  * ExecSetupPartitionTupleRouting - sets up information needed during
@@ -64,6 +67,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -74,20 +79,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
-	if (mtstate && mtstate->operation == CMD_UPDATE)
+	if (node && node->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		update_rri = mtstate->resultRelInfo;
 		num_update_rri = list_length(node->plans);
 		proute->subplan_partition_offsets =
@@ -475,9 +476,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +485,143 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * Initialize information about this partition that's needed to handle
+	 * the ON CONFLICT clause.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell *lc;
+		List	 *arbiterIndexes = NIL;
+
+		/* Generate a list of arbiter indexes for the partition. */
+		foreach(lc, resultRelInfo->ri_onConflictArbiterIndexes)
+		{
+			Oid		parentArbiterIndexOid = lfirst_oid(lc);
+			int		i;
+
+			/*
+			 * Find parentArbiterIndexOid's child in this partition and add it
+			 * to my_arbiterindexes.
+			 */
+			for (i = 0; i < leaf_part_rri->ri_NumIndices; i++)
+			{
+				Relation index = leaf_part_rri->ri_IndexRelationDescs[i];
+				Oid		 indexOid = RelationGetRelid(index);
+
+				if (parentArbiterIndexOid ==
+					get_partition_root_parent(indexOid))
+					arbiterIndexes = lappend_oid(arbiterIndexes, indexOid);
+			}
+		}
+		leaf_part_rri->ri_onConflictArbiterIndexes = arbiterIndexes;
+
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+		{
+			Assert(node->onConflictSet != NIL);
+			Assert(resultRelInfo->ri_onConflictSet != NULL);
+
+			/*
+			 * If partition's tuple descriptor matches exactly with the root
+			 * parent, we can simply use the parent's ON CONFLICT SET state.
+			 */
+			if (map == NULL)
+				leaf_part_rri->ri_onConflictSet =
+											resultRelInfo->ri_onConflictSet;
+			else
+			{
+				List	 *onconflset;
+				TupleDesc tupDesc;
+
+				/*
+				 * We need to translate expressions (Vars and TargetEntry's)
+				 * in onConflictSet and onConflictWhere to account for
+				 * differences in attribute numbers between the partition and
+				 * the root parent.
+				 */
+				leaf_part_rri->ri_onConflictSet =
+										palloc0(sizeof(OnConflictSetState));
+
+				/*
+				 * We need to call map_partition_varattnos twice -- first to
+				 * convert Vars referencing the EXCLUDED pseudo-relation
+				 * (varno == INNER_VAR) and then Vars referencing main target
+				 * relation (varno == firstVarno).
+				 *
+				 * Note that node->onConflictSet itself remains unmodified
+				 * here, because a copy is made before changing any nodes.
+				 */
+				onconflset = map_partition_varattnos(node->onConflictSet,
+													 INNER_VAR,
+													 partrel,
+													 firstResultRel, NULL);
+				onconflset = map_partition_varattnos(onconflset,
+													 firstVarno,
+													 partrel,
+													 firstResultRel, NULL);
+
+				/*
+				 * The original list wouldn't contain entries for the
+				 * partition's dropped attributes, which we must be accounted
+				 * for because targetlist must have all the attributes of the
+				 * underlying table including the dropped ones.  Fix that and
+				 * reorder target list entries if their resnos change as a
+				 * result of the adjustment.
+				 */
+				onconflset = adjust_partition_tlist(onconflset, map);
+
+				/*
+				 * Caller must set mtstate->mt_conflproj's tuple descriptor to
+				 * this one before trying to use it for projection.
+				 */
+				tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+				leaf_part_rri->ri_onConflictSet->proj =
+						ExecBuildProjectionInfo(onconflset, econtext,
+												mtstate->mt_conflproj,
+												&mtstate->ps, partrelDesc);
+				leaf_part_rri->ri_onConflictSet->projTupDesc = tupDesc;
+
+				if (node->onConflictWhere)
+				{
+					/*
+					 * Convert the Vars to contain partition's atttribute
+					 * numbers
+					 */
+					List *onconflwhere;
+
+					/*
+					 * Just like for onConflictSet, we need to call
+					 * map_partition_varattnos twice.
+					 *
+					 * Again node->onConflictWhere itself remains unchanged,
+					 * because a copy is made before changing any nodes.
+					 */
+					onconflwhere = map_partition_varattnos((List *)
+														node->onConflictWhere,
+														INNER_VAR,
+														partrel,
+														firstResultRel, NULL);
+					/* Convert Vars referencing main target relation. */
+					onconflwhere = map_partition_varattnos((List *)
+														onconflwhere,
+														firstVarno,
+														partrel,
+														firstResultRel, NULL);
+					leaf_part_rri->ri_onConflictSet->where =
+						ExecInitQual(onconflwhere, &mtstate->ps);
+				}
+			}
+		}
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
@@ -946,3 +1081,94 @@ ExecBuildSlotPartitionKeyDescription(Relation rel,
 
 	return buf.data;
 }
+
+/*
+ * adjust_partition_tlist
+ *		Adjust the targetlist entries for a given partition to account for
+ *		attribute differences between parent and the partition
+ *
+ * The expressions have already been fixed, but we have to make sure that the
+ * target resnos match the partition's attribute numbers.  This results in
+ * generating a copy of the original target list in which the entries appear
+ * in sorted order of resno, including both the existing entries (that may
+ * have their resno changed in-place) and the newly added entries.
+ *
+ * Scribbles on the input tlist, so callers must make sure to make a copy
+ * before passing it to us.
+ */
+static List *
+adjust_partition_tlist(List *tlist, TupleConversionMap *map)
+{
+	List	   *new_tlist = NIL;
+	TupleDesc	tupdesc = map->outdesc;
+	AttrNumber *attrMap = map->attrMap;
+	int			numattrs = tupdesc->natts;
+	int			attrno;
+
+	for (attrno = 1; attrno <= numattrs; attrno++)
+	{
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+		TargetEntry *tle;
+
+		if (attrMap[attrno - 1] != 0)
+		{
+			/*
+			 * Use the existing entry from the parent, but make sure to
+			 * update the resno to match the partition's attno.
+			 */
+			Assert(!att_tup->attisdropped);
+
+			/* Get the corresponding tlist entry from the given tlist */
+			tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1);
+
+			/* Get the resno right */
+			if (tle->resno != attrno)
+				tle->resno = attrno;
+		}
+		else
+		{
+			/*
+			 * This corresponds to a dropped attribute in the partition, for
+			 * which we enerate a dummy entry with resno matching the
+			 * partition's attno.
+			 */
+			Node	   *expr;
+
+			Assert(att_tup->attisdropped);
+
+			/* Insert NULL for dropped column */
+			expr = (Node *) makeConst(INT4OID,
+									  -1,
+									  InvalidOid,
+									  sizeof(int32),
+									  (Datum) 0,
+									  true, /* isnull */
+									  true /* byval */ );
+
+			tle = makeTargetEntry((Expr *) expr,
+								  attrno,
+								  pstrdup(NameStr(att_tup->attname)),
+								  false);
+		}
+
+		new_tlist = lappend(new_tlist, tle);
+	}
+
+	/* Sanity check on the order of entries in the new tlist. */
+#ifdef USE_ASSERT_CHECKING
+	{
+		TargetEntry *prev = NULL;
+		ListCell *lc;
+
+		foreach(lc, new_tlist)
+		{
+			TargetEntry *cur = lfirst(lc);
+
+			Assert(prev == NULL || cur->resno > prev->resno);
+			prev = cur;
+		}
+	}
+#endif
+
+	return new_tlist;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4fa2d7265f..a2cd276b1d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -422,7 +422,7 @@ ExecInsert(ModifyTableState *mtstate,
 			bool		specConflict;
 			List	   *arbiterIndexes;
 
-			arbiterIndexes = node->arbiterIndexes;
+			arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -1056,6 +1056,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -1313,7 +1325,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 {
 	ExprContext *econtext = mtstate->ps.ps_ExprContext;
 	Relation	relation = resultRelInfo->ri_RelationDesc;
-	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflictSetWhere;
+	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflictSet->where;
 	HeapTupleData tuple;
 	HeapUpdateFailureData hufd;
 	LockTupleMode lockmode;
@@ -1462,7 +1474,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	}
 
 	/* Project the new tuple version */
-	ExecProject(resultRelInfo->ri_onConflictSetProj);
+	ExecProject(resultRelInfo->ri_onConflictSet->proj);
 
 	/*
 	 * Note that it is possible that the target tuple has been modified in
@@ -1639,6 +1651,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						ResultRelInfo *targetRelInfo,
 						TupleTableSlot *slot)
 {
+	ModifyTable *node;
 	int			partidx;
 	ResultRelInfo *partrel;
 	HeapTuple	tuple;
@@ -1720,6 +1733,19 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 							  proute->partition_tuple_slot,
 							  &slot);
 
+	/* Initialize information needed to handle ON CONFLICT DO UPDATE. */
+	Assert(mtstate != NULL);
+	node = (ModifyTable *) mtstate->ps.plan;
+	if (node->onConflictAction == ONCONFLICT_UPDATE)
+	{
+		Assert(mtstate->mt_existing != NULL);
+		ExecSetSlotDescriptor(mtstate->mt_existing,
+							  RelationGetDescr(partrel->ri_RelationDesc));
+		Assert(mtstate->mt_conflproj != NULL);
+		ExecSetSlotDescriptor(mtstate->mt_conflproj,
+							  partrel->ri_onConflictSet->projTupDesc);
+	}
+
 	return slot;
 }
 
@@ -2347,11 +2373,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		mtstate->ps.ps_ExprContext = NULL;
 	}
 
+	/* Set the list of arbiter indexes if needed for ON CONFLICT */
+	resultRelInfo = mtstate->resultRelInfo;
+	if (node->onConflictAction != ONCONFLICT_NONE)
+		resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
+
 	/*
 	 * If needed, Initialize target list, projection and qual for ON CONFLICT
 	 * DO UPDATE.
 	 */
-	resultRelInfo = mtstate->resultRelInfo;
 	if (node->onConflictAction == ONCONFLICT_UPDATE)
 	{
 		ExprContext *econtext;
@@ -2368,21 +2398,51 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/* initialize slot for the existing tuple */
-		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		/*
+		 * Initialize slot for the existing tuple.  If we'll be performing
+		 * tuple routing, the tuple descriptor to use for this will be
+		 * determined based on which relation the update is actually applied
+		 * to, so we don't set its tuple descriptor here.
+		 */
+		if (mtstate->mt_partition_tuple_routing == NULL)
+			mtstate->mt_existing =
+				ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+		else
+			mtstate->mt_existing =
+				ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
 
+		/* create state for DO UPDATE SET operation */
+		resultRelInfo->ri_onConflictSet = palloc0(sizeof(OnConflictSetState));
+
 		/* create target slot for UPDATE SET projection */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 								 relationDesc->tdhasoid);
-		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+
+		/*
+		 * Just like the "existing tuple" slot, we leave this slot's
+		 * tuple descriptor set to NULL in the tuple routing case.
+		 */
+		if (mtstate->mt_partition_tuple_routing == NULL)
+			mtstate->mt_conflproj =
+				ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+		else
+			mtstate->mt_conflproj =
+				ExecInitExtraTupleSlot(mtstate->ps.state, NULL);
+
+		/*
+		 * Although, we keep this tuple descriptor around so that for the
+		 * common case where partitions have the same descriptor as the root
+		 * parent (this table), we don't end up regenerating it needlessly.
+		 * ExecPrepareTupleRouting still has to set mtstate->mt_conflproj's
+		 * descriptor though.
+		 */
+		resultRelInfo->ri_onConflictSet->projTupDesc = tupDesc;
 
 		/* build UPDATE SET projection state */
-		resultRelInfo->ri_onConflictSetProj =
+		resultRelInfo->ri_onConflictSet->proj =
 			ExecBuildProjectionInfo(node->onConflictSet, econtext,
 									mtstate->mt_conflproj, &mtstate->ps,
 									relationDesc);
@@ -2395,7 +2455,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			qualexpr = ExecInitQual((List *) node->onConflictWhere,
 									&mtstate->ps);
 
-			resultRelInfo->ri_onConflictSetWhere = qualexpr;
+			resultRelInfo->ri_onConflictSet->where = qualexpr;
 		}
 	}
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index cf1a34e41a..a4b5aaef44 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1026,13 +1026,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..287642b01b 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -52,6 +52,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
 extern Oid	get_partition_parent(Oid relid);
+extern Oid	get_partition_root_parent(Oid relid);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bf2616a95e..7d32927289 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -363,6 +363,24 @@ typedef struct JunkFilter
 } JunkFilter;
 
 /*
+ * OnConflictSetState
+ *
+ * Contains execution time state of a ON CONFLICT DO UPDATE operation, which
+ * includes the state of projection, tuple descriptor of the projection, and
+ * WHERE quals if any.
+ */
+typedef struct OnConflictSetState
+{	/* for computing ON CONFLICT DO UPDATE SET */
+	ProjectionInfo	*proj;
+
+	/* TupleDesc describing the result of the above */
+	TupleDesc		projTupDesc;
+
+	/* list of ON CONFLICT DO UPDATE exprs (qual) */
+	ExprState	   *where;
+}	OnConflictSetState;
+
+/*
  * ResultRelInfo
  *
  * Whenever we update an existing relation, we have to update indexes on the
@@ -424,11 +442,10 @@ typedef struct ResultRelInfo
 	/* for computing a RETURNING list */
 	ProjectionInfo *ri_projectReturning;
 
-	/* for computing ON CONFLICT DO UPDATE SET */
-	ProjectionInfo *ri_onConflictSetProj;
+	/* list of arbiter indexes to use to check conflicts */
+	List		   *ri_onConflictArbiterIndexes;
 
-	/* list of ON CONFLICT DO UPDATE exprs (qual) */
-	ExprState  *ri_onConflictSetWhere;
+	OnConflictSetState *ri_onConflictSet;
 
 	/* partition check expression */
 	List	   *ri_PartitionCheck;
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..6f1e3094d7 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,80 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
+-- test with multiple rows
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (1, 'a'), (2, 'a'), (4, 'a') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+insert into parted_conflict_test (a, b) values (1, 'b'), (2, 'c'), (4, 'b') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+-- should see (1, 'b'), (2, 'a'), (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 1 | b
+ 2 | a
+ 4 | b
+(3 rows)
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 99be9ac6e9..f53ac6bdf1 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2328,6 +2328,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
 NOTICE:  trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
 NOTICE:  trigger = my_table_insert_trig, new table = <NULL>
 --
+-- now using a partitioned table
+--
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL>
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB)
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (1,AAA), (2,BBB), new table = (1,AAA:AAA), (2,BBB:BBB)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD)
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = <NULL>
+drop table iocdu_tt_parted;
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..a25cd718a5 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,67 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
+-- test with multiple rows
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (1, 'a'), (2, 'a'), (4, 'a') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+insert into parted_conflict_test (a, b) values (1, 'b'), (2, 'c'), (4, 'b') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+
+-- should see (1, 'b'), (2, 'a'), (4, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 3354f4899f..3773c6bc98 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1773,6 +1773,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
   update set b = my_table.b || ':' || excluded.b;
 
 --
+-- now using a partitioned table
+--
+
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+drop table iocdu_tt_parted;
+
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
-- 
2.11.0

#31Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Amit Langote (#28)
Re: ON CONFLICT DO UPDATE for partitioned tables

(2018/03/22 18:31), Amit Langote wrote:

On 2018/03/20 20:53, Etsuro Fujita wrote:

Here are comments on executor changes in (the latest version of) the patch:

@@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
ItemPointerData conflictTid;
bool        specConflict;
List       *arbiterIndexes;
+            PartitionTupleRouting *proute =
+                                        mtstate->mt_partition_tuple_routing;
-            arbiterIndexes = node->arbiterIndexes;
+            /* Use the appropriate list of arbiter indexes. */
+            if (mtstate->mt_partition_tuple_routing != NULL)
+            {
+                Assert(partition_index>= 0&&  proute != NULL);
+                arbiterIndexes =
+                        proute->partition_arbiter_indexes[partition_index];
+            }
+            else
+                arbiterIndexes = node->arbiterIndexes;

To handle both cases the same way, I wonder if it would be better to have
the arbiterindexes list in ResultRelInfo as well, as mentioned by Alvaro
upthread, or to re-add mt_arbiterindexes as before and set it to
proute->partition_arbiter_indexes[partition_index] before we get here,
maybe in ExecPrepareTupleRouting, in the case of tuple routing.

It's a good idea. I somehow missed that Alvaro had already mentioned it.

In HEAD, we now have ri_onConflictSetProj and ri_onConflictSetWhere. I
propose we name the field ri_onConflictArbiterIndexes as done in the
updated patch.

I like that naming.

@@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
econtext = mtstate->ps.ps_ExprContext;
relationDesc = resultRelInfo->ri_RelationDesc->rd_att;

-        /* initialize slot for the existing tuple */
-        mtstate->mt_existing =
-            ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+        /*
+         * Initialize slot for the existing tuple.  We determine which
+         * tupleDesc to use for this after we have determined which relation
+         * the insert/update will be applied to, possibly after performing
+         * tuple routing.
+         */
+        mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state,
NULL);
/* carried forward solely for the benefit of explain */
mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
/* create target slot for UPDATE SET projection */
tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
relationDesc->tdhasoid);
+        PinTupleDesc(tupDesc);
+        mtstate->mt_conflproj_tupdesc = tupDesc;
+
+        /*
+         * Just like the "existing tuple" slot, we'll defer deciding which
+         * tupleDesc to use for this slot to a point where tuple routing has
+         * been performed.
+         */
mtstate->mt_conflproj =
-            ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+            ExecInitExtraTupleSlot(mtstate->ps.state, NULL);

If we do ExecInitExtraTupleSlot for mtstate->mt_existing and
mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
routing, as said above, we wouldn't need this changes. I think doing that
only in the case of tuple routing and keeping this as-is would be better
because that would save cycles in the normal case.

Hmm, I think we shouldn't be doing ExecInitExtraTupleSlot in
ExecPrepareTupleRouting, because we shouldn't have more than one instance
of mtstate->mt_existing and mtstate->mt_conflproj slots.

Yeah, I think so too. What I was going to say here is
ExecSetSlotDescriptor, not ExecInitExtraTupleSlot, as you said below.
Sorry about the incorrectness. I guess I was too tired when writing
that comments.

As you also said above, I think you meant to say here that we do
ExecInitExtraTupleSlot only once for both mtstate->mt_existing and
mtstate->mt_conflproj in ExecInitModifyTable and only do
ExecSetSlotDescriptor in ExecPrepareTupleRouting.

That's right.

I have changed it so
that ExecInitModifyTable now both creates the slot and sets the descriptor
for non-tuple-routing cases and only creates but doesn't set the
descriptor in the tuple-routing case.

IMHO I don't see much value in modifying code as such, because we do
ExecSetSlotDescriptor for mt_existing and mt_conflproj in
ExecPrepareTupleRouting for every inserted tuple. So, I would leave
that as-is, to keep that simple.

For ExecPrepareTupleRouting to be able to access the tupDesc of the
onConflictSet target list, I've added ri_onConflictSetProjTupDesc which is
set by ExecInitPartitionInfo on first call for a give partition. This is
also suggested by Pavan in his review.

Seems like a good idea.

Here are some comments on the latest version of the patch:

+               /*
+                * Caller must set mtstate->mt_conflproj's tuple 
descriptor to
+                * this one before trying to use it for projection.
+                */
+               tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+               leaf_part_rri->ri_onConflictSet->proj =
+                       ExecBuildProjectionInfo(onconflset, econtext,
+                                               mtstate->mt_conflproj,
+                                               &mtstate->ps, partrelDesc);

ExecBuildProjectionInfo is called without setting the tuple descriptor
of mtstate->mt_conflproj to tupDesc. That might work at least for now,
but I think it's a good thing to set it appropriately to make that
future proof.

+            * This corresponds to a dropped attribute in the partition, for
+            * which we enerate a dummy entry with resno matching the
+            * partition's attno.

s/enerate/generate/

+ * OnConflictSetState
+ *
+ * Contains execution time state of a ON CONFLICT DO UPDATE operation, 
which
+ * includes the state of projection, tuple descriptor of the 
projection, and
+ * WHERE quals if any.

s/a ON/an ON/

+typedef struct OnConflictSetState
+{  /* for computing ON CONFLICT DO UPDATE SET */

This is nitpicking, but this wouldn't follow the project style, so I
think that needs re-formatting.

I'll look at the patch a little bit more early next week.

Thanks for updating the patch!

Best regards,
Etsuro Fujita

#32Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#30)
Re: ON CONFLICT DO UPDATE for partitioned tables

Thanks for these changes. I'm going over this now, with intention to
push it shortly.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#33Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#30)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

I made a bunch of further edits and I think this v10 is ready to push.
Before doing so I'll give it a final look, particularly because of the
new elog(ERROR) I added. Post-commit review is of course always
appreciated.

Most notable change is because I noticed that if you mention an
intermediate partition level in the INSERT command, and the index is on
the top level, arbiter index selection fails to find the correct index
because it walks all the way to the top instead of stopping in the
middle, as it should (the command was still working because it ended up
with an empty arbiter index list).

To fix this, I had to completely rework the "get partition parent root"
stuff into "get list of ancestors of this partition".

Because of this, I added a new check that the partition's arbiter index
list is same length as parent's; if not, throw an error. I couldn't get
it to fire (so it's just an elog not ereport), but maybe I just didn't
try any weird enough scenarios.

Other changes:

* I added a copyObject() call for nodes we're operating upon. Maybe
this is unnecessary but the comments claimed "we're working on a copy"
and I couldn't find any place where we were actually making one.
Anyway it seems sane to make a copy, because we're scribbling on those
nodes ... I hope I didn't introduce any serious memory leaks.

* I made the new OnConflictSetState thing into a proper node. Like
ResultRelInfo, it doesn't have any niceties like nodeToString support,
but it seems saner this way (palloc -> makeNode). I reworked the
formatting of that struct definition too, and renamed members.

* I removed an assertion block at the bottom of adjust_partition_tlist.
It seemed quite pointless, since it was just about checking that the
resno values were sorted, but by construction we already know that
they are indeed sorted ...

* General code style improvements, comment rewording, etc.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v10-0001-on-conflict.patchtext/plain; charset=us-asciiDownload
From d87cd6154fe026f7641e98c8a43683b208a61f5b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 22 Mar 2018 19:12:57 -0300
Subject: [PATCH v10] on conflict

---
 doc/src/sgml/ddl.sgml                         |  15 --
 doc/src/sgml/ref/insert.sgml                  |   8 +
 src/backend/catalog/partition.c               |  88 ++++++++--
 src/backend/executor/execMain.c               |   4 +
 src/backend/executor/execPartition.c          | 229 ++++++++++++++++++++++++--
 src/backend/executor/nodeModifyTable.c        |  74 +++++++--
 src/backend/parser/analyze.c                  |   7 -
 src/include/catalog/partition.h               |   1 +
 src/include/nodes/execnodes.h                 |  22 ++-
 src/include/nodes/nodes.h                     |   1 +
 src/test/regress/expected/insert_conflict.out | 108 ++++++++++--
 src/test/regress/expected/triggers.out        |  33 ++++
 src/test/regress/sql/insert_conflict.sql      |  95 +++++++++--
 src/test/regress/sql/triggers.sql             |  33 ++++
 14 files changed, 635 insertions(+), 83 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 134092fa9c..62e142fd8e 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -518,6 +518,14 @@ INSERT INTO <replaceable class="parameter">table_name</replaceable> [ AS <replac
     not duplicate each other in terms of attributes constrained by an
     arbiter index or constraint.
    </para>
+
+   <para>
+    Note that it is currently not supported for the
+    <literal>ON CONFLICT DO UPDATE</literal> clause of an
+    <command>INSERT</command> applied to a partitioned table to update the
+    partition key of a conflicting row such that it requires the row be moved
+    to a new partition.
+   </para>
    <tip>
     <para>
      It is often preferable to use unique index inference rather than
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..b00a986432 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,10 @@ typedef struct PartitionRangeBound
 	bool		lower;			/* this is the lower (vs upper) bound */
 } PartitionRangeBound;
 
+
+static Oid	get_partition_parent_worker(Relation inhRel, Oid relid);
+static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
+							   List **ancestors);
 static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
 static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
 							   void *arg);
@@ -1377,6 +1381,7 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent of given relation
  *
  * Returns inheritance parent of a partition by scanning pg_inherits
  *
@@ -1387,15 +1392,34 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 Oid
 get_partition_parent(Oid relid)
 {
-	Form_pg_inherits form;
 	Relation	catalogRelation;
-	SysScanDesc scan;
-	ScanKeyData key[2];
-	HeapTuple	tuple;
 	Oid			result;
 
 	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
 
+	result = get_partition_parent_worker(catalogRelation, relid);
+
+	if (!OidIsValid(result))
+		elog(ERROR, "could not find tuple for parent of relation %u", relid);
+
+	heap_close(catalogRelation, AccessShareLock);
+
+	return result;
+}
+
+/*
+ * get_partition_parent_worker
+ *		Scan the pg_inherits relation to return the OID of the parent of the
+ *		given relation
+ */
+static Oid
+get_partition_parent_worker(Relation inhRel, Oid relid)
+{
+	SysScanDesc scan;
+	ScanKeyData key[2];
+	Oid			result = InvalidOid;
+	HeapTuple	tuple;
+
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
 				BTEqualStrategyNumber, F_OIDEQ,
@@ -1405,23 +1429,65 @@ get_partition_parent(Oid relid)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
+	if (HeapTupleIsValid(tuple))
+	{
+		Form_pg_inherits form = (Form_pg_inherits) GETSTRUCT(tuple);
 
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
+		result = form->inhparent;
+	}
 
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
 
 	return result;
 }
 
 /*
+ * get_partition_ancestors
+ *		Obtain ancestors of given relation
+ *
+ * Returns a list of ancestors of the given relation.
+ *
+ * Note: Because this function assumes that the relation whose OID is passed
+ * as an argument and each ancestor will have precisely one parent, it should
+ * only be called when it is known that the relation is a partition.
+ */
+List *
+get_partition_ancestors(Oid relid)
+{
+	List	   *result = NIL;
+	Relation	inhRel;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	get_partition_ancestors_worker(inhRel, relid, &result);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return result;
+}
+
+/*
+ * get_partition_ancestors_worker
+ *		recursive worker for get_partition_ancestors
+ */
+static void
+get_partition_ancestors_worker(Relation inhRel, Oid relid, List **ancestors)
+{
+	Oid			parentOid;
+
+	/* Recursion ends at the topmost level, ie., when there's no parent */
+	parentOid = get_partition_parent_worker(inhRel, relid);
+	if (parentOid == InvalidOid)
+		return;
+
+	*ancestors = lappend_oid(*ancestors, parentOid);
+	get_partition_ancestors_worker(inhRel, parentOid, ancestors);
+}
+
+/*
  * get_qual_from_partbound
  *		Given a parser node for partition bound, return the list of executable
  *		expressions as partition constraint
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 890067757c..250aa1eaaf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1349,11 +1349,15 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 		resultRelInfo->ri_FdwRoutine = GetFdwRoutineForRelation(resultRelationDesc, true);
 	else
 		resultRelInfo->ri_FdwRoutine = NULL;
+
+	/* The following fields are set later if needed */
 	resultRelInfo->ri_FdwState = NULL;
 	resultRelInfo->ri_usesFdwDirectModify = false;
 	resultRelInfo->ri_ConstraintExprs = NULL;
 	resultRelInfo->ri_junkFilter = NULL;
 	resultRelInfo->ri_projectReturning = NULL;
+	resultRelInfo->ri_onConflictArbiterIndexes = NIL;
+	resultRelInfo->ri_onConflict = NULL;
 
 	/*
 	 * Partition constraint, which also includes the partition constraint of
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ce9a4e16cf..f363103b2a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -15,10 +15,12 @@
 #include "postgres.h"
 
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_type.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -36,6 +38,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 Datum *values,
 									 bool *isnull,
 									 int maxfieldlen);
+static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
 
 /*
  * ExecSetupPartitionTupleRouting - sets up information needed during
@@ -64,6 +67,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -74,20 +79,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
-	if (mtstate && mtstate->operation == CMD_UPDATE)
+	if (node && node->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		update_rri = mtstate->resultRelInfo;
 		num_update_rri = list_length(node->plans);
 		proute->subplan_partition_offsets =
@@ -475,9 +476,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +485,144 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * If there is an ON CONFLICT clause, initialize state for it.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell   *lc;
+		List	   *arbiterIndexes = NIL;
+
+		/*
+		 * If there is a list of arbiter indexes, map it to a list of indexes
+		 * in the partition.  We do that by scanning the partition's index
+		 * list and searching for ancestry relationships to each index in the
+		 * ancestor table.
+		 */
+		if (list_length(resultRelInfo->ri_onConflictArbiterIndexes) > 0)
+		{
+			List	   *childIdxs;
+
+			childIdxs = RelationGetIndexList(leaf_part_rri->ri_RelationDesc);
+
+			foreach(lc, childIdxs)
+			{
+				Oid			childIdx = lfirst_oid(lc);
+				List	   *ancestors;
+				ListCell   *lc2;
+
+				ancestors = get_partition_ancestors(childIdx);
+				foreach(lc2, resultRelInfo->ri_onConflictArbiterIndexes)
+				{
+					if (list_member_oid(ancestors, lfirst_oid(lc2)))
+						arbiterIndexes = lappend_oid(arbiterIndexes, childIdx);
+				}
+				list_free(ancestors);
+			}
+		}
+
+		/*
+		 * If the resulting lists are of inequal length, something is wrong.
+		 * (This shouldn't happen, since arbiter index selection should not
+		 * pick up an invalid index.)
+		 */
+		if (list_length(resultRelInfo->ri_onConflictArbiterIndexes) !=
+			list_length(arbiterIndexes))
+			elog(ERROR, "invalid arbiter index list");
+		leaf_part_rri->ri_onConflictArbiterIndexes = arbiterIndexes;
+
+		/*
+		 * In the DO UPDATE case, we have some more state to initialize.
+		 */
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+		{
+			Assert(node->onConflictSet != NIL);
+			Assert(resultRelInfo->ri_onConflict != NULL);
+
+			/*
+			 * If the partition's tuple descriptor matches exactly the root
+			 * parent (the common case), we can simply re-use the parent's ON
+			 * CONFLICT SET state, skipping a bunch of work.  Otherwise, we
+			 * need to create state specific to this partition.
+			 */
+			if (map == NULL)
+				leaf_part_rri->ri_onConflict = resultRelInfo->ri_onConflict;
+			else
+			{
+				List	   *onconflset;
+				TupleDesc	tupDesc;
+				bool		found_whole_row;
+
+				leaf_part_rri->ri_onConflict = makeNode(OnConflictSetState);
+
+				/*
+				 * Translate expressions in onConflictSet to account for
+				 * different attribute numbers.  For that, map partition
+				 * varattnos twice: first to catch the EXCLUDED
+				 * pseudo-relation (INNER_VAR), and second to handle the main
+				 * target relation (firstVarno).
+				 */
+				onconflset = (List *) copyObject((Node *) node->onConflictSet);
+				onconflset =
+					map_partition_varattnos(onconflset, INNER_VAR, partrel,
+											firstResultRel, &found_whole_row);
+				Assert(!found_whole_row);
+				onconflset =
+					map_partition_varattnos(onconflset, firstVarno, partrel,
+											firstResultRel, &found_whole_row);
+				Assert(!found_whole_row);
+
+				/* Finally, adjust this tlist to match the partition. */
+				onconflset = adjust_partition_tlist(onconflset, map);
+
+				/*
+				 * Build UPDATE SET's projection info.  The user of this
+				 * projection is responsible for setting the slot's tupdesc!
+				 * We set aside a tupdesc that's good for the common case of a
+				 * partition that's tupdesc-equal to the partitioned table;
+				 * partitions of different tupdescs must generate their own.
+				 */
+				tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+				leaf_part_rri->ri_onConflict->oc_ProjInfo =
+					ExecBuildProjectionInfo(onconflset, econtext,
+											mtstate->mt_conflproj,
+											&mtstate->ps, partrelDesc);
+				leaf_part_rri->ri_onConflict->oc_ProjTupdesc = tupDesc;
+
+				/*
+				 * If there is a WHERE clause, initialize state where it will
+				 * be evaluated, mapping the attribute numbers appropriately.
+				 * As with onConflictSet, we need to map partition varattnos
+				 * to the partition's tupdesc.
+				 */
+				if (node->onConflictWhere)
+				{
+					List	   *clause;
+
+					clause = copyObject((List *) node->onConflictWhere);
+					clause = map_partition_varattnos(clause, INNER_VAR,
+													 partrel, firstResultRel,
+													 &found_whole_row);
+					Assert(!found_whole_row);
+					clause = map_partition_varattnos(clause, firstVarno,
+													 partrel, firstResultRel,
+													 &found_whole_row);
+					Assert(!found_whole_row);
+					leaf_part_rri->ri_onConflict->oc_WhereClause =
+						ExecInitQual((List *) clause, &mtstate->ps);
+				}
+			}
+		}
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
@@ -946,3 +1082,70 @@ ExecBuildSlotPartitionKeyDescription(Relation rel,
 
 	return buf.data;
 }
+
+/*
+ * adjust_partition_tlist
+ *		Adjust the targetlist entries for a given partition to account for
+ *		attribute differences between parent and the partition
+ *
+ * The expressions have already been fixed, but here we fix the list to make
+ * target resnos match the partition's attribute numbers.  This results in a
+ * copy of the original target list in which the entries appear in resno
+ * order, including both the existing entries (that may have their resno
+ * changed in-place) and the newly added entries for columns that don't exist
+ * in the parent.
+ *
+ * Scribbles on the input tlist, so callers must make sure to make a copy
+ * before passing it to us.
+ */
+static List *
+adjust_partition_tlist(List *tlist, TupleConversionMap *map)
+{
+	List	   *new_tlist = NIL;
+	TupleDesc	tupdesc = map->outdesc;
+	AttrNumber *attrMap = map->attrMap;
+	AttrNumber	attrno;
+
+	for (attrno = 1; attrno <= tupdesc->natts; attrno++)
+	{
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+		TargetEntry *tle;
+
+		if (attrMap[attrno - 1] != InvalidAttrNumber)
+		{
+			Assert(!att_tup->attisdropped);
+
+			/*
+			 * Use the corresponding entry from the parent's tlist, adjusting
+			 * the resno the match the partition's attno.
+			 */
+			tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1);
+			tle->resno = attrno;
+		}
+		else
+		{
+			Const	   *expr;
+
+			/*
+			 * For a dropped attribute in the partition, generate a dummy
+			 * entry with resno matching the partition's attno.
+			 */
+			Assert(att_tup->attisdropped);
+			expr = makeConst(INT4OID,
+							 -1,
+							 InvalidOid,
+							 sizeof(int32),
+							 (Datum) 0,
+							 true,	/* isnull */
+							 true /* byval */ );
+			tle = makeTargetEntry((Expr *) expr,
+								  attrno,
+								  pstrdup(NameStr(att_tup->attname)),
+								  false);
+		}
+
+		new_tlist = lappend(new_tlist, tle);
+	}
+
+	return new_tlist;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4fa2d7265f..1b09868ff8 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -422,7 +422,7 @@ ExecInsert(ModifyTableState *mtstate,
 			bool		specConflict;
 			List	   *arbiterIndexes;
 
-			arbiterIndexes = node->arbiterIndexes;
+			arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -1056,6 +1056,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -1313,7 +1325,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 {
 	ExprContext *econtext = mtstate->ps.ps_ExprContext;
 	Relation	relation = resultRelInfo->ri_RelationDesc;
-	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflictSetWhere;
+	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
 	HeapTupleData tuple;
 	HeapUpdateFailureData hufd;
 	LockTupleMode lockmode;
@@ -1462,7 +1474,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	}
 
 	/* Project the new tuple version */
-	ExecProject(resultRelInfo->ri_onConflictSetProj);
+	ExecProject(resultRelInfo->ri_onConflict->oc_ProjInfo);
 
 	/*
 	 * Note that it is possible that the target tuple has been modified in
@@ -1639,6 +1651,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						ResultRelInfo *targetRelInfo,
 						TupleTableSlot *slot)
 {
+	ModifyTable *node;
 	int			partidx;
 	ResultRelInfo *partrel;
 	HeapTuple	tuple;
@@ -1720,6 +1733,19 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 							  proute->partition_tuple_slot,
 							  &slot);
 
+	/* Initialize information needed to handle ON CONFLICT DO UPDATE. */
+	Assert(mtstate != NULL);
+	node = (ModifyTable *) mtstate->ps.plan;
+	if (node->onConflictAction == ONCONFLICT_UPDATE)
+	{
+		Assert(mtstate->mt_existing != NULL);
+		ExecSetSlotDescriptor(mtstate->mt_existing,
+							  RelationGetDescr(partrel->ri_RelationDesc));
+		Assert(mtstate->mt_conflproj != NULL);
+		ExecSetSlotDescriptor(mtstate->mt_conflproj,
+							  partrel->ri_onConflict->oc_ProjTupdesc);
+	}
+
 	return slot;
 }
 
@@ -2347,11 +2373,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		mtstate->ps.ps_ExprContext = NULL;
 	}
 
+	/* Set the list of arbiter indexes if needed for ON CONFLICT */
+	resultRelInfo = mtstate->resultRelInfo;
+	if (node->onConflictAction != ONCONFLICT_NONE)
+		resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
+
 	/*
 	 * If needed, Initialize target list, projection and qual for ON CONFLICT
 	 * DO UPDATE.
 	 */
-	resultRelInfo = mtstate->resultRelInfo;
 	if (node->onConflictAction == ONCONFLICT_UPDATE)
 	{
 		ExprContext *econtext;
@@ -2368,34 +2398,54 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/* initialize slot for the existing tuple */
+		/*
+		 * Initialize slot for the existing tuple.  If we'll be performing
+		 * tuple routing, the tuple descriptor to use for this will be
+		 * determined based on which relation the update is actually applied
+		 * to, so we don't set its tuple descriptor here.
+		 */
 		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state,
+								   mtstate->mt_partition_tuple_routing ?
+								   NULL : relationDesc);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
 
-		/* create target slot for UPDATE SET projection */
+		/* create state for DO UPDATE SET operation */
+		resultRelInfo->ri_onConflict = makeNode(OnConflictSetState);
+
+		/*
+		 * Create the tuple slot for the UPDATE SET projection.
+		 *
+		 * Just like mt_existing above, we leave it without a tuple descriptor
+		 * in the case of partitioning tuple routing, so that it can be
+		 * changed by ExecPrepareTupleRouting.  In that case, we still save
+		 * the tupdesc in the parent's state: it can be reused by partitions
+		 * with an identical descriptor to the parent.
+		 */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 								 relationDesc->tdhasoid);
 		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state,
+								   mtstate->mt_partition_tuple_routing ?
+								   NULL : tupDesc);
+		resultRelInfo->ri_onConflict->oc_ProjTupdesc = tupDesc;
 
 		/* build UPDATE SET projection state */
-		resultRelInfo->ri_onConflictSetProj =
+		resultRelInfo->ri_onConflict->oc_ProjInfo =
 			ExecBuildProjectionInfo(node->onConflictSet, econtext,
 									mtstate->mt_conflproj, &mtstate->ps,
 									relationDesc);
 
-		/* build DO UPDATE WHERE clause expression */
+		/* initialize state to evaluate the WHERE clause, if any */
 		if (node->onConflictWhere)
 		{
 			ExprState  *qualexpr;
 
 			qualexpr = ExecInitQual((List *) node->onConflictWhere,
 									&mtstate->ps);
-
-			resultRelInfo->ri_onConflictSetWhere = qualexpr;
+			resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
 		}
 	}
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index cf1a34e41a..a4b5aaef44 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1026,13 +1026,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..cd15faa7a1 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -52,6 +52,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
 extern Oid	get_partition_parent(Oid relid);
+extern List *get_partition_ancestors(Oid relid);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bf2616a95e..2c2d2823c0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -363,6 +363,20 @@ typedef struct JunkFilter
 } JunkFilter;
 
 /*
+ * OnConflictSetState
+ *
+ * Executor state of an ON CONFLICT DO UPDATE operation.
+ */
+typedef struct OnConflictSetState
+{
+	NodeTag		type;
+
+	ProjectionInfo *oc_ProjInfo;	/* for ON CONFLICT DO UPDATE SET */
+	TupleDesc	oc_ProjTupdesc; /* TupleDesc for the above projection */
+	ExprState  *oc_WhereClause; /* state for the WHERE clause */
+} OnConflictSetState;
+
+/*
  * ResultRelInfo
  *
  * Whenever we update an existing relation, we have to update indexes on the
@@ -424,11 +438,11 @@ typedef struct ResultRelInfo
 	/* for computing a RETURNING list */
 	ProjectionInfo *ri_projectReturning;
 
-	/* for computing ON CONFLICT DO UPDATE SET */
-	ProjectionInfo *ri_onConflictSetProj;
+	/* list of arbiter indexes to use to check conflicts */
+	List	   *ri_onConflictArbiterIndexes;
 
-	/* list of ON CONFLICT DO UPDATE exprs (qual) */
-	ExprState  *ri_onConflictSetWhere;
+	/* ON CONFLICT evaluation state */
+	OnConflictSetState *ri_onConflict;
 
 	/* partition check expression */
 	List	   *ri_PartitionCheck;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..443de22704 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -34,6 +34,7 @@ typedef enum NodeTag
 	T_ExprContext,
 	T_ProjectionInfo,
 	T_JunkFilter,
+	T_OnConflictSetState,
 	T_ResultRelInfo,
 	T_EState,
 	T_TupleTableSlot,
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..2d7061fa1b 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,102 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
+-- test with multiple rows
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (1, 'a'), (2, 'a'), (4, 'a') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+insert into parted_conflict_test (a, b) values (1, 'b'), (2, 'c'), (4, 'b') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+-- should see (1, 'b'), (2, 'a'), (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 1 | b
+ 2 | a
+ 4 | b
+(3 rows)
+
 drop table parted_conflict_test;
+-- test behavior of inserting a conflicting tuple into an intermediate
+-- partitioning level
+create table parted_conflict (a int primary key, b text) partition by range (a);
+create table parted_conflict_1 partition of parted_conflict for values from (0) to (1000) partition by range (a);
+create table parted_conflict_1_1 partition of parted_conflict_1 for values from (0) to (500);
+insert into parted_conflict values (40, 'forty');
+insert into parted_conflict_1 values (40, 'cuarenta')
+  on conflict (a) do update set b = excluded.b;
+drop table parted_conflict;
+-- same thing, but this time try to use an index that's created not in the
+-- partition
+create table parted_conflict (a int, b text) partition by range (a);
+create table parted_conflict_1 partition of parted_conflict for values from (0) to (1000) partition by range (a);
+create unique index on only parted_conflict_1 (a);
+create unique index on only parted_conflict (a);
+alter index parted_conflict_a_idx attach partition parted_conflict_1_a_idx;
+create table parted_conflict_1_1 partition of parted_conflict_1 for values from (0) to (500);
+insert into parted_conflict values (40, 'forty');
+insert into parted_conflict_1 values (40, 'cuarenta')
+  on conflict (a) do update set b = excluded.b;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+drop table parted_conflict;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 53e7ae41ba..f534d0db18 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2624,6 +2624,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
 NOTICE:  trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
 NOTICE:  trigger = my_table_insert_trig, new table = <NULL>
 --
+-- now using a partitioned table
+--
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL>
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB)
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (1,AAA), (2,BBB), new table = (1,AAA:AAA), (2,BBB:BBB)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD)
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = <NULL>
+drop table iocdu_tt_parted;
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..6c50fd61eb 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,90 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
+-- test with multiple rows
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (1, 'a'), (2, 'a'), (4, 'a') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+insert into parted_conflict_test (a, b) values (1, 'b'), (2, 'c'), (4, 'b') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+
+-- should see (1, 'b'), (2, 'a'), (4, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
+
+-- test behavior of inserting a conflicting tuple into an intermediate
+-- partitioning level
+create table parted_conflict (a int primary key, b text) partition by range (a);
+create table parted_conflict_1 partition of parted_conflict for values from (0) to (1000) partition by range (a);
+create table parted_conflict_1_1 partition of parted_conflict_1 for values from (0) to (500);
+insert into parted_conflict values (40, 'forty');
+insert into parted_conflict_1 values (40, 'cuarenta')
+  on conflict (a) do update set b = excluded.b;
+drop table parted_conflict;
+
+-- same thing, but this time try to use an index that's created not in the
+-- partition
+create table parted_conflict (a int, b text) partition by range (a);
+create table parted_conflict_1 partition of parted_conflict for values from (0) to (1000) partition by range (a);
+create unique index on only parted_conflict_1 (a);
+create unique index on only parted_conflict (a);
+alter index parted_conflict_a_idx attach partition parted_conflict_1_a_idx;
+create table parted_conflict_1_1 partition of parted_conflict_1 for values from (0) to (500);
+insert into parted_conflict values (40, 'forty');
+insert into parted_conflict_1 values (40, 'cuarenta')
+  on conflict (a) do update set b = excluded.b;
+drop table parted_conflict;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 8be893bd1e..9d3e0ef707 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1983,6 +1983,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
   update set b = my_table.b || ':' || excluded.b;
 
 --
+-- now using a partitioned table
+--
+
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+drop table iocdu_tt_parted;
+
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
-- 
2.11.0

#34Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#33)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/24 9:23, Alvaro Herrera wrote:

I made a bunch of further edits and I think this v10 is ready to push.
Before doing so I'll give it a final look, particularly because of the
new elog(ERROR) I added. Post-commit review is of course always
appreciated.

Most notable change is because I noticed that if you mention an
intermediate partition level in the INSERT command, and the index is on
the top level, arbiter index selection fails to find the correct index
because it walks all the way to the top instead of stopping in the
middle, as it should (the command was still working because it ended up
with an empty arbiter index list).

Good catch!

To fix this, I had to completely rework the "get partition parent root"
stuff into "get list of ancestors of this partition".

I wondered if a is_partition_ancestor(partrelid, ancestorid) isn't enough
instead of creating a list of ancestors and then looping over it as you've
done, but maybe what you have here is fine.

Because of this, I added a new check that the partition's arbiter index
list is same length as parent's; if not, throw an error. I couldn't get
it to fire (so it's just an elog not ereport), but maybe I just didn't
try any weird enough scenarios.

Other changes:

* I added a copyObject() call for nodes we're operating upon. Maybe
this is unnecessary but the comments claimed "we're working on a copy"
and I couldn't find any place where we were actually making one.
Anyway it seems sane to make a copy, because we're scribbling on those
nodes ... I hope I didn't introduce any serious memory leaks.

That seems fine as ExecInitPartitionInfo allocates in the query context
(es_query_cxt).

* I made the new OnConflictSetState thing into a proper node. Like
ResultRelInfo, it doesn't have any niceties like nodeToString support,
but it seems saner this way (palloc -> makeNode). I reworked the
formatting of that struct definition too, and renamed members.

Looks good, thanks.

* I removed an assertion block at the bottom of adjust_partition_tlist.
It seemed quite pointless, since it was just about checking that the
resno values were sorted, but by construction we already know that
they are indeed sorted ...

Hmm yes.

* General code style improvements, comment rewording, etc.

There was one comment in Fujita-san's review he posted on Friday [1]/messages/by-id/5AB4DEB6.2020901@lab.ntt.co.jp that
doesn't seem to be addressed in v10, which I think we probably should. It
was this comment:

"ExecBuildProjectionInfo is called without setting the tuple descriptor of
mtstate->mt_conflproj to tupDesc. That might work at least for now, but I
think it's a good thing to set it appropriately to make that future proof."

All of his other comments seem to have been taken care of in v10. I have
fixed the above one in the attached updated version.

Thanks,
Amit

[1]: /messages/by-id/5AB4DEB6.2020901@lab.ntt.co.jp

Attachments:

v11-0001-on-conflict.patchtext/plain; charset=UTF-8; name=v11-0001-on-conflict.patchDownload
From 7c0e1432f8f9516647126eab962656c13e691f3e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 22 Mar 2018 19:12:57 -0300
Subject: [PATCH v11] on conflict

---
 doc/src/sgml/ddl.sgml                         |  15 --
 doc/src/sgml/ref/insert.sgml                  |   8 +
 src/backend/catalog/partition.c               |  88 ++++++++--
 src/backend/executor/execMain.c               |   4 +
 src/backend/executor/execPartition.c          | 230 ++++++++++++++++++++++++--
 src/backend/executor/nodeModifyTable.c        |  74 +++++++--
 src/backend/parser/analyze.c                  |   7 -
 src/include/catalog/partition.h               |   1 +
 src/include/nodes/execnodes.h                 |  22 ++-
 src/include/nodes/nodes.h                     |   1 +
 src/test/regress/expected/insert_conflict.out | 108 ++++++++++--
 src/test/regress/expected/triggers.out        |  33 ++++
 src/test/regress/sql/insert_conflict.sql      |  95 +++++++++--
 src/test/regress/sql/triggers.sql             |  33 ++++
 14 files changed, 636 insertions(+), 83 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02
 
      <listitem>
       <para>
-       Using the <literal>ON CONFLICT</literal> clause with partitioned tables
-       will cause an error if the conflict target is specified (see
-       <xref linkend="sql-on-conflict" /> for more details on how the clause
-       works).  Therefore, it is not possible to specify
-       <literal>DO UPDATE</literal> as the alternative action, because
-       specifying the conflict target is mandatory in that case.  On the other
-       hand, specifying <literal>DO NOTHING</literal> as the alternative action
-       works fine provided the conflict target is not specified.  In that case,
-       unique constraints (or exclusion constraints) of the individual leaf
-       partitions are considered.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When an <command>UPDATE</command> causes a row to move from one
        partition to another, there is a chance that another concurrent
        <command>UPDATE</command> or <command>DELETE</command> misses this row.
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 134092fa9c..62e142fd8e 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -518,6 +518,14 @@ INSERT INTO <replaceable class="parameter">table_name</replaceable> [ AS <replac
     not duplicate each other in terms of attributes constrained by an
     arbiter index or constraint.
    </para>
+
+   <para>
+    Note that it is currently not supported for the
+    <literal>ON CONFLICT DO UPDATE</literal> clause of an
+    <command>INSERT</command> applied to a partitioned table to update the
+    partition key of a conflicting row such that it requires the row be moved
+    to a new partition.
+   </para>
    <tip>
     <para>
      It is often preferable to use unique index inference rather than
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..b00a986432 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,10 @@ typedef struct PartitionRangeBound
 	bool		lower;			/* this is the lower (vs upper) bound */
 } PartitionRangeBound;
 
+
+static Oid	get_partition_parent_worker(Relation inhRel, Oid relid);
+static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
+							   List **ancestors);
 static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
 static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
 							   void *arg);
@@ -1377,6 +1381,7 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 
 /*
  * get_partition_parent
+ *		Obtain direct parent of given relation
  *
  * Returns inheritance parent of a partition by scanning pg_inherits
  *
@@ -1387,15 +1392,34 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 Oid
 get_partition_parent(Oid relid)
 {
-	Form_pg_inherits form;
 	Relation	catalogRelation;
-	SysScanDesc scan;
-	ScanKeyData key[2];
-	HeapTuple	tuple;
 	Oid			result;
 
 	catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
 
+	result = get_partition_parent_worker(catalogRelation, relid);
+
+	if (!OidIsValid(result))
+		elog(ERROR, "could not find tuple for parent of relation %u", relid);
+
+	heap_close(catalogRelation, AccessShareLock);
+
+	return result;
+}
+
+/*
+ * get_partition_parent_worker
+ *		Scan the pg_inherits relation to return the OID of the parent of the
+ *		given relation
+ */
+static Oid
+get_partition_parent_worker(Relation inhRel, Oid relid)
+{
+	SysScanDesc scan;
+	ScanKeyData key[2];
+	Oid			result = InvalidOid;
+	HeapTuple	tuple;
+
 	ScanKeyInit(&key[0],
 				Anum_pg_inherits_inhrelid,
 				BTEqualStrategyNumber, F_OIDEQ,
@@ -1405,23 +1429,65 @@ get_partition_parent(Oid relid)
 				BTEqualStrategyNumber, F_INT4EQ,
 				Int32GetDatum(1));
 
-	scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+	scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true,
 							  NULL, 2, key);
-
 	tuple = systable_getnext(scan);
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "could not find tuple for parent of relation %u", relid);
+	if (HeapTupleIsValid(tuple))
+	{
+		Form_pg_inherits form = (Form_pg_inherits) GETSTRUCT(tuple);
 
-	form = (Form_pg_inherits) GETSTRUCT(tuple);
-	result = form->inhparent;
+		result = form->inhparent;
+	}
 
 	systable_endscan(scan);
-	heap_close(catalogRelation, AccessShareLock);
 
 	return result;
 }
 
 /*
+ * get_partition_ancestors
+ *		Obtain ancestors of given relation
+ *
+ * Returns a list of ancestors of the given relation.
+ *
+ * Note: Because this function assumes that the relation whose OID is passed
+ * as an argument and each ancestor will have precisely one parent, it should
+ * only be called when it is known that the relation is a partition.
+ */
+List *
+get_partition_ancestors(Oid relid)
+{
+	List	   *result = NIL;
+	Relation	inhRel;
+
+	inhRel = heap_open(InheritsRelationId, AccessShareLock);
+
+	get_partition_ancestors_worker(inhRel, relid, &result);
+
+	heap_close(inhRel, AccessShareLock);
+
+	return result;
+}
+
+/*
+ * get_partition_ancestors_worker
+ *		recursive worker for get_partition_ancestors
+ */
+static void
+get_partition_ancestors_worker(Relation inhRel, Oid relid, List **ancestors)
+{
+	Oid			parentOid;
+
+	/* Recursion ends at the topmost level, ie., when there's no parent */
+	parentOid = get_partition_parent_worker(inhRel, relid);
+	if (parentOid == InvalidOid)
+		return;
+
+	*ancestors = lappend_oid(*ancestors, parentOid);
+	get_partition_ancestors_worker(inhRel, parentOid, ancestors);
+}
+
+/*
  * get_qual_from_partbound
  *		Given a parser node for partition bound, return the list of executable
  *		expressions as partition constraint
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f47c691d12..68f6450ee6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1347,11 +1347,15 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 		resultRelInfo->ri_FdwRoutine = GetFdwRoutineForRelation(resultRelationDesc, true);
 	else
 		resultRelInfo->ri_FdwRoutine = NULL;
+
+	/* The following fields are set later if needed */
 	resultRelInfo->ri_FdwState = NULL;
 	resultRelInfo->ri_usesFdwDirectModify = false;
 	resultRelInfo->ri_ConstraintExprs = NULL;
 	resultRelInfo->ri_junkFilter = NULL;
 	resultRelInfo->ri_projectReturning = NULL;
+	resultRelInfo->ri_onConflictArbiterIndexes = NIL;
+	resultRelInfo->ri_onConflict = NULL;
 
 	/*
 	 * Partition constraint, which also includes the partition constraint of
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ce9a4e16cf..9b67722a1e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -15,10 +15,12 @@
 #include "postgres.h"
 
 #include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_type.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "utils/lsyscache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -36,6 +38,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 Datum *values,
 									 bool *isnull,
 									 int maxfieldlen);
+static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
 
 /*
  * ExecSetupPartitionTupleRouting - sets up information needed during
@@ -64,6 +67,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	int			num_update_rri = 0,
 				update_rri_index = 0;
 	PartitionTupleRouting *proute;
+	int			nparts;
+	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Get the information about the partition tree after locking all the
@@ -74,20 +79,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	proute->partition_dispatch_info =
 		RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch,
 										 &leaf_parts);
-	proute->num_partitions = list_length(leaf_parts);
-	proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions *
-												   sizeof(ResultRelInfo *));
+	proute->num_partitions = nparts = list_length(leaf_parts);
+	proute->partitions =
+		(ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *));
 	proute->parent_child_tupconv_maps =
-		(TupleConversionMap **) palloc0(proute->num_partitions *
-										sizeof(TupleConversionMap *));
-	proute->partition_oids = (Oid *) palloc(proute->num_partitions *
-											sizeof(Oid));
+		(TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *));
+	proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid));
 
 	/* Set up details specific to the type of tuple routing we are doing. */
-	if (mtstate && mtstate->operation == CMD_UPDATE)
+	if (node && node->operation == CMD_UPDATE)
 	{
-		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
-
 		update_rri = mtstate->resultRelInfo;
 		num_update_rri = list_length(node->plans);
 		proute->subplan_partition_offsets =
@@ -475,9 +476,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 									&mtstate->ps, RelationGetDescr(partrel));
 	}
 
-	Assert(proute->partitions[partidx] == NULL);
-	proute->partitions[partidx] = leaf_part_rri;
-
 	/*
 	 * Save a tuple conversion map to convert a tuple routed to this partition
 	 * from the parent's type to the partition's.
@@ -487,6 +485,145 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 							   RelationGetDescr(partrel),
 							   gettext_noop("could not convert row type"));
 
+	/*
+	 * If there is an ON CONFLICT clause, initialize state for it.
+	 */
+	if (node && node->onConflictAction != ONCONFLICT_NONE)
+	{
+		TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
+		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+		Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+		TupleDesc	partrelDesc = RelationGetDescr(partrel);
+		ExprContext *econtext = mtstate->ps.ps_ExprContext;
+		ListCell   *lc;
+		List	   *arbiterIndexes = NIL;
+
+		/*
+		 * If there is a list of arbiter indexes, map it to a list of indexes
+		 * in the partition.  We do that by scanning the partition's index
+		 * list and searching for ancestry relationships to each index in the
+		 * ancestor table.
+		 */
+		if (list_length(resultRelInfo->ri_onConflictArbiterIndexes) > 0)
+		{
+			List	   *childIdxs;
+
+			childIdxs = RelationGetIndexList(leaf_part_rri->ri_RelationDesc);
+
+			foreach(lc, childIdxs)
+			{
+				Oid			childIdx = lfirst_oid(lc);
+				List	   *ancestors;
+				ListCell   *lc2;
+
+				ancestors = get_partition_ancestors(childIdx);
+				foreach(lc2, resultRelInfo->ri_onConflictArbiterIndexes)
+				{
+					if (list_member_oid(ancestors, lfirst_oid(lc2)))
+						arbiterIndexes = lappend_oid(arbiterIndexes, childIdx);
+				}
+				list_free(ancestors);
+			}
+		}
+
+		/*
+		 * If the resulting lists are of inequal length, something is wrong.
+		 * (This shouldn't happen, since arbiter index selection should not
+		 * pick up an invalid index.)
+		 */
+		if (list_length(resultRelInfo->ri_onConflictArbiterIndexes) !=
+			list_length(arbiterIndexes))
+			elog(ERROR, "invalid arbiter index list");
+		leaf_part_rri->ri_onConflictArbiterIndexes = arbiterIndexes;
+
+		/*
+		 * In the DO UPDATE case, we have some more state to initialize.
+		 */
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
+		{
+			Assert(node->onConflictSet != NIL);
+			Assert(resultRelInfo->ri_onConflict != NULL);
+
+			/*
+			 * If the partition's tuple descriptor matches exactly the root
+			 * parent (the common case), we can simply re-use the parent's ON
+			 * CONFLICT SET state, skipping a bunch of work.  Otherwise, we
+			 * need to create state specific to this partition.
+			 */
+			if (map == NULL)
+				leaf_part_rri->ri_onConflict = resultRelInfo->ri_onConflict;
+			else
+			{
+				List	   *onconflset;
+				TupleDesc	tupDesc;
+				bool		found_whole_row;
+
+				leaf_part_rri->ri_onConflict = makeNode(OnConflictSetState);
+
+				/*
+				 * Translate expressions in onConflictSet to account for
+				 * different attribute numbers.  For that, map partition
+				 * varattnos twice: first to catch the EXCLUDED
+				 * pseudo-relation (INNER_VAR), and second to handle the main
+				 * target relation (firstVarno).
+				 */
+				onconflset = (List *) copyObject((Node *) node->onConflictSet);
+				onconflset =
+					map_partition_varattnos(onconflset, INNER_VAR, partrel,
+											firstResultRel, &found_whole_row);
+				Assert(!found_whole_row);
+				onconflset =
+					map_partition_varattnos(onconflset, firstVarno, partrel,
+											firstResultRel, &found_whole_row);
+				Assert(!found_whole_row);
+
+				/* Finally, adjust this tlist to match the partition. */
+				onconflset = adjust_partition_tlist(onconflset, map);
+
+				/*
+				 * Build UPDATE SET's projection info.  The user of this
+				 * projection is responsible for setting the slot's tupdesc!
+				 * We set aside a tupdesc that's good for the common case of a
+				 * partition that's tupdesc-equal to the partitioned table;
+				 * partitions of different tupdescs must generate their own.
+				 */
+				tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid);
+				ExecSetSlotDescriptor(mtstate->mt_conflproj, tupDesc);
+				leaf_part_rri->ri_onConflict->oc_ProjInfo =
+					ExecBuildProjectionInfo(onconflset, econtext,
+											mtstate->mt_conflproj,
+											&mtstate->ps, partrelDesc);
+				leaf_part_rri->ri_onConflict->oc_ProjTupdesc = tupDesc;
+
+				/*
+				 * If there is a WHERE clause, initialize state where it will
+				 * be evaluated, mapping the attribute numbers appropriately.
+				 * As with onConflictSet, we need to map partition varattnos
+				 * to the partition's tupdesc.
+				 */
+				if (node->onConflictWhere)
+				{
+					List	   *clause;
+
+					clause = copyObject((List *) node->onConflictWhere);
+					clause = map_partition_varattnos(clause, INNER_VAR,
+													 partrel, firstResultRel,
+													 &found_whole_row);
+					Assert(!found_whole_row);
+					clause = map_partition_varattnos(clause, firstVarno,
+													 partrel, firstResultRel,
+													 &found_whole_row);
+					Assert(!found_whole_row);
+					leaf_part_rri->ri_onConflict->oc_WhereClause =
+						ExecInitQual((List *) clause, &mtstate->ps);
+				}
+			}
+		}
+	}
+
+	Assert(proute->partitions[partidx] == NULL);
+	proute->partitions[partidx] = leaf_part_rri;
+
 	MemoryContextSwitchTo(oldContext);
 
 	return leaf_part_rri;
@@ -946,3 +1083,70 @@ ExecBuildSlotPartitionKeyDescription(Relation rel,
 
 	return buf.data;
 }
+
+/*
+ * adjust_partition_tlist
+ *		Adjust the targetlist entries for a given partition to account for
+ *		attribute differences between parent and the partition
+ *
+ * The expressions have already been fixed, but here we fix the list to make
+ * target resnos match the partition's attribute numbers.  This results in a
+ * copy of the original target list in which the entries appear in resno
+ * order, including both the existing entries (that may have their resno
+ * changed in-place) and the newly added entries for columns that don't exist
+ * in the parent.
+ *
+ * Scribbles on the input tlist, so callers must make sure to make a copy
+ * before passing it to us.
+ */
+static List *
+adjust_partition_tlist(List *tlist, TupleConversionMap *map)
+{
+	List	   *new_tlist = NIL;
+	TupleDesc	tupdesc = map->outdesc;
+	AttrNumber *attrMap = map->attrMap;
+	AttrNumber	attrno;
+
+	for (attrno = 1; attrno <= tupdesc->natts; attrno++)
+	{
+		Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1);
+		TargetEntry *tle;
+
+		if (attrMap[attrno - 1] != InvalidAttrNumber)
+		{
+			Assert(!att_tup->attisdropped);
+
+			/*
+			 * Use the corresponding entry from the parent's tlist, adjusting
+			 * the resno the match the partition's attno.
+			 */
+			tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1);
+			tle->resno = attrno;
+		}
+		else
+		{
+			Const	   *expr;
+
+			/*
+			 * For a dropped attribute in the partition, generate a dummy
+			 * entry with resno matching the partition's attno.
+			 */
+			Assert(att_tup->attisdropped);
+			expr = makeConst(INT4OID,
+							 -1,
+							 InvalidOid,
+							 sizeof(int32),
+							 (Datum) 0,
+							 true,	/* isnull */
+							 true /* byval */ );
+			tle = makeTargetEntry((Expr *) expr,
+								  attrno,
+								  pstrdup(NameStr(att_tup->attname)),
+								  false);
+		}
+
+		new_tlist = lappend(new_tlist, tle);
+	}
+
+	return new_tlist;
+}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4fa2d7265f..1b09868ff8 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -422,7 +422,7 @@ ExecInsert(ModifyTableState *mtstate,
 			bool		specConflict;
 			List	   *arbiterIndexes;
 
-			arbiterIndexes = node->arbiterIndexes;
+			arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -1056,6 +1056,18 @@ lreplace:;
 			TupleConversionMap *tupconv_map;
 
 			/*
+			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+			 * original row to migrate to a different partition.  Maybe this
+			 * can be implemented some day, but it seems a fringe feature with
+			 * little redeeming value.
+			 */
+			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("invalid ON UPDATE specification"),
+						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+			/*
 			 * When an UPDATE is run on a leaf partition, we will not have
 			 * partition tuple routing set up. In that case, fail with
 			 * partition constraint violation error.
@@ -1313,7 +1325,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 {
 	ExprContext *econtext = mtstate->ps.ps_ExprContext;
 	Relation	relation = resultRelInfo->ri_RelationDesc;
-	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflictSetWhere;
+	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
 	HeapTupleData tuple;
 	HeapUpdateFailureData hufd;
 	LockTupleMode lockmode;
@@ -1462,7 +1474,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	}
 
 	/* Project the new tuple version */
-	ExecProject(resultRelInfo->ri_onConflictSetProj);
+	ExecProject(resultRelInfo->ri_onConflict->oc_ProjInfo);
 
 	/*
 	 * Note that it is possible that the target tuple has been modified in
@@ -1639,6 +1651,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						ResultRelInfo *targetRelInfo,
 						TupleTableSlot *slot)
 {
+	ModifyTable *node;
 	int			partidx;
 	ResultRelInfo *partrel;
 	HeapTuple	tuple;
@@ -1720,6 +1733,19 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 							  proute->partition_tuple_slot,
 							  &slot);
 
+	/* Initialize information needed to handle ON CONFLICT DO UPDATE. */
+	Assert(mtstate != NULL);
+	node = (ModifyTable *) mtstate->ps.plan;
+	if (node->onConflictAction == ONCONFLICT_UPDATE)
+	{
+		Assert(mtstate->mt_existing != NULL);
+		ExecSetSlotDescriptor(mtstate->mt_existing,
+							  RelationGetDescr(partrel->ri_RelationDesc));
+		Assert(mtstate->mt_conflproj != NULL);
+		ExecSetSlotDescriptor(mtstate->mt_conflproj,
+							  partrel->ri_onConflict->oc_ProjTupdesc);
+	}
+
 	return slot;
 }
 
@@ -2347,11 +2373,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		mtstate->ps.ps_ExprContext = NULL;
 	}
 
+	/* Set the list of arbiter indexes if needed for ON CONFLICT */
+	resultRelInfo = mtstate->resultRelInfo;
+	if (node->onConflictAction != ONCONFLICT_NONE)
+		resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
+
 	/*
 	 * If needed, Initialize target list, projection and qual for ON CONFLICT
 	 * DO UPDATE.
 	 */
-	resultRelInfo = mtstate->resultRelInfo;
 	if (node->onConflictAction == ONCONFLICT_UPDATE)
 	{
 		ExprContext *econtext;
@@ -2368,34 +2398,54 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		econtext = mtstate->ps.ps_ExprContext;
 		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
 
-		/* initialize slot for the existing tuple */
+		/*
+		 * Initialize slot for the existing tuple.  If we'll be performing
+		 * tuple routing, the tuple descriptor to use for this will be
+		 * determined based on which relation the update is actually applied
+		 * to, so we don't set its tuple descriptor here.
+		 */
 		mtstate->mt_existing =
-			ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state,
+								   mtstate->mt_partition_tuple_routing ?
+								   NULL : relationDesc);
 
 		/* carried forward solely for the benefit of explain */
 		mtstate->mt_excludedtlist = node->exclRelTlist;
 
-		/* create target slot for UPDATE SET projection */
+		/* create state for DO UPDATE SET operation */
+		resultRelInfo->ri_onConflict = makeNode(OnConflictSetState);
+
+		/*
+		 * Create the tuple slot for the UPDATE SET projection.
+		 *
+		 * Just like mt_existing above, we leave it without a tuple descriptor
+		 * in the case of partitioning tuple routing, so that it can be
+		 * changed by ExecPrepareTupleRouting.  In that case, we still save
+		 * the tupdesc in the parent's state: it can be reused by partitions
+		 * with an identical descriptor to the parent.
+		 */
 		tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
 								 relationDesc->tdhasoid);
 		mtstate->mt_conflproj =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+			ExecInitExtraTupleSlot(mtstate->ps.state,
+								   mtstate->mt_partition_tuple_routing ?
+								   NULL : tupDesc);
+		resultRelInfo->ri_onConflict->oc_ProjTupdesc = tupDesc;
 
 		/* build UPDATE SET projection state */
-		resultRelInfo->ri_onConflictSetProj =
+		resultRelInfo->ri_onConflict->oc_ProjInfo =
 			ExecBuildProjectionInfo(node->onConflictSet, econtext,
 									mtstate->mt_conflproj, &mtstate->ps,
 									relationDesc);
 
-		/* build DO UPDATE WHERE clause expression */
+		/* initialize state to evaluate the WHERE clause, if any */
 		if (node->onConflictWhere)
 		{
 			ExprState  *qualexpr;
 
 			qualexpr = ExecInitQual((List *) node->onConflictWhere,
 									&mtstate->ps);
-
-			resultRelInfo->ri_onConflictSetWhere = qualexpr;
+			resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
 		}
 	}
 
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index cf1a34e41a..a4b5aaef44 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1026,13 +1026,6 @@ transformOnConflictClause(ParseState *pstate,
 		TargetEntry *te;
 		int			attno;
 
-		if (targetrel->rd_partdesc)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("%s cannot be applied to partitioned table \"%s\"",
-							"ON CONFLICT DO UPDATE",
-							RelationGetRelationName(targetrel))));
-
 		/*
 		 * All INSERT expressions have been parsed, get ready for potentially
 		 * existing SET statements that need to be processed like an UPDATE.
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..cd15faa7a1 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -52,6 +52,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
 extern void check_new_partition_bound(char *relname, Relation parent,
 						  PartitionBoundSpec *spec);
 extern Oid	get_partition_parent(Oid relid);
+extern List *get_partition_ancestors(Oid relid);
 extern List *get_qual_from_partbound(Relation rel, Relation parent,
 						PartitionBoundSpec *spec);
 extern List *map_partition_varattnos(List *expr, int fromrel_varno,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bf2616a95e..2c2d2823c0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -363,6 +363,20 @@ typedef struct JunkFilter
 } JunkFilter;
 
 /*
+ * OnConflictSetState
+ *
+ * Executor state of an ON CONFLICT DO UPDATE operation.
+ */
+typedef struct OnConflictSetState
+{
+	NodeTag		type;
+
+	ProjectionInfo *oc_ProjInfo;	/* for ON CONFLICT DO UPDATE SET */
+	TupleDesc	oc_ProjTupdesc; /* TupleDesc for the above projection */
+	ExprState  *oc_WhereClause; /* state for the WHERE clause */
+} OnConflictSetState;
+
+/*
  * ResultRelInfo
  *
  * Whenever we update an existing relation, we have to update indexes on the
@@ -424,11 +438,11 @@ typedef struct ResultRelInfo
 	/* for computing a RETURNING list */
 	ProjectionInfo *ri_projectReturning;
 
-	/* for computing ON CONFLICT DO UPDATE SET */
-	ProjectionInfo *ri_onConflictSetProj;
+	/* list of arbiter indexes to use to check conflicts */
+	List	   *ri_onConflictArbiterIndexes;
 
-	/* list of ON CONFLICT DO UPDATE exprs (qual) */
-	ExprState  *ri_onConflictSetWhere;
+	/* ON CONFLICT evaluation state */
+	OnConflictSetState *ri_onConflict;
 
 	/* partition check expression */
 	List	   *ri_PartitionCheck;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..443de22704 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -34,6 +34,7 @@ typedef enum NodeTag
 	T_ExprContext,
 	T_ProjectionInfo,
 	T_JunkFilter,
+	T_OnConflictSetState,
 	T_ResultRelInfo,
 	T_EState,
 	T_TupleTableSlot,
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2650faedee..2d7061fa1b 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -786,16 +786,102 @@ select * from selfconflict;
 (3 rows)
 
 drop table selfconflict;
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
-ERROR:  ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test"
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 2 | b
+(1 row)
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 3 | b
+(1 row)
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 4 | b
+(1 row)
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 5 | b
+(1 row)
+
+-- test with multiple rows
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (1, 'a'), (2, 'a'), (4, 'a') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+insert into parted_conflict_test (a, b) values (1, 'b'), (2, 'c'), (4, 'b') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+-- should see (1, 'b'), (2, 'a'), (4, 'b')
+select * from parted_conflict_test order by a;
+ a | b 
+---+---
+ 1 | b
+ 2 | a
+ 4 | b
+(3 rows)
+
 drop table parted_conflict_test;
+-- test behavior of inserting a conflicting tuple into an intermediate
+-- partitioning level
+create table parted_conflict (a int primary key, b text) partition by range (a);
+create table parted_conflict_1 partition of parted_conflict for values from (0) to (1000) partition by range (a);
+create table parted_conflict_1_1 partition of parted_conflict_1 for values from (0) to (500);
+insert into parted_conflict values (40, 'forty');
+insert into parted_conflict_1 values (40, 'cuarenta')
+  on conflict (a) do update set b = excluded.b;
+drop table parted_conflict;
+-- same thing, but this time try to use an index that's created not in the
+-- partition
+create table parted_conflict (a int, b text) partition by range (a);
+create table parted_conflict_1 partition of parted_conflict for values from (0) to (1000) partition by range (a);
+create unique index on only parted_conflict_1 (a);
+create unique index on only parted_conflict (a);
+alter index parted_conflict_a_idx attach partition parted_conflict_1_a_idx;
+create table parted_conflict_1_1 partition of parted_conflict_1 for values from (0) to (500);
+insert into parted_conflict values (40, 'forty');
+insert into parted_conflict_1 values (40, 'cuarenta')
+  on conflict (a) do update set b = excluded.b;
+ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+drop table parted_conflict;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 53e7ae41ba..f534d0db18 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2624,6 +2624,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
 NOTICE:  trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
 NOTICE:  trigger = my_table_insert_trig, new table = <NULL>
 --
+-- now using a partitioned table
+--
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL>
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB)
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (1,AAA), (2,BBB), new table = (1,AAA:AAA), (2,BBB:BBB)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD)
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+NOTICE:  trigger = iocdu_tt_parted_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD)
+NOTICE:  trigger = iocdu_tt_parted_insert_trig, new table = <NULL>
+drop table iocdu_tt_parted;
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 32c647e3f8..6c50fd61eb 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -472,15 +472,90 @@ select * from selfconflict;
 
 drop table selfconflict;
 
--- check that the following works:
--- insert into partitioned_table on conflict do nothing
-create table parted_conflict_test (a int, b char) partition by list (a);
-create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1);
+-- check ON CONFLICT handling with partitioned tables
+create table parted_conflict_test (a int unique, b char) partition by list (a);
+create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2);
+
+-- no indexes required here
 insert into parted_conflict_test values (1, 'a') on conflict do nothing;
-insert into parted_conflict_test values (1, 'a') on conflict do nothing;
--- however, on conflict do update is not supported yet
-insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a;
--- but it works OK if we target the partition directly
-insert into parted_conflict_test_1 values (1) on conflict (b) do
-update set a = excluded.a;
+
+-- index on a required, which does exist in parent
+insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing;
+insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b;
+
+-- index on b required, which doesn't exist in parent
+insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- targeting partition directly will work
+insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a;
+
+-- should see (2, 'b')
+select * from parted_conflict_test order by a;
+
+-- now check that DO UPDATE works correctly for target partition with
+-- different attribute numbers
+create table parted_conflict_test_2 (b char, a int unique);
+alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3);
+truncate parted_conflict_test;
+insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b;
+
+-- should see (3, 'b')
+select * from parted_conflict_test order by a;
+
+-- case where parent will have a dropped column, but the partition won't
+alter table parted_conflict_test drop b, add b char;
+create table parted_conflict_test_3 partition of parted_conflict_test for values in (4);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (4, 'b')
+select * from parted_conflict_test order by a;
+
+-- case with multi-level partitioning
+create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a);
+create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5);
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b;
+insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a';
+
+-- should see (5, 'b')
+select * from parted_conflict_test order by a;
+
+-- test with multiple rows
+truncate parted_conflict_test;
+insert into parted_conflict_test (a, b) values (1, 'a'), (2, 'a'), (4, 'a') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+insert into parted_conflict_test (a, b) values (1, 'b'), (2, 'c'), (4, 'b') on conflict (a) do update set b = excluded.b where excluded.b = 'b';
+
+-- should see (1, 'b'), (2, 'a'), (4, 'b')
+select * from parted_conflict_test order by a;
+
 drop table parted_conflict_test;
+
+-- test behavior of inserting a conflicting tuple into an intermediate
+-- partitioning level
+create table parted_conflict (a int primary key, b text) partition by range (a);
+create table parted_conflict_1 partition of parted_conflict for values from (0) to (1000) partition by range (a);
+create table parted_conflict_1_1 partition of parted_conflict_1 for values from (0) to (500);
+insert into parted_conflict values (40, 'forty');
+insert into parted_conflict_1 values (40, 'cuarenta')
+  on conflict (a) do update set b = excluded.b;
+drop table parted_conflict;
+
+-- same thing, but this time try to use an index that's created not in the
+-- partition
+create table parted_conflict (a int, b text) partition by range (a);
+create table parted_conflict_1 partition of parted_conflict for values from (0) to (1000) partition by range (a);
+create unique index on only parted_conflict_1 (a);
+create unique index on only parted_conflict (a);
+alter index parted_conflict_a_idx attach partition parted_conflict_1_a_idx;
+create table parted_conflict_1_1 partition of parted_conflict_1 for values from (0) to (500);
+insert into parted_conflict values (40, 'forty');
+insert into parted_conflict_1 values (40, 'cuarenta')
+  on conflict (a) do update set b = excluded.b;
+drop table parted_conflict;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 8be893bd1e..9d3e0ef707 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1983,6 +1983,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD')
   update set b = my_table.b || ':' || excluded.b;
 
 --
+-- now using a partitioned table
+--
+
+create table iocdu_tt_parted (a int primary key, b text) partition by list (a);
+create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1);
+create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2);
+create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3);
+create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4);
+create trigger iocdu_tt_parted_insert_trig
+  after insert on iocdu_tt_parted referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger iocdu_tt_parted_update_trig
+  after update on iocdu_tt_parted referencing old table as old_table new table as new_table
+  for each statement execute procedure dump_update();
+
+-- inserts only
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- mixture of inserts and updates
+insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+-- updates only
+insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD')
+  on conflict (a) do
+  update set b = iocdu_tt_parted.b || ':' || excluded.b;
+
+drop table iocdu_tt_parted;
+
+--
 -- Verify that you can't create a trigger with transition tables for
 -- more than one event.
 --
-- 
2.11.0

#35Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#34)
Re: ON CONFLICT DO UPDATE for partitioned tables

Pushed now.

Amit Langote wrote:

On 2018/03/24 9:23, Alvaro Herrera wrote:

To fix this, I had to completely rework the "get partition parent root"
stuff into "get list of ancestors of this partition".

I wondered if a is_partition_ancestor(partrelid, ancestorid) isn't enough
instead of creating a list of ancestors and then looping over it as you've
done, but maybe what you have here is fine.

Yeah, I wondered about doing it that way too (since you can stop looking
early), but decided that I didn't like repeatedly opening pg_inherits
for each level. Anyway the most common case is a single level, and in
rare cases two levels ... I don't think we're going to see much more
than that. So it doesn't matter too much. We can refine later anyway,
if this becomes a hot spot (I doubt it TBH).

* General code style improvements, comment rewording, etc.

There was one comment in Fujita-san's review he posted on Friday [1] that
doesn't seem to be addressed in v10, which I think we probably should. It
was this comment:

"ExecBuildProjectionInfo is called without setting the tuple descriptor of
mtstate->mt_conflproj to tupDesc. That might work at least for now, but I
think it's a good thing to set it appropriately to make that future proof."

All of his other comments seem to have been taken care of in v10. I have
fixed the above one in the attached updated version.

I was of two minds about this item myself; we don't use the tupdesc for
anything at that point and I expect more things would break if we
required that. But I don't think it hurts, so I kept it.

The one thing I wasn't terribly in love with is the four calls to
map_partition_varattnos(), creating the attribute map four times ... but
we already have it in the TupleConversionMap, no? Looks like we could
save a bunch of work there.

And a final item is: can we have a whole-row expression in the clauses?
We currently don't handle those either, not even to throw an error.
[figures a test case] ... and now that I test it, it does crash!

create table part (a int primary key, b text) partition by range (a);
create table part1 (b text, a int not null);
alter table part attach partition part1 for values from (1) to (1000);
insert into part values (1, 'two') on conflict (a)
do update set b = format('%s (was %s)', excluded.b, part.b)
where part.* *<> (1, text 'two');

I think this means we should definitely handle found_whole_row. (If you
create part1 in the normal way, it works as you'd expect.)

I'm going to close a few other things first, then come back to this.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#36Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#35)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/26 23:20, Alvaro Herrera wrote:

Pushed now.

Thank you!

Amit Langote wrote:

On 2018/03/24 9:23, Alvaro Herrera wrote:

To fix this, I had to completely rework the "get partition parent root"
stuff into "get list of ancestors of this partition".

I wondered if a is_partition_ancestor(partrelid, ancestorid) isn't enough
instead of creating a list of ancestors and then looping over it as you've
done, but maybe what you have here is fine.

Yeah, I wondered about doing it that way too (since you can stop looking
early), but decided that I didn't like repeatedly opening pg_inherits
for each level. Anyway the most common case is a single level, and in
rare cases two levels ... I don't think we're going to see much more
than that. So it doesn't matter too much. We can refine later anyway,
if this becomes a hot spot (I doubt it TBH).

Yes, I suppose.

* General code style improvements, comment rewording, etc.

There was one comment in Fujita-san's review he posted on Friday [1] that
doesn't seem to be addressed in v10, which I think we probably should. It
was this comment:

"ExecBuildProjectionInfo is called without setting the tuple descriptor of
mtstate->mt_conflproj to tupDesc. That might work at least for now, but I
think it's a good thing to set it appropriately to make that future proof."

All of his other comments seem to have been taken care of in v10. I have
fixed the above one in the attached updated version.

I was of two minds about this item myself; we don't use the tupdesc for
anything at that point and I expect more things would break if we
required that. But I don't think it hurts, so I kept it.

The one thing I wasn't terribly in love with is the four calls to
map_partition_varattnos(), creating the attribute map four times ... but
we already have it in the TupleConversionMap, no? Looks like we could
save a bunch of work there.

Hmm, actually we can't use that map, assuming you're talking about the
following map:

TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];

We can only use that to tell if we need converting expressions (as we
currently do), but it cannot be used to actually convert the expressions.
The map in question is for use by do_convert_tuple(), not to map varattnos
in Vars using map_variable_attnos().

But it's definitely a bit undesirable to have various
map_partition_varattnos() calls within ExecInitPartitionInfo() to
initialize the same information (the map) multiple times.

And a final item is: can we have a whole-row expression in the clauses?
We currently don't handle those either, not even to throw an error.
[figures a test case] ... and now that I test it, it does crash!

create table part (a int primary key, b text) partition by range (a);
create table part1 (b text, a int not null);
alter table part attach partition part1 for values from (1) to (1000);
insert into part values (1, 'two') on conflict (a)
do update set b = format('%s (was %s)', excluded.b, part.b)
where part.* *<> (1, text 'two');

I think this means we should definitely handle found_whole_row. (If you
create part1 in the normal way, it works as you'd expect.)

I agree. That means we simply remove the Assert after the
map_partition_varattnos call.

I'm going to close a few other things first, then come back to this.

Attached find a patch to fix the whole-row expression issue. I added your
test to insert_conflict.sql.

Thanks,
Amit

Attachments:

partition-on-conflict-wholerow-fix.patchtext/plain; charset=UTF-8; name=partition-on-conflict-wholerow-fix.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 9a13188649..f1a972e235 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -557,7 +557,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 			{
 				List	   *onconflset;
 				TupleDesc	tupDesc;
-				bool		found_whole_row;
 
 				leaf_part_rri->ri_onConflict = makeNode(OnConflictSetState);
 
@@ -571,12 +570,10 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 				onconflset = (List *) copyObject((Node *) node->onConflictSet);
 				onconflset =
 					map_partition_varattnos(onconflset, INNER_VAR, partrel,
-											firstResultRel, &found_whole_row);
-				Assert(!found_whole_row);
+											firstResultRel, NULL);
 				onconflset =
 					map_partition_varattnos(onconflset, firstVarno, partrel,
-											firstResultRel, &found_whole_row);
-				Assert(!found_whole_row);
+											firstResultRel, NULL);
 
 				/* Finally, adjust this tlist to match the partition. */
 				onconflset = adjust_partition_tlist(onconflset, map);
@@ -609,12 +606,10 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 					clause = copyObject((List *) node->onConflictWhere);
 					clause = map_partition_varattnos(clause, INNER_VAR,
 													 partrel, firstResultRel,
-													 &found_whole_row);
-					Assert(!found_whole_row);
+													 NULL);
 					clause = map_partition_varattnos(clause, firstVarno,
 													 partrel, firstResultRel,
-													 &found_whole_row);
-					Assert(!found_whole_row);
+													 NULL);
 					leaf_part_rri->ri_onConflict->oc_WhereClause =
 						ExecInitQual((List *) clause, &mtstate->ps);
 				}
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2d7061fa1b..66ca1839bc 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -884,4 +884,20 @@ insert into parted_conflict values (40, 'forty');
 insert into parted_conflict_1 values (40, 'cuarenta')
   on conflict (a) do update set b = excluded.b;
 ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- test whole-row Vars in ON CONFLICT expressions
+create unique index on parted_conflict (a, b);
+alter table parted_conflict add c int;
+truncate parted_conflict;
+insert into parted_conflict values (50, 'cuarenta', 1);
+insert into parted_conflict values (50, 'cuarenta', 2)
+  on conflict (a, b) do update set (a, b, c) = row(excluded.*)
+  where parted_conflict = (50, text 'cuarenta', 1) and
+        excluded = (50, text 'cuarenta', 2);
+-- should see (50, 'cuarenta', 2)
+select * from parted_conflict order by a;
+ a  |    b     | c 
+----+----------+---
+ 50 | cuarenta | 2
+(1 row)
+
 drop table parted_conflict;
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 6c50fd61eb..fb30530a54 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -558,4 +558,18 @@ create table parted_conflict_1_1 partition of parted_conflict_1 for values from
 insert into parted_conflict values (40, 'forty');
 insert into parted_conflict_1 values (40, 'cuarenta')
   on conflict (a) do update set b = excluded.b;
+
+-- test whole-row Vars in ON CONFLICT expressions
+create unique index on parted_conflict (a, b);
+alter table parted_conflict add c int;
+truncate parted_conflict;
+insert into parted_conflict values (50, 'cuarenta', 1);
+insert into parted_conflict values (50, 'cuarenta', 2)
+  on conflict (a, b) do update set (a, b, c) = row(excluded.*)
+  where parted_conflict = (50, text 'cuarenta', 1) and
+        excluded = (50, text 'cuarenta', 2);
+
+-- should see (50, 'cuarenta', 2)
+select * from parted_conflict order by a;
+
 drop table parted_conflict;
#37Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#36)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/03/27 13:27, Amit Langote wrote:

On 2018/03/26 23:20, Alvaro Herrera wrote:

The one thing I wasn't terribly in love with is the four calls to
map_partition_varattnos(), creating the attribute map four times ... but
we already have it in the TupleConversionMap, no? Looks like we could
save a bunch of work there.

Hmm, actually we can't use that map, assuming you're talking about the
following map:

TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];

We can only use that to tell if we need converting expressions (as we
currently do), but it cannot be used to actually convert the expressions.
The map in question is for use by do_convert_tuple(), not to map varattnos
in Vars using map_variable_attnos().

But it's definitely a bit undesirable to have various
map_partition_varattnos() calls within ExecInitPartitionInfo() to
initialize the same information (the map) multiple times.

I will try to think of doing something about this later this week.

And a final item is: can we have a whole-row expression in the clauses?
We currently don't handle those either, not even to throw an error.
[figures a test case] ... and now that I test it, it does crash!

create table part (a int primary key, b text) partition by range (a);
create table part1 (b text, a int not null);
alter table part attach partition part1 for values from (1) to (1000);
insert into part values (1, 'two') on conflict (a)
do update set b = format('%s (was %s)', excluded.b, part.b)
where part.* *<> (1, text 'two');

I think this means we should definitely handle found_whole_row. (If you
create part1 in the normal way, it works as you'd expect.)

I agree. That means we simply remove the Assert after the
map_partition_varattnos call.

I'm going to close a few other things first, then come back to this.

Attached find a patch to fix the whole-row expression issue. I added your
test to insert_conflict.sql.

Adding this to the open items list.

Thanks,
Amit

#38Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#37)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/04/10 11:56, Amit Langote wrote:

On 2018/03/27 13:27, Amit Langote wrote:

On 2018/03/26 23:20, Alvaro Herrera wrote:

The one thing I wasn't terribly in love with is the four calls to
map_partition_varattnos(), creating the attribute map four times ... but
we already have it in the TupleConversionMap, no? Looks like we could
save a bunch of work there.

Hmm, actually we can't use that map, assuming you're talking about the
following map:

TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];

We can only use that to tell if we need converting expressions (as we
currently do), but it cannot be used to actually convert the expressions.
The map in question is for use by do_convert_tuple(), not to map varattnos
in Vars using map_variable_attnos().

But it's definitely a bit undesirable to have various
map_partition_varattnos() calls within ExecInitPartitionInfo() to
initialize the same information (the map) multiple times.

I will try to think of doing something about this later this week.

The solution I came up with is to call map_variable_attnos() directly,
instead of going through map_partition_varattnos() every time, after first
creating the attribute map ourselves.

And a final item is: can we have a whole-row expression in the clauses?
We currently don't handle those either, not even to throw an error.
[figures a test case] ... and now that I test it, it does crash!

create table part (a int primary key, b text) partition by range (a);
create table part1 (b text, a int not null);
alter table part attach partition part1 for values from (1) to (1000);
insert into part values (1, 'two') on conflict (a)
do update set b = format('%s (was %s)', excluded.b, part.b)
where part.* *<> (1, text 'two');

I think this means we should definitely handle found_whole_row. (If you
create part1 in the normal way, it works as you'd expect.)

I agree. That means we simply remove the Assert after the
map_partition_varattnos call.

I'm going to close a few other things first, then come back to this.

Attached find a patch to fix the whole-row expression issue. I added your
test to insert_conflict.sql.

Combined the above patch into the attached patch.

Thanks,
Amit

Attachments:

v2-0001-Couple-of-fixes-for-ExecInitPartitionInfo.patchtext/plain; charset=UTF-8; name=v2-0001-Couple-of-fixes-for-ExecInitPartitionInfo.patchDownload
From a90decd69a42bebdb6e07c8268686c0500f8c48e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 16 Apr 2018 17:31:42 +0900
Subject: [PATCH v2] Couple of fixes for ExecInitPartitionInfo

First, avoid repeated calling of map_partition_varattnos.  To do that,
generate the rootrel -> partrel attribute conversion map ourselves
just once and call map_variable_attnos() directly with it.

Second, support conversion of whole-row variables that appear in
ON CONFLICT expressions.  Add relevant test.
---
 src/backend/executor/execPartition.c          | 88 ++++++++++++++++++++-------
 src/test/regress/expected/insert_conflict.out | 16 +++++
 src/test/regress/sql/insert_conflict.sql      | 14 +++++
 3 files changed, 97 insertions(+), 21 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 218645d43b..1727e111bb 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
 #include "nodes/makefuncs.h"
 #include "partitioning/partbounds.h"
 #include "partitioning/partprune.h"
+#include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
 #include "utils/partcache.h"
 #include "utils/rel.h"
@@ -309,6 +310,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 				partrel;
 	ResultRelInfo *leaf_part_rri;
 	MemoryContext oldContext;
+	AttrNumber *part_attnos = NULL;
+	bool		found_whole_row;
 
 	/*
 	 * We locked all the partitions in ExecSetupPartitionTupleRouting
@@ -397,8 +400,19 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		/*
 		 * Convert Vars in it to contain this partition's attribute numbers.
 		 */
-		wcoList = map_partition_varattnos(wcoList, firstVarno,
-										  partrel, firstResultRel, NULL);
+		part_attnos =
+			convert_tuples_by_name_map(RelationGetDescr(partrel),
+									   RelationGetDescr(firstResultRel),
+									   gettext_noop("could not convert row type"));
+		wcoList = (List *)
+				map_variable_attnos((Node *) wcoList,
+									firstVarno, 0,
+									part_attnos,
+									RelationGetDescr(firstResultRel)->natts,
+									RelationGetForm(partrel)->reltype,
+									&found_whole_row);
+		/* We ignore the value of found_whole_row. */
+
 		foreach(ll, wcoList)
 		{
 			WithCheckOption *wco = castNode(WithCheckOption, lfirst(ll));
@@ -446,9 +460,20 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		/*
 		 * Convert Vars in it to contain this partition's attribute numbers.
 		 */
-		returningList = map_partition_varattnos(returningList, firstVarno,
-												partrel, firstResultRel,
-												NULL);
+		if (part_attnos == NULL)
+			part_attnos =
+				convert_tuples_by_name_map(RelationGetDescr(partrel),
+										   RelationGetDescr(firstResultRel),
+										   gettext_noop("could not convert row type"));
+		returningList = (List *)
+				map_variable_attnos((Node *) returningList,
+									firstVarno, 0,
+									part_attnos,
+									RelationGetDescr(firstResultRel)->natts,
+									RelationGetForm(partrel)->reltype,
+									&found_whole_row);
+		/* We ignore the value of found_whole_row. */
+
 		leaf_part_rri->ri_returningList = returningList;
 
 		/*
@@ -549,14 +574,27 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 				 * target relation (firstVarno).
 				 */
 				onconflset = (List *) copyObject((Node *) node->onConflictSet);
-				onconflset =
-					map_partition_varattnos(onconflset, INNER_VAR, partrel,
-											firstResultRel, &found_whole_row);
-				Assert(!found_whole_row);
-				onconflset =
-					map_partition_varattnos(onconflset, firstVarno, partrel,
-											firstResultRel, &found_whole_row);
-				Assert(!found_whole_row);
+				if (part_attnos == NULL)
+					part_attnos =
+						convert_tuples_by_name_map(RelationGetDescr(partrel),
+												   RelationGetDescr(firstResultRel),
+												   gettext_noop("could not convert row type"));
+				onconflset = (List *)
+					map_variable_attnos((Node *) onconflset,
+										INNER_VAR, 0,
+										part_attnos,
+										RelationGetDescr(firstResultRel)->natts,
+										RelationGetForm(partrel)->reltype,
+										&found_whole_row);
+				/* We ignore the value of found_whole_row. */
+				onconflset = (List *)
+					map_variable_attnos((Node *) onconflset,
+										firstVarno, 0,
+										part_attnos,
+										RelationGetDescr(firstResultRel)->natts,
+										RelationGetForm(partrel)->reltype,
+										&found_whole_row);
+				/* We ignore the value of found_whole_row. */
 
 				/* Finally, adjust this tlist to match the partition. */
 				onconflset = adjust_partition_tlist(onconflset, map);
@@ -587,14 +625,22 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 					List	   *clause;
 
 					clause = copyObject((List *) node->onConflictWhere);
-					clause = map_partition_varattnos(clause, INNER_VAR,
-													 partrel, firstResultRel,
-													 &found_whole_row);
-					Assert(!found_whole_row);
-					clause = map_partition_varattnos(clause, firstVarno,
-													 partrel, firstResultRel,
-													 &found_whole_row);
-					Assert(!found_whole_row);
+					clause = (List *)
+						map_variable_attnos((Node *) clause,
+											INNER_VAR, 0,
+											part_attnos,
+											RelationGetDescr(firstResultRel)->natts,
+											RelationGetForm(partrel)->reltype,
+											&found_whole_row);
+					/* We ignore the value of found_whole_row. */
+					clause = (List *)
+						map_variable_attnos((Node *) clause,
+											firstVarno, 0,
+											part_attnos,
+											RelationGetDescr(firstResultRel)->natts,
+											RelationGetForm(partrel)->reltype,
+											&found_whole_row);
+					/* We ignore the value of found_whole_row. */
 					leaf_part_rri->ri_onConflict->oc_WhereClause =
 						ExecInitQual((List *) clause, &mtstate->ps);
 				}
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 2d7061fa1b..66ca1839bc 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -884,4 +884,20 @@ insert into parted_conflict values (40, 'forty');
 insert into parted_conflict_1 values (40, 'cuarenta')
   on conflict (a) do update set b = excluded.b;
 ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT specification
+-- test whole-row Vars in ON CONFLICT expressions
+create unique index on parted_conflict (a, b);
+alter table parted_conflict add c int;
+truncate parted_conflict;
+insert into parted_conflict values (50, 'cuarenta', 1);
+insert into parted_conflict values (50, 'cuarenta', 2)
+  on conflict (a, b) do update set (a, b, c) = row(excluded.*)
+  where parted_conflict = (50, text 'cuarenta', 1) and
+        excluded = (50, text 'cuarenta', 2);
+-- should see (50, 'cuarenta', 2)
+select * from parted_conflict order by a;
+ a  |    b     | c 
+----+----------+---
+ 50 | cuarenta | 2
+(1 row)
+
 drop table parted_conflict;
diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql
index 6c50fd61eb..fb30530a54 100644
--- a/src/test/regress/sql/insert_conflict.sql
+++ b/src/test/regress/sql/insert_conflict.sql
@@ -558,4 +558,18 @@ create table parted_conflict_1_1 partition of parted_conflict_1 for values from
 insert into parted_conflict values (40, 'forty');
 insert into parted_conflict_1 values (40, 'cuarenta')
   on conflict (a) do update set b = excluded.b;
+
+-- test whole-row Vars in ON CONFLICT expressions
+create unique index on parted_conflict (a, b);
+alter table parted_conflict add c int;
+truncate parted_conflict;
+insert into parted_conflict values (50, 'cuarenta', 1);
+insert into parted_conflict values (50, 'cuarenta', 2)
+  on conflict (a, b) do update set (a, b, c) = row(excluded.*)
+  where parted_conflict = (50, text 'cuarenta', 1) and
+        excluded = (50, text 'cuarenta', 2);
+
+-- should see (50, 'cuarenta', 2)
+select * from parted_conflict order by a;
+
 drop table parted_conflict;
-- 
2.11.0

#39Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#38)
Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote wrote:

The solution I came up with is to call map_variable_attnos() directly,
instead of going through map_partition_varattnos() every time, after first
creating the attribute map ourselves.

Yeah, sounds good. I added a tweak: if the tupledescs are equal, there
should be no need to do any mapping.

(Minor adjustment to the test: "cuarenta" means forty, so I changed the
new test to say "cincuenta" instead, which means fifty).

Pushed now, thanks.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#39)
Re: ON CONFLICT DO UPDATE for partitioned tables

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Pushed now, thanks.

Buildfarm doesn't like this even a little bit.

regards, tom lane

#41Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#39)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/04/17 4:10, Alvaro Herrera wrote:

Amit Langote wrote:

The solution I came up with is to call map_variable_attnos() directly,
instead of going through map_partition_varattnos() every time, after first
creating the attribute map ourselves.

Yeah, sounds good. I added a tweak: if the tupledescs are equal, there
should be no need to do any mapping.

Thanks for the commit!

About the equalTupleDescs()-based optimization you added -- It seems to me
that that *always* returns false if you pass it tupledescs of two
different tables, which in this case, we do. That's because
equalTupleDescs has this:

if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;

So, it fails to optimize as you were hoping it would.

Instead of doing this, I think we should try to make
convert_tuples_by_name_map() a bit smarter by integrating the logic in
convert_tuples_by_name() that's used conclude if no tuple conversion is
necessary. So, if it turns that the tuples descriptors passed to
convert_tuples_by_name_map() contain the same number of attributes and the
individual attributes are at the same positions, we signal to the caller
that no conversion is necessary by returning NULL.

Attached find a patch that does that. When working on this, I noticed
that when recursing for inheritance children, ATPrepAlterColumnType()
would use a AlterTableCmd (cmd) that's already scribbled on as if it were
the original.

Thanks,
Amit

Attachments:

v1-0001-Optimize-convert_tuples_by_name_map-usage-a-bit.patchtext/plain; charset=UTF-8; name=v1-0001-Optimize-convert_tuples_by_name_map-usage-a-bit.patchDownload
From ec390e3d5d25e53d39d0be30961e9e272a5cf88e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 17 Apr 2018 15:51:42 +0900
Subject: [PATCH v1] Optimize convert_tuples_by_name_map usage a bit

---
 src/backend/access/common/tupconvert.c |  74 +++++++-------------
 src/backend/catalog/index.c            |  31 +++++----
 src/backend/catalog/partition.c        |  13 ++--
 src/backend/catalog/pg_constraint.c    |   5 +-
 src/backend/commands/indexcmds.c       |  14 ++--
 src/backend/commands/tablecmds.c       |  32 +++++----
 src/backend/executor/execPartition.c   | 121 ++++++++++++++-------------------
 src/backend/parser/parse_utilcmd.c     |   9 +--
 8 files changed, 139 insertions(+), 160 deletions(-)

diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c
index 2d0d2f4b32..6f3de2d8f2 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -214,57 +214,20 @@ convert_tuples_by_name(TupleDesc indesc,
 	TupleConversionMap *map;
 	AttrNumber *attrMap;
 	int			n = outdesc->natts;
-	int			i;
-	bool		same;
 
 	/* Verify compatibility and prepare attribute-number map */
 	attrMap = convert_tuples_by_name_map(indesc, outdesc, msg);
 
 	/*
-	 * Check to see if the map is one-to-one, in which case we need not do a
-	 * tuple conversion.  We must also insist that both tupdescs either
-	 * specify or don't specify an OID column, else we need a conversion to
-	 * add/remove space for that.  (For some callers, presence or absence of
-	 * an OID column perhaps would not really matter, but let's be safe.)
+	 * If attributes are at the same positions in the input and output
+	 * descriptors, there is no need for tuple conversion.  Also, we must also
+	 * insist that both tupdescs either specify or don't specify an OID column,
+	 * else we need a conversion to add/remove space for that.  (For some
+	 * callers, presence or absence of an OID column perhaps would not really
+	 * matter, but let's be safe.)
 	 */
-	if (indesc->natts == outdesc->natts &&
-		indesc->tdhasoid == outdesc->tdhasoid)
-	{
-		same = true;
-		for (i = 0; i < n; i++)
-		{
-			Form_pg_attribute inatt;
-			Form_pg_attribute outatt;
-
-			if (attrMap[i] == (i + 1))
-				continue;
-
-			/*
-			 * If it's a dropped column and the corresponding input column is
-			 * also dropped, we needn't convert.  However, attlen and attalign
-			 * must agree.
-			 */
-			inatt = TupleDescAttr(indesc, i);
-			outatt = TupleDescAttr(outdesc, i);
-			if (attrMap[i] == 0 &&
-				inatt->attisdropped &&
-				inatt->attlen == outatt->attlen &&
-				inatt->attalign == outatt->attalign)
-				continue;
-
-			same = false;
-			break;
-		}
-	}
-	else
-		same = false;
-
-	if (same)
-	{
-		/* Runtime conversion is not needed */
-		pfree(attrMap);
+	if (attrMap == NULL && indesc->tdhasoid == outdesc->tdhasoid)
 		return NULL;
-	}
 
 	/* Prepare the map structure */
 	map = (TupleConversionMap *) palloc(sizeof(TupleConversionMap));
@@ -285,9 +248,10 @@ convert_tuples_by_name(TupleDesc indesc,
 
 /*
  * Return a palloc'd bare attribute map for tuple conversion, matching input
- * and output columns by name.  (Dropped columns are ignored in both input and
- * output.)  This is normally a subroutine for convert_tuples_by_name, but can
- * be used standalone.
+ * and output columns by name or NULL if the attributes appear at the same
+ * positions in input and output (Dropped columns are ignored in both input
+ * and output.)  This is normally a subroutine for convert_tuples_by_name, but
+ * can be used standalone.
  */
 AttrNumber *
 convert_tuples_by_name_map(TupleDesc indesc,
@@ -297,12 +261,13 @@ convert_tuples_by_name_map(TupleDesc indesc,
 	AttrNumber *attrMap;
 	int			n;
 	int			i;
+	bool		all_attrpos_same = true;
 
 	n = outdesc->natts;
 	attrMap = (AttrNumber *) palloc0(n * sizeof(AttrNumber));
 	for (i = 0; i < n; i++)
 	{
-		Form_pg_attribute outatt = TupleDescAttr(outdesc, i);
+		Form_pg_attribute outatt= TupleDescAttr(outdesc, i);
 		char	   *attname;
 		Oid			atttypid;
 		int32		atttypmod;
@@ -331,6 +296,8 @@ convert_tuples_by_name_map(TupleDesc indesc,
 									   format_type_be(outdesc->tdtypeid),
 									   format_type_be(indesc->tdtypeid))));
 				attrMap[i] = (AttrNumber) (j + 1);
+				if (i != j)
+					all_attrpos_same = false;
 				break;
 			}
 		}
@@ -344,6 +311,17 @@ convert_tuples_by_name_map(TupleDesc indesc,
 							   format_type_be(indesc->tdtypeid))));
 	}
 
+	/*
+	 * No need of mapping if both descriptors have the same number of
+	 * attributes and individual attributes all at the same positions in both
+	 * descriptors.
+	 */
+	if (all_attrpos_same && indesc->natts == outdesc->natts)
+	{
+		pfree(attrMap);
+		return NULL;
+	}
+
 	return attrMap;
 }
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index dec4265d68..ed0c2f6c9b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1848,13 +1848,18 @@ CompareIndexInfo(IndexInfo *info1, IndexInfo *info2,
 	 */
 	for (i = 0; i < info1->ii_NumIndexAttrs; i++)
 	{
-		if (maplen < info2->ii_IndexAttrNumbers[i])
+		AttrNumber	attnum1 = info1->ii_IndexAttrNumbers[i];
+		AttrNumber  attnum2;
+
+		if (attmap != NULL && maplen < info2->ii_IndexAttrNumbers[i])
 			elog(ERROR, "incorrect attribute map");
 
+		attnum2 = attmap != NULL
+						? attmap[info2->ii_IndexAttrNumbers[i] - 1]
+						: info2->ii_IndexAttrNumbers[i];
+
 		/* ignore expressions at this stage */
-		if ((info1->ii_IndexAttrNumbers[i] != InvalidAttrNumber) &&
-			(attmap[info2->ii_IndexAttrNumbers[i] - 1] !=
-			info1->ii_IndexAttrNumbers[i]))
+		if (attnum1 != InvalidAttrNumber && attnum2 != attnum1)
 			return false;
 
 		/* collation and opfamily is not valid for including columns */
@@ -1876,11 +1881,12 @@ CompareIndexInfo(IndexInfo *info1, IndexInfo *info2,
 	if (info1->ii_Expressions != NIL)
 	{
 		bool	found_whole_row;
-		Node   *mapped;
+		Node   *mapped = (Node *) info2->ii_Expressions;
 
-		mapped = map_variable_attnos((Node *) info2->ii_Expressions,
-									 1, 0, attmap, maplen,
-									 InvalidOid, &found_whole_row);
+		if (attmap != NULL)
+			mapped = map_variable_attnos((Node *) info2->ii_Expressions,
+										 1, 0, attmap, maplen,
+										 InvalidOid, &found_whole_row);
 		if (found_whole_row)
 		{
 			/*
@@ -1900,11 +1906,12 @@ CompareIndexInfo(IndexInfo *info1, IndexInfo *info2,
 	if (info1->ii_Predicate != NULL)
 	{
 		bool	found_whole_row;
-		Node   *mapped;
+		Node   *mapped = (Node *) info2->ii_Predicate;
 
-		mapped = map_variable_attnos((Node *) info2->ii_Predicate,
-									 1, 0, attmap, maplen,
-									 InvalidOid, &found_whole_row);
+		if (attmap != NULL)
+			mapped = map_variable_attnos((Node *) info2->ii_Predicate,
+										 1, 0, attmap, maplen,
+										 InvalidOid, &found_whole_row);
 		if (found_whole_row)
 		{
 			/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index de801ad788..0db7d0d8e3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -178,12 +178,13 @@ map_partition_varattnos(List *expr, int fromrel_varno,
 		part_attnos = convert_tuples_by_name_map(RelationGetDescr(to_rel),
 												 RelationGetDescr(from_rel),
 												 gettext_noop("could not convert row type"));
-		expr = (List *) map_variable_attnos((Node *) expr,
-											fromrel_varno, 0,
-											part_attnos,
-											RelationGetDescr(from_rel)->natts,
-											RelationGetForm(to_rel)->reltype,
-											&my_found_whole_row);
+		if (part_attnos != NULL)
+			expr = (List *) map_variable_attnos((Node *) expr,
+												fromrel_varno, 0,
+												part_attnos,
+												RelationGetDescr(from_rel)->natts,
+												RelationGetForm(to_rel)->reltype,
+												&my_found_whole_row);
 	}
 
 	if (found_whole_row)
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index c5b5395791..46fe53cc8b 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -488,7 +488,7 @@ CloneForeignKeyConstraints(Oid parentId, Oid relationId, List **cloned)
 		memcpy(conkey, ARR_DATA_PTR(arr), nelem * sizeof(AttrNumber));
 
 		for (i = 0; i < nelem; i++)
-			mapped_conkey[i] = attmap[conkey[i] - 1];
+			mapped_conkey[i] = attmap ? attmap[conkey[i] - 1] : conkey[i];
 
 		datum = fastgetattr(tuple, Anum_pg_constraint_confkey,
 							tupdesc, &isnull);
@@ -621,7 +621,8 @@ CloneForeignKeyConstraints(Oid parentId, Oid relationId, List **cloned)
 	}
 	systable_endscan(scan);
 
-	pfree(attmap);
+	if (attmap)
+		pfree(attmap);
 
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f2dcc1c51f..797bb54d63 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -918,7 +918,7 @@ DefineIndex(Oid relationId,
 					convert_tuples_by_name_map(RelationGetDescr(childrel),
 											   parentDesc,
 											   gettext_noop("could not convert row type"));
-				maplen = parentDesc->natts;
+				maplen = attmap != NULL ? parentDesc->natts : 0;
 
 
 				foreach(cell, childidxs)
@@ -992,10 +992,11 @@ DefineIndex(Oid relationId,
 					IndexStmt  *childStmt = copyObject(stmt);
 					bool		found_whole_row;
 
-					childStmt->whereClause =
-						map_variable_attnos(stmt->whereClause, 1, 0,
-											attmap, maplen,
-											InvalidOid, &found_whole_row);
+					if (attmap != NULL)
+						childStmt->whereClause =
+							map_variable_attnos(stmt->whereClause, 1, 0,
+												attmap, maplen,
+												InvalidOid, &found_whole_row);
 					if (found_whole_row)
 						elog(ERROR, "cannot convert whole-row table reference");
 
@@ -1009,7 +1010,8 @@ DefineIndex(Oid relationId,
 								false, quiet);
 				}
 
-				pfree(attmap);
+				if (attmap)
+					pfree(attmap);
 			}
 
 			/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c1a9bda433..6f37dd153a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9220,6 +9220,7 @@ ATPrepAlterColumnType(List **wqueue,
 		{
 			Oid			childrelid = lfirst_oid(child);
 			Relation	childrel;
+			AlterTableCmd *child_cmd = cmd;
 
 			if (childrelid == relid)
 				continue;
@@ -9238,24 +9239,26 @@ ATPrepAlterColumnType(List **wqueue,
 				bool		found_whole_row;
 
 				/* create a copy to scribble on */
-				cmd = copyObject(cmd);
+				child_cmd = copyObject(cmd);
 
 				attmap = convert_tuples_by_name_map(RelationGetDescr(childrel),
 													RelationGetDescr(rel),
 													gettext_noop("could not convert row type"));
-				((ColumnDef *) cmd->def)->cooked_default =
-					map_variable_attnos(def->cooked_default,
-										1, 0,
-										attmap, RelationGetDescr(rel)->natts,
-										InvalidOid, &found_whole_row);
+				if (attmap != NULL)
+					((ColumnDef *) child_cmd->def)->cooked_default =
+						map_variable_attnos(def->cooked_default,
+											1, 0,
+											attmap, RelationGetDescr(rel)->natts,
+											InvalidOid, &found_whole_row);
 				if (found_whole_row)
 					ereport(ERROR,
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot convert whole-row table reference"),
 							 errdetail("USING expression contains a whole-row table reference.")));
-				pfree(attmap);
+				if (attmap)
+					pfree(attmap);
 			}
-			ATPrepCmd(wqueue, childrel, cmd, false, true, lockmode);
+			ATPrepCmd(wqueue, childrel, child_cmd, false, true, lockmode);
 			relation_close(childrel, NoLock);
 		}
 	}
@@ -14381,6 +14384,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 		Relation	idxRel = index_open(idx, AccessShareLock);
 		IndexInfo  *info;
 		AttrNumber *attmap;
+		int			maplen;
 		bool		found = false;
 		Oid			constraintOid;
 
@@ -14399,6 +14403,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 		attmap = convert_tuples_by_name_map(RelationGetDescr(attachrel),
 											RelationGetDescr(rel),
 											gettext_noop("could not convert row type"));
+		maplen = attmap != NULL ? RelationGetDescr(rel)->natts : 0;
 		constraintOid = get_relation_idx_constraint_oid(RelationGetRelid(rel), idx);
 
 		/*
@@ -14420,8 +14425,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 								 idxRel->rd_indcollation,
 								 attachrelIdxRels[i]->rd_opfamily,
 								 idxRel->rd_opfamily,
-								 attmap,
-								 RelationGetDescr(rel)->natts))
+								 attmap, maplen))
 			{
 				/*
 				 * If this index is being created in the parent because of a
@@ -14828,6 +14832,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 		IndexInfo  *childInfo;
 		IndexInfo  *parentInfo;
 		AttrNumber *attmap;
+		int			maplen;
 		bool		found;
 		int			i;
 		PartitionDesc partDesc;
@@ -14874,13 +14879,13 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 		attmap = convert_tuples_by_name_map(RelationGetDescr(partTbl),
 											RelationGetDescr(parentTbl),
 											gettext_noop("could not convert row type"));
+		maplen = attmap != NULL ? RelationGetDescr(partTbl)->natts : 0;
 		if (!CompareIndexInfo(childInfo, parentInfo,
 							  partIdx->rd_indcollation,
 							  parentIdx->rd_indcollation,
 							  partIdx->rd_opfamily,
 							  parentIdx->rd_opfamily,
-							  attmap,
-							  RelationGetDescr(partTbl)->natts))
+							  attmap, maplen))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("cannot attach index \"%s\" as a partition of index \"%s\"",
@@ -14917,7 +14922,8 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 			ConstraintSetParentConstraint(cldConstrId, constraintOid);
 		update_relispartition(NULL, partIdxId, true);
 
-		pfree(attmap);
+		if (attmap)
+			pfree(attmap);
 
 		validatePartitionedIndex(parentIdx, parentTbl);
 	}
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 23a74bc3d9..a077aee470 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -311,9 +311,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 	Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
 	ResultRelInfo *leaf_part_rri;
 	MemoryContext oldContext;
-	AttrNumber *part_attnos = NULL;
+	AttrNumber *attmap;
 	bool		found_whole_row;
-	bool		equalTupdescs;
 
 	/*
 	 * We locked all the partitions in ExecSetupPartitionTupleRouting
@@ -361,9 +360,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 						(node != NULL &&
 						 node->onConflictAction != ONCONFLICT_NONE));
 
-	/* if tuple descs are identical, we don't need to map the attrs */
-	equalTupdescs = equalTupleDescs(RelationGetDescr(partrel),
-									RelationGetDescr(firstResultRel));
+	/*
+	 * Get a attribute conversion map to convert expressions, which if NULL,
+	 * the original expressions can be used as is.
+	 */
+	attmap = convert_tuples_by_name_map(RelationGetDescr(partrel),
+										RelationGetDescr(firstResultRel),
+										gettext_noop("could not convert row type"));
 
 	/*
 	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
@@ -405,16 +408,12 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		/*
 		 * Convert Vars in it to contain this partition's attribute numbers.
 		 */
-		if (!equalTupdescs)
+		if (attmap != NULL)
 		{
-			part_attnos =
-				convert_tuples_by_name_map(RelationGetDescr(partrel),
-										   RelationGetDescr(firstResultRel),
-										   gettext_noop("could not convert row type"));
 			wcoList = (List *)
 				map_variable_attnos((Node *) wcoList,
 									firstVarno, 0,
-									part_attnos,
+									attmap,
 									RelationGetDescr(firstResultRel)->natts,
 									RelationGetForm(partrel)->reltype,
 									&found_whole_row);
@@ -464,25 +463,18 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		 */
 		returningList = linitial(node->returningLists);
 
-		if (!equalTupdescs)
-		{
-			/*
-			 * Convert Vars in it to contain this partition's attribute numbers.
-			 */
-			if (part_attnos == NULL)
-				part_attnos =
-					convert_tuples_by_name_map(RelationGetDescr(partrel),
-											   RelationGetDescr(firstResultRel),
-											   gettext_noop("could not convert row type"));
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		if (attmap != NULL)
 			returningList = (List *)
 				map_variable_attnos((Node *) returningList,
 									firstVarno, 0,
-									part_attnos,
+									attmap,
 									RelationGetDescr(firstResultRel)->natts,
 									RelationGetForm(partrel)->reltype,
 									&found_whole_row);
-			/* We ignore the value of found_whole_row. */
-		}
+		/* We ignore the value of found_whole_row. */
 
 		leaf_part_rri->ri_returningList = returningList;
 
@@ -565,7 +557,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 			 * CONFLICT SET state, skipping a bunch of work.  Otherwise, we
 			 * need to create state specific to this partition.
 			 */
-			if (map == NULL)
+			if (attmap == NULL)
 				leaf_part_rri->ri_onConflict = resultRelInfo->ri_onConflict;
 			else
 			{
@@ -583,33 +575,25 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 				 * target relation (firstVarno).
 				 */
 				onconflset = (List *) copyObject((Node *) node->onConflictSet);
-				if (!equalTupdescs)
-				{
-					if (part_attnos == NULL)
-						part_attnos =
-							convert_tuples_by_name_map(RelationGetDescr(partrel),
-													   RelationGetDescr(firstResultRel),
-													   gettext_noop("could not convert row type"));
-					onconflset = (List *)
-						map_variable_attnos((Node *) onconflset,
-											INNER_VAR, 0,
-											part_attnos,
-											RelationGetDescr(firstResultRel)->natts,
-											RelationGetForm(partrel)->reltype,
-											&found_whole_row);
-					/* We ignore the value of found_whole_row. */
-					onconflset = (List *)
-						map_variable_attnos((Node *) onconflset,
-											firstVarno, 0,
-											part_attnos,
-											RelationGetDescr(firstResultRel)->natts,
-											RelationGetForm(partrel)->reltype,
-											&found_whole_row);
-					/* We ignore the value of found_whole_row. */
+				onconflset = (List *)
+					map_variable_attnos((Node *) onconflset,
+										INNER_VAR, 0,
+										attmap,
+										RelationGetDescr(firstResultRel)->natts,
+										RelationGetForm(partrel)->reltype,
+										&found_whole_row);
+				/* We ignore the value of found_whole_row. */
+				onconflset = (List *)
+					map_variable_attnos((Node *) onconflset,
+										firstVarno, 0,
+										attmap,
+										RelationGetDescr(firstResultRel)->natts,
+										RelationGetForm(partrel)->reltype,
+										&found_whole_row);
+				/* We ignore the value of found_whole_row. */
 
-					/* Finally, adjust this tlist to match the partition. */
-					onconflset = adjust_partition_tlist(onconflset, map);
-				}
+				/* Finally, adjust this tlist to match the partition. */
+				onconflset = adjust_partition_tlist(onconflset, map);
 
 				/*
 				 * Build UPDATE SET's projection info.  The user of this
@@ -637,25 +621,22 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 					List	   *clause;
 
 					clause = copyObject((List *) node->onConflictWhere);
-					if (!equalTupdescs)
-					{
-						clause = (List *)
-							map_variable_attnos((Node *) clause,
-												INNER_VAR, 0,
-												part_attnos,
-												RelationGetDescr(firstResultRel)->natts,
-												RelationGetForm(partrel)->reltype,
-												&found_whole_row);
+					clause = (List *)
+						map_variable_attnos((Node *) clause,
+											INNER_VAR, 0,
+											attmap,
+											RelationGetDescr(firstResultRel)->natts,
+											RelationGetForm(partrel)->reltype,
+											&found_whole_row);
+					/* We ignore the value of found_whole_row. */
+					clause = (List *)
+						map_variable_attnos((Node *) clause,
+											firstVarno, 0,
+											attmap,
+											RelationGetDescr(firstResultRel)->natts,
+											RelationGetForm(partrel)->reltype,
+											&found_whole_row);
 						/* We ignore the value of found_whole_row. */
-						clause = (List *)
-							map_variable_attnos((Node *) clause,
-												firstVarno, 0,
-												part_attnos,
-												RelationGetDescr(firstResultRel)->natts,
-												RelationGetForm(partrel)->reltype,
-												&found_whole_row);
-						/* We ignore the value of found_whole_row. */
-					}
 					leaf_part_rri->ri_onConflict->oc_WhereClause =
 						ExecInitQual((List *) clause, &mtstate->ps);
 				}
@@ -663,6 +644,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		}
 	}
 
+	if (attmap)
+		pfree(attmap);
 	Assert(proute->partitions[partidx] == NULL);
 	proute->partitions[partidx] = leaf_part_rri;
 
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index c6f3628def..378331e216 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1504,10 +1504,11 @@ generateClonedIndexStmt(RangeVar *heapRel, Oid heapRelid, Relation source_idx,
 			indexpr_item = lnext(indexpr_item);
 
 			/* Adjust Vars to match new table's column numbering */
-			indexkey = map_variable_attnos(indexkey,
-										   1, 0,
-										   attmap, attmap_length,
-										   InvalidOid, &found_whole_row);
+			if (attmap != NULL)
+				indexkey = map_variable_attnos(indexkey,
+											   1, 0,
+											   attmap, attmap_length,
+											   InvalidOid, &found_whole_row);
 
 			/* As in transformTableLikeClause, reject whole-row variables */
 			if (found_whole_row)
-- 
2.11.0

#42Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#41)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/04/17 16:45, Amit Langote wrote:

Instead of doing this, I think we should try to make
convert_tuples_by_name_map() a bit smarter by integrating the logic in
convert_tuples_by_name() that's used conclude if no tuple conversion is
necessary. So, if it turns that the tuples descriptors passed to
convert_tuples_by_name_map() contain the same number of attributes and the
individual attributes are at the same positions, we signal to the caller
that no conversion is necessary by returning NULL.

Attached find a patch that does that.

I just confirmed my hunch that this wouldn't somehow do the right thing
when the OID system column is involved. Like this case:

create table parent (a int);
create table child () inherits (parent) with oids;
insert into parent values (1);
insert into child values (1);
analyze parent;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

That's because, convert_tuples_by_name() that's called by
acquire_inherited_sample_rows() gets a TupleConversionMap whose attrMap is
set to NULL. do_convert_tuple() may then try to access a member of such
NULL attrMap. In this case, even if parent and child tables have same
user attributes, patched convert_tuples_by_name_map would return NULL, but
since their hasoids setting doesn't match, a TupleConversionMap is still
returned but has its attrMap set to NULL. To fix that, I taught
do_convert_tuple() to ignore the map if NULL. Also, free_conversion_map()
shouldn't try to free attrMap if it's NULL.

Attached updated patch.

Thanks,
Amit

Attachments:

v2-0001-Optimize-convert_tuples_by_name_map-usage-a-bit.patchtext/plain; charset=UTF-8; name=v2-0001-Optimize-convert_tuples_by_name_map-usage-a-bit.patchDownload
From d5cc2db9bd610523915d1512c2fcad84e8bae3b6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 17 Apr 2018 15:51:42 +0900
Subject: [PATCH v2] Optimize convert_tuples_by_name_map usage a bit

---
 src/backend/access/common/tupconvert.c |  79 ++++++++-------------
 src/backend/catalog/index.c            |  31 +++++----
 src/backend/catalog/partition.c        |  13 ++--
 src/backend/catalog/pg_constraint.c    |   5 +-
 src/backend/commands/indexcmds.c       |  14 ++--
 src/backend/commands/tablecmds.c       |  32 +++++----
 src/backend/executor/execPartition.c   | 122 ++++++++++++++-------------------
 src/backend/parser/parse_utilcmd.c     |   9 +--
 8 files changed, 143 insertions(+), 162 deletions(-)

diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c
index 2d0d2f4b32..2ee4a7f40f 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -214,57 +214,20 @@ convert_tuples_by_name(TupleDesc indesc,
 	TupleConversionMap *map;
 	AttrNumber *attrMap;
 	int			n = outdesc->natts;
-	int			i;
-	bool		same;
 
 	/* Verify compatibility and prepare attribute-number map */
 	attrMap = convert_tuples_by_name_map(indesc, outdesc, msg);
 
 	/*
-	 * Check to see if the map is one-to-one, in which case we need not do a
-	 * tuple conversion.  We must also insist that both tupdescs either
-	 * specify or don't specify an OID column, else we need a conversion to
-	 * add/remove space for that.  (For some callers, presence or absence of
-	 * an OID column perhaps would not really matter, but let's be safe.)
+	 * If attributes are at the same positions in the input and output
+	 * descriptors, there is no need for tuple conversion.  Also, we must also
+	 * insist that both tupdescs either specify or don't specify an OID column,
+	 * else we need a conversion to add/remove space for that.  (For some
+	 * callers, presence or absence of an OID column perhaps would not really
+	 * matter, but let's be safe.)
 	 */
-	if (indesc->natts == outdesc->natts &&
-		indesc->tdhasoid == outdesc->tdhasoid)
-	{
-		same = true;
-		for (i = 0; i < n; i++)
-		{
-			Form_pg_attribute inatt;
-			Form_pg_attribute outatt;
-
-			if (attrMap[i] == (i + 1))
-				continue;
-
-			/*
-			 * If it's a dropped column and the corresponding input column is
-			 * also dropped, we needn't convert.  However, attlen and attalign
-			 * must agree.
-			 */
-			inatt = TupleDescAttr(indesc, i);
-			outatt = TupleDescAttr(outdesc, i);
-			if (attrMap[i] == 0 &&
-				inatt->attisdropped &&
-				inatt->attlen == outatt->attlen &&
-				inatt->attalign == outatt->attalign)
-				continue;
-
-			same = false;
-			break;
-		}
-	}
-	else
-		same = false;
-
-	if (same)
-	{
-		/* Runtime conversion is not needed */
-		pfree(attrMap);
+	if (attrMap == NULL && indesc->tdhasoid == outdesc->tdhasoid)
 		return NULL;
-	}
 
 	/* Prepare the map structure */
 	map = (TupleConversionMap *) palloc(sizeof(TupleConversionMap));
@@ -285,9 +248,10 @@ convert_tuples_by_name(TupleDesc indesc,
 
 /*
  * Return a palloc'd bare attribute map for tuple conversion, matching input
- * and output columns by name.  (Dropped columns are ignored in both input and
- * output.)  This is normally a subroutine for convert_tuples_by_name, but can
- * be used standalone.
+ * and output columns by name or NULL if the attributes appear at the same
+ * positions in input and output (Dropped columns are ignored in both input
+ * and output.)  This is normally a subroutine for convert_tuples_by_name, but
+ * can be used standalone.
  */
 AttrNumber *
 convert_tuples_by_name_map(TupleDesc indesc,
@@ -297,12 +261,13 @@ convert_tuples_by_name_map(TupleDesc indesc,
 	AttrNumber *attrMap;
 	int			n;
 	int			i;
+	bool		all_attrpos_same = true;
 
 	n = outdesc->natts;
 	attrMap = (AttrNumber *) palloc0(n * sizeof(AttrNumber));
 	for (i = 0; i < n; i++)
 	{
-		Form_pg_attribute outatt = TupleDescAttr(outdesc, i);
+		Form_pg_attribute outatt= TupleDescAttr(outdesc, i);
 		char	   *attname;
 		Oid			atttypid;
 		int32		atttypmod;
@@ -331,6 +296,8 @@ convert_tuples_by_name_map(TupleDesc indesc,
 									   format_type_be(outdesc->tdtypeid),
 									   format_type_be(indesc->tdtypeid))));
 				attrMap[i] = (AttrNumber) (j + 1);
+				if (i != j)
+					all_attrpos_same = false;
 				break;
 			}
 		}
@@ -344,6 +311,17 @@ convert_tuples_by_name_map(TupleDesc indesc,
 							   format_type_be(indesc->tdtypeid))));
 	}
 
+	/*
+	 * No need of mapping if both descriptors have the same number of
+	 * attributes and individual attributes all at the same positions in both
+	 * descriptors.
+	 */
+	if (all_attrpos_same && indesc->natts == outdesc->natts)
+	{
+		pfree(attrMap);
+		return NULL;
+	}
+
 	return attrMap;
 }
 
@@ -373,7 +351,7 @@ do_convert_tuple(HeapTuple tuple, TupleConversionMap *map)
 	 */
 	for (i = 0; i < outnatts; i++)
 	{
-		int			j = attrMap[i];
+		int			j = attrMap ? attrMap[i] : i;
 
 		outvalues[i] = invalues[j];
 		outisnull[i] = inisnull[j];
@@ -392,7 +370,8 @@ void
 free_conversion_map(TupleConversionMap *map)
 {
 	/* indesc and outdesc are not ours to free */
-	pfree(map->attrMap);
+	if (map->attrMap)
+		pfree(map->attrMap);
 	pfree(map->invalues);
 	pfree(map->inisnull);
 	pfree(map->outvalues);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index dec4265d68..ed0c2f6c9b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1848,13 +1848,18 @@ CompareIndexInfo(IndexInfo *info1, IndexInfo *info2,
 	 */
 	for (i = 0; i < info1->ii_NumIndexAttrs; i++)
 	{
-		if (maplen < info2->ii_IndexAttrNumbers[i])
+		AttrNumber	attnum1 = info1->ii_IndexAttrNumbers[i];
+		AttrNumber  attnum2;
+
+		if (attmap != NULL && maplen < info2->ii_IndexAttrNumbers[i])
 			elog(ERROR, "incorrect attribute map");
 
+		attnum2 = attmap != NULL
+						? attmap[info2->ii_IndexAttrNumbers[i] - 1]
+						: info2->ii_IndexAttrNumbers[i];
+
 		/* ignore expressions at this stage */
-		if ((info1->ii_IndexAttrNumbers[i] != InvalidAttrNumber) &&
-			(attmap[info2->ii_IndexAttrNumbers[i] - 1] !=
-			info1->ii_IndexAttrNumbers[i]))
+		if (attnum1 != InvalidAttrNumber && attnum2 != attnum1)
 			return false;
 
 		/* collation and opfamily is not valid for including columns */
@@ -1876,11 +1881,12 @@ CompareIndexInfo(IndexInfo *info1, IndexInfo *info2,
 	if (info1->ii_Expressions != NIL)
 	{
 		bool	found_whole_row;
-		Node   *mapped;
+		Node   *mapped = (Node *) info2->ii_Expressions;
 
-		mapped = map_variable_attnos((Node *) info2->ii_Expressions,
-									 1, 0, attmap, maplen,
-									 InvalidOid, &found_whole_row);
+		if (attmap != NULL)
+			mapped = map_variable_attnos((Node *) info2->ii_Expressions,
+										 1, 0, attmap, maplen,
+										 InvalidOid, &found_whole_row);
 		if (found_whole_row)
 		{
 			/*
@@ -1900,11 +1906,12 @@ CompareIndexInfo(IndexInfo *info1, IndexInfo *info2,
 	if (info1->ii_Predicate != NULL)
 	{
 		bool	found_whole_row;
-		Node   *mapped;
+		Node   *mapped = (Node *) info2->ii_Predicate;
 
-		mapped = map_variable_attnos((Node *) info2->ii_Predicate,
-									 1, 0, attmap, maplen,
-									 InvalidOid, &found_whole_row);
+		if (attmap != NULL)
+			mapped = map_variable_attnos((Node *) info2->ii_Predicate,
+										 1, 0, attmap, maplen,
+										 InvalidOid, &found_whole_row);
 		if (found_whole_row)
 		{
 			/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index de801ad788..0db7d0d8e3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -178,12 +178,13 @@ map_partition_varattnos(List *expr, int fromrel_varno,
 		part_attnos = convert_tuples_by_name_map(RelationGetDescr(to_rel),
 												 RelationGetDescr(from_rel),
 												 gettext_noop("could not convert row type"));
-		expr = (List *) map_variable_attnos((Node *) expr,
-											fromrel_varno, 0,
-											part_attnos,
-											RelationGetDescr(from_rel)->natts,
-											RelationGetForm(to_rel)->reltype,
-											&my_found_whole_row);
+		if (part_attnos != NULL)
+			expr = (List *) map_variable_attnos((Node *) expr,
+												fromrel_varno, 0,
+												part_attnos,
+												RelationGetDescr(from_rel)->natts,
+												RelationGetForm(to_rel)->reltype,
+												&my_found_whole_row);
 	}
 
 	if (found_whole_row)
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index c5b5395791..46fe53cc8b 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -488,7 +488,7 @@ CloneForeignKeyConstraints(Oid parentId, Oid relationId, List **cloned)
 		memcpy(conkey, ARR_DATA_PTR(arr), nelem * sizeof(AttrNumber));
 
 		for (i = 0; i < nelem; i++)
-			mapped_conkey[i] = attmap[conkey[i] - 1];
+			mapped_conkey[i] = attmap ? attmap[conkey[i] - 1] : conkey[i];
 
 		datum = fastgetattr(tuple, Anum_pg_constraint_confkey,
 							tupdesc, &isnull);
@@ -621,7 +621,8 @@ CloneForeignKeyConstraints(Oid parentId, Oid relationId, List **cloned)
 	}
 	systable_endscan(scan);
 
-	pfree(attmap);
+	if (attmap)
+		pfree(attmap);
 
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f2dcc1c51f..797bb54d63 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -918,7 +918,7 @@ DefineIndex(Oid relationId,
 					convert_tuples_by_name_map(RelationGetDescr(childrel),
 											   parentDesc,
 											   gettext_noop("could not convert row type"));
-				maplen = parentDesc->natts;
+				maplen = attmap != NULL ? parentDesc->natts : 0;
 
 
 				foreach(cell, childidxs)
@@ -992,10 +992,11 @@ DefineIndex(Oid relationId,
 					IndexStmt  *childStmt = copyObject(stmt);
 					bool		found_whole_row;
 
-					childStmt->whereClause =
-						map_variable_attnos(stmt->whereClause, 1, 0,
-											attmap, maplen,
-											InvalidOid, &found_whole_row);
+					if (attmap != NULL)
+						childStmt->whereClause =
+							map_variable_attnos(stmt->whereClause, 1, 0,
+												attmap, maplen,
+												InvalidOid, &found_whole_row);
 					if (found_whole_row)
 						elog(ERROR, "cannot convert whole-row table reference");
 
@@ -1009,7 +1010,8 @@ DefineIndex(Oid relationId,
 								false, quiet);
 				}
 
-				pfree(attmap);
+				if (attmap)
+					pfree(attmap);
 			}
 
 			/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c1a9bda433..6f37dd153a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9220,6 +9220,7 @@ ATPrepAlterColumnType(List **wqueue,
 		{
 			Oid			childrelid = lfirst_oid(child);
 			Relation	childrel;
+			AlterTableCmd *child_cmd = cmd;
 
 			if (childrelid == relid)
 				continue;
@@ -9238,24 +9239,26 @@ ATPrepAlterColumnType(List **wqueue,
 				bool		found_whole_row;
 
 				/* create a copy to scribble on */
-				cmd = copyObject(cmd);
+				child_cmd = copyObject(cmd);
 
 				attmap = convert_tuples_by_name_map(RelationGetDescr(childrel),
 													RelationGetDescr(rel),
 													gettext_noop("could not convert row type"));
-				((ColumnDef *) cmd->def)->cooked_default =
-					map_variable_attnos(def->cooked_default,
-										1, 0,
-										attmap, RelationGetDescr(rel)->natts,
-										InvalidOid, &found_whole_row);
+				if (attmap != NULL)
+					((ColumnDef *) child_cmd->def)->cooked_default =
+						map_variable_attnos(def->cooked_default,
+											1, 0,
+											attmap, RelationGetDescr(rel)->natts,
+											InvalidOid, &found_whole_row);
 				if (found_whole_row)
 					ereport(ERROR,
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("cannot convert whole-row table reference"),
 							 errdetail("USING expression contains a whole-row table reference.")));
-				pfree(attmap);
+				if (attmap)
+					pfree(attmap);
 			}
-			ATPrepCmd(wqueue, childrel, cmd, false, true, lockmode);
+			ATPrepCmd(wqueue, childrel, child_cmd, false, true, lockmode);
 			relation_close(childrel, NoLock);
 		}
 	}
@@ -14381,6 +14384,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 		Relation	idxRel = index_open(idx, AccessShareLock);
 		IndexInfo  *info;
 		AttrNumber *attmap;
+		int			maplen;
 		bool		found = false;
 		Oid			constraintOid;
 
@@ -14399,6 +14403,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 		attmap = convert_tuples_by_name_map(RelationGetDescr(attachrel),
 											RelationGetDescr(rel),
 											gettext_noop("could not convert row type"));
+		maplen = attmap != NULL ? RelationGetDescr(rel)->natts : 0;
 		constraintOid = get_relation_idx_constraint_oid(RelationGetRelid(rel), idx);
 
 		/*
@@ -14420,8 +14425,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 								 idxRel->rd_indcollation,
 								 attachrelIdxRels[i]->rd_opfamily,
 								 idxRel->rd_opfamily,
-								 attmap,
-								 RelationGetDescr(rel)->natts))
+								 attmap, maplen))
 			{
 				/*
 				 * If this index is being created in the parent because of a
@@ -14828,6 +14832,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 		IndexInfo  *childInfo;
 		IndexInfo  *parentInfo;
 		AttrNumber *attmap;
+		int			maplen;
 		bool		found;
 		int			i;
 		PartitionDesc partDesc;
@@ -14874,13 +14879,13 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 		attmap = convert_tuples_by_name_map(RelationGetDescr(partTbl),
 											RelationGetDescr(parentTbl),
 											gettext_noop("could not convert row type"));
+		maplen = attmap != NULL ? RelationGetDescr(partTbl)->natts : 0;
 		if (!CompareIndexInfo(childInfo, parentInfo,
 							  partIdx->rd_indcollation,
 							  parentIdx->rd_indcollation,
 							  partIdx->rd_opfamily,
 							  parentIdx->rd_opfamily,
-							  attmap,
-							  RelationGetDescr(partTbl)->natts))
+							  attmap, maplen))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("cannot attach index \"%s\" as a partition of index \"%s\"",
@@ -14917,7 +14922,8 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 			ConstraintSetParentConstraint(cldConstrId, constraintOid);
 		update_relispartition(NULL, partIdxId, true);
 
-		pfree(attmap);
+		if (attmap)
+			pfree(attmap);
 
 		validatePartitionedIndex(parentIdx, parentTbl);
 	}
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 23a74bc3d9..f600af3f0e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -311,9 +311,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 	Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
 	ResultRelInfo *leaf_part_rri;
 	MemoryContext oldContext;
-	AttrNumber *part_attnos = NULL;
+	AttrNumber *attmap;
 	bool		found_whole_row;
-	bool		equalTupdescs;
 
 	/*
 	 * We locked all the partitions in ExecSetupPartitionTupleRouting
@@ -361,9 +360,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 						(node != NULL &&
 						 node->onConflictAction != ONCONFLICT_NONE));
 
-	/* if tuple descs are identical, we don't need to map the attrs */
-	equalTupdescs = equalTupleDescs(RelationGetDescr(partrel),
-									RelationGetDescr(firstResultRel));
+	/*
+	 * Get a attribute conversion map to convert expressions, which if NULL,
+	 * the original expressions can be used as is.
+	 */
+	attmap = convert_tuples_by_name_map(RelationGetDescr(partrel),
+										RelationGetDescr(firstResultRel),
+										gettext_noop("could not convert row type"));
 
 	/*
 	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
@@ -405,16 +408,12 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		/*
 		 * Convert Vars in it to contain this partition's attribute numbers.
 		 */
-		if (!equalTupdescs)
+		if (attmap != NULL)
 		{
-			part_attnos =
-				convert_tuples_by_name_map(RelationGetDescr(partrel),
-										   RelationGetDescr(firstResultRel),
-										   gettext_noop("could not convert row type"));
 			wcoList = (List *)
 				map_variable_attnos((Node *) wcoList,
 									firstVarno, 0,
-									part_attnos,
+									attmap,
 									RelationGetDescr(firstResultRel)->natts,
 									RelationGetForm(partrel)->reltype,
 									&found_whole_row);
@@ -464,25 +463,18 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		 */
 		returningList = linitial(node->returningLists);
 
-		if (!equalTupdescs)
-		{
-			/*
-			 * Convert Vars in it to contain this partition's attribute numbers.
-			 */
-			if (part_attnos == NULL)
-				part_attnos =
-					convert_tuples_by_name_map(RelationGetDescr(partrel),
-											   RelationGetDescr(firstResultRel),
-											   gettext_noop("could not convert row type"));
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		if (attmap != NULL)
 			returningList = (List *)
 				map_variable_attnos((Node *) returningList,
 									firstVarno, 0,
-									part_attnos,
+									attmap,
 									RelationGetDescr(firstResultRel)->natts,
 									RelationGetForm(partrel)->reltype,
 									&found_whole_row);
-			/* We ignore the value of found_whole_row. */
-		}
+		/* We ignore the value of found_whole_row. */
 
 		leaf_part_rri->ri_returningList = returningList;
 
@@ -565,7 +557,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 			 * CONFLICT SET state, skipping a bunch of work.  Otherwise, we
 			 * need to create state specific to this partition.
 			 */
-			if (map == NULL)
+			if (attmap == NULL)
 				leaf_part_rri->ri_onConflict = resultRelInfo->ri_onConflict;
 			else
 			{
@@ -583,33 +575,26 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 				 * target relation (firstVarno).
 				 */
 				onconflset = (List *) copyObject((Node *) node->onConflictSet);
-				if (!equalTupdescs)
-				{
-					if (part_attnos == NULL)
-						part_attnos =
-							convert_tuples_by_name_map(RelationGetDescr(partrel),
-													   RelationGetDescr(firstResultRel),
-													   gettext_noop("could not convert row type"));
-					onconflset = (List *)
-						map_variable_attnos((Node *) onconflset,
-											INNER_VAR, 0,
-											part_attnos,
-											RelationGetDescr(firstResultRel)->natts,
-											RelationGetForm(partrel)->reltype,
-											&found_whole_row);
-					/* We ignore the value of found_whole_row. */
-					onconflset = (List *)
-						map_variable_attnos((Node *) onconflset,
-											firstVarno, 0,
-											part_attnos,
-											RelationGetDescr(firstResultRel)->natts,
-											RelationGetForm(partrel)->reltype,
-											&found_whole_row);
-					/* We ignore the value of found_whole_row. */
+				onconflset = (List *)
+					map_variable_attnos((Node *) onconflset,
+										INNER_VAR, 0,
+										attmap,
+										RelationGetDescr(firstResultRel)->natts,
+										RelationGetForm(partrel)->reltype,
+										&found_whole_row);
+				/* We ignore the value of found_whole_row. */
+				onconflset = (List *)
+					map_variable_attnos((Node *) onconflset,
+										firstVarno, 0,
+										attmap,
+										RelationGetDescr(firstResultRel)->natts,
+										RelationGetForm(partrel)->reltype,
+										&found_whole_row);
+				/* We ignore the value of found_whole_row. */
 
-					/* Finally, adjust this tlist to match the partition. */
-					onconflset = adjust_partition_tlist(onconflset, map);
-				}
+				/* Finally, adjust this tlist to match the partition. */
+				Assert(map != NULL && map->attrMap != NULL);
+				onconflset = adjust_partition_tlist(onconflset, map);
 
 				/*
 				 * Build UPDATE SET's projection info.  The user of this
@@ -637,25 +622,22 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 					List	   *clause;
 
 					clause = copyObject((List *) node->onConflictWhere);
-					if (!equalTupdescs)
-					{
-						clause = (List *)
-							map_variable_attnos((Node *) clause,
-												INNER_VAR, 0,
-												part_attnos,
-												RelationGetDescr(firstResultRel)->natts,
-												RelationGetForm(partrel)->reltype,
-												&found_whole_row);
+					clause = (List *)
+						map_variable_attnos((Node *) clause,
+											INNER_VAR, 0,
+											attmap,
+											RelationGetDescr(firstResultRel)->natts,
+											RelationGetForm(partrel)->reltype,
+											&found_whole_row);
+					/* We ignore the value of found_whole_row. */
+					clause = (List *)
+						map_variable_attnos((Node *) clause,
+											firstVarno, 0,
+											attmap,
+											RelationGetDescr(firstResultRel)->natts,
+											RelationGetForm(partrel)->reltype,
+											&found_whole_row);
 						/* We ignore the value of found_whole_row. */
-						clause = (List *)
-							map_variable_attnos((Node *) clause,
-												firstVarno, 0,
-												part_attnos,
-												RelationGetDescr(firstResultRel)->natts,
-												RelationGetForm(partrel)->reltype,
-												&found_whole_row);
-						/* We ignore the value of found_whole_row. */
-					}
 					leaf_part_rri->ri_onConflict->oc_WhereClause =
 						ExecInitQual((List *) clause, &mtstate->ps);
 				}
@@ -663,6 +645,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		}
 	}
 
+	if (attmap)
+		pfree(attmap);
 	Assert(proute->partitions[partidx] == NULL);
 	proute->partitions[partidx] = leaf_part_rri;
 
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index c6f3628def..378331e216 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1504,10 +1504,11 @@ generateClonedIndexStmt(RangeVar *heapRel, Oid heapRelid, Relation source_idx,
 			indexpr_item = lnext(indexpr_item);
 
 			/* Adjust Vars to match new table's column numbering */
-			indexkey = map_variable_attnos(indexkey,
-										   1, 0,
-										   attmap, attmap_length,
-										   InvalidOid, &found_whole_row);
+			if (attmap != NULL)
+				indexkey = map_variable_attnos(indexkey,
+											   1, 0,
+											   attmap, attmap_length,
+											   InvalidOid, &found_whole_row);
 
 			/* As in transformTableLikeClause, reject whole-row variables */
 			if (found_whole_row)
-- 
2.11.0

#43Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#41)
Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote wrote:

Attached find a patch that does that. When working on this, I noticed
that when recursing for inheritance children, ATPrepAlterColumnType()
would use a AlterTableCmd (cmd) that's already scribbled on as if it were
the original.

While I agree that the code here is in poor style, there is no live bug
here, because the only thing that is changed each time is the copy's
cmd->def, and its value is not obtained from the scribbled 'cmd' -- it's
obtained from the passed-in cmd->def, which is unmodified.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#44Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#42)
Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote wrote:

I just confirmed my hunch that this wouldn't somehow do the right thing
when the OID system column is involved. Like this case:

This looks too big a patch to pursue now. I'm inclined to just remove
the equalTupdesc changes.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#45Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#43)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/04/18 0:02, Alvaro Herrera wrote:

Amit Langote wrote:

Attached find a patch that does that. When working on this, I noticed
that when recursing for inheritance children, ATPrepAlterColumnType()
would use a AlterTableCmd (cmd) that's already scribbled on as if it were
the original.

While I agree that the code here is in poor style, there is no live bug
here, because the only thing that is changed each time is the copy's
cmd->def, and its value is not obtained from the scribbled 'cmd' -- it's
obtained from the passed-in cmd->def, which is unmodified.

Ah, you're right. The original cmd->def itself remains intact.

Thanks,
Amit

#46Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#44)
1 attachment(s)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/04/18 0:04, Alvaro Herrera wrote:

Amit Langote wrote:

I just confirmed my hunch that this wouldn't somehow do the right thing
when the OID system column is involved. Like this case:

This looks too big a patch to pursue now. I'm inclined to just remove
the equalTupdesc changes.

OK. Here is the patch that removes equalTupdesc optimization.

I will add the rest of the patch to the next CF after starting a new
thread for it sometime later.

Thanks,
Amit

Attachments:

v1-0001-Remove-equalTupDescs-based-optimization-in-ExecIn.patchtext/plain; charset=UTF-8; name=v1-0001-Remove-equalTupDescs-based-optimization-in-ExecIn.patchDownload
From ceaba0f59653be237f9bafee47fe82205db6fe14 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Apr 2018 13:22:49 +0900
Subject: [PATCH v1] Remove equalTupDescs-based optimization in
 ExecInitPartitionInfo

---
 src/backend/executor/execPartition.c | 151 ++++++++++++++++-------------------
 1 file changed, 67 insertions(+), 84 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 23a74bc3d9..a2f6b29cd5 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -313,7 +313,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 	MemoryContext oldContext;
 	AttrNumber *part_attnos = NULL;
 	bool		found_whole_row;
-	bool		equalTupdescs;
 
 	/*
 	 * We locked all the partitions in ExecSetupPartitionTupleRouting
@@ -361,10 +360,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 						(node != NULL &&
 						 node->onConflictAction != ONCONFLICT_NONE));
 
-	/* if tuple descs are identical, we don't need to map the attrs */
-	equalTupdescs = equalTupleDescs(RelationGetDescr(partrel),
-									RelationGetDescr(firstResultRel));
-
 	/*
 	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
 	 * didn't build the withCheckOptionList for partitions within the planner,
@@ -405,21 +400,18 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		/*
 		 * Convert Vars in it to contain this partition's attribute numbers.
 		 */
-		if (!equalTupdescs)
-		{
-			part_attnos =
-				convert_tuples_by_name_map(RelationGetDescr(partrel),
-										   RelationGetDescr(firstResultRel),
-										   gettext_noop("could not convert row type"));
-			wcoList = (List *)
-				map_variable_attnos((Node *) wcoList,
-									firstVarno, 0,
-									part_attnos,
-									RelationGetDescr(firstResultRel)->natts,
-									RelationGetForm(partrel)->reltype,
-									&found_whole_row);
-			/* We ignore the value of found_whole_row. */
-		}
+		part_attnos =
+			convert_tuples_by_name_map(RelationGetDescr(partrel),
+									   RelationGetDescr(firstResultRel),
+									   gettext_noop("could not convert row type"));
+		wcoList = (List *)
+			map_variable_attnos((Node *) wcoList,
+								firstVarno, 0,
+								part_attnos,
+								RelationGetDescr(firstResultRel)->natts,
+								RelationGetForm(partrel)->reltype,
+								&found_whole_row);
+		/* We ignore the value of found_whole_row. */
 
 		foreach(ll, wcoList)
 		{
@@ -464,25 +456,22 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 		 */
 		returningList = linitial(node->returningLists);
 
-		if (!equalTupdescs)
-		{
-			/*
-			 * Convert Vars in it to contain this partition's attribute numbers.
-			 */
-			if (part_attnos == NULL)
-				part_attnos =
-					convert_tuples_by_name_map(RelationGetDescr(partrel),
-											   RelationGetDescr(firstResultRel),
-											   gettext_noop("could not convert row type"));
-			returningList = (List *)
-				map_variable_attnos((Node *) returningList,
-									firstVarno, 0,
-									part_attnos,
-									RelationGetDescr(firstResultRel)->natts,
-									RelationGetForm(partrel)->reltype,
-									&found_whole_row);
-			/* We ignore the value of found_whole_row. */
-		}
+		/*
+		 * Convert Vars in it to contain this partition's attribute numbers.
+		 */
+		if (part_attnos == NULL)
+			part_attnos =
+				convert_tuples_by_name_map(RelationGetDescr(partrel),
+										   RelationGetDescr(firstResultRel),
+										   gettext_noop("could not convert row type"));
+		returningList = (List *)
+			map_variable_attnos((Node *) returningList,
+								firstVarno, 0,
+								part_attnos,
+								RelationGetDescr(firstResultRel)->natts,
+								RelationGetForm(partrel)->reltype,
+								&found_whole_row);
+		/* We ignore the value of found_whole_row. */
 
 		leaf_part_rri->ri_returningList = returningList;
 
@@ -583,33 +572,30 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 				 * target relation (firstVarno).
 				 */
 				onconflset = (List *) copyObject((Node *) node->onConflictSet);
-				if (!equalTupdescs)
-				{
-					if (part_attnos == NULL)
-						part_attnos =
-							convert_tuples_by_name_map(RelationGetDescr(partrel),
-													   RelationGetDescr(firstResultRel),
-													   gettext_noop("could not convert row type"));
-					onconflset = (List *)
-						map_variable_attnos((Node *) onconflset,
-											INNER_VAR, 0,
-											part_attnos,
-											RelationGetDescr(firstResultRel)->natts,
-											RelationGetForm(partrel)->reltype,
-											&found_whole_row);
-					/* We ignore the value of found_whole_row. */
-					onconflset = (List *)
-						map_variable_attnos((Node *) onconflset,
-											firstVarno, 0,
-											part_attnos,
-											RelationGetDescr(firstResultRel)->natts,
-											RelationGetForm(partrel)->reltype,
-											&found_whole_row);
-					/* We ignore the value of found_whole_row. */
+				if (part_attnos == NULL)
+					part_attnos =
+						convert_tuples_by_name_map(RelationGetDescr(partrel),
+												   RelationGetDescr(firstResultRel),
+												   gettext_noop("could not convert row type"));
+				onconflset = (List *)
+					map_variable_attnos((Node *) onconflset,
+										INNER_VAR, 0,
+										part_attnos,
+										RelationGetDescr(firstResultRel)->natts,
+										RelationGetForm(partrel)->reltype,
+										&found_whole_row);
+				/* We ignore the value of found_whole_row. */
+				onconflset = (List *)
+					map_variable_attnos((Node *) onconflset,
+										firstVarno, 0,
+										part_attnos,
+										RelationGetDescr(firstResultRel)->natts,
+										RelationGetForm(partrel)->reltype,
+										&found_whole_row);
+				/* We ignore the value of found_whole_row. */
 
-					/* Finally, adjust this tlist to match the partition. */
-					onconflset = adjust_partition_tlist(onconflset, map);
-				}
+				/* Finally, adjust this tlist to match the partition. */
+				onconflset = adjust_partition_tlist(onconflset, map);
 
 				/*
 				 * Build UPDATE SET's projection info.  The user of this
@@ -637,25 +623,22 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
 					List	   *clause;
 
 					clause = copyObject((List *) node->onConflictWhere);
-					if (!equalTupdescs)
-					{
-						clause = (List *)
-							map_variable_attnos((Node *) clause,
-												INNER_VAR, 0,
-												part_attnos,
-												RelationGetDescr(firstResultRel)->natts,
-												RelationGetForm(partrel)->reltype,
-												&found_whole_row);
-						/* We ignore the value of found_whole_row. */
-						clause = (List *)
-							map_variable_attnos((Node *) clause,
-												firstVarno, 0,
-												part_attnos,
-												RelationGetDescr(firstResultRel)->natts,
-												RelationGetForm(partrel)->reltype,
-												&found_whole_row);
-						/* We ignore the value of found_whole_row. */
-					}
+					clause = (List *)
+						map_variable_attnos((Node *) clause,
+											INNER_VAR, 0,
+											part_attnos,
+											RelationGetDescr(firstResultRel)->natts,
+											RelationGetForm(partrel)->reltype,
+											&found_whole_row);
+					/* We ignore the value of found_whole_row. */
+					clause = (List *)
+						map_variable_attnos((Node *) clause,
+											firstVarno, 0,
+											part_attnos,
+											RelationGetDescr(firstResultRel)->natts,
+											RelationGetForm(partrel)->reltype,
+											&found_whole_row);
+					/* We ignore the value of found_whole_row. */
 					leaf_part_rri->ri_onConflict->oc_WhereClause =
 						ExecInitQual((List *) clause, &mtstate->ps);
 				}
-- 
2.11.0

#47Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#46)
Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote wrote:

On 2018/04/18 0:04, Alvaro Herrera wrote:

Amit Langote wrote:

I just confirmed my hunch that this wouldn't somehow do the right thing
when the OID system column is involved. Like this case:

This looks too big a patch to pursue now. I'm inclined to just remove
the equalTupdesc changes.

OK. Here is the patch that removes equalTupdesc optimization.

Hmm. If we modify (during pg12, of course -- not now) partition tables
that are created identical to their parent table so that they share the
pg_type row, this would become useful. Unless there a reason why that
change is completely unworkable, I'd just leave it there. (I claim that
it works like that only because it used to work like that, not because
it's impossible to make work the other way.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#48Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#47)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/04/18 22:40, Alvaro Herrera wrote:

Amit Langote wrote:

On 2018/04/18 0:04, Alvaro Herrera wrote:

Amit Langote wrote:

I just confirmed my hunch that this wouldn't somehow do the right thing
when the OID system column is involved. Like this case:

This looks too big a patch to pursue now. I'm inclined to just remove
the equalTupdesc changes.

OK. Here is the patch that removes equalTupdesc optimization.

Hmm. If we modify (during pg12, of course -- not now) partition tables
that are created identical to their parent table so that they share the
pg_type row, this would become useful. Unless there a reason why that
change is completely unworkable, I'd just leave it there. (I claim that
it works like that only because it used to work like that, not because
it's impossible to make work the other way.)

Yeah, I too have wondered in the past what it would take to make
equalTupDescs() return true for parent and partitions. Maybe we can make
it work by looking a bit harder than I did then.

Although, just leaving it there now would mean we're adding a few cycles
needlessly in the PG 11 code. Why not add that optimization when we
surely know it can work?

Thanks,
Amit

#49Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#48)
Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote wrote:

Yeah, I too have wondered in the past what it would take to make
equalTupDescs() return true for parent and partitions. Maybe we can make
it work by looking a bit harder than I did then.

How about simply relaxing the tdtypeid test from equalTupleDescs? I
haven't looked deeply but I think just checking whether or not both are
RECORDOID might be sufficient, for typecache purposes.

If we just remove the tdtypeid test, check-world passes.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#50Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#49)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Thu, Apr 19, 2018 at 1:20 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Amit Langote wrote:

Yeah, I too have wondered in the past what it would take to make
equalTupDescs() return true for parent and partitions. Maybe we can make
it work by looking a bit harder than I did then.

How about simply relaxing the tdtypeid test from equalTupleDescs? I
haven't looked deeply but I think just checking whether or not both are
RECORDOID might be sufficient, for typecache purposes.

That strike me as a very scary thing to do. There's code all over the
system that may have non-obvious assumptions about the behavior of
equalTupleDescs(), and I don't think we can have any confidence that
nothing will break unless we do a detailed audit of all that code.

If we just remove the tdtypeid test, check-world passes.

That does not reassure me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In reply to: Robert Haas (#50)
Re: ON CONFLICT DO UPDATE for partitioned tables

On Thu, Apr 19, 2018 at 12:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Apr 19, 2018 at 1:20 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

How about simply relaxing the tdtypeid test from equalTupleDescs? I
haven't looked deeply but I think just checking whether or not both are
RECORDOID might be sufficient, for typecache purposes.

That strike me as a very scary thing to do. There's code all over the
system that may have non-obvious assumptions about the behavior of
equalTupleDescs(), and I don't think we can have any confidence that
nothing will break unless we do a detailed audit of all that code.

+1. I think that it is plainly a bad idea to do something like that at
this point in the cycle.

--
Peter Geoghegan

#52Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#49)
Re: ON CONFLICT DO UPDATE for partitioned tables

Alvaro Herrera wrote:

Amit Langote wrote:

Yeah, I too have wondered in the past what it would take to make
equalTupDescs() return true for parent and partitions. Maybe we can make
it work by looking a bit harder than I did then.

How about simply relaxing the tdtypeid test from equalTupleDescs? I
haven't looked deeply but I think just checking whether or not both are
RECORDOID might be sufficient, for typecache purposes.

After looking at the code, I'm a bit nervous about doing this, because I
don't fully understand what is going on in typcache, and what is the
HeapTupleHeaderGetTypeId macro really doing. I'm afraid that if we
confuse a table's tupdesc with one of its partition's , something
entirely random might end up happening.

Maybe this is completely off-base, but if so I'd like to have to proof.
So I'm thinking of reverting that patch instead per your patch.

While composing this we got emails from Robert and Peter G suggesting
the same too, so consider it done.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#53Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#52)
Re: ON CONFLICT DO UPDATE for partitioned tables

On 2018/04/20 4:40, Alvaro Herrera wrote:

Alvaro Herrera wrote:

Amit Langote wrote:

Yeah, I too have wondered in the past what it would take to make
equalTupDescs() return true for parent and partitions. Maybe we can make
it work by looking a bit harder than I did then.

How about simply relaxing the tdtypeid test from equalTupleDescs? I
haven't looked deeply but I think just checking whether or not both are
RECORDOID might be sufficient, for typecache purposes.

After looking at the code, I'm a bit nervous about doing this, because I
don't fully understand what is going on in typcache, and what is the
HeapTupleHeaderGetTypeId macro really doing. I'm afraid that if we
confuse a table's tupdesc with one of its partition's , something
entirely random might end up happening.

Maybe this is completely off-base, but if so I'd like to have to proof.
So I'm thinking of reverting that patch instead per your patch.

While composing this we got emails from Robert and Peter G suggesting
the same too, so consider it done.

Thank you for committing the patch.

Regards,
Amit