partition routing layering in nodeModifyTable.c

Started by Andres Freundover 6 years ago84 messages
#1Andres Freund
andres@anarazel.de

Hi,

While discussing partition related code with David in [1]/messages/by-id/CAKJS1f-YObQJTbncGJGRZ6gSFiS+gw_Y5kvrpR=vEnFKH17AVA@mail.gmail.com, I again was
confused by the layering of partition related code in
nodeModifyTable.c.

1) How come partition routing is done outside of ExecInsert()?

case CMD_INSERT:
/* Prepare for tuple routing if needed. */
if (proute)
slot = ExecPrepareTupleRouting(node, estate, proute,
resultRelInfo, slot);
slot = ExecInsert(node, slot, planSlot,
estate, node->canSetTag);
/* Revert ExecPrepareTupleRouting's state change. */
if (proute)
estate->es_result_relation_info = resultRelInfo;
break;

That already seems like a layering violation, but it's made worse by
ExecUpdate() having its partition handling solely inside - including another
call to ExecInsert(), including the surrounding partition setup code.

And even worse, after all that, ExecInsert() still contains partitioning
code.

It seems to me that if we just moved the ExecPrepareTupleRouting() into
ExecInsert(), we could remove the duplication.

2) The contents of the
/*
* If a partition check failed, try to move the row into the right
* partition.
*/
if (partition_constraint_failed)

block ought to be moved to a separate function (maybe
ExecCrossPartitionUpdate or ExecMove). ExecUpdate() is already
complicated enough without dealing with the partition move.

3) How come we reset estate->es_result_relation_info after partition
routing, but not the mtstate wide changes by
ExecPrepareTupleRouting()? Note that its comment says:

* Caller must revert the estate changes after executing the insertion!
* In mtstate, transition capture changes may also need to be reverted.

ExecUpdate() contains

/*
* Updates set the transition capture map only when a new subplan
* is chosen. But for inserts, it is set for each row. So after
* INSERT, we need to revert back to the map created for UPDATE;
* otherwise the next UPDATE will incorrectly use the one created
* for INSERT. So first save the one created for UPDATE.
*/
if (mtstate->mt_transition_capture)
saved_tcs_map = mtstate->mt_transition_capture->tcs_map;

but as I read the code, that's not really true? It's
ExecPrepareTupleRouting() that does so, and that's called directly in ExecUpdate().

4)
/*
* If this insert is the result of a partition key update that moved the
* tuple to a new partition, put this row into the transition NEW TABLE,
* if there is one. We need to do this separately for DELETE and INSERT
* because they happen on different tables.
*/
ar_insert_trig_tcs = mtstate->mt_transition_capture;
if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
&& mtstate->mt_transition_capture->tcs_update_new_table)
{
ExecARUpdateTriggers(estate, resultRelInfo, NULL,
NULL,
slot,
NULL,
mtstate->mt_transition_capture);

/*
* We've already captured the NEW TABLE row, so make sure any AR
* INSERT trigger fired below doesn't capture it again.
*/
ar_insert_trig_tcs = NULL;
}

/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
ar_insert_trig_tcs);

Besides not using the just defined ar_insert_trig_tcs and instead
repeatedly referring to mtstate->mt_transition_capture, wouldn't this be
a easier to understand if the were an if/else, instead of resetting
ar_insert_trig_tcs? If the block were

/*
* triggers behave differently depending on this being a delete as
* part of a partion move, or a deletion proper.
if (mtstate->operation == CMD_UPDATE)
{
/*
* If this insert is the result of a partition key update that moved the
* tuple to a new partition, put this row into the transition NEW TABLE,
* if there is one. We need to do this separately for DELETE and INSERT
* because they happen on different tables.
*/
ExecARUpdateTriggers(estate, resultRelInfo, NULL,
NULL,
slot,
NULL,
mtstate->mt_transition_capture);

/*
* But we do want to fire plain per-row INSERT triggers on the
* new table. By not passing in transition_capture we prevent
* ....
*/
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
NULL);
}
else
{
/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
ar_insert_trig_tcs);
}

it seems like it'd be quite a bit clearer (although I do think the
comments also need a fair bit of polishing independent of this proposed
change).

Greetings,

Andres Freund

[1]: /messages/by-id/CAKJS1f-YObQJTbncGJGRZ6gSFiS+gw_Y5kvrpR=vEnFKH17AVA@mail.gmail.com

#2Amit Langote
amitlangote09@gmail.com
In reply to: Andres Freund (#1)
Re: partition routing layering in nodeModifyTable.c

Hi Andres,

On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote:

1) How come partition routing is done outside of ExecInsert()?

case CMD_INSERT:
/* Prepare for tuple routing if needed. */
if (proute)
slot = ExecPrepareTupleRouting(node, estate, proute,
resultRelInfo, slot);
slot = ExecInsert(node, slot, planSlot,
estate, node->canSetTag);
/* Revert ExecPrepareTupleRouting's state change. */
if (proute)
estate->es_result_relation_info = resultRelInfo;
break;

That already seems like a layering violation,

The decision to move partition routing out of ExecInsert() came about
when we encountered a bug [1]/messages/by-id/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp whereby ExecInsert() would fail to reset
estate->es_result_relation_info back to the root table if it had to
take an abnormal path out (early return), of which there are quite a
few instances. The first solution I came up with was to add a goto
label for the code to reset estate->es_result_relation_info and jump
to it from the various places that do an early return, which was
complained about as reducing readability. So, the solution we
eventually settled on in 6666ee49f was to perform ResultRelInfos
switching at a higher level.

but it's made worse by
ExecUpdate() having its partition handling solely inside - including another
call to ExecInsert(), including the surrounding partition setup code.

And even worse, after all that, ExecInsert() still contains partitioning
code.

AFAIK, it's only to check the partition constraint when necessary.
Partition routing complexity is totally outside, but based on what you
write in point 4 below there's bit more...

It seems to me that if we just moved the ExecPrepareTupleRouting() into
ExecInsert(), we could remove the duplication.

I agree that there's duplication here. Given what I wrote above, I
can think of doing this: move all of ExecInsert()'s code into
ExecInsertInternal() and make the former instead look like this:

static TupleTableSlot *
ExecInsert(ModifyTableState *mtstate,
TupleTableSlot *slot,
TupleTableSlot *planSlot,
EState *estate,
bool canSetTag)
{
PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
ResultRelInfo *resultRelInfo = estate->es_result_relation_info;

/* Prepare for tuple routing if needed. */
if (proute)
slot = ExecPrepareTupleRouting(mtstate, estate, proute, resultRelInfo,
slot);

slot = ExecInsertInternal(mtstate, slot, planSlot, estate,
mtstate->canSetTag);

/* Revert ExecPrepareTupleRouting's state change. */
if (proute)
estate->es_result_relation_info = resultRelInfo;

return slot;
}

2) The contents of the
/*
* If a partition check failed, try to move the row into the right
* partition.
*/
if (partition_constraint_failed)

block ought to be moved to a separate function (maybe
ExecCrossPartitionUpdate or ExecMove). ExecUpdate() is already
complicated enough without dealing with the partition move.

I tend to agree with this. Adding Amit Khandekar in case he wants to
chime in about this.

3) How come we reset estate->es_result_relation_info after partition
routing, but not the mtstate wide changes by
ExecPrepareTupleRouting()? Note that its comment says:

* Caller must revert the estate changes after executing the insertion!
* In mtstate, transition capture changes may also need to be reverted.

ExecUpdate() contains

/*
* Updates set the transition capture map only when a new subplan
* is chosen. But for inserts, it is set for each row. So after
* INSERT, we need to revert back to the map created for UPDATE;
* otherwise the next UPDATE will incorrectly use the one created
* for INSERT. So first save the one created for UPDATE.
*/
if (mtstate->mt_transition_capture)
saved_tcs_map = mtstate->mt_transition_capture->tcs_map;

but as I read the code, that's not really true? It's
ExecPrepareTupleRouting() that does so, and that's called directly in ExecUpdate().

Calling ExecPrepareTupleRouting() is considered a part of a given
INSERT operation, so anything it does is to facilitate the INSERT. In
this case, which map to assign to tcs_map can only be determined after
a partition is chosen and determining the partition (routing) is a job
of ExecPrepareTupleRouting(). Perhaps, we need to update the comment
here a bit.

4)
/*
* If this insert is the result of a partition key update that moved the
* tuple to a new partition, put this row into the transition NEW TABLE,
* if there is one. We need to do this separately for DELETE and INSERT
* because they happen on different tables.
*/
ar_insert_trig_tcs = mtstate->mt_transition_capture;
if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
&& mtstate->mt_transition_capture->tcs_update_new_table)
{
ExecARUpdateTriggers(estate, resultRelInfo, NULL,
NULL,
slot,
NULL,
mtstate->mt_transition_capture);

/*
* We've already captured the NEW TABLE row, so make sure any AR
* INSERT trigger fired below doesn't capture it again.
*/
ar_insert_trig_tcs = NULL;
}

/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
ar_insert_trig_tcs);

Besides not using the just defined ar_insert_trig_tcs and instead
repeatedly referring to mtstate->mt_transition_capture, wouldn't this be
a easier to understand if the were an if/else, instead of resetting
ar_insert_trig_tcs? If the block were

/*
* triggers behave differently depending on this being a delete as
* part of a partion move, or a deletion proper.
if (mtstate->operation == CMD_UPDATE)
{
/*
* If this insert is the result of a partition key update that moved the
* tuple to a new partition, put this row into the transition NEW TABLE,
* if there is one. We need to do this separately for DELETE and INSERT
* because they happen on different tables.
*/
ExecARUpdateTriggers(estate, resultRelInfo, NULL,
NULL,
slot,
NULL,
mtstate->mt_transition_capture);

/*
* But we do want to fire plain per-row INSERT triggers on the
* new table. By not passing in transition_capture we prevent
* ....
*/
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
NULL);
}
else
{
/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
ar_insert_trig_tcs);
}

Maybe you meant to use mtstate->mt_transition_capture instead of
ar_insert_trig_tcs in the else block. We don't need
ar_insert_trig_tcs at all.

it seems like it'd be quite a bit clearer (although I do think the
comments also need a fair bit of polishing independent of this proposed
change).

Fwiw, I agree with your proposed restructuring, although I'd let Amit
Kh chime in as he'd be more familiar with this code. I wasn't aware
of this partitioning-related bit being present in ExecInsert().

Would you like me to write a patch for some or all items?

Thanks,
Amit

[1]: /messages/by-id/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp

#3Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#2)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-07-18 14:24:29 +0900, Amit Langote wrote:

On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote:

1) How come partition routing is done outside of ExecInsert()?

case CMD_INSERT:
/* Prepare for tuple routing if needed. */
if (proute)
slot = ExecPrepareTupleRouting(node, estate, proute,
resultRelInfo, slot);
slot = ExecInsert(node, slot, planSlot,
estate, node->canSetTag);
/* Revert ExecPrepareTupleRouting's state change. */
if (proute)
estate->es_result_relation_info = resultRelInfo;
break;

That already seems like a layering violation,

The decision to move partition routing out of ExecInsert() came about
when we encountered a bug [1] whereby ExecInsert() would fail to reset
estate->es_result_relation_info back to the root table if it had to
take an abnormal path out (early return), of which there are quite a
few instances. The first solution I came up with was to add a goto
label for the code to reset estate->es_result_relation_info and jump
to it from the various places that do an early return, which was
complained about as reducing readability. So, the solution we
eventually settled on in 6666ee49f was to perform ResultRelInfos
switching at a higher level.

I think that was the wrong path, given that the code now lives in
multiple places. Without even a comment explaining that if one has to be
changed, the other has to be changed too.

It seems to me that if we just moved the ExecPrepareTupleRouting() into
ExecInsert(), we could remove the duplication.

I agree that there's duplication here. Given what I wrote above, I
can think of doing this: move all of ExecInsert()'s code into
ExecInsertInternal() and make the former instead look like this:

For me just having the gotos is cleaner than that here.

But perhaps the right fix would be to not have ExecPrepareTupleRouting()
change global state at all, and instead change it much more locally
inside ExecInsert(), around the calls that need it to be set
differently.

Or perhaps the actually correct fix is to remove es_result_relation_info
alltogether, and just pass it down the places that need it - we've a lot
more code setting it than using the value. And it'd not be hard to
actually pass it to the places that read it. Given all the
setting/resetting of it it's pretty obvious that a query-global resource
isn't the right place for it.

/*
* triggers behave differently depending on this being a delete as
* part of a partion move, or a deletion proper.
if (mtstate->operation == CMD_UPDATE)
{
/*
* If this insert is the result of a partition key update that moved the
* tuple to a new partition, put this row into the transition NEW TABLE,
* if there is one. We need to do this separately for DELETE and INSERT
* because they happen on different tables.
*/
ExecARUpdateTriggers(estate, resultRelInfo, NULL,
NULL,
slot,
NULL,
mtstate->mt_transition_capture);

/*
* But we do want to fire plain per-row INSERT triggers on the
* new table. By not passing in transition_capture we prevent
* ....
*/
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
NULL);
}
else
{
/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
ar_insert_trig_tcs);
}

Maybe you meant to use mtstate->mt_transition_capture instead of
ar_insert_trig_tcs in the else block. We don't need
ar_insert_trig_tcs at all.

Yes, it was just a untested example of how the code could be made
clearer.

it seems like it'd be quite a bit clearer (although I do think the
comments also need a fair bit of polishing independent of this proposed
change).

Fwiw, I agree with your proposed restructuring, although I'd let Amit
Kh chime in as he'd be more familiar with this code. I wasn't aware
of this partitioning-related bit being present in ExecInsert().

Would you like me to write a patch for some or all items?

Yes, that would be awesome.

Greetings,

Andres Freund

#4Amit Langote
amitlangote09@gmail.com
In reply to: Andres Freund (#3)
Re: partition routing layering in nodeModifyTable.c

On Thu, Jul 18, 2019 at 2:53 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-07-18 14:24:29 +0900, Amit Langote wrote:

On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote:

1) How come partition routing is done outside of ExecInsert()?

case CMD_INSERT:
/* Prepare for tuple routing if needed. */
if (proute)
slot = ExecPrepareTupleRouting(node, estate, proute,
resultRelInfo, slot);
slot = ExecInsert(node, slot, planSlot,
estate, node->canSetTag);
/* Revert ExecPrepareTupleRouting's state change. */
if (proute)
estate->es_result_relation_info = resultRelInfo;
break;

That already seems like a layering violation,

The decision to move partition routing out of ExecInsert() came about
when we encountered a bug [1] whereby ExecInsert() would fail to reset
estate->es_result_relation_info back to the root table if it had to
take an abnormal path out (early return), of which there are quite a
few instances. The first solution I came up with was to add a goto
label for the code to reset estate->es_result_relation_info and jump
to it from the various places that do an early return, which was
complained about as reducing readability. So, the solution we
eventually settled on in 6666ee49f was to perform ResultRelInfos
switching at a higher level.

I think that was the wrong path, given that the code now lives in
multiple places. Without even a comment explaining that if one has to be
changed, the other has to be changed too.

It seems to me that if we just moved the ExecPrepareTupleRouting() into
ExecInsert(), we could remove the duplication.

I agree that there's duplication here. Given what I wrote above, I
can think of doing this: move all of ExecInsert()'s code into
ExecInsertInternal() and make the former instead look like this:

For me just having the gotos is cleaner than that here.

But perhaps the right fix would be to not have ExecPrepareTupleRouting()
change global state at all, and instead change it much more locally
inside ExecInsert(), around the calls that need it to be set
differently.

Or perhaps the actually correct fix is to remove es_result_relation_info
alltogether, and just pass it down the places that need it - we've a lot
more code setting it than using the value. And it'd not be hard to
actually pass it to the places that read it. Given all the
setting/resetting of it it's pretty obvious that a query-global resource
isn't the right place for it.

I tend to agree that managing state through es_result_relation_info
across various operations on a result relation has turned a bit messy
at this point. That said, while most of the places that access the
currently active result relation from es_result_relation_info can be
easily modified to receive it directly, the FDW API BeginDirectModify
poses bit of a challenge. BeginDirectlyModify() is called via
ExecInitForeignScan() that in turn can't be changed to add a result
relation (Index or ResultRelInfo *) argument, so the only way left for
BeginDirectlyModify() is to access it via es_result_relation_info.

Maybe we can do to ExecPrepareTupleRouting() what you say -- remove
all code in it that changes ModifyTable-global and EState-global
states. Also, maybe call ExecPrepareTupleRouting() inside
ExecInsert() at the beginning instead of outside of it. I agree that
setting and reverting global states around the exact piece of code
that need that to be done is better for clarity. All of that assuming
you're not saying that we scrap ExecPrepareTupleRouting altogether.

Thoughts? Other opinions?

Would you like me to write a patch for some or all items?

Yes, that would be awesome.

OK, I will try to post a patch soon.

Thanks,
Amit

#5Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#4)
Re: partition routing layering in nodeModifyTable.c

On Thu, Jul 18, 2019 at 4:51 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jul 18, 2019 at 2:53 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-07-18 14:24:29 +0900, Amit Langote wrote:

On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote:

1) How come partition routing is done outside of ExecInsert()?

case CMD_INSERT:
/* Prepare for tuple routing if needed. */
if (proute)
slot = ExecPrepareTupleRouting(node, estate, proute,
resultRelInfo, slot);
slot = ExecInsert(node, slot, planSlot,
estate, node->canSetTag);
/* Revert ExecPrepareTupleRouting's state change. */
if (proute)
estate->es_result_relation_info = resultRelInfo;
break;

That already seems like a layering violation,

The decision to move partition routing out of ExecInsert() came about
when we encountered a bug [1] whereby ExecInsert() would fail to reset
estate->es_result_relation_info back to the root table if it had to
take an abnormal path out (early return), of which there are quite a
few instances. The first solution I came up with was to add a goto
label for the code to reset estate->es_result_relation_info and jump
to it from the various places that do an early return, which was
complained about as reducing readability. So, the solution we
eventually settled on in 6666ee49f was to perform ResultRelInfos
switching at a higher level.

I think that was the wrong path, given that the code now lives in
multiple places. Without even a comment explaining that if one has to be
changed, the other has to be changed too.

I thought this would be OK because we have the
ExecPrepareTupleRouting() call in just two places in a single source
file, at least currently.

Or perhaps the actually correct fix is to remove es_result_relation_info
alltogether, and just pass it down the places that need it - we've a lot
more code setting it than using the value. And it'd not be hard to
actually pass it to the places that read it. Given all the
setting/resetting of it it's pretty obvious that a query-global resource
isn't the right place for it.

I tend to agree that managing state through es_result_relation_info
across various operations on a result relation has turned a bit messy
at this point. That said, while most of the places that access the
currently active result relation from es_result_relation_info can be
easily modified to receive it directly, the FDW API BeginDirectModify
poses bit of a challenge. BeginDirectlyModify() is called via
ExecInitForeignScan() that in turn can't be changed to add a result
relation (Index or ResultRelInfo *) argument, so the only way left for
BeginDirectlyModify() is to access it via es_result_relation_info.

That's right. I'm not sure that's a good idea, because I think other
extensions also might look at es_result_relation_info, and if so,
removing es_result_relation_info altogether would require the
extension authors to update their extensions without any benefit,
which I think isn't a good thing.

Best regards,
Etsuro Fujita

#6Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#4)
2 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Thu, Jul 18, 2019 at 4:50 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jul 18, 2019 at 2:53 PM Andres Freund <andres@anarazel.de> wrote:

On 2019-07-18 14:24:29 +0900, Amit Langote wrote:

On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote:

Or perhaps the actually correct fix is to remove es_result_relation_info
alltogether, and just pass it down the places that need it - we've a lot
more code setting it than using the value. And it'd not be hard to
actually pass it to the places that read it. Given all the
setting/resetting of it it's pretty obvious that a query-global resource
isn't the right place for it.

Would you like me to write a patch for some or all items?

Yes, that would be awesome.

OK, I will try to post a patch soon.

Attached are two patches.

The first one (0001) deals with reducing the core executor's reliance
on es_result_relation_info to access the currently active result
relation, in favor of receiving it from the caller as a function
argument. So no piece of core code relies on it being correctly set
anymore. It still needs to be set correctly for the third-party code
such as FDWs. Also, because the partition routing related suggestions
upthread are closely tied into this, especially those around
ExecInsert(), I've included them in the same patch. I chose to keep
the function ExecPrepareTupleRouting, even though it's now only called
from ExecInsert(), to preserve the readability of the latter.

The second patch (0002) implements some rearrangement of the UPDATE
tuple movement code, addressing the point 2 of in the first email of
this thread. Mainly the block of code in ExecUpdate() that implements
row movement proper has been moved in a function called ExecMove().
It also contains the cosmetic improvements suggested in point 4.

Thanks,
Amit

Attachments:

v1-0001-Reduce-es_result_relation_info-usage.patchapplication/octet-stream; name=v1-0001-Reduce-es_result_relation_info-usage.patchDownload
From 95de45e03e62abc284deb9ba1107e3b84ee7dfc9 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v1 1/2] Reduce es_result_relation_info usage

Change many places that access the currently active result relation
using es_result_relation_info to receive it directly via function
arguments.  Maintaining that state in es_result_relation_info has
become cumbersome, especially with partitioning where each partition
gets its own result relation info.  Having to set and reset it across
arbitrary operations has caused bugs in the past.

We still need to set it before calling extension code, such as FDWs,
because the existing interfaces leave them with no option but to
access the result relation via EState.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/nodeModifyTable.c   | 187 ++++++++++++++-----------------
 src/backend/replication/logical/worker.c |  39 ++++---
 src/include/executor/executor.h          |  19 +++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   8 +-
 9 files changed, 147 insertions(+), 162 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4f04d122c3..2f682de785 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2445,9 +2445,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2480,7 +2477,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2845,7 +2843,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3117,11 +3114,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3225,7 +3217,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3296,7 +3288,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fc1c4dfa4c..cedaecc6b9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1745,7 +1745,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1875,7 +1874,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d8b695d897..4a92ae17b4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,50 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+		/* Result relation has changed, so update EState reference too. */
+		estate->es_result_relation_info = resultRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +411,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -428,7 +447,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -490,8 +509,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -551,7 +570,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -590,7 +610,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -676,6 +697,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -687,7 +709,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -697,10 +718,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1037,6 +1054,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1045,12 +1063,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1060,10 +1076,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1090,7 +1102,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1127,7 +1139,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1177,6 +1189,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1202,8 +1215,8 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate, false, false /* canSetTag */ ,
 					   true /* changingPart */ , &tuple_deleted, &epqslot);
 
 			/*
@@ -1245,16 +1258,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1271,18 +1274,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1448,7 +1451,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1687,7 +1691,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1844,41 +1848,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -1989,10 +1989,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2041,14 +2039,9 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
+	 * Save the result relation in EState, because that's the only place
+	 * where external modules such as FDWs may find it.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	estate->es_result_relation_info = resultRelInfo;
 
 	/*
@@ -2084,6 +2077,9 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
+				/*
+				 * Result relation has changed, so update EState reference too.
+				 */
 				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
@@ -2129,7 +2125,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2212,23 +2207,17 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
 								  true, node->canSetTag,
 								  false /* changingPart */ , NULL, NULL);
 				break;
@@ -2242,15 +2231,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2271,7 +2254,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2314,14 +2296,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2362,8 +2338,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			operation == CMD_UPDATE)
 			update_tuple_routing_needed = true;
 
-		/* Now init the plan for this result rel */
+		/*
+		 * Save the result relation in EState, because that's the only place
+		 * where external modules such as FDWs may find it.  (See
+		 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify()
+		 * as one example.)
+		 */
 		estate->es_result_relation_info = resultRelInfo;
+
+		/* Now init the plan for this result rel */
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2387,8 +2370,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..7df3e78b22 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -173,10 +173,10 @@ ensure_transaction(void)
  * This is based on similar code in copy.c
  */
 static EState *
-create_estate_for_relation(LogicalRepRelMapEntry *rel)
+create_estate_for_relation(LogicalRepRelMapEntry *rel,
+						   ResultRelInfo **resultRelInfo)
 {
 	EState	   *estate;
-	ResultRelInfo *resultRelInfo;
 	RangeTblEntry *rte;
 
 	estate = CreateExecutorState();
@@ -188,12 +188,11 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	rte->rellockmode = AccessShareLock;
 	ExecInitRangeTable(estate, list_make1(rte));
 
-	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	*resultRelInfo = makeNode(ResultRelInfo);
+	InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0);
 
-	estate->es_result_relations = resultRelInfo;
+	estate->es_result_relations = *resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -589,7 +589,7 @@ apply_handle_insert(StringInfo s)
 	}
 
 	/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel, &resultRelInfo);
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -603,13 +603,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -664,6 +664,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -696,7 +697,7 @@ apply_handle_update(StringInfo s)
 	check_relation_updatable(rel);
 
 	/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel, &resultRelInfo);
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -705,7 +706,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -747,7 +748,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -763,7 +765,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -786,6 +788,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -815,7 +818,7 @@ apply_handle_delete(StringInfo s)
 	check_relation_updatable(rel);
 
 	/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel, &resultRelInfo);
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -824,7 +827,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -852,7 +855,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -864,7 +867,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4596..3ecdcc3a34 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 98bdcbcef5..9ce58049f5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,7 +519,13 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
+
+	/*
+	 * Currently active result relation.  The core code no longer uses this
+	 * value, but it's still maintained for the convenience of third party
+	 * code.
+	 */
+	ResultRelInfo *es_result_relation_info;
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
-- 
2.11.0

v1-0002-Rearrange-partition-update-row-movement-code-a-bi.patchapplication/octet-stream; name=v1-0002-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From 8155bcd9dda6973963faa2149cb7a59d9f76194b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v1 2/2] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecMove() which must be retried until
it finishes the movement.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 327 ++++++++++++++++++---------------
 1 file changed, 180 insertions(+), 147 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4a92ae17b4..25a587723d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -623,31 +622,29 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If this delete is a part of a partition key update, put this row into
+	 * the UPDATE trigger's NEW TABLE instead of that of an INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -713,7 +710,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -958,32 +954,29 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If this delete is a part of a partition key update, put this row into
+	 * the UPDATE trigger's OLD TABLE instead of that of a DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1030,6 +1023,139 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecMove
+ *		Move an updated tuple from the input result relation to the
+ *		new partition of its root parent table
+ *
+ *	This works by first deleting the tuple from the input result relation
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo.
+ *
+ *	Returns true if it's detected that the tuple we're trying to move has
+ *	been concurrently updated.
+ */
+static bool
+ExecMove(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
+		 ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *planSlot,
+		 EPQState *epqstate, bool canSetTag, TupleTableSlot **slot,
+		 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, *slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate, false, false /* canSetTag */ ,
+			   true /* changingPart */ , &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return false;
+		else
+		{
+			*slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return true;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		*slot = execute_attr_map_slot(tupconv_map->attrMap,
+									  *slot,
+									  mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, *slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return false;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1183,116 +1309,23 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecMove() will first DELETE the row from the current partition
+			 * and then insert it back into the root table, which will
+			 * re-route it to the correct partition.  The first part may have
+			 * to be repeated if ExecMove() detects that the tuple we're
+			 * trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+			retry = ExecMove(mtstate, resultRelInfo, tupleid, oldtuple,
+							 planSlot, epqstate, canSetTag,
+							 &slot, &inserted_tuple);
+			if (retry)
+				goto lreplace;
 
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
-			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
-			}
-
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.11.0

#7Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#6)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-07-19 17:52:20 +0900, Amit Langote wrote:

Attached are two patches.

Awesome.

The first one (0001) deals with reducing the core executor's reliance
on es_result_relation_info to access the currently active result
relation, in favor of receiving it from the caller as a function
argument. So no piece of core code relies on it being correctly set
anymore. It still needs to be set correctly for the third-party code
such as FDWs.

I'm inclined to just remove it. There's not much code out there relying
on it, as far as I can tell. Most FDWs don't support the direct modify
API, and that's afaict the case where we one needs to use
es_result_relation_info?

In fact, I searched through alllFDWs listed on https://wiki.postgresql.org/wiki/Foreign_data_wrappers
that are on github and in first few categories (up and including to
"file wrappers"), and there was only one reference to
es_result_relation_info, and that just in comments in a test:
https://github.com/pgspider/griddb_fdw/search?utf8=%E2%9C%93&amp;q=es_result_relation_info&amp;type=
which I think was just copied from our source code.

IOW, we should just change the direct modify calls to get the relevant
ResultRelationInfo or something in that vein (perhaps just the relevant
RT index?).

pglogical also references it, but just because it creates its own
EState afaict.

@@ -334,32 +335,50 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
*		ExecInsert
*
*		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
*
*		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
* ----------------------------------------------------------------
*/
static TupleTableSlot *
ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
TupleTableSlot *slot,
TupleTableSlot *planSlot,
EState *estate,
bool canSetTag)
{
-	ResultRelInfo *resultRelInfo;
Relation	resultRelationDesc;
List	   *recheckIndexes = NIL;
TupleTableSlot *result = NULL;
TransitionCaptureState *ar_insert_trig_tcs;
ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+		/* Result relation has changed, so update EState reference too. */
+		estate->es_result_relation_info = resultRelInfo;
+	}

I think by removing es_result_relation entirely, this would look
cleaner.

@@ -1271,18 +1274,18 @@ lreplace:;
mtstate->mt_root_tuple_slot);

/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
*/
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;

Wonder if we could remove the need for this somehow, it's still pretty
darn ugly. Thomas, perhaps you have some insights?

To me the need to modify these ModifyTable wide state on a per-subplan
and even per-partition basis indicates that the datastructures are in
the wrong place.

@@ -2212,23 +2207,17 @@ ExecModifyTable(PlanState *pstate)
switch (operation)
{
case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
break;
case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
break;
case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
true, node->canSetTag,
false /* changingPart */ , NULL, NULL);
break;

This reminds me of another complaint: ExecDelete and ExecInsert() have
gotten more boolean parameters for partition moving, but only one of
them is explained with a comment (/* changingPart */) - think we should
do that for all.

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..7df3e78b22 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -173,10 +173,10 @@ ensure_transaction(void)
* This is based on similar code in copy.c
*/
static EState *
-create_estate_for_relation(LogicalRepRelMapEntry *rel)
+create_estate_for_relation(LogicalRepRelMapEntry *rel,
+						   ResultRelInfo **resultRelInfo)
{
EState	   *estate;
-	ResultRelInfo *resultRelInfo;
RangeTblEntry *rte;

estate = CreateExecutorState();
@@ -188,12 +188,11 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
rte->rellockmode = AccessShareLock;
ExecInitRangeTable(estate, list_make1(rte));

-	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	*resultRelInfo = makeNode(ResultRelInfo);
+	InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0);
-	estate->es_result_relations = resultRelInfo;
+	estate->es_result_relations = *resultRelInfo;
estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;

estate->es_output_cid = GetCurrentCommandId(true);

@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
static void
apply_handle_insert(StringInfo s)
{
+ ResultRelInfo *resultRelInfo;
LogicalRepRelMapEntry *rel;
LogicalRepTupleData newtup;
LogicalRepRelId relid;
@@ -589,7 +589,7 @@ apply_handle_insert(StringInfo s)
}

/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel, &resultRelInfo);

Hm. It kinda seems cleaner if we were to instead return the relevant
index, rather than the entire ResultRelInfo, as an output from
create_estate_for_relation(). Makes it clearer that it's still in the
EState.

Or perhaps we ought to compute it in a separate step? Then that'd be
more amenable to support replcating into partition roots.

/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If this delete is a part of a partition key update, put this row into
+	 * the UPDATE trigger's NEW TABLE instead of that of an INSERT trigger.
*/
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
*/
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);

While a tiny bit more code, perhaps, this is considerably clearer
imo. Thanks.

+/*
+ *	ExecMove
+ *		Move an updated tuple from the input result relation to the
+ *		new partition of its root parent table
+ *
+ *	This works by first deleting the tuple from the input result relation
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo.
+ *
+ *	Returns true if it's detected that the tuple we're trying to move has
+ *	been concurrently updated.
+ */
+static bool
+ExecMove(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
+		 ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *planSlot,
+		 EPQState *epqstate, bool canSetTag, TupleTableSlot **slot,
+		 TupleTableSlot **inserted_tuple)
+{

I know that it was one of the names I proposed, but now that I'm
thinking about it again, it sounds too generic. Perhaps
ExecCrossPartitionUpdate() wouldn't be a quite so generic name? Since
there's only one reference the longer name wouldn't be painful.

+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate, false, false /* canSetTag */ ,
+			   true /* changingPart */ , &tuple_deleted, &epqslot);

Here again it'd be nice if all the booleans would be explained with a
comment.

Greetings,

Andres Freund

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#7)
Re: partition routing layering in nodeModifyTable.c

On 2019-Jul-19, Andres Freund wrote:

On 2019-07-19 17:52:20 +0900, Amit Langote wrote:

The first one (0001) deals with reducing the core executor's reliance
on es_result_relation_info to access the currently active result
relation, in favor of receiving it from the caller as a function
argument. So no piece of core code relies on it being correctly set
anymore. It still needs to be set correctly for the third-party code
such as FDWs.

I'm inclined to just remove it. There's not much code out there relying
on it, as far as I can tell. Most FDWs don't support the direct modify
API, and that's afaict the case where we one needs to use
es_result_relation_info?

Yeah, I too agree with removing it; since our code doesn't use it, it
seems very likely that it will become slightly out of sync with reality
and we'd not notice until some FDW misbehaves weirdly.

-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
true, node->canSetTag,
false /* changingPart */ , NULL, NULL);
break;

This reminds me of another complaint: ExecDelete and ExecInsert() have
gotten more boolean parameters for partition moving, but only one of
them is explained with a comment (/* changingPart */) - think we should
do that for all.

Maybe change the API to use a flags bitmask?

(IMO the placement of the comment inside the function call, making the
comma appear preceded with a space, looks ugly. If we want to add
comments, let's put each param on its own line with the comment beyond
the comma. That's what we do in other places where this pattern is
used.)

/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel, &resultRelInfo);

Hm. It kinda seems cleaner if we were to instead return the relevant
index, rather than the entire ResultRelInfo, as an output from
create_estate_for_relation(). Makes it clearer that it's still in the
EState.

Yeah.

Or perhaps we ought to compute it in a separate step? Then that'd be
more amenable to support replcating into partition roots.

I'm not quite seeing the shape that you're imagining this would take.
I vote not to mess with that for this patch; I bet that we'll have to
change a few other things in this code when we add better support for
partitioning in logical replication.

+/*
+ *	ExecMove
+ *		Move an updated tuple from the input result relation to the
+ *		new partition of its root parent table
+ *
+ *	This works by first deleting the tuple from the input result relation
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo.
+ *
+ *	Returns true if it's detected that the tuple we're trying to move has
+ *	been concurrently updated.
+ */
+static bool
+ExecMove(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
+		 ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *planSlot,
+		 EPQState *epqstate, bool canSetTag, TupleTableSlot **slot,
+		 TupleTableSlot **inserted_tuple)
+{

I know that it was one of the names I proposed, but now that I'm
thinking about it again, it sounds too generic. Perhaps
ExecCrossPartitionUpdate() wouldn't be a quite so generic name? Since
there's only one reference the longer name wouldn't be painful.

That name sounds good. Isn't the return convention backwards? Sounds
like "true" should mean that it succeeded.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#8)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-07-19 17:11:10 -0400, Alvaro Herrera wrote:

On 2019-Jul-19, Andres Freund wrote:

-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
true, node->canSetTag,
false /* changingPart */ , NULL, NULL);
break;

This reminds me of another complaint: ExecDelete and ExecInsert() have
gotten more boolean parameters for partition moving, but only one of
them is explained with a comment (/* changingPart */) - think we should
do that for all.

Maybe change the API to use a flags bitmask?

(IMO the placement of the comment inside the function call, making the
comma appear preceded with a space, looks ugly. If we want to add
comments, let's put each param on its own line with the comment beyond
the comma. That's what we do in other places where this pattern is
used.)

Well, that's the pre-existing style, so I'd just have gone with
that. I'm not sure I buy there's much point in going for a bitmask, as
this is file-private code, not code where changing the signature
requires modifying multiple files.

/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel, &resultRelInfo);

Hm. It kinda seems cleaner if we were to instead return the relevant
index, rather than the entire ResultRelInfo, as an output from
create_estate_for_relation(). Makes it clearer that it's still in the
EState.

Yeah.

Or perhaps we ought to compute it in a separate step? Then that'd be
more amenable to support replcating into partition roots.

I'm not quite seeing the shape that you're imagining this would take.
I vote not to mess with that for this patch; I bet that we'll have to
change a few other things in this code when we add better support for
partitioning in logical replication.

Yea, I think it's fine to do that separately. If we wanted to support
replication roots as replication targets, we'd obviously need to do
something pretty similar to what ExecInsert()/ExecUpdate() already
do. And there we can't just reference an index in EState, as partition
children aren't in there.

I kind of was wondering if we were to have a separate function for
getting the ResultRelInfo targeted, we'd be able to just extend that
function to support replication. But now that I think about it a bit
more, that's so much just scratching the surface...

We really ought to have the replication "sink" code share more code with
nodeModifyTable.c.

Greetings,

Andres Freund

#10Amit Langote
amitlangote09@gmail.com
In reply to: Andres Freund (#7)
3 attachment(s)
Re: partition routing layering in nodeModifyTable.c

Hi Andres,

Sorry about the delay in replying as I was on vacation for the last few days.

On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote:

The first one (0001) deals with reducing the core executor's reliance
on es_result_relation_info to access the currently active result
relation, in favor of receiving it from the caller as a function
argument. So no piece of core code relies on it being correctly set
anymore. It still needs to be set correctly for the third-party code
such as FDWs.

I'm inclined to just remove it. There's not much code out there relying
on it, as far as I can tell. Most FDWs don't support the direct modify
API, and that's afaict the case where we one needs to use
es_result_relation_info?

Right, only the directly modify API uses it.

In fact, I searched through alllFDWs listed on https://wiki.postgresql.org/wiki/Foreign_data_wrappers
that are on github and in first few categories (up and including to
"file wrappers"), and there was only one reference to
es_result_relation_info, and that just in comments in a test:
https://github.com/pgspider/griddb_fdw/search?utf8=%E2%9C%93&amp;q=es_result_relation_info&amp;type=
which I think was just copied from our source code.

IOW, we should just change the direct modify calls to get the relevant
ResultRelationInfo or something in that vein (perhaps just the relevant
RT index?).

It seems easy to make one of the two functions that constitute the
direct modify API, IterateDirectModify(), access the result relation
from ForeignScanState by saving either the result relation RT index or
ResultRelInfo pointer itself into the ForeignScanState's FDW-private
area. For example, for postgres_fdw, one would simply add a new
member to PgFdwDirectModifyState struct.

Doing that for the other function BeginDirectModify() seems a bit more
involved. We could add a new field to ForeignScan, say
resultRelation, that's set by either PlanDirectModify() (the FDW code)
or make_modifytable() (the core code) if the ForeignScan node contains
the command for direct modification. BeginDirectModify() can then use
that value instead of relying on es_result_relation_info being set.

Thoughts? Fujita-san, do you have any opinion on whether that would
be a good idea?

pglogical also references it, but just because it creates its own
EState afaict.

That sounds easily manageable.

@@ -334,32 +335,50 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
*           ExecInsert
*
*           For INSERT, we have to insert the tuple into the target relation
- *           and insert appropriate tuples into the index relations.
+ *           (or partition thereof) and insert appropriate tuples into the index
+ *           relations.
*
*           Returns RETURNING result if any, otherwise NULL.
+ *
+ *           This may change the currently active tuple conversion map in
+ *           mtstate->mt_transition_capture, so the callers must take care to
+ *           save the previous value to avoid losing track of it.
* ----------------------------------------------------------------
*/
static TupleTableSlot *
ExecInsert(ModifyTableState *mtstate,
+                ResultRelInfo *resultRelInfo,
TupleTableSlot *slot,
TupleTableSlot *planSlot,
EState *estate,
bool canSetTag)
{
-     ResultRelInfo *resultRelInfo;
Relation        resultRelationDesc;
List       *recheckIndexes = NIL;
TupleTableSlot *result = NULL;
TransitionCaptureState *ar_insert_trig_tcs;
ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
OnConflictAction onconflict = node->onConflictAction;
+     PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+     /*
+      * If the input result relation is a partitioned table, find the leaf
+      * partition to insert the tuple into.
+      */
+     if (proute)
+     {
+             ResultRelInfo *partRelInfo;
+
+             slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+                                                                        resultRelInfo, slot,
+                                                                        &partRelInfo);
+             resultRelInfo = partRelInfo;
+             /* Result relation has changed, so update EState reference too. */
+             estate->es_result_relation_info = resultRelInfo;
+     }

I think by removing es_result_relation entirely, this would look
cleaner.

I agree. Maybe, setting es_result_relation_info here isn't really
needed, because the ResultRelInfo is directly passed through
ExecForeignInsert. Still, some FDWs may be relying on
es_result_relation_info being correctly set despite the
aforementioned. Again, the only way to get them to stop doing so may
be to remove it.

@@ -1271,18 +1274,18 @@ lreplace:;
mtstate->mt_root_tuple_slot);

/*
-                      * Prepare for tuple routing, making it look like we're inserting
-                      * into the root.
+                      * ExecInsert() may scribble on mtstate->mt_transition_capture,
+                      * so save the currently active map.
*/
+                     if (mtstate->mt_transition_capture)
+                             saved_tcs_map = mtstate->mt_transition_capture->tcs_map;

Wonder if we could remove the need for this somehow, it's still pretty
darn ugly. Thomas, perhaps you have some insights?

To me the need to modify these ModifyTable wide state on a per-subplan
and even per-partition basis indicates that the datastructures are in
the wrong place.

I agree that having to ensure tcs_map is set correctly is cumbersome,
because it has to be reset every time the currently active result
relation changes. I think a better place for the map to be is
ResultRelInfo itself. The trigger code can just get the correct map
from the ResultRelInfo of the result relation it's processing.

Regarding that idea, the necessary map is already present in the
tuple-routing state struct that's embedded in the partition's
ResultRelInfo. But the UPDATE result relations that are never
processed as tuple routing targets don't have routing info initialized
(also think non-partition inheritance children), so we could add
another TupleConversionMap * field in ResultRelInfo. Attached patch
0003 implements that.

With this change, we no longer need to track the map in a global
variable, that is, TransitionCaptureState no longer needs tcs_map. We
still have tcs_original_insert_tuple though, which must be set during
ExecInsert and reset after it's read by AfterTriggerSaveEvent. I have
moved the resetting of its value to right after where the originally
set value is read to make it clear that the value must be read only
once.

@@ -2212,23 +2207,17 @@ ExecModifyTable(PlanState *pstate)
switch (operation)
{
case CMD_INSERT:
-                             /* Prepare for tuple routing if needed. */
-                             if (proute)
-                                     slot = ExecPrepareTupleRouting(node, estate, proute,
-                                                                                                resultRelInfo, slot);
-                             slot = ExecInsert(node, slot, planSlot,
+                             slot = ExecInsert(node, resultRelInfo, slot, planSlot,
estate, node->canSetTag);
-                             /* Revert ExecPrepareTupleRouting's state change. */
-                             if (proute)
-                                     estate->es_result_relation_info = resultRelInfo;
break;
case CMD_UPDATE:
-                             slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-                                                               &node->mt_epqstate, estate, node->canSetTag);
+                             slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+                                                               planSlot, &node->mt_epqstate, estate,
+                                                               node->canSetTag);
break;
case CMD_DELETE:
-                             slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-                                                               &node->mt_epqstate, estate,
+                             slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+                                                               planSlot, &node->mt_epqstate, estate,
true, node->canSetTag,
false /* changingPart */ , NULL, NULL);
break;

This reminds me of another complaint: ExecDelete and ExecInsert() have
gotten more boolean parameters for partition moving, but only one of
them is explained with a comment (/* changingPart */) - think we should
do that for all.

Agree about the confusing state of ExecDelete call sites. I've
reformatted the calls to properly label the arguments (the changes are
contained in the revised 0001). I don't see many
partitioning-specific boolean parameters in ExecInsert though.

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..7df3e78b22 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -173,10 +173,10 @@ ensure_transaction(void)
* This is based on similar code in copy.c
*/
static EState *
-create_estate_for_relation(LogicalRepRelMapEntry *rel)
+create_estate_for_relation(LogicalRepRelMapEntry *rel,
+                                                ResultRelInfo **resultRelInfo)
{
EState     *estate;
-     ResultRelInfo *resultRelInfo;
RangeTblEntry *rte;

estate = CreateExecutorState();
@@ -188,12 +188,11 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
rte->rellockmode = AccessShareLock;
ExecInitRangeTable(estate, list_make1(rte));

-     resultRelInfo = makeNode(ResultRelInfo);
-     InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+     *resultRelInfo = makeNode(ResultRelInfo);
+     InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0);
-     estate->es_result_relations = resultRelInfo;
+     estate->es_result_relations = *resultRelInfo;
estate->es_num_result_relations = 1;
-     estate->es_result_relation_info = resultRelInfo;

estate->es_output_cid = GetCurrentCommandId(true);

@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
static void
apply_handle_insert(StringInfo s)
{
+ ResultRelInfo *resultRelInfo;
LogicalRepRelMapEntry *rel;
LogicalRepTupleData newtup;
LogicalRepRelId relid;
@@ -589,7 +589,7 @@ apply_handle_insert(StringInfo s)
}

/* Initialize the executor state. */
-     estate = create_estate_for_relation(rel);
+     estate = create_estate_for_relation(rel, &resultRelInfo);

Hm. It kinda seems cleaner if we were to instead return the relevant
index, rather than the entire ResultRelInfo, as an output from
create_estate_for_relation(). Makes it clearer that it's still in the
EState.

For now, I've reverted these changes in favor of just doing this:

/* Initialize the executor state. */
estate = create_estate_for_relation(rel);
+ resultRelInfo = &estate->es_result_relations[0];

This seems OK as we know for sure that there is only one target relation.

Or perhaps we ought to compute it in a separate step? Then that'd be
more amenable to support replcating into partition roots.

If we think of create_estate_for_relation() being like InitPlan(),
then perhaps it makes sense to leave it as is. Any setup needed for
replicating into partition roots will have to be in a separate
function anyway.

+/*
+ *   ExecMove
+ *           Move an updated tuple from the input result relation to the
+ *           new partition of its root parent table
+ *
+ *   This works by first deleting the tuple from the input result relation
+ *   followed by inserting it into the root parent table, that is,
+ *   mtstate->rootResultRelInfo.
+ *
+ *   Returns true if it's detected that the tuple we're trying to move has
+ *   been concurrently updated.
+ */
+static bool
+ExecMove(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
+              ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *planSlot,
+              EPQState *epqstate, bool canSetTag, TupleTableSlot **slot,
+              TupleTableSlot **inserted_tuple)
+{

I know that it was one of the names I proposed, but now that I'm
thinking about it again, it sounds too generic. Perhaps
ExecCrossPartitionUpdate() wouldn't be a quite so generic name? Since
there's only one reference the longer name wouldn't be painful.

OK, I've renamed ExecMove to ExecCrossPartitionUpdate.

+     /*
+      * Row movement, part 1.  Delete the tuple, but skip RETURNING
+      * processing. We want to return rows from INSERT.
+      */
+     ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+                        epqstate, estate, false, false /* canSetTag */ ,
+                        true /* changingPart */ , &tuple_deleted, &epqslot);

Here again it'd be nice if all the booleans would be explained with a
comment.

Done too.

Attached updated 0001, 0002, and the new 0003 for transition tuple
conversion map related refactoring as explained above.

Thanks,
Amit

Attachments:

v2-0003-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v2-0003-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From 72c010158ea534e449fdbb998274951400415df0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v2 3/3] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.
---
 src/backend/commands/copy.c            |  31 ++-----
 src/backend/commands/trigger.c         |  21 +++--
 src/backend/executor/execPartition.c   |  23 +++--
 src/backend/executor/nodeModifyTable.c | 156 +++++++++------------------------
 src/include/commands/trigger.h         |  10 +--
 src/include/nodes/execnodes.h          |   6 ++
 6 files changed, 84 insertions(+), 163 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f682de785..5d02c67389 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3114,32 +3114,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 2d9a8e9d54..43f796172c 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5750,14 +5749,26 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from the global variable and set
+		 * the latter to NULL because any given tuple must be read only once.
+		 * Note that the TransitionCaptureState is shared across many calls
+		 * to this function.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 729dc396a9..62b93f39d4 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -167,7 +167,8 @@ static void ExecInitRoutingInfo(ModifyTableState *mtstate,
 								PartitionTupleRouting *proute,
 								PartitionDispatch dispatch,
 								ResultRelInfo *partRelInfo,
-								int partidx);
+								int partidx,
+								bool is_update_result_rel);
 static PartitionDispatch ExecInitPartitionDispatchInfo(EState *estate,
 													   PartitionTupleRouting *proute,
 													   Oid partoid, PartitionDispatch parent_pd, int partidx);
@@ -389,7 +390,7 @@ ExecFindPartition(ModifyTableState *mtstate,
 
 						/* Set up the PartitionRoutingInfo for it */
 						ExecInitRoutingInfo(mtstate, estate, proute, dispatch,
-											rri, partidx);
+											rri, partidx, true);
 					}
 				}
 
@@ -676,7 +677,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 
 	/* Set up information needed for routing tuples to the partition. */
 	ExecInitRoutingInfo(mtstate, estate, proute, dispatch,
-						leaf_part_rri, partidx);
+						leaf_part_rri, partidx, false);
 
 	/*
 	 * If there is an ON CONFLICT clause, initialize state for it.
@@ -888,7 +889,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 					PartitionTupleRouting *proute,
 					PartitionDispatch dispatch,
 					ResultRelInfo *partRelInfo,
-					int partidx)
+					int partidx,
+					bool is_update_result_rel)
 {
 	MemoryContext oldcxt;
 	PartitionRoutingInfo *partrouteinfo;
@@ -935,10 +937,15 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot),
-								   gettext_noop("could not convert row type"));
+		/* If partition is an update target, then we already got the map. */
+		if (is_update_result_rel)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot),
+									   gettext_noop("could not convert row type"));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 253fe96e6c..8f4cf1e616 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -74,8 +74,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -339,10 +337,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1055,9 +1049,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1133,41 +1125,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1874,28 +1841,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1919,6 +1864,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1933,37 +1879,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -2018,20 +1944,6 @@ ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
 	}
 }
 
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2137,17 +2049,6 @@ ExecModifyTable(PlanState *pstate)
 				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2316,6 +2217,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2338,8 +2240,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2349,6 +2256,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2422,6 +2336,21 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc),
+									   gettext_noop("could not convert row type"));
 		resultRelInfo++;
 		i++;
 	}
@@ -2446,13 +2375,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
 	 * Construct mapping from each of the per-subplan partition attnos to the
 	 * root attno.  This is required when during update row movement the tuple
 	 * descriptor of a source partition does not match the root partitioned
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a46feeedb0..bb080980c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -65,14 +65,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9ce58049f5..ffa14d3248 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -485,6 +485,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
-- 
2.11.0

v2-0002-Rearrange-partition-update-row-movement-code-a-bi.patchapplication/octet-stream; name=v2-0002-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From 2e62f7d72b4cc93a1bb5f3ae9ab75e09862734b3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v2 2/3] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 27ad69aa1c..253fe96e6c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -623,31 +622,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -713,7 +711,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -958,32 +955,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1030,6 +1025,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1183,119 +1325,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.11.0

v2-0001-Reduce-es_result_relation_info-usage.patchapplication/octet-stream; name=v2-0001-Reduce-es_result_relation_info-usage.patchDownload
From cf9609ec037ae91352e734e6e5aa2a9a6a9939f6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v2 1/3] Reduce es_result_relation_info usage

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.

We still need to set it before calling third-party code, like FDWs,
because the existing interfaces leave them with no option but to
access the result relation via EState.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/nodeModifyTable.c   | 198 ++++++++++++++-----------------
 src/backend/replication/logical/worker.c |  26 ++--
 src/include/executor/executor.h          |  19 ++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   8 +-
 9 files changed, 150 insertions(+), 157 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4f04d122c3..2f682de785 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2445,9 +2445,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2480,7 +2477,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2845,7 +2843,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3117,11 +3114,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3225,7 +3217,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3296,7 +3288,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fb2be10794..e63b25bf25 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1746,7 +1746,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1876,7 +1875,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9e0c8794c4..27ad69aa1c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,50 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+		/* Result relation has changed, so update EState reference too. */
+		estate->es_result_relation_info = resultRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +411,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -428,7 +447,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -490,8 +509,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -551,7 +570,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -590,7 +610,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -676,6 +697,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -687,7 +709,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -697,10 +718,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1037,6 +1054,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1045,12 +1063,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1060,10 +1076,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1090,7 +1102,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1127,7 +1139,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1177,6 +1189,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1202,9 +1215,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1245,16 +1261,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1271,18 +1277,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1448,7 +1454,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1687,7 +1694,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1844,41 +1851,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -1989,10 +1992,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2041,14 +2042,9 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
+	 * Save the result relation in EState, because that's the only place
+	 * where external modules such as FDWs may find it.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	estate->es_result_relation_info = resultRelInfo;
 
 	/*
@@ -2084,6 +2080,9 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
+				/*
+				 * Result relation has changed, so update EState reference too.
+				 */
 				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
@@ -2129,7 +2128,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2212,25 +2210,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2242,15 +2236,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2271,7 +2259,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2314,14 +2301,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2362,8 +2343,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			operation == CMD_UPDATE)
 			update_tuple_routing_needed = true;
 
-		/* Now init the plan for this result rel */
+		/*
+		 * Save the result relation in EState, because that's the only place
+		 * where external modules such as FDWs may find it.  (See
+		 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify()
+		 * as one example.)
+		 */
 		estate->es_result_relation_info = resultRelInfo;
+
+		/* Now init the plan for this result rel */
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2387,8 +2375,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..10ef6af3e7 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -193,7 +193,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -590,6 +590,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -603,13 +604,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -664,6 +665,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -697,6 +699,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -705,7 +708,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -747,7 +750,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -763,7 +767,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -786,6 +790,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -816,6 +821,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -824,7 +830,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -852,7 +858,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -864,7 +870,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4596..3ecdcc3a34 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 98bdcbcef5..9ce58049f5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,7 +519,13 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
+
+	/*
+	 * Currently active result relation.  The core code no longer uses this
+	 * value, but it's still maintained for the convenience of third party
+	 * code.
+	 */
+	ResultRelInfo *es_result_relation_info;
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
-- 
2.11.0

#11Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#10)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Tue, Jul 30, 2019 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote:

The first one (0001) deals with reducing the core executor's reliance
on es_result_relation_info to access the currently active result
relation, in favor of receiving it from the caller as a function
argument. So no piece of core code relies on it being correctly set
anymore. It still needs to be set correctly for the third-party code
such as FDWs.

I'm inclined to just remove it. There's not much code out there relying
on it, as far as I can tell. Most FDWs don't support the direct modify
API, and that's afaict the case where we one needs to use
es_result_relation_info?

Right, only the directly modify API uses it.

In fact, I searched through alllFDWs listed on https://wiki.postgresql.org/wiki/Foreign_data_wrappers
that are on github and in first few categories (up and including to
"file wrappers"), and there was only one reference to
es_result_relation_info, and that just in comments in a test:
https://github.com/pgspider/griddb_fdw/search?utf8=%E2%9C%93&amp;q=es_result_relation_info&amp;type=
which I think was just copied from our source code.

IOW, we should just change the direct modify calls to get the relevant
ResultRelationInfo or something in that vein (perhaps just the relevant
RT index?).

It seems easy to make one of the two functions that constitute the
direct modify API, IterateDirectModify(), access the result relation
from ForeignScanState by saving either the result relation RT index or
ResultRelInfo pointer itself into the ForeignScanState's FDW-private
area. For example, for postgres_fdw, one would simply add a new
member to PgFdwDirectModifyState struct.

Doing that for the other function BeginDirectModify() seems a bit more
involved. We could add a new field to ForeignScan, say
resultRelation, that's set by either PlanDirectModify() (the FDW code)
or make_modifytable() (the core code) if the ForeignScan node contains
the command for direct modification. BeginDirectModify() can then use
that value instead of relying on es_result_relation_info being set.

Thoughts? Fujita-san, do you have any opinion on whether that would
be a good idea?

I looked into trying to do the things I mentioned above and it seems
to me that revising BeginDirectModify()'s API to receive the
ResultRelInfo directly as Andres suggested might be the best way
forward. I've implemented that in the attached 0001. Patches that
were previously 0001, 0002, and 0003 are now 0002, 003, and 0004,
respectively. 0002 is now a patch to "remove"
es_result_relation_info.

Thanks,
Amit

Attachments:

v3-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchapplication/octet-stream; name=v3-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchDownload
From fe4a05fcd0f16944ed27ee47a19393afa025f4be Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 31 Jul 2019 16:38:43 +0900
Subject: [PATCH v3 1/4] Revise BeginDirectModify API to pass ResultRelInfo
 directly

---
 contrib/postgres_fdw/postgres_fdw.c    | 22 ++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml           | 11 ++++++++++-
 src/backend/executor/nodeForeignscan.c |  7 +++----
 src/backend/executor/nodeModifyTable.c | 15 ++++++++++++---
 src/include/foreign/fdwapi.h           |  1 +
 5 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 033aeb2556..1b60ff88b1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -205,6 +205,9 @@ typedef struct PgFdwDirectModifyState
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 	bool		set_processed;	/* do we set the command es_processed? */
 
+	/* Information about the relation being modified */
+	ResultRelInfo *resultRelInfo;
+
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the update */
 	int			numParams;		/* number of parameters passed to query */
@@ -360,7 +363,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2340,7 +2345,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2368,7 +2375,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2414,6 +2421,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	dmstate->set_processed = intVal(list_nth(fsplan->fdw_private,
 											 FdwDirectModifyPrivateSetProcessed));
 
+	/* Save the ResultRelInfo of the relation being modified. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Create context for per-tuple temp workspace. */
 	dmstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
 											  "postgres_fdw temporary data",
@@ -2463,7 +2473,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4043,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4190,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb611..04c2eccd1c 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -871,6 +871,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -883,7 +884,9 @@ BeginDirectModify(ForeignScanState *node,
      the table to modify is accessible through the
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
+     information provided by <function>PlanDirectModify</function>).  In
+     addition, <literal>rinfo</literal> also contains information describing
+     the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node,
      for <function>ExplainDirectModify</function> and <function>EndDirectModify</function>.
     </para>
 
+    <note>
+     Also note that it's a good idea to store the <literal>rinfo</literal>
+     in the <structfield>fdw_state</structfield> for
+     <function>IterateDirectModify</function> to use.
+    </node>
+
     <para>
      If the <function>BeginDirectModify</function> pointer is set to
      <literal>NULL</literal>, no attempts to execute a direct modification on the
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..88d9f5da10 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -221,11 +221,10 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan.  Direct modification scans
+	 * are initialized elsewhere; see ExecInitModifyTable().
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
 	return scanstate;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9e0c8794c4..55ce709adb 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2370,9 +2370,18 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
 
 		/* Also let FDWs init themselves for foreign-table result rels */
-		if (!resultRelInfo->ri_usesFdwDirectModify &&
-			resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
+		if (resultRelInfo->ri_usesFdwDirectModify)
+		{
+			ForeignScanState *scanstate;
+
+			Assert(IsA(mtstate->mt_plans[i], ForeignScanState));
+			scanstate = (ForeignScanState *) mtstate->mt_plans[i];
+			resultRelInfo->ri_FdwRoutine->BeginDirectModify(scanstate,
+															resultRelInfo,
+															eflags);
+		}
+		else if (resultRelInfo->ri_FdwRoutine != NULL &&
+				 resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
 		{
 			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
 
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
-- 
2.11.0

v3-0002-Remove-es_result_relation_info.patchapplication/octet-stream; name=v3-0002-Remove-es_result_relation_info.patchDownload
From 88cb7d5a256f7d21268648f49595e513839030a2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v3 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   4 -
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 188 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  26 +++--
 src/include/executor/executor.h          |  19 +++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   1 -
 11 files changed, 128 insertions(+), 168 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4f04d122c3..2f682de785 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2445,9 +2445,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2480,7 +2477,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2845,7 +2843,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3117,11 +3114,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3225,7 +3217,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3296,7 +3288,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fb2be10794..e63b25bf25 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1746,7 +1746,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1876,7 +1875,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbd7dd9bcd..2e8802df07 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -858,9 +858,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -904,7 +901,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c1fc0d54e9..eaaf69bb93 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,8 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 55ce709adb..bf0679d753 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +409,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -428,7 +445,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -490,8 +507,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -551,7 +568,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -590,7 +608,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -676,6 +695,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -687,7 +707,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -697,10 +716,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1037,6 +1052,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1045,12 +1061,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1060,10 +1074,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1090,7 +1100,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1127,7 +1137,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1177,6 +1187,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1202,9 +1213,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1245,16 +1259,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1271,18 +1275,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1448,7 +1452,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1687,7 +1692,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1844,41 +1849,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -1989,10 +1990,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2041,17 +2040,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2084,7 +2072,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2129,7 +2116,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2212,25 +2198,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2242,15 +2224,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2271,7 +2247,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2314,14 +2289,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2363,7 +2332,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2396,8 +2364,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..10ef6af3e7 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -193,7 +193,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -590,6 +590,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -603,13 +604,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -664,6 +665,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -697,6 +699,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -705,7 +708,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -747,7 +750,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -763,7 +767,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -786,6 +790,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -816,6 +821,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -824,7 +830,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -852,7 +858,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -864,7 +870,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4596..3ecdcc3a34 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 98bdcbcef5..b527cd93ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,7 +519,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
-- 
2.11.0

v3-0004-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v3-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From 95eee81cff36bcbbfed75205d93f16dc1c53e1c5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v3 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.
---
 src/backend/commands/copy.c            |  31 ++-----
 src/backend/commands/trigger.c         |  21 +++--
 src/backend/executor/execPartition.c   |  23 +++--
 src/backend/executor/nodeModifyTable.c | 156 +++++++++------------------------
 src/include/commands/trigger.h         |  10 +--
 src/include/nodes/execnodes.h          |   6 ++
 6 files changed, 84 insertions(+), 163 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f682de785..5d02c67389 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3114,32 +3114,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 2d9a8e9d54..43f796172c 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5750,14 +5749,26 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from the global variable and set
+		 * the latter to NULL because any given tuple must be read only once.
+		 * Note that the TransitionCaptureState is shared across many calls
+		 * to this function.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 729dc396a9..62b93f39d4 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -167,7 +167,8 @@ static void ExecInitRoutingInfo(ModifyTableState *mtstate,
 								PartitionTupleRouting *proute,
 								PartitionDispatch dispatch,
 								ResultRelInfo *partRelInfo,
-								int partidx);
+								int partidx,
+								bool is_update_result_rel);
 static PartitionDispatch ExecInitPartitionDispatchInfo(EState *estate,
 													   PartitionTupleRouting *proute,
 													   Oid partoid, PartitionDispatch parent_pd, int partidx);
@@ -389,7 +390,7 @@ ExecFindPartition(ModifyTableState *mtstate,
 
 						/* Set up the PartitionRoutingInfo for it */
 						ExecInitRoutingInfo(mtstate, estate, proute, dispatch,
-											rri, partidx);
+											rri, partidx, true);
 					}
 				}
 
@@ -676,7 +677,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 
 	/* Set up information needed for routing tuples to the partition. */
 	ExecInitRoutingInfo(mtstate, estate, proute, dispatch,
-						leaf_part_rri, partidx);
+						leaf_part_rri, partidx, false);
 
 	/*
 	 * If there is an ON CONFLICT clause, initialize state for it.
@@ -888,7 +889,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 					PartitionTupleRouting *proute,
 					PartitionDispatch dispatch,
 					ResultRelInfo *partRelInfo,
-					int partidx)
+					int partidx,
+					bool is_update_result_rel)
 {
 	MemoryContext oldcxt;
 	PartitionRoutingInfo *partrouteinfo;
@@ -935,10 +937,15 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot),
-								   gettext_noop("could not convert row type"));
+		/* If partition is an update target, then we already got the map. */
+		if (is_update_result_rel)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot),
+									   gettext_noop("could not convert row type"));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index fad8b928bb..4868e6f3a6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -74,8 +74,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -339,10 +337,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1053,9 +1047,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1131,41 +1123,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1872,28 +1839,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1917,6 +1862,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1931,37 +1877,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -2016,20 +1942,6 @@ ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
 	}
 }
 
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2125,17 +2037,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2304,6 +2205,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2326,8 +2228,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2337,6 +2244,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2411,6 +2325,21 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc),
+									   gettext_noop("could not convert row type"));
 		resultRelInfo++;
 		i++;
 	}
@@ -2435,13 +2364,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
 	 * Construct mapping from each of the per-subplan partition attnos to the
 	 * root attno.  This is required when during update row movement the tuple
 	 * descriptor of a source partition does not match the root partitioned
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a46feeedb0..bb080980c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -65,14 +65,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b527cd93ed..0e70a62b26 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -485,6 +485,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
-- 
2.11.0

v3-0003-Rearrange-partition-update-row-movement-code-a-bi.patchapplication/octet-stream; name=v3-0003-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From 9f66668cddb46f52db65024b3d05cb6654604cb8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v3 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bf0679d753..fad8b928bb 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -621,31 +620,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -711,7 +709,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -956,32 +953,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1028,6 +1023,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1181,119 +1323,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.11.0

#12Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#11)
Re: partition routing layering in nodeModifyTable.c

On Wed, Jul 31, 2019 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jul 30, 2019 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote:

IOW, we should just change the direct modify calls to get the relevant
ResultRelationInfo or something in that vein (perhaps just the relevant
RT index?).

It seems easy to make one of the two functions that constitute the
direct modify API, IterateDirectModify(), access the result relation
from ForeignScanState by saving either the result relation RT index or
ResultRelInfo pointer itself into the ForeignScanState's FDW-private
area. For example, for postgres_fdw, one would simply add a new
member to PgFdwDirectModifyState struct.

Doing that for the other function BeginDirectModify() seems a bit more
involved. We could add a new field to ForeignScan, say
resultRelation, that's set by either PlanDirectModify() (the FDW code)
or make_modifytable() (the core code) if the ForeignScan node contains
the command for direct modification. BeginDirectModify() can then use
that value instead of relying on es_result_relation_info being set.

Thoughts? Fujita-san, do you have any opinion on whether that would
be a good idea?

I'm still not sure that it's a good idea to remove
es_result_relation_info, but if I had to say then I think the latter
would probably be better. I'm planning to rework on direct
modification to base it on upper planner pathification so we can
perform direct modification without the ModifyTable node. (I'm not
sure we can really do this for inherited UPDATE/DELETE, though.) For
that rewrite, I'm thinking to call BeginDirectModify() from the
ForeignScan node (ie, ExecInitForeignScan()) as-is. The latter
approach would allow that without any changes and avoid changing that
API many times. That's the reason why I think the latter would
probably be better.

I looked into trying to do the things I mentioned above and it seems
to me that revising BeginDirectModify()'s API to receive the
ResultRelInfo directly as Andres suggested might be the best way
forward. I've implemented that in the attached 0001. Patches that
were previously 0001, 0002, and 0003 are now 0002, 003, and 0004,
respectively. 0002 is now a patch to "remove"
es_result_relation_info.

Sorry for speaking this late.

Best regards,
Etsuro Fujita

#13Andres Freund
andres@anarazel.de
In reply to: Etsuro Fujita (#12)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-07-31 21:03:58 +0900, Etsuro Fujita wrote:

I'm still not sure that it's a good idea to remove
es_result_relation_info, but if I had to say then I think the latter
would probably be better. I'm planning to rework on direct
modification to base it on upper planner pathification so we can
perform direct modification without the ModifyTable node. (I'm not
sure we can really do this for inherited UPDATE/DELETE, though.) For
that rewrite, I'm thinking to call BeginDirectModify() from the
ForeignScan node (ie, ExecInitForeignScan()) as-is. The latter
approach would allow that without any changes and avoid changing that
API many times. That's the reason why I think the latter would
probably be better.

I think if we did that, it'd become *more* urgent to remove
es_result_relation. Having more and more plan nodes change global
resources is a recipse for disaster.

Greetings,

Andres Freund

#14Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#12)
Re: partition routing layering in nodeModifyTable.c

Fujita-san,

Thanks for the reply and sorry I didn't wait a bit more before posting
the patch.

On Wed, Jul 31, 2019 at 9:04 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Jul 31, 2019 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jul 30, 2019 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote:

IOW, we should just change the direct modify calls to get the relevant
ResultRelationInfo or something in that vein (perhaps just the relevant
RT index?).

It seems easy to make one of the two functions that constitute the
direct modify API, IterateDirectModify(), access the result relation
from ForeignScanState by saving either the result relation RT index or
ResultRelInfo pointer itself into the ForeignScanState's FDW-private
area. For example, for postgres_fdw, one would simply add a new
member to PgFdwDirectModifyState struct.

Doing that for the other function BeginDirectModify() seems a bit more
involved. We could add a new field to ForeignScan, say
resultRelation, that's set by either PlanDirectModify() (the FDW code)
or make_modifytable() (the core code) if the ForeignScan node contains
the command for direct modification. BeginDirectModify() can then use
that value instead of relying on es_result_relation_info being set.

Thoughts? Fujita-san, do you have any opinion on whether that would
be a good idea?

I'm still not sure that it's a good idea to remove
es_result_relation_info, but if I had to say then I think the latter
would probably be better.

Could you please clarify what you meant by the "latter"?

If it's the approach of adding a resultRelation Index field to
ForeignScan node, I tried and had to give up, realizing that we don't
maintain ResultRelInfos in an array that is indexable by RT indexes.
It would've worked if es_result_relations had mirrored es_range_table,
although that probably complicates how the individual ModifyTable
nodes attach to that array. In any case, given this discussion,
further hacking on a global variable like es_result_relations may be a
course we might not want to pursue.

I'm planning to rework on direct
modification to base it on upper planner pathification so we can
perform direct modification without the ModifyTable node. (I'm not
sure we can really do this for inherited UPDATE/DELETE, though.) For
that rewrite, I'm thinking to call BeginDirectModify() from the
ForeignScan node (ie, ExecInitForeignScan()) as-is. The latter
approach would allow that without any changes and avoid changing that
API many times. That's the reason why I think the latter would
probably be better.

Will the new planning approach you're thinking of get rid of needing
any result relations at all (and so the ResultRelInfos in the
executor)?

Thanks,
Amit

#15Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#14)
1 attachment(s)
Re: partition routing layering in nodeModifyTable.c

Amit-san,

On Thu, Aug 1, 2019 at 10:33 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Jul 31, 2019 at 9:04 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Jul 31, 2019 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jul 30, 2019 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote:

IOW, we should just change the direct modify calls to get the relevant
ResultRelationInfo or something in that vein (perhaps just the relevant
RT index?).

It seems easy to make one of the two functions that constitute the
direct modify API, IterateDirectModify(), access the result relation
from ForeignScanState by saving either the result relation RT index or
ResultRelInfo pointer itself into the ForeignScanState's FDW-private
area. For example, for postgres_fdw, one would simply add a new
member to PgFdwDirectModifyState struct.

Doing that for the other function BeginDirectModify() seems a bit more
involved. We could add a new field to ForeignScan, say
resultRelation, that's set by either PlanDirectModify() (the FDW code)
or make_modifytable() (the core code) if the ForeignScan node contains
the command for direct modification. BeginDirectModify() can then use
that value instead of relying on es_result_relation_info being set.

Thoughts? Fujita-san, do you have any opinion on whether that would
be a good idea?

I'm still not sure that it's a good idea to remove
es_result_relation_info, but if I had to say then I think the latter
would probably be better.

Could you please clarify what you meant by the "latter"?

If it's the approach of adding a resultRelation Index field to
ForeignScan node, I tried and had to give up, realizing that we don't
maintain ResultRelInfos in an array that is indexable by RT indexes.
It would've worked if es_result_relations had mirrored es_range_table,
although that probably complicates how the individual ModifyTable
nodes attach to that array. In any case, given this discussion,
further hacking on a global variable like es_result_relations may be a
course we might not want to pursue.

Yeah, I mean that approach. To get the ResultRelInfo, I think
searching through the es_result_relations for the ResultRelInfo for
the resultRelation added to the ForeignScan in BeginDirectModify()
like the attached, which is created along your proposal.
ExecFindResultRelInfo() added by the patch wouldn't work efficiently
for inherited UPDATE/DELETE where there are many children that are
foreign tables, but I think that would probably be OK because in most
use-cases, including sharding, the number of such children would be at
most < 100 or so. For improving the efficiency for the cases where
there are a lot more such children, however, I think it would be an
option to do something about global variables so that we can access
the ResultRelInfos by RT indexes more efficiently, because IMO I don't
think that would be against the point here ie, removing the dependency
on es_result_relation_info. Maybe I'm missing something, though.

I'm planning to rework on direct
modification to base it on upper planner pathification so we can
perform direct modification without the ModifyTable node. (I'm not
sure we can really do this for inherited UPDATE/DELETE, though.) For
that rewrite, I'm thinking to call BeginDirectModify() from the
ForeignScan node (ie, ExecInitForeignScan()) as-is. The latter
approach would allow that without any changes and avoid changing that
API many times. That's the reason why I think the latter would
probably be better.

Will the new planning approach you're thinking of get rid of needing
any result relations at all (and so the ResultRelInfos in the
executor)?

I think the new planning approach would still need result relations
and ResultRelInfos in the executor as-is; and the FDW would probably
use the ResultRelInfo for the foreign table created by the core. Some
of the ResultRelInfo data would prpbably need to be initialized by the
FDW itesef, though (eg, WCO constraints and/or RETURNING if any).

Best regards,
Etsuro Fujita

Attachments:

postgres-fdw-dont-depend-on-es_result_relation_info.patchapplication/octet-stream; name=postgres-fdw-dont-depend-on-es_result_relation_info.patchDownload
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 033aeb2556..9982007082 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -197,6 +197,7 @@ typedef struct PgFdwModifyState
 typedef struct PgFdwDirectModifyState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	ResultRelInfo *relInfo;		/* ResultRelInfo for the foreign table */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -2303,6 +2304,7 @@ postgresPlanDirectModify(PlannerInfo *root,
 	 * Update the operation info.
 	 */
 	fscan->operation = operation;
+	fscan->resultRelation = resultRelation;
 
 	/*
 	 * Update the fdw_exprs list that will be available to the executor.
@@ -2368,7 +2370,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = fsplan->resultRelation;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2380,6 +2382,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	table = GetForeignTable(RelationGetRelid(dmstate->rel));
 	user = GetUserMapping(userid, table->serverid);
 
+	/* Get ResultRelInfo for foreign table. */
+	dmstate->relInfo = ExecFindResultRelInfo(estate, rtindex, false);
+
 	/*
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
@@ -2463,7 +2468,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->relInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4038,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->relInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4185,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->relInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbd7dd9bcd..4d0d0553be 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1349,6 +1349,31 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
+/*
+ * ExecFindResultRelInfo -- find the ResultRelInfo struct for given rangetable
+ * index
+ *
+ * This function only searches through the query result relations.  If no such
+ * struct, either return NULL or throw error depending on missing_ok
+ */
+ResultRelInfo *
+ExecFindResultRelInfo(EState *estate, Index rti, bool missing_ok)
+{
+	ResultRelInfo *rInfo = estate->es_result_relations;
+	int			nr  = estate->es_num_result_relations;
+
+	while (nr > 0)
+	{
+		if (rInfo->ri_RangeTableIndex == rti)
+			return rInfo;
+		rInfo++;
+		nr--;
+	}
+	if (!missing_ok)
+		elog(ERROR, "failed to find ResultRelInfo for rangetable index %u", rti);
+	return NULL;
+}
+
 /*
  * ExecGetTriggerResultRel
  *		Get a ResultRelInfo for a trigger target relation.
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6414aded0e..9342b99aae 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -751,6 +751,7 @@ _copyForeignScan(const ForeignScan *from)
 	 * copy remainder of node
 	 */
 	COPY_SCALAR_FIELD(operation);
+	COPY_SCALAR_FIELD(resultRelation);
 	COPY_SCALAR_FIELD(fs_server);
 	COPY_NODE_FIELD(fdw_exprs);
 	COPY_NODE_FIELD(fdw_private);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 86c31a48c9..4dadb9ff7c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -688,6 +688,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	_outScanInfo(str, (const Scan *) node);
 
 	WRITE_ENUM_FIELD(operation, CmdType);
+	WRITE_UINT_FIELD(resultRelation);
 	WRITE_OID_FIELD(fs_server);
 	WRITE_NODE_FIELD(fdw_exprs);
 	WRITE_NODE_FIELD(fdw_private);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6c2626ee62..04080c752a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1976,6 +1976,7 @@ _readForeignScan(void)
 	ReadCommonScan(&local_node->scan);
 
 	READ_ENUM_FIELD(operation, CmdType);
+	READ_UINT_FIELD(resultRelation);
 	READ_OID_FIELD(fs_server);
 	READ_NODE_FIELD(fdw_exprs);
 	READ_NODE_FIELD(fdw_private);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c6b8553a08..cc5be409ec 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5415,6 +5415,7 @@ make_foreignscan(List *qptlist,
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	node->operation = CMD_SELECT;
+	node->resultRelation = 0;
 	/* fs_server will be filled in by create_foreignscan_plan */
 	node->fs_server = InvalidOid;
 	node->fdw_exprs = fdw_exprs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dc11f098e0..25c1e7dfc7 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1158,6 +1158,10 @@ set_foreignscan_references(PlannerInfo *root,
 	if (fscan->scan.scanrelid > 0)
 		fscan->scan.scanrelid += rtoffset;
 
+	/* Adjust resultRelation if it's valid */
+	if (fscan->resultRelation > 0)
+		fscan->resultRelation += rtoffset;
+
 	if (fscan->fdw_scan_tlist != NIL || fscan->scan.scanrelid == 0)
 	{
 		/*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4596..a0533df779 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -184,6 +184,7 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Index resultRelationIndex,
 							  Relation partition_root,
 							  int instrument_options);
+extern ResultRelInfo *ExecFindResultRelInfo(EState *estate, Index rti, bool missing_ok);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
 extern void ExecCleanUpTriggerState(EState *estate);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 70f8b8e22b..218626254f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -609,6 +609,8 @@ typedef struct ForeignScan
 {
 	Scan		scan;
 	CmdType		operation;		/* SELECT/INSERT/UPDATE/DELETE */
+	Index		resultRelation;	/* rtable index of target relation for
+								 * INSERT/UPDATE/DELETE; 0 for SELECT */
 	Oid			fs_server;		/* OID of foreign server */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
 	List	   *fdw_private;	/* private data for FDW */
#16Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#11)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-07-31 17:04:38 +0900, Amit Langote wrote:

I looked into trying to do the things I mentioned above and it seems
to me that revising BeginDirectModify()'s API to receive the
ResultRelInfo directly as Andres suggested might be the best way
forward. I've implemented that in the attached 0001. Patches that
were previously 0001, 0002, and 0003 are now 0002, 003, and 0004,
respectively. 0002 is now a patch to "remove"
es_result_relation_info.

Thanks! Some minor quibbles aside, the non FDW patches look good to me.

Fujita-san, do you have any comments on the FDW API change? Or anybody
else?

I'm a bit woried about the move of BeginDirectModify() into
nodeModifyTable.c - it just seems like an odd control flow to me. Not
allowing any intermittent nodes between ForeignScan and ModifyTable also
seems like an undesirable restriction for the future. I realize that we
already do that for BeginForeignModify() (just btw, that already accepts
resultRelInfo as a parameter, so being symmetrical for BeginDirectModify
makes sense), but it still seems like the wrong direction to me.

The need for that move, I assume, comes from needing knowing the correct
ResultRelInfo, correct? I wonder if we shouldn't instead determine the
at plan time (in setrefs.c), somewhat similar to how we determine
ModifyTable.resultRelIndex. Doesn't look like that'd be too hard?

Then we could just have BeginForeignModify, BeginDirectModify,
BeginForeignScan all be called from ExecInitForeignScan().

Path 04 is such a nice improvement. Besides getting rid of a substantial
amount of code, it also makes the control flow a lot easier to read.

@@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
* If there are no triggers in 'trigdesc' that request relevant transition
* tables, then return NULL.
*
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
*
* Note that we copy the flags from a parent table into this struct (rather
* than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5750,14 +5749,26 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
*/
if (row_trigger && transition_capture != NULL)
{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
bool		delete_old_table = transition_capture->tcs_delete_old_table;
bool		update_old_table = transition_capture->tcs_update_old_table;
bool		update_new_table = transition_capture->tcs_update_new_table;
bool		insert_new_table = transition_capture->tcs_insert_new_table;
/*
+		 * Get the originally inserted tuple from the global variable and set
+		 * the latter to NULL because any given tuple must be read only once.
+		 * Note that the TransitionCaptureState is shared across many calls
+		 * to this function.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;

Maybe I'm missing something, but original_insert_tuple is not a global
variable?

@@ -888,7 +889,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
PartitionTupleRouting *proute,
PartitionDispatch dispatch,
ResultRelInfo *partRelInfo,
-					int partidx)
+					int partidx,
+					bool is_update_result_rel)
{
MemoryContext oldcxt;
PartitionRoutingInfo *partrouteinfo;
@@ -935,10 +937,15 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
if (mtstate &&
(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot),
-								   gettext_noop("could not convert row type"));
+		/* If partition is an update target, then we already got the map. */
+		if (is_update_result_rel)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot),
+									   gettext_noop("could not convert row type"));
}

Hm, isn't is_update_result_rel just ModifyTable->operation == CMD_UPDATE?

Greetings,

Andres Freund

#17Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andres Freund (#16)
Re: partition routing layering in nodeModifyTable.c

Hi,

On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-07-31 17:04:38 +0900, Amit Langote wrote:

I looked into trying to do the things I mentioned above and it seems
to me that revising BeginDirectModify()'s API to receive the
ResultRelInfo directly as Andres suggested might be the best way
forward. I've implemented that in the attached 0001.

Fujita-san, do you have any comments on the FDW API change? Or anybody
else?

I'm a bit woried about the move of BeginDirectModify() into
nodeModifyTable.c - it just seems like an odd control flow to me. Not
allowing any intermittent nodes between ForeignScan and ModifyTable also
seems like an undesirable restriction for the future. I realize that we
already do that for BeginForeignModify() (just btw, that already accepts
resultRelInfo as a parameter, so being symmetrical for BeginDirectModify
makes sense), but it still seems like the wrong direction to me.

The need for that move, I assume, comes from needing knowing the correct
ResultRelInfo, correct? I wonder if we shouldn't instead determine the
at plan time (in setrefs.c), somewhat similar to how we determine
ModifyTable.resultRelIndex. Doesn't look like that'd be too hard?

I'd vote for that; I created a patch for that [1]/messages/by-id/CAPmGK15=oFHmWNND5vopfokSGfn6jMXVvnHa7K7P49F7k1hWPQ@mail.gmail.com.

Then we could just have BeginForeignModify, BeginDirectModify,
BeginForeignScan all be called from ExecInitForeignScan().

I think so too.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/CAPmGK15=oFHmWNND5vopfokSGfn6jMXVvnHa7K7P49F7k1hWPQ@mail.gmail.com

#18Andres Freund
andres@anarazel.de
In reply to: Etsuro Fujita (#17)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-08-03 05:20:35 +0900, Etsuro Fujita wrote:

On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-07-31 17:04:38 +0900, Amit Langote wrote:

I looked into trying to do the things I mentioned above and it seems
to me that revising BeginDirectModify()'s API to receive the
ResultRelInfo directly as Andres suggested might be the best way
forward. I've implemented that in the attached 0001.

Fujita-san, do you have any comments on the FDW API change? Or anybody
else?

I'm a bit woried about the move of BeginDirectModify() into
nodeModifyTable.c - it just seems like an odd control flow to me. Not
allowing any intermittent nodes between ForeignScan and ModifyTable also
seems like an undesirable restriction for the future. I realize that we
already do that for BeginForeignModify() (just btw, that already accepts
resultRelInfo as a parameter, so being symmetrical for BeginDirectModify
makes sense), but it still seems like the wrong direction to me.

The need for that move, I assume, comes from needing knowing the correct
ResultRelInfo, correct? I wonder if we shouldn't instead determine the
at plan time (in setrefs.c), somewhat similar to how we determine
ModifyTable.resultRelIndex. Doesn't look like that'd be too hard?

I'd vote for that; I created a patch for that [1].

[1] /messages/by-id/CAPmGK15=oFHmWNND5vopfokSGfn6jMXVvnHa7K7P49F7k1hWPQ@mail.gmail.com

Oh, missed that. But that's not quite what I'm proposing. I don't like
ExecFindResultRelInfo at all. What's the point of it? It's introduction
is still an API break - I don't understand why that break is better than
just passing the ResultRelInfo directly to BeginDirectModify()? I want
to again remark that BeginForeignModify() does get the ResultRelInfo -
it should have been done the same when adding direct modify.

Even if you need the loop - which I don't think is right - it should
live somewhere that individual FDWs don't have to care about.

- Andres

#19Andres Freund
andres@anarazel.de
In reply to: Etsuro Fujita (#15)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-08-01 18:38:09 +0900, Etsuro Fujita wrote:

On Thu, Aug 1, 2019 at 10:33 AM Amit Langote <amitlangote09@gmail.com> wrote:

If it's the approach of adding a resultRelation Index field to
ForeignScan node, I tried and had to give up, realizing that we don't
maintain ResultRelInfos in an array that is indexable by RT indexes.
It would've worked if es_result_relations had mirrored es_range_table,
although that probably complicates how the individual ModifyTable
nodes attach to that array.

We know at plan time what the the resultRelation offset for a
ModifyTable node is. We just need to transport that to the respective
foreign scan node, and update it properly in setrefs? Then we can index
es_result_relations without any additional mapping?

Maybe I'm missing something? I think all we need to do is to have
setrefs.c:set_plan_refs() iterate over ->fdwDirectModifyPlans or such,
and set the respective node's result_relation_offset or whatever we're
naming it to splan->resultRelIndex + offset from fdwDirectModifyPlans?

In any case, given this discussion, further hacking on a global
variable like es_result_relations may be a course we might not want
to pursue.

I don't think es_result_relations really is problem - it doesn't have to
change while processing individual subplans / partitions / whatnot. If
we needed a mapping between rtis and result indexes, I'd not see a
problem. Doubtful it's needed though.

There's a fundamental difference between EState->es_result_relations and
EState->es_result_relation_info. The former stays static during the
whole query once initialized, whereas es_result_relation_info changes
depending on which relation we're processing. The latter is what makes
the code more complicated, because we cannot ever return early etc.

Similarly, ModifyTableState->mt_per_subplan_tupconv_maps is not a
problem, it stays static, but e.g. mtstate->mt_transition_capture is a
problem, because we have to change for each subplan / routing /
partition movement.

Greetings,

Andres Freund

#20Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andres Freund (#18)
Re: partition routing layering in nodeModifyTable.c

Hi,

On Sat, Aug 3, 2019 at 5:31 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-03 05:20:35 +0900, Etsuro Fujita wrote:

On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-07-31 17:04:38 +0900, Amit Langote wrote:

I looked into trying to do the things I mentioned above and it seems
to me that revising BeginDirectModify()'s API to receive the
ResultRelInfo directly as Andres suggested might be the best way
forward. I've implemented that in the attached 0001.

Fujita-san, do you have any comments on the FDW API change? Or anybody
else?

I'm a bit woried about the move of BeginDirectModify() into
nodeModifyTable.c - it just seems like an odd control flow to me. Not
allowing any intermittent nodes between ForeignScan and ModifyTable also
seems like an undesirable restriction for the future. I realize that we
already do that for BeginForeignModify() (just btw, that already accepts
resultRelInfo as a parameter, so being symmetrical for BeginDirectModify
makes sense), but it still seems like the wrong direction to me.

The need for that move, I assume, comes from needing knowing the correct
ResultRelInfo, correct? I wonder if we shouldn't instead determine the
at plan time (in setrefs.c), somewhat similar to how we determine
ModifyTable.resultRelIndex. Doesn't look like that'd be too hard?

I'd vote for that; I created a patch for that [1].

[1] /messages/by-id/CAPmGK15=oFHmWNND5vopfokSGfn6jMXVvnHa7K7P49F7k1hWPQ@mail.gmail.com

Oh, missed that. But that's not quite what I'm proposing.

Sorry, I misread your message. I think I was too tired.

I don't like
ExecFindResultRelInfo at all. What's the point of it? It's introduction
is still an API break - I don't understand why that break is better than
just passing the ResultRelInfo directly to BeginDirectModify()?

What API does that function break? The point of that function was to
keep the direct modify layering/API as-is, because 1) I too felt the
same way about the move of BeginDirectModify() to nodeModifyTable.c,
and 2) I was thinking that when rewriting direct modify with upper
planner pathification so that we can perform it without ModifyTable,
we could still use the existing layering/API as-is, leading to smaller
changes to the core for that.

I want
to again remark that BeginForeignModify() does get the ResultRelInfo -
it should have been done the same when adding direct modify.

Might have been so.

Even if you need the loop - which I don't think is right - it should
live somewhere that individual FDWs don't have to care about.

I was thinking to use hash lookup in ExecFindResultRelInfo() when
es_result_relations is very long, but I think the setters.c approach
you mentioned above might be much better. Will consider that.

Best regards,
Etsuro Fujita

#21Andres Freund
andres@anarazel.de
In reply to: Etsuro Fujita (#20)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-08-03 19:41:55 +0900, Etsuro Fujita wrote:

I don't like
ExecFindResultRelInfo at all. What's the point of it? It's introduction
is still an API break - I don't understand why that break is better than
just passing the ResultRelInfo directly to BeginDirectModify()?

What API does that function break?

You need to call it, whereas previously you did not need to call it. The
effort to change an FDW to get one more parameter, or to call that
function is about the same.

Greetings,

Andres Freund

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#21)
Re: partition routing layering in nodeModifyTable.c

Andres Freund <andres@anarazel.de> writes:

On 2019-08-03 19:41:55 +0900, Etsuro Fujita wrote:

What API does that function break?

You need to call it, whereas previously you did not need to call it. The
effort to change an FDW to get one more parameter, or to call that
function is about the same.

If those are the choices, adding a parameter is clearly the preferable
solution, because it makes the API breakage obvious at compile.

Adding a function would make sense, perhaps, if only a minority of FDWs
need to do so. It'd still be risky if the need to do so could be missed
in light testing.

regards, tom lane

#23Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#22)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-08-03 13:48:01 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2019-08-03 19:41:55 +0900, Etsuro Fujita wrote:

What API does that function break?

You need to call it, whereas previously you did not need to call it. The
effort to change an FDW to get one more parameter, or to call that
function is about the same.

If those are the choices, adding a parameter is clearly the preferable
solution, because it makes the API breakage obvious at compile.

Right. I think it's a *bit* less clear in this case because we'd also
remove the field that such FDWs with direct modify support would use
now (EState.es_result_relation_info).

But I think it's also just plainly a better API to use the
parameter. Even if, in contrast to the BeginDirectModify at hand,
BeginForeignModify didn't already accept it. Requiring a function call to
gather information that just about every realistic implementation is
going to need doesn't make sense.

Greetings,

Andres Freund

#24Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Andres Freund (#23)
Re: partition routing layering in nodeModifyTable.c

Hi,

On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-03 13:48:01 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2019-08-03 19:41:55 +0900, Etsuro Fujita wrote:

What API does that function break?

You need to call it, whereas previously you did not need to call it. The
effort to change an FDW to get one more parameter, or to call that
function is about the same.

I got the point.

If those are the choices, adding a parameter is clearly the preferable
solution, because it makes the API breakage obvious at compile.

Right. I think it's a *bit* less clear in this case because we'd also
remove the field that such FDWs with direct modify support would use
now (EState.es_result_relation_info).

But I think it's also just plainly a better API to use the
parameter. Even if, in contrast to the BeginDirectModify at hand,
BeginForeignModify didn't already accept it. Requiring a function call to
gather information that just about every realistic implementation is
going to need doesn't make sense.

Agreed.

Best regards,
Etsuro Fujita

#25Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#24)
Re: partition routing layering in nodeModifyTable.c

Hi,

On Sun, Aug 4, 2019 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-03 13:48:01 -0400, Tom Lane wrote:

If those are the choices, adding a parameter is clearly the preferable
solution, because it makes the API breakage obvious at compile.

Right. I think it's a *bit* less clear in this case because we'd also
remove the field that such FDWs with direct modify support would use
now (EState.es_result_relation_info).

But I think it's also just plainly a better API to use the
parameter. Even if, in contrast to the BeginDirectModify at hand,
BeginForeignModify didn't already accept it. Requiring a function call to
gather information that just about every realistic implementation is
going to need doesn't make sense.

Agreed.

So, is it correct to think that the consensus is to add a parameter to
BeginDirectModify()?

Also, avoid changing where BeginDirectModify() is called from, like my
patch did, only to have easy access to the ResultRelInfo to pass. We
can do that by by augmenting ForeignScan node to add the information
needed to fetch the ResultRelInfo efficiently from
ExecInitForeignScan() itself. That information is the ordinal
position of a given result relation in PlannedStmt.resultRelations,
not the RT index as we were discussing.

Thanks,
Amit

#26Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#25)
Re: partition routing layering in nodeModifyTable.c

Amit-san,

On Mon, Aug 5, 2019 at 1:31 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sun, Aug 4, 2019 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-03 13:48:01 -0400, Tom Lane wrote:

If those are the choices, adding a parameter is clearly the preferable
solution, because it makes the API breakage obvious at compile.

Right. I think it's a *bit* less clear in this case because we'd also
remove the field that such FDWs with direct modify support would use
now (EState.es_result_relation_info).

But I think it's also just plainly a better API to use the
parameter. Even if, in contrast to the BeginDirectModify at hand,
BeginForeignModify didn't already accept it. Requiring a function call to
gather information that just about every realistic implementation is
going to need doesn't make sense.

Agreed.

So, is it correct to think that the consensus is to add a parameter to
BeginDirectModify()?

I think so.

Also, avoid changing where BeginDirectModify() is called from, like my
patch did, only to have easy access to the ResultRelInfo to pass. We
can do that by by augmenting ForeignScan node to add the information
needed to fetch the ResultRelInfo efficiently from
ExecInitForeignScan() itself.

I think so.

That information is the ordinal
position of a given result relation in PlannedStmt.resultRelations,
not the RT index as we were discussing.

Yeah, that would be what Andres is proposing, which I think is much
better than what I proposed using the RT index.

Could you update your patch?

Best regards,
Etsuro Fujita

#27Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#26)
Re: partition routing layering in nodeModifyTable.c

Fujita-san,

Thanks for the quick follow up.

On Mon, Aug 5, 2019 at 2:31 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Mon, Aug 5, 2019 at 1:31 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sun, Aug 4, 2019 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-03 13:48:01 -0400, Tom Lane wrote:

If those are the choices, adding a parameter is clearly the preferable
solution, because it makes the API breakage obvious at compile.

Right. I think it's a *bit* less clear in this case because we'd also
remove the field that such FDWs with direct modify support would use
now (EState.es_result_relation_info).

But I think it's also just plainly a better API to use the
parameter. Even if, in contrast to the BeginDirectModify at hand,
BeginForeignModify didn't already accept it. Requiring a function call to
gather information that just about every realistic implementation is
going to need doesn't make sense.

Agreed.

So, is it correct to think that the consensus is to add a parameter to
BeginDirectModify()?

I think so.

Also, avoid changing where BeginDirectModify() is called from, like my
patch did, only to have easy access to the ResultRelInfo to pass. We
can do that by by augmenting ForeignScan node to add the information
needed to fetch the ResultRelInfo efficiently from
ExecInitForeignScan() itself.

I think so.

That information is the ordinal
position of a given result relation in PlannedStmt.resultRelations,
not the RT index as we were discussing.

Yeah, that would be what Andres is proposing, which I think is much
better than what I proposed using the RT index.

Could you update your patch?

OK, I will do that. I'll reply with the updated patches to an
upthread email of Andres' [1]/messages/by-id/20190802180138.64zcircokw2upaho@alap3.anarazel.de, where he also comments on the other
patches.

Thanks,
Amit

[1]: /messages/by-id/20190802180138.64zcircokw2upaho@alap3.anarazel.de

#28Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#27)
Re: partition routing layering in nodeModifyTable.c

Amit-san,

On Mon, Aug 5, 2019 at 2:36 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Aug 5, 2019 at 2:31 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Mon, Aug 5, 2019 at 1:31 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sun, Aug 4, 2019 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-08-03 13:48:01 -0400, Tom Lane wrote:

If those are the choices, adding a parameter is clearly the preferable
solution, because it makes the API breakage obvious at compile.

Right. I think it's a *bit* less clear in this case because we'd also
remove the field that such FDWs with direct modify support would use
now (EState.es_result_relation_info).

But I think it's also just plainly a better API to use the
parameter. Even if, in contrast to the BeginDirectModify at hand,
BeginForeignModify didn't already accept it. Requiring a function call to
gather information that just about every realistic implementation is
going to need doesn't make sense.

Agreed.

So, is it correct to think that the consensus is to add a parameter to
BeginDirectModify()?

I think so.

Also, avoid changing where BeginDirectModify() is called from, like my
patch did, only to have easy access to the ResultRelInfo to pass. We
can do that by by augmenting ForeignScan node to add the information
needed to fetch the ResultRelInfo efficiently from
ExecInitForeignScan() itself.

I think so.

That information is the ordinal
position of a given result relation in PlannedStmt.resultRelations,
not the RT index as we were discussing.

Yeah, that would be what Andres is proposing, which I think is much
better than what I proposed using the RT index.

Could you update your patch?

OK, I will do that. I'll reply with the updated patches to an
upthread email of Andres' [1], where he also comments on the other
patches.

Thanks! Will review the updated version of the FDW patch, at least.

Best regards,
Etsuro Fujita

#29Amit Langote
amitlangote09@gmail.com
In reply to: Andres Freund (#16)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

Hi Andres, Fujita-san,

On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote:

On 2019-07-31 17:04:38 +0900, Amit Langote wrote:

I looked into trying to do the things I mentioned above and it seems
to me that revising BeginDirectModify()'s API to receive the
ResultRelInfo directly as Andres suggested might be the best way
forward. I've implemented that in the attached 0001. Patches that
were previously 0001, 0002, and 0003 are now 0002, 003, and 0004,
respectively. 0002 is now a patch to "remove"
es_result_relation_info.

Thanks! Some minor quibbles aside, the non FDW patches look good to me.

Fujita-san, do you have any comments on the FDW API change? Or anybody
else?

Based on the discussion, I have updated the patch.

I'm a bit woried about the move of BeginDirectModify() into
nodeModifyTable.c - it just seems like an odd control flow to me. Not
allowing any intermittent nodes between ForeignScan and ModifyTable also
seems like an undesirable restriction for the future. I realize that we
already do that for BeginForeignModify() (just btw, that already accepts
resultRelInfo as a parameter, so being symmetrical for BeginDirectModify
makes sense), but it still seems like the wrong direction to me.

The need for that move, I assume, comes from needing knowing the correct
ResultRelInfo, correct? I wonder if we shouldn't instead determine the
at plan time (in setrefs.c), somewhat similar to how we determine
ModifyTable.resultRelIndex. Doesn't look like that'd be too hard?

The patch adds a resultRelIndex field to ForeignScan node, which is
set to >= 0 value for non-SELECT queries. I first thought to set it
only if direct modification is being used, but maybe it'd be simpler
to set it even if direct modification is not used. To set it, the
patch teaches set_plan_refs() to initialize resultRelIndex of
ForeignScan plans that appear under ModifyTable. Fujita-san said he
plans to revise the planning of direct-modification style queries to
not require a ModifyTable node anymore, but maybe he'll just need to
add similar code elsewhere but not outside setrefs.c.

Then we could just have BeginForeignModify, BeginDirectModify,
BeginForeignScan all be called from ExecInitForeignScan().

I too think that it would've been great if we could call both
BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but
the former's API seems to be designed to be called from
ExecInitModifyTable from the get-go. Maybe we should leave that
as-is?

Path 04 is such a nice improvement. Besides getting rid of a substantial
amount of code, it also makes the control flow a lot easier to read.

Thanks.

@@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
* If there are no triggers in 'trigdesc' that request relevant transition
* tables, then return NULL.
*
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
*
* Note that we copy the flags from a parent table into this struct (rather
* than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5750,14 +5749,26 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
*/
if (row_trigger && transition_capture != NULL)
{
-             TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-             TupleConversionMap *map = transition_capture->tcs_map;
+             TupleTableSlot *original_insert_tuple;
+             PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+             TupleConversionMap *map = pinfo ?
+                                                             pinfo->pi_PartitionToRootMap :
+                                                             relinfo->ri_ChildToRootMap;
bool            delete_old_table = transition_capture->tcs_delete_old_table;
bool            update_old_table = transition_capture->tcs_update_old_table;
bool            update_new_table = transition_capture->tcs_update_new_table;
bool            insert_new_table = transition_capture->tcs_insert_new_table;
/*
+              * Get the originally inserted tuple from the global variable and set
+              * the latter to NULL because any given tuple must be read only once.
+              * Note that the TransitionCaptureState is shared across many calls
+              * to this function.
+              */
+             original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+             transition_capture->tcs_original_insert_tuple = NULL;

Maybe I'm missing something, but original_insert_tuple is not a global
variable?

I really meant to refer to the fact that it's maintained in a
ModifyTable-global struct. I've updated this comment a bit.

@@ -888,7 +889,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
PartitionTupleRouting *proute,
PartitionDispatch dispatch,
ResultRelInfo *partRelInfo,
-                                     int partidx)
+                                     int partidx,
+                                     bool is_update_result_rel)
{
MemoryContext oldcxt;
PartitionRoutingInfo *partrouteinfo;
@@ -935,10 +937,15 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
if (mtstate &&
(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
{
-             partrouteinfo->pi_PartitionToRootMap =
-                     convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-                                                                RelationGetDescr(partRelInfo->ri_PartitionRoot),
-                                                                gettext_noop("could not convert row type"));
+             /* If partition is an update target, then we already got the map. */
+             if (is_update_result_rel)
+                     partrouteinfo->pi_PartitionToRootMap =
+                             partRelInfo->ri_ChildToRootMap;
+             else
+                     partrouteinfo->pi_PartitionToRootMap =
+                             convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+                                                                        RelationGetDescr(partRelInfo->ri_PartitionRoot),
+                                                                        gettext_noop("could not convert row type"));
}

Hm, isn't is_update_result_rel just ModifyTable->operation == CMD_UPDATE?

No. The operation being CMD_UPDATE doesn't mean that the
ResultRelInfo that is passed to ExecInitRoutingInfo() is an UPDATE
result rel. It could be a ResultRelInfo built by ExecFindPartition()
when a row needed to be moved into a partition that is not present in
the UPDATE result rels contained in ModifyTableState. Though I
realized that we don't really need to add a new parameter to figure
that out. Looking at ri_RangeTableIndex property of the passed-in
ResultRelInfo is enough to distinguish the two types of
ResultRelInfos. I've updated the patch that way.

I found more dead code related to transition capture setup, which I've
removed in the latest 0004. For example, the
mt_per_subplan_tupconv_maps array and the code in nodeModifyTable.c
that was used to initialize it.

Attached updated patches.

Thanks,
Amit

Attachments:

v4-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchapplication/octet-stream; name=v4-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchDownload
From f7231c6d04da106321f8b43b1a55dc10b8cb2b82 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 31 Jul 2019 16:38:43 +0900
Subject: [PATCH v4 1/4] Revise BeginDirectModify API to pass ResultRelInfo
 directly

For ExecInitForeignScan() to efficiently get the ResultRelInfo to
pass to BeginDirectModify(), add a field to ForeignScan that gives
the index of a given result relation in the query's list of result
relations.
---
 contrib/postgres_fdw/postgres_fdw.c     | 22 ++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            | 11 ++++++++++-
 src/backend/executor/nodeForeignscan.c  | 13 ++++++++++---
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 28 ++++++++++++++++++++++------
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  4 ++++
 9 files changed, 67 insertions(+), 16 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 033aeb2556..1b60ff88b1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -205,6 +205,9 @@ typedef struct PgFdwDirectModifyState
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 	bool		set_processed;	/* do we set the command es_processed? */
 
+	/* Information about the relation being modified */
+	ResultRelInfo *resultRelInfo;
+
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the update */
 	int			numParams;		/* number of parameters passed to query */
@@ -360,7 +363,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2340,7 +2345,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2368,7 +2375,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2414,6 +2421,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	dmstate->set_processed = intVal(list_nth(fsplan->fdw_private,
 											 FdwDirectModifyPrivateSetProcessed));
 
+	/* Save the ResultRelInfo of the relation being modified. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Create context for per-tuple temp workspace. */
 	dmstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
 											  "postgres_fdw temporary data",
@@ -2463,7 +2473,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4043,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4190,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb611..04c2eccd1c 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -871,6 +871,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -883,7 +884,9 @@ BeginDirectModify(ForeignScanState *node,
      the table to modify is accessible through the
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
+     information provided by <function>PlanDirectModify</function>).  In
+     addition, <literal>rinfo</literal> also contains information describing
+     the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node,
      for <function>ExplainDirectModify</function> and <function>EndDirectModify</function>.
     </para>
 
+    <note>
+     Also note that it's a good idea to store the <literal>rinfo</literal>
+     in the <structfield>fdw_state</structfield> for
+     <function>IterateDirectModify</function> to use.
+    </node>
+
     <para>
      If the <function>BeginDirectModify</function> pointer is set to
      <literal>NULL</literal>, no attempts to execute a direct modification on the
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..9824c16e09 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -223,10 +223,17 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Tell the FDW to initialize the scan.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		/* Perform initializations for a direct modification. */
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a2617c7cfd..e981298a75 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e6ce8e2110..80eedc4a24 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2325694c5..ff78167f79 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5455,6 +5455,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* this might be filled to a >= 0 value by set_plan_refs() */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 329ebd5f28..b233eb7dce 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -781,6 +781,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 		case T_ModifyTable:
 			{
 				ModifyTable *splan = (ModifyTable *) plan;
+				int		resultRelIndex;
 
 				Assert(splan->plan.targetlist == NIL);
 				Assert(splan->plan.qual == NIL);
@@ -877,12 +878,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
-				foreach(l, splan->plans)
-				{
-					lfirst(l) = set_plan_refs(root,
-											  (Plan *) lfirst(l),
-											  rtoffset);
-				}
 
 				/*
 				 * Append this ModifyTable node's final result relation RT
@@ -908,6 +903,27 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 						lappend_int(root->glob->rootResultRelations,
 									splan->rootRelation);
 				}
+
+				resultRelIndex = splan->resultRelIndex;
+				foreach(l, splan->plans)
+				{
+					lfirst(l) = set_plan_refs(root,
+											  (Plan *) lfirst(l),
+											  rtoffset);
+
+					/*
+					 * For foreign table result relations, save their index
+					 * in the global list of result relations into the
+					 * corresponding ForeignScan nodes.
+					 */
+					if (IsA(lfirst(l), ForeignScan))
+					{
+						ForeignScan *fscan = (ForeignScan *) lfirst(l);
+
+						fscan->resultRelIndex = resultRelIndex;
+					}
+					resultRelIndex++;
+				}
 			}
 			break;
 		case T_Append:
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8e6594e355..047dd73dd1 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -616,6 +616,10 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* For non-SELECT operations, this contains
+								 * the offset of result relation in a
+								 * query-global list of result relations; -1
+								 * otherwise */
 } ForeignScan;
 
 /* ----------------
-- 
2.11.0

v4-0002-Remove-es_result_relation_info.patchapplication/octet-stream; name=v4-0002-Remove-es_result_relation_info.patchDownload
From bb48579528926d485a675bbedaac9dae8a4062ff Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v4 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   4 -
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 188 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  26 +++--
 src/include/executor/executor.h          |  19 +++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   1 -
 11 files changed, 128 insertions(+), 168 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4f04d122c3..2f682de785 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2445,9 +2445,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2480,7 +2477,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2845,7 +2843,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3117,11 +3114,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3225,7 +3217,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3296,7 +3288,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fb2be10794..e63b25bf25 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1746,7 +1746,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1876,7 +1875,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbd7dd9bcd..2e8802df07 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -858,9 +858,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -904,7 +901,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c1fc0d54e9..eaaf69bb93 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,8 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9e0c8794c4..3316c089e9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +409,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -428,7 +445,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -490,8 +507,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -551,7 +568,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -590,7 +608,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -676,6 +695,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -687,7 +707,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -697,10 +716,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1037,6 +1052,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1045,12 +1061,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1060,10 +1074,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1090,7 +1100,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1127,7 +1137,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1177,6 +1187,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1202,9 +1213,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1245,16 +1259,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1271,18 +1275,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1448,7 +1452,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1687,7 +1692,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1844,41 +1849,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -1989,10 +1990,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2041,17 +2040,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2084,7 +2072,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2129,7 +2116,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2212,25 +2198,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2242,15 +2224,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2271,7 +2247,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2314,14 +2289,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2363,7 +2332,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2387,8 +2355,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..10ef6af3e7 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -193,7 +193,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -590,6 +590,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -603,13 +604,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -664,6 +665,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -697,6 +699,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -705,7 +708,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -747,7 +750,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -763,7 +767,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -786,6 +790,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -816,6 +821,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -824,7 +830,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -852,7 +858,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -864,7 +870,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4596..3ecdcc3a34 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4ec78491f6..0b40d13bfa 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,7 +519,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
-- 
2.11.0

v4-0004-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v4-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From 54778e45d8ed52171b30dcc1819a1934269690f1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v4 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 ++---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  21 +++-
 src/backend/executor/nodeModifyTable.c | 209 +++++++--------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 87 insertions(+), 212 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f682de785..5d02c67389 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3114,32 +3114,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 2d9a8e9d54..a8faa5e1e4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5750,14 +5749,24 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 729dc396a9..743f54926a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -935,10 +935,23 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot),
-								   gettext_noop("could not convert row type"));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot),
+									   gettext_noop("could not convert row type"));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cbf3de6267..d327153a6a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -339,10 +336,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1053,9 +1046,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1131,41 +1122,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1872,28 +1838,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1917,6 +1861,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1931,37 +1876,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -1977,59 +1902,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc,
-								   gettext_noop("could not convert row type"));
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2125,17 +1997,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2304,6 +2165,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2326,8 +2188,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2337,6 +2204,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2402,6 +2276,21 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc),
+									   gettext_noop("could not convert row type"));
 		resultRelInfo++;
 		i++;
 	}
@@ -2426,26 +2315,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a46feeedb0..bb080980c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -65,14 +65,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b40d13bfa..9571bbe328 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -485,6 +485,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1136,9 +1142,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
2.11.0

v4-0003-Rearrange-partition-update-row-movement-code-a-bi.patchapplication/octet-stream; name=v4-0003-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From 91a24b9eaef9949b14197fef3f44d938ea78772b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v4 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3316c089e9..cbf3de6267 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -621,31 +620,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -711,7 +709,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -956,32 +953,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1028,6 +1023,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1181,119 +1323,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.11.0

#30Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#29)
1 attachment(s)
Re: partition routing layering in nodeModifyTable.c

Amit-san,

On Mon, Aug 5, 2019 at 6:16 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote:
Based on the discussion, I have updated the patch.

I'm a bit woried about the move of BeginDirectModify() into
nodeModifyTable.c - it just seems like an odd control flow to me. Not
allowing any intermittent nodes between ForeignScan and ModifyTable also
seems like an undesirable restriction for the future. I realize that we
already do that for BeginForeignModify() (just btw, that already accepts
resultRelInfo as a parameter, so being symmetrical for BeginDirectModify
makes sense), but it still seems like the wrong direction to me.

The need for that move, I assume, comes from needing knowing the correct
ResultRelInfo, correct? I wonder if we shouldn't instead determine the
at plan time (in setrefs.c), somewhat similar to how we determine
ModifyTable.resultRelIndex. Doesn't look like that'd be too hard?

The patch adds a resultRelIndex field to ForeignScan node, which is
set to >= 0 value for non-SELECT queries.

Thanks for the updated patch!

I first thought to set it
only if direct modification is being used, but maybe it'd be simpler
to set it even if direct modification is not used. To set it, the
patch teaches set_plan_refs() to initialize resultRelIndex of
ForeignScan plans that appear under ModifyTable. Fujita-san said he
plans to revise the planning of direct-modification style queries to
not require a ModifyTable node anymore, but maybe he'll just need to
add similar code elsewhere but not outside setrefs.c.

Yeah, but I'm not sure this is a good idea:

@ -877,12 +878,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
rc->rti += rtoffset;
rc->prti += rtoffset;
}
- foreach(l, splan->plans)
- {
- lfirst(l) = set_plan_refs(root,
- (Plan *) lfirst(l),
- rtoffset);
- }

                /*
                 * Append this ModifyTable node's final result relation RT
@@ -908,6 +903,27 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                        lappend_int(root->glob->rootResultRelations,
                                    splan->rootRelation);
                }
+
+               resultRelIndex = splan->resultRelIndex;
+               foreach(l, splan->plans)
+               {
+                   lfirst(l) = set_plan_refs(root,
+                                             (Plan *) lfirst(l),
+                                             rtoffset);
+
+                   /*
+                    * For foreign table result relations, save their index
+                    * in the global list of result relations into the
+                    * corresponding ForeignScan nodes.
+                    */
+                   if (IsA(lfirst(l), ForeignScan))
+                   {
+                       ForeignScan *fscan = (ForeignScan *) lfirst(l);
+
+                       fscan->resultRelIndex = resultRelIndex;
+                   }
+                   resultRelIndex++;
+               }
            }

because I still feel the same way as mentioned above by Andres. What
I'm thinking for the setrefs.c change is to modify ForeignScan (ie,
set_foreignscan_references) rather than ModifyTable, like the
attached. Maybe I'm missing something, but for direct modification
without ModifyTable, I think we would probably only have to modify
that function further so that it not only adjusts resultRelIndex but
does some extra work such as appending the result relation RT index to
root->glob->resultRelations as done for ModifyTable.

Then we could just have BeginForeignModify, BeginDirectModify,
BeginForeignScan all be called from ExecInitForeignScan().

Sorry, previously, I mistakenly agreed with that. As I said before, I
think I was too tired.

I too think that it would've been great if we could call both
BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but
the former's API seems to be designed to be called from
ExecInitModifyTable from the get-go. Maybe we should leave that
as-is?

+1 for leaving that as-is; it seems reasonable to me to call
BeginForeignModify in ExecInitModifyTable, because the ForeignModify
API is designed based on an analogy with local table modifications, in
which case the initialization needed for performing
ExecInsert/ExecUpdate/ExecDelete is done in ModifyTable, not in the
underlying scan/join node.

@@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node,
for <function>ExplainDirectModify</function> and <function>EndDirectModif\
y</function>.
</para>

+    <note>
+     Also note that it's a good idea to store the <literal>rinfo</literal>
+     in the <structfield>fdw_state</structfield> for
+     <function>IterateDirectModify</function> to use.
+    </node>

Actually, if the FDW only supports direct modifications for queries
without RETURNING, it wouldn't need the rinfo in IterateDirectModify,
so I think we would probably need to update this as such. Having said
that, it seems too detailed to me to describe such a thing in the FDW
documentation. To avoid making the documentation verbose, it would be
better to not add such kind of thing at all?

Note: other change in the attached patch is that I modified
_readForeignScan accordingly.

Best regards,
Etsuro Fujita

Attachments:

v4-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf-efujita.patchapplication/octet-stream; name=v4-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf-efujita.patchDownload
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 06a205877d..a61c2d8b5f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -205,6 +205,9 @@ typedef struct PgFdwDirectModifyState
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 	bool		set_processed;	/* do we set the command es_processed? */
 
+	/* Information about the relation being modified */
+	ResultRelInfo *resultRelInfo;
+
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the update */
 	int			numParams;		/* number of parameters passed to query */
@@ -360,7 +363,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2331,6 +2336,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2340,7 +2350,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2368,7 +2380,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2414,6 +2426,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	dmstate->set_processed = intVal(list_nth(fsplan->fdw_private,
 											 FdwDirectModifyPrivateSetProcessed));
 
+	/* Save the ResultRelInfo of the relation being modified. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Create context for per-tuple temp workspace. */
 	dmstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
 											  "postgres_fdw temporary data",
@@ -2463,7 +2478,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4048,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4195,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb611..04c2eccd1c 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -871,6 +871,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -883,7 +884,9 @@ BeginDirectModify(ForeignScanState *node,
      the table to modify is accessible through the
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
+     information provided by <function>PlanDirectModify</function>).  In
+     addition, <literal>rinfo</literal> also contains information describing
+     the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node,
      for <function>ExplainDirectModify</function> and <function>EndDirectModify</function>.
     </para>
 
+    <note>
+     Also note that it's a good idea to store the <literal>rinfo</literal>
+     in the <structfield>fdw_state</structfield> for
+     <function>IterateDirectModify</function> to use.
+    </node>
+
     <para>
      If the <function>BeginDirectModify</function> pointer is set to
      <literal>NULL</literal>, no attempts to execute a direct modification on the
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..9824c16e09 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -223,10 +223,17 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Tell the FDW to initialize the scan.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		/* Perform initializations for a direct modification. */
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a2617c7cfd..e981298a75 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e6ce8e2110..80eedc4a24 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 764e3bb90c..92cc90c0f0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1983,6 +1983,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2325694c5..ff78167f79 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5455,6 +5455,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* this might be filled to a >= 0 value by set_plan_refs() */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 329ebd5f28..0bdfc50207 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1225,6 +1225,14 @@ set_foreignscan_references(PlannerInfo *root,
 			tempset = bms_add_member(tempset, x + rtoffset);
 		fscan->fs_relids = tempset;
 	}
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8e6594e355..047dd73dd1 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -616,6 +616,10 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* For non-SELECT operations, this contains
+								 * the offset of result relation in a
+								 * query-global list of result relations; -1
+								 * otherwise */
 } ForeignScan;
 
 /* ----------------
#31Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#29)
Re: partition routing layering in nodeModifyTable.c

Hi,

On 2019-08-05 18:16:10 +0900, Amit Langote wrote:

The patch adds a resultRelIndex field to ForeignScan node, which is
set to >= 0 value for non-SELECT queries. I first thought to set it
only if direct modification is being used, but maybe it'd be simpler
to set it even if direct modification is not used.

Yea, I think we should just always set it.

To set it, the
patch teaches set_plan_refs() to initialize resultRelIndex of
ForeignScan plans that appear under ModifyTable. Fujita-san said he
plans to revise the planning of direct-modification style queries to
not require a ModifyTable node anymore, but maybe he'll just need to
add similar code elsewhere but not outside setrefs.c.

I think I prefer the approach in Fujita-san's email. While not extremely
pretty either, it would allow for having nodes between the foreign scan
and the modify node.

Then we could just have BeginForeignModify, BeginDirectModify,
BeginForeignScan all be called from ExecInitForeignScan().

I too think that it would've been great if we could call both
BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but
the former's API seems to be designed to be called from
ExecInitModifyTable from the get-go. Maybe we should leave that
as-is?

Yea, we should leave it where it is. I think the API here is fairly
ugly, but it's probably not worth changing. And if we were to change it,
it'd need a lot bigger hammer.

Greetings,

Andres Freund

#32Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#30)
Re: partition routing layering in nodeModifyTable.c

Fujita-san,

Thanks a lot the review.

On Tue, Aug 6, 2019 at 9:56 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Mon, Aug 5, 2019 at 6:16 PM Amit Langote <amitlangote09@gmail.com> wrote:

I first thought to set it
only if direct modification is being used, but maybe it'd be simpler
to set it even if direct modification is not used. To set it, the
patch teaches set_plan_refs() to initialize resultRelIndex of
ForeignScan plans that appear under ModifyTable. Fujita-san said he
plans to revise the planning of direct-modification style queries to
not require a ModifyTable node anymore, but maybe he'll just need to
add similar code elsewhere but not outside setrefs.c.

Yeah, but I'm not sure this is a good idea:

@ -877,12 +878,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
rc->rti += rtoffset;
rc->prti += rtoffset;
}
- foreach(l, splan->plans)
- {
- lfirst(l) = set_plan_refs(root,
- (Plan *) lfirst(l),
- rtoffset);
- }

/*
* Append this ModifyTable node's final result relation RT
@@ -908,6 +903,27 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->rootResultRelations,
splan->rootRelation);
}
+
+               resultRelIndex = splan->resultRelIndex;
+               foreach(l, splan->plans)
+               {
+                   lfirst(l) = set_plan_refs(root,
+                                             (Plan *) lfirst(l),
+                                             rtoffset);
+
+                   /*
+                    * For foreign table result relations, save their index
+                    * in the global list of result relations into the
+                    * corresponding ForeignScan nodes.
+                    */
+                   if (IsA(lfirst(l), ForeignScan))
+                   {
+                       ForeignScan *fscan = (ForeignScan *) lfirst(l);
+
+                       fscan->resultRelIndex = resultRelIndex;
+                   }
+                   resultRelIndex++;
+               }
}

because I still feel the same way as mentioned above by Andres.

Reading Andres' emails again, I now understand why we shouldn't set
ForeignScan's resultRelIndex the way my patches did.

What
I'm thinking for the setrefs.c change is to modify ForeignScan (ie,
set_foreignscan_references) rather than ModifyTable, like the
attached.

Thanks for the patch. I have couple of comments:

* I'm afraid that we've implicitly created an ordering constraint on
some code in set_plan_refs(). That is, a ModifyTable's plans now must
always be processed before adding its result relations to the global
list, which for good measure, should be written down somewhere; I
propose this comment in the ModifyTable's case block in set_plan_refs:

@@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                     rc->rti += rtoffset;
                     rc->prti += rtoffset;
                 }
+                /*
+                 * Caution: Do not change the relative ordering of this loop
+                 * and the statement below that adds the result relations to
+                 * root->glob->resultRelations, because we need to use the
+                 * current value of list_length(root->glob->resultRelations)
+                 * in some plans.
+                 */
                 foreach(l, splan->plans)
                 {
                     lfirst(l) = set_plan_refs(root,

* Regarding setting ForeignScan.resultRelIndex even for non-direct
modifications, maybe that's not a good idea anymore. A foreign table
result relation might be involved in a local join, which prevents it
from being directly-modifiable and also hides the ForeignScan node
from being easily modifiable in PlanForeignModify. Maybe, we should
just interpret resultRelIndex as being set only when
direct-modification is feasible. Should we rename the field
accordingly to be self-documenting?

Please let me know your thoughts, so that I can modify the patch.

Maybe I'm missing something, but for direct modification
without ModifyTable, I think we would probably only have to modify
that function further so that it not only adjusts resultRelIndex but
does some extra work such as appending the result relation RT index to
root->glob->resultRelations as done for ModifyTable.

Yeah, that seems reasonable.

Then we could just have BeginForeignModify, BeginDirectModify,
BeginForeignScan all be called from ExecInitForeignScan().

Sorry, previously, I mistakenly agreed with that. As I said before, I
think I was too tired.

I too think that it would've been great if we could call both
BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but
the former's API seems to be designed to be called from
ExecInitModifyTable from the get-go. Maybe we should leave that
as-is?

+1 for leaving that as-is; it seems reasonable to me to call
BeginForeignModify in ExecInitModifyTable, because the ForeignModify
API is designed based on an analogy with local table modifications, in
which case the initialization needed for performing
ExecInsert/ExecUpdate/ExecDelete is done in ModifyTable, not in the
underlying scan/join node.

Thanks for the explanation.

@@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node,
for <function>ExplainDirectModify</function> and <function>EndDirectModif\
y</function>.
</para>

+    <note>
+     Also note that it's a good idea to store the <literal>rinfo</literal>
+     in the <structfield>fdw_state</structfield> for
+     <function>IterateDirectModify</function> to use.
+    </node>

Actually, if the FDW only supports direct modifications for queries
without RETURNING, it wouldn't need the rinfo in IterateDirectModify,
so I think we would probably need to update this as such. Having said
that, it seems too detailed to me to describe such a thing in the FDW
documentation. To avoid making the documentation verbose, it would be
better to not add such kind of thing at all?

Hmm OK. Perhaps, others who want to implement the direct modification
API can work that out by looking at postgres_fdw implementation.

Note: other change in the attached patch is that I modified
_readForeignScan accordingly.

Thanks.

Regards,
Amit

#33Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#32)
Re: partition routing layering in nodeModifyTable.c

Amit-san,

On Wed, Aug 7, 2019 at 10:24 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Aug 6, 2019 at 9:56 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

What
I'm thinking for the setrefs.c change is to modify ForeignScan (ie,
set_foreignscan_references) rather than ModifyTable, like the
attached.

Thanks for the patch. I have couple of comments:

* I'm afraid that we've implicitly created an ordering constraint on
some code in set_plan_refs(). That is, a ModifyTable's plans now must
always be processed before adding its result relations to the global
list, which for good measure, should be written down somewhere; I
propose this comment in the ModifyTable's case block in set_plan_refs:

@@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
rc->rti += rtoffset;
rc->prti += rtoffset;
}
+                /*
+                 * Caution: Do not change the relative ordering of this loop
+                 * and the statement below that adds the result relations to
+                 * root->glob->resultRelations, because we need to use the
+                 * current value of list_length(root->glob->resultRelations)
+                 * in some plans.
+                 */
foreach(l, splan->plans)
{
lfirst(l) = set_plan_refs(root,

+1

* Regarding setting ForeignScan.resultRelIndex even for non-direct
modifications, maybe that's not a good idea anymore. A foreign table
result relation might be involved in a local join, which prevents it
from being directly-modifiable and also hides the ForeignScan node
from being easily modifiable in PlanForeignModify. Maybe, we should
just interpret resultRelIndex as being set only when
direct-modification is feasible.

Yeah, I think so; when using PlanForeignModify because for example,
the foreign table result relation is involved in a local join, as you
mentioned, ForeignScan.operation would be left unchanged (ie,
CMD_SELECT), so to me it's more understandable to not set
ForeignScan.resultRelIndex.

Should we rename the field
accordingly to be self-documenting?

IMO I like the name resultRelIndex, but do you have any better idea?

@@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node,
for <function>ExplainDirectModify</function> and <function>EndDirectModif\
y</function>.
</para>

+    <note>
+     Also note that it's a good idea to store the <literal>rinfo</literal>
+     in the <structfield>fdw_state</structfield> for
+     <function>IterateDirectModify</function> to use.
+    </node>

Actually, if the FDW only supports direct modifications for queries
without RETURNING, it wouldn't need the rinfo in IterateDirectModify,
so I think we would probably need to update this as such. Having said
that, it seems too detailed to me to describe such a thing in the FDW
documentation. To avoid making the documentation verbose, it would be
better to not add such kind of thing at all?

Hmm OK. Perhaps, others who want to implement the direct modification
API can work that out by looking at postgres_fdw implementation.

Yeah, I think so.

Best regards,
Etsuro Fujita

#34Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#33)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

Fujita-san,

Thanks for the quick follow up.

On Wed, Aug 7, 2019 at 11:30 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Aug 7, 2019 at 10:24 AM Amit Langote <amitlangote09@gmail.com> wrote:

* Regarding setting ForeignScan.resultRelIndex even for non-direct
modifications, maybe that's not a good idea anymore. A foreign table
result relation might be involved in a local join, which prevents it
from being directly-modifiable and also hides the ForeignScan node
from being easily modifiable in PlanForeignModify. Maybe, we should
just interpret resultRelIndex as being set only when
direct-modification is feasible.

Yeah, I think so; when using PlanForeignModify because for example,
the foreign table result relation is involved in a local join, as you
mentioned, ForeignScan.operation would be left unchanged (ie,
CMD_SELECT), so to me it's more understandable to not set
ForeignScan.resultRelIndex.

OK.

Should we rename the field
accordingly to be self-documenting?

IMO I like the name resultRelIndex, but do you have any better idea?

On second thought, I'm fine with sticking to resultRelIndex. Trying
to make it self documenting might make the name very long.

Here are the updated patches.

Thanks,
Amit

Attachments:

v5-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchapplication/octet-stream; name=v5-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchDownload
From b367546b5ad0a89bb0d11c95ba57bfab2ce3fdef Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 7 Aug 2019 09:46:27 +0900
Subject: [PATCH v5 1/4] Revise BeginDirectModify API to pass ResultRelInfo
 directly

For ExecInitForeignScan() to efficiently get the ResultRelInfo to
pass to BeginDirectModify(), add a field to ForeignScan that gives
the index of a given result relation in the query's list of result
relations.
---
 contrib/postgres_fdw/postgres_fdw.c     | 27 +++++++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            |  5 ++++-
 src/backend/executor/nodeForeignscan.c  | 13 ++++++++++---
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 15 +++++++++++++++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  4 ++++
 10 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 06a205877d..a61c2d8b5f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -205,6 +205,9 @@ typedef struct PgFdwDirectModifyState
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 	bool		set_processed;	/* do we set the command es_processed? */
 
+	/* Information about the relation being modified */
+	ResultRelInfo *resultRelInfo;
+
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the update */
 	int			numParams;		/* number of parameters passed to query */
@@ -360,7 +363,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2331,6 +2336,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2340,7 +2350,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2368,7 +2380,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2414,6 +2426,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	dmstate->set_processed = intVal(list_nth(fsplan->fdw_private,
 											 FdwDirectModifyPrivateSetProcessed));
 
+	/* Save the ResultRelInfo of the relation being modified. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Create context for per-tuple temp workspace. */
 	dmstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
 											  "postgres_fdw temporary data",
@@ -2463,7 +2478,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4048,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4195,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb611..123b02958a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -871,6 +871,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -883,7 +884,9 @@ BeginDirectModify(ForeignScanState *node,
      the table to modify is accessible through the
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
+     information provided by <function>PlanDirectModify</function>).  In
+     addition, <literal>rinfo</literal> also contains information describing
+     the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..9824c16e09 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -223,10 +223,17 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Tell the FDW to initialize the scan.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		/* Perform initializations for a direct modification. */
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a2617c7cfd..e981298a75 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e6ce8e2110..80eedc4a24 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 764e3bb90c..92cc90c0f0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1983,6 +1983,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2325694c5..ff78167f79 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5455,6 +5455,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* this might be filled to a >= 0 value by set_plan_refs() */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 329ebd5f28..f18e94a879 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
 				foreach(l, splan->plans)
 				{
 					lfirst(l) = set_plan_refs(root,
@@ -1225,6 +1232,14 @@ set_foreignscan_references(PlannerInfo *root,
 			tempset = bms_add_member(tempset, x + rtoffset);
 		fscan->fs_relids = tempset;
 	}
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8e6594e355..dc2061cf89 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -616,6 +616,10 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* For "direct modification" operations, this
+								 * contains the offset of result relation in
+								 * query-global list of result relations; -1
+								 * otherwise */
 } ForeignScan;
 
 /* ----------------
-- 
2.11.0

v5-0003-Rearrange-partition-update-row-movement-code-a-bi.patchapplication/octet-stream; name=v5-0003-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From 7d51c519aa3b1232687acc79c00b89e5346181b2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v5 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3316c089e9..cbf3de6267 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -621,31 +620,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -711,7 +709,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -956,32 +953,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1028,6 +1023,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1181,119 +1323,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.11.0

v5-0004-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v5-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From 648438dfe8223ade371ca6e81e6da5162ac673b1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v5 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 ++---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  21 +++-
 src/backend/executor/nodeModifyTable.c | 209 +++++++--------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 87 insertions(+), 212 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 967dba6fcf..5e7153bb8d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3113,32 +3113,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 2d9a8e9d54..a8faa5e1e4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5750,14 +5749,24 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 729dc396a9..743f54926a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -935,10 +935,23 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot),
-								   gettext_noop("could not convert row type"));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot),
+									   gettext_noop("could not convert row type"));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cbf3de6267..d327153a6a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -339,10 +336,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1053,9 +1046,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1131,41 +1122,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1872,28 +1838,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1917,6 +1861,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1931,37 +1876,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -1977,59 +1902,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc,
-								   gettext_noop("could not convert row type"));
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2125,17 +1997,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2304,6 +2165,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2326,8 +2188,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2337,6 +2204,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2402,6 +2276,21 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc),
+									   gettext_noop("could not convert row type"));
 		resultRelInfo++;
 		i++;
 	}
@@ -2426,26 +2315,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a46feeedb0..bb080980c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -65,14 +65,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b40d13bfa..9571bbe328 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -485,6 +485,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1136,9 +1142,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
2.11.0

v5-0002-Remove-es_result_relation_info.patchapplication/octet-stream; name=v5-0002-Remove-es_result_relation_info.patchDownload
From 903b9d66e992de889a1247c20581b4ca511d17dc Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v5 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   4 -
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 188 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  26 +++--
 src/include/executor/executor.h          |  19 +++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   1 -
 11 files changed, 128 insertions(+), 168 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3aeef30b28..967dba6fcf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2444,9 +2444,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2479,7 +2476,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2844,7 +2842,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3116,11 +3113,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3224,7 +3216,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3295,7 +3287,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fb2be10794..e63b25bf25 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1746,7 +1746,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1876,7 +1875,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbd7dd9bcd..2e8802df07 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -858,9 +858,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -904,7 +901,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c1fc0d54e9..eaaf69bb93 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,8 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9e0c8794c4..3316c089e9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +409,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -428,7 +445,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -490,8 +507,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -551,7 +568,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -590,7 +608,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -676,6 +695,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -687,7 +707,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -697,10 +716,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1037,6 +1052,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1045,12 +1061,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1060,10 +1074,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1090,7 +1100,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1127,7 +1137,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1177,6 +1187,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1202,9 +1213,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1245,16 +1259,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1271,18 +1275,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1448,7 +1452,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1687,7 +1692,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1844,41 +1849,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -1989,10 +1990,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2041,17 +2040,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2084,7 +2072,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2129,7 +2116,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2212,25 +2198,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2242,15 +2224,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2271,7 +2247,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2314,14 +2289,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2363,7 +2332,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2387,8 +2355,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..10ef6af3e7 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -193,7 +193,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -590,6 +590,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -603,13 +604,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -664,6 +665,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -697,6 +699,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -705,7 +708,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -747,7 +750,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -763,7 +767,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -786,6 +790,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -816,6 +821,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -824,7 +830,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -852,7 +858,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -864,7 +870,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4596..3ecdcc3a34 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4ec78491f6..0b40d13bfa 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,7 +519,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
-- 
2.11.0

#35Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#34)
Re: partition routing layering in nodeModifyTable.c

Hi,

On Wed, Aug 7, 2019 at 11:47 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Aug 7, 2019 at 11:30 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Aug 7, 2019 at 10:24 AM Amit Langote <amitlangote09@gmail.com> wrote:

* Regarding setting ForeignScan.resultRelIndex even for non-direct
modifications, maybe that's not a good idea anymore. A foreign table
result relation might be involved in a local join, which prevents it
from being directly-modifiable and also hides the ForeignScan node
from being easily modifiable in PlanForeignModify. Maybe, we should
just interpret resultRelIndex as being set only when
direct-modification is feasible.

Yeah, I think so; when using PlanForeignModify because for example,
the foreign table result relation is involved in a local join, as you
mentioned, ForeignScan.operation would be left unchanged (ie,
CMD_SELECT), so to me it's more understandable to not set
ForeignScan.resultRelIndex.

OK.

Should we rename the field
accordingly to be self-documenting?

IMO I like the name resultRelIndex, but do you have any better idea?

On second thought, I'm fine with sticking to resultRelIndex. Trying
to make it self documenting might make the name very long.

OK

Here are the updated patches.

IIUC, I think we reached a consensus at least on the 0001 patch.
Andres, would you mind if I commit that patch?

Best regards,
Etsuro Fujita

#36Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#35)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Wed, Aug 7, 2019 at 12:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

IIUC, I think we reached a consensus at least on the 0001 patch.
Andres, would you mind if I commit that patch?

I just noticed obsolete references to es_result_relation_info that
0002 failed to remove. One of them is in fdwhandler.sgml:

<programlisting>
TupleTableSlot *
IterateDirectModify(ForeignScanState *node);
</programlisting>

... The data that was actually inserted, updated
or deleted must be stored in the
<literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
of the node's <structname>EState</structname>.

We will need to rewrite this without mentioning
es_result_relation_info. How about as follows:

-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the result relation's<structname>ResultRelInfo</structname> that has
+     been made available via node.

I've updated 0001 with the above change.

Also, I updated 0002 to remove other references.

Thanks,
Amit

Attachments:

v6-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchapplication/octet-stream; name=v6-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchDownload
From a85cc6820a7193c4398e6337ed66943f4ac63837 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 7 Aug 2019 09:46:27 +0900
Subject: [PATCH v6 1/4] Revise BeginDirectModify API to pass ResultRelInfo
 directly

For ExecInitForeignScan() to efficiently get the ResultRelInfo to
pass to BeginDirectModify(), add a field to ForeignScan that gives
the index of a given result relation in the query's list of result
relations.
---
 contrib/postgres_fdw/postgres_fdw.c     | 27 +++++++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            | 11 +++++++----
 src/backend/executor/nodeForeignscan.c  | 13 ++++++++++---
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 15 +++++++++++++++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  4 ++++
 10 files changed, 63 insertions(+), 13 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 06a205877d..a61c2d8b5f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -205,6 +205,9 @@ typedef struct PgFdwDirectModifyState
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 	bool		set_processed;	/* do we set the command es_processed? */
 
+	/* Information about the relation being modified */
+	ResultRelInfo *resultRelInfo;
+
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the update */
 	int			numParams;		/* number of parameters passed to query */
@@ -360,7 +363,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2331,6 +2336,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2340,7 +2350,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2368,7 +2380,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2414,6 +2426,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	dmstate->set_processed = intVal(list_nth(fsplan->fdw_private,
 											 FdwDirectModifyPrivateSetProcessed));
 
+	/* Save the ResultRelInfo of the relation being modified. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Create context for per-tuple temp workspace. */
 	dmstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
 											  "postgres_fdw temporary data",
@@ -2463,7 +2478,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4048,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4195,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb611..919b88a945 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -871,6 +871,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -883,7 +884,9 @@ BeginDirectModify(ForeignScanState *node,
      the table to modify is accessible through the
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
+     information provided by <function>PlanDirectModify</function>).  In
+     addition, <literal>rinfo</literal> also contains information describing
+     the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -915,9 +918,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
-     Return NULL if no more rows are available.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the result relation's <structname>ResultRelInfo</structname> that has
+     been made available via node.  Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
      <function>BeginDirectModify</function> if you need longer-lived storage, or use
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..9824c16e09 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -223,10 +223,17 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Tell the FDW to initialize the scan.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		/* Perform initializations for a direct modification. */
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a2617c7cfd..e981298a75 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e6ce8e2110..80eedc4a24 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 764e3bb90c..92cc90c0f0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1983,6 +1983,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2325694c5..ff78167f79 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5455,6 +5455,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* this might be filled to a >= 0 value by set_plan_refs() */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 329ebd5f28..f18e94a879 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
 				foreach(l, splan->plans)
 				{
 					lfirst(l) = set_plan_refs(root,
@@ -1225,6 +1232,14 @@ set_foreignscan_references(PlannerInfo *root,
 			tempset = bms_add_member(tempset, x + rtoffset);
 		fscan->fs_relids = tempset;
 	}
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8e6594e355..dc2061cf89 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -616,6 +616,10 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* For "direct modification" operations, this
+								 * contains the offset of result relation in
+								 * query-global list of result relations; -1
+								 * otherwise */
 } ForeignScan;
 
 /* ----------------
-- 
2.11.0

v6-0002-Remove-es_result_relation_info.patchapplication/octet-stream; name=v6-0002-Remove-es_result_relation_info.patchDownload
From 6cb135479a53bd88d3728193e35ec2cc8788a9ab Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v6 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   5 -
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 188 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  26 +++--
 src/include/executor/executor.h          |  19 +++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 130 insertions(+), 175 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3aeef30b28..967dba6fcf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2444,9 +2444,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2479,7 +2476,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2844,7 +2842,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3116,11 +3113,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3224,7 +3216,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3295,7 +3287,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fb2be10794..e63b25bf25 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1746,7 +1746,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1876,7 +1875,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbd7dd9bcd..d6c3a94522 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -858,9 +858,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -904,7 +901,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
@@ -2822,7 +2818,6 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree)
 			estate->es_num_root_result_relations = numRootResultRels;
 		}
 	}
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	estate->es_top_eflags = parentestate->es_top_eflags;
 	estate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c1fc0d54e9..eaaf69bb93 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,8 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9e0c8794c4..3316c089e9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +409,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -428,7 +445,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -490,8 +507,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -551,7 +568,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -590,7 +608,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -676,6 +695,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -687,7 +707,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -697,10 +716,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1037,6 +1052,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1045,12 +1061,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1060,10 +1074,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1090,7 +1100,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1127,7 +1137,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1177,6 +1187,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1202,9 +1213,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1245,16 +1259,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1271,18 +1275,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1448,7 +1452,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1687,7 +1692,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1844,41 +1849,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -1989,10 +1990,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2041,17 +2040,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2084,7 +2072,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2129,7 +2116,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2212,25 +2198,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2242,15 +2224,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2271,7 +2247,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2314,14 +2289,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2363,7 +2332,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2387,8 +2355,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..10ef6af3e7 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -193,7 +193,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -590,6 +590,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -603,13 +604,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -664,6 +665,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -697,6 +699,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -705,7 +708,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -747,7 +750,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -763,7 +767,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -786,6 +790,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -816,6 +821,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -824,7 +830,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -852,7 +858,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -864,7 +870,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4596..3ecdcc3a34 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4ec78491f6..0b40d13bfa 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,7 +519,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 75e25cdf48..b73e26fc69 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 23885f638c..e5a0a05d13 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
2.11.0

v6-0004-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v6-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From 2afa538badeb6beb0c87b91c23f8de494b71e61f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v6 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 ++---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  21 +++-
 src/backend/executor/nodeModifyTable.c | 209 +++++++--------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 87 insertions(+), 212 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 967dba6fcf..5e7153bb8d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3113,32 +3113,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 2d9a8e9d54..a8faa5e1e4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5750,14 +5749,24 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 729dc396a9..743f54926a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -935,10 +935,23 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot),
-								   gettext_noop("could not convert row type"));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot),
+									   gettext_noop("could not convert row type"));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cbf3de6267..d327153a6a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -339,10 +336,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1053,9 +1046,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1131,41 +1122,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1872,28 +1838,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1917,6 +1861,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1931,37 +1876,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -1977,59 +1902,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc,
-								   gettext_noop("could not convert row type"));
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2125,17 +1997,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2304,6 +2165,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2326,8 +2188,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2337,6 +2204,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2402,6 +2276,21 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc),
+									   gettext_noop("could not convert row type"));
 		resultRelInfo++;
 		i++;
 	}
@@ -2426,26 +2315,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a46feeedb0..bb080980c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -65,14 +65,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b40d13bfa..9571bbe328 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -485,6 +485,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1136,9 +1142,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
2.11.0

v6-0003-Rearrange-partition-update-row-movement-code-a-bi.patchapplication/octet-stream; name=v6-0003-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From a9e75e5c00d8cae7785f77c34b92461c8e029690 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v6 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3316c089e9..cbf3de6267 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -621,31 +620,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -711,7 +709,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -956,32 +953,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1028,6 +1023,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1181,119 +1323,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.11.0

#37Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#36)
Re: partition routing layering in nodeModifyTable.c

Amit-san,

On Wed, Aug 7, 2019 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Aug 7, 2019 at 12:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

IIUC, I think we reached a consensus at least on the 0001 patch.
Andres, would you mind if I commit that patch?

I just noticed obsolete references to es_result_relation_info that
0002 failed to remove. One of them is in fdwhandler.sgml:

<programlisting>
TupleTableSlot *
IterateDirectModify(ForeignScanState *node);
</programlisting>

... The data that was actually inserted, updated
or deleted must be stored in the
<literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
of the node's <structname>EState</structname>.

We will need to rewrite this without mentioning
es_result_relation_info. How about as follows:

-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the result relation's<structname>ResultRelInfo</structname> that has
+     been made available via node.

I've updated 0001 with the above change.

Good catch!

This would be nitpicking, but:

* IIUC, we don't use the term "result relation" in fdwhandler.sgml.
For consistency with your change to the doc for BeginDirectModify, how
about using the term "target foreign table" instead of "result
relation"?

* ISTM that "<structname>ResultRelInfo</structname> that has been made
available via node" would be a bit fuzzy to FDW authors. To be more
specific, how about changing it to
"<structname>ResultRelInfo</structname> passed to
<function>BeginDirectModify</function>" or something like that?

Best regards,
Etsuro Fujita

#38Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#37)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

Fujita-san,

On Wed, Aug 7, 2019 at 6:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Aug 7, 2019 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote:

I just noticed obsolete references to es_result_relation_info that
0002 failed to remove. One of them is in fdwhandler.sgml:

<programlisting>
TupleTableSlot *
IterateDirectModify(ForeignScanState *node);
</programlisting>

... The data that was actually inserted, updated
or deleted must be stored in the
<literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
of the node's <structname>EState</structname>.

We will need to rewrite this without mentioning
es_result_relation_info. How about as follows:

-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the result relation's<structname>ResultRelInfo</structname> that has
+     been made available via node.

I've updated 0001 with the above change.

Good catch!

Thanks for the review.

This would be nitpicking, but:

* IIUC, we don't use the term "result relation" in fdwhandler.sgml.
For consistency with your change to the doc for BeginDirectModify, how
about using the term "target foreign table" instead of "result
relation"?

Agreed, done.

* ISTM that "<structname>ResultRelInfo</structname> that has been made
available via node" would be a bit fuzzy to FDW authors. To be more
specific, how about changing it to
"<structname>ResultRelInfo</structname> passed to
<function>BeginDirectModify</function>" or something like that?

That works for me, although an FDW author reading this still has got
to make the connection.

Attached updated patches; only 0001 changed in this version.

Thanks,
Amit

Attachments:

v7-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchapplication/octet-stream; name=v7-0001-Revise-BeginDirectModify-API-to-pass-ResultRelInf.patchDownload
From 62ed7026523f19af95595e3cdbe51720b1bb266d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 7 Aug 2019 09:46:27 +0900
Subject: [PATCH v7 1/4] Revise BeginDirectModify API to pass ResultRelInfo
 directly

For ExecInitForeignScan() to efficiently get the ResultRelInfo to
pass to BeginDirectModify(), add a field to ForeignScan that gives
the index of a given result relation in the query's list of result
relations.
---
 contrib/postgres_fdw/postgres_fdw.c     | 27 +++++++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            | 12 ++++++++----
 src/backend/executor/nodeForeignscan.c  | 13 ++++++++++---
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 15 +++++++++++++++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  4 ++++
 10 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 06a205877d..a61c2d8b5f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -205,6 +205,9 @@ typedef struct PgFdwDirectModifyState
 	List	   *retrieved_attrs;	/* attr numbers retrieved by RETURNING */
 	bool		set_processed;	/* do we set the command es_processed? */
 
+	/* Information about the relation being modified */
+	ResultRelInfo *resultRelInfo;
+
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the update */
 	int			numParams;		/* number of parameters passed to query */
@@ -360,7 +363,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2331,6 +2336,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2340,7 +2350,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2368,7 +2380,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2414,6 +2426,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	dmstate->set_processed = intVal(list_nth(fsplan->fdw_private,
 											 FdwDirectModifyPrivateSetProcessed));
 
+	/* Save the ResultRelInfo of the relation being modified. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Create context for per-tuple temp workspace. */
 	dmstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
 											  "postgres_fdw temporary data",
@@ -2463,7 +2478,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4048,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4195,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb611..907ce46ee6 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -871,6 +871,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -883,7 +884,9 @@ BeginDirectModify(ForeignScanState *node,
      the table to modify is accessible through the
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
+     information provided by <function>PlanDirectModify</function>).  In
+     addition, <literal>rinfo</literal> also contains information describing
+     the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -915,9 +918,10 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
-     Return NULL if no more rows are available.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     passed to <function>BeginDirectModify</function>.  Return NULL if no more
+     rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
      <function>BeginDirectModify</function> if you need longer-lived storage, or use
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..9824c16e09 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -223,10 +223,17 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Tell the FDW to initialize the scan.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		/* Perform initializations for a direct modification. */
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a2617c7cfd..e981298a75 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e6ce8e2110..80eedc4a24 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 764e3bb90c..92cc90c0f0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1983,6 +1983,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2325694c5..ff78167f79 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5455,6 +5455,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* this might be filled to a >= 0 value by set_plan_refs() */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 329ebd5f28..f18e94a879 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
 				foreach(l, splan->plans)
 				{
 					lfirst(l) = set_plan_refs(root,
@@ -1225,6 +1232,14 @@ set_foreignscan_references(PlannerInfo *root,
 			tempset = bms_add_member(tempset, x + rtoffset);
 		fscan->fs_relids = tempset;
 	}
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8e6594e355..dc2061cf89 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -616,6 +616,10 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* For "direct modification" operations, this
+								 * contains the offset of result relation in
+								 * query-global list of result relations; -1
+								 * otherwise */
 } ForeignScan;
 
 /* ----------------
-- 
2.11.0

v7-0002-Remove-es_result_relation_info.patchapplication/octet-stream; name=v7-0002-Remove-es_result_relation_info.patchDownload
From 0485b1889cb4c4f1ef4097223f8f26c5e8c349b0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v7 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   5 -
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 188 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  26 +++--
 src/include/executor/executor.h          |  19 +++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 130 insertions(+), 175 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3aeef30b28..967dba6fcf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2444,9 +2444,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2479,7 +2476,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2844,7 +2842,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3116,11 +3113,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3224,7 +3216,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3295,7 +3287,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fb2be10794..e63b25bf25 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1746,7 +1746,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1876,7 +1875,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dbd7dd9bcd..d6c3a94522 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -858,9 +858,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -904,7 +901,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
@@ -2822,7 +2818,6 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree)
 			estate->es_num_root_result_relations = numRootResultRels;
 		}
 	}
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	estate->es_top_eflags = parentestate->es_top_eflags;
 	estate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c1fc0d54e9..eaaf69bb93 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,8 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9e0c8794c4..3316c089e9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +409,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -428,7 +445,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -490,8 +507,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -551,7 +568,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -590,7 +608,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -676,6 +695,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -687,7 +707,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -697,10 +716,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1037,6 +1052,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1045,12 +1061,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1060,10 +1074,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1090,7 +1100,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1127,7 +1137,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1177,6 +1187,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1202,9 +1213,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1245,16 +1259,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1271,18 +1275,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1448,7 +1452,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1687,7 +1692,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1844,41 +1849,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -1989,10 +1990,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2041,17 +2040,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2084,7 +2072,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2129,7 +2116,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2212,25 +2198,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2242,15 +2224,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2271,7 +2247,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2314,14 +2289,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2363,7 +2332,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2387,8 +2355,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 43edfef089..10ef6af3e7 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -193,7 +193,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -590,6 +590,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -603,13 +604,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -664,6 +665,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -697,6 +699,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -705,7 +708,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -747,7 +750,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -763,7 +767,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -786,6 +790,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -816,6 +821,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -824,7 +830,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -852,7 +858,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -864,7 +870,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 1fb28b4596..3ecdcc3a34 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4ec78491f6..0b40d13bfa 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,7 +519,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 75e25cdf48..b73e26fc69 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 23885f638c..e5a0a05d13 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
2.11.0

v7-0004-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v7-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From 1f05ac37b37f98e05a5cd468e2718ec4ab684c95 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v7 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 ++---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  21 +++-
 src/backend/executor/nodeModifyTable.c | 209 +++++++--------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 87 insertions(+), 212 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 967dba6fcf..5e7153bb8d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3113,32 +3113,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 2d9a8e9d54..a8faa5e1e4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5750,14 +5749,24 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 729dc396a9..743f54926a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -935,10 +935,23 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot),
-								   gettext_noop("could not convert row type"));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot),
+									   gettext_noop("could not convert row type"));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cbf3de6267..d327153a6a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -339,10 +336,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1053,9 +1046,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1131,41 +1122,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1872,28 +1838,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1917,6 +1861,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1931,37 +1876,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -1977,59 +1902,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc,
-								   gettext_noop("could not convert row type"));
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2125,17 +1997,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2304,6 +2165,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2326,8 +2188,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2337,6 +2204,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2402,6 +2276,21 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc),
+									   gettext_noop("could not convert row type"));
 		resultRelInfo++;
 		i++;
 	}
@@ -2426,26 +2315,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a46feeedb0..bb080980c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -65,14 +65,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0b40d13bfa..9571bbe328 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -485,6 +485,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1136,9 +1142,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
2.11.0

v7-0003-Rearrange-partition-update-row-movement-code-a-bi.patchapplication/octet-stream; name=v7-0003-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From c60bad85be3ace4b84f6df185385de4c849b7c5d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v7 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3316c089e9..cbf3de6267 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -621,31 +620,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -711,7 +709,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -956,32 +953,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1028,6 +1023,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1181,119 +1323,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.11.0

#39Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Amit Langote (#38)
1 attachment(s)
Re: partition routing layering in nodeModifyTable.c

Hi,

On Thu, Aug 8, 2019 at 10:10 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Aug 7, 2019 at 6:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Wed, Aug 7, 2019 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote:

I just noticed obsolete references to es_result_relation_info that
0002 failed to remove. One of them is in fdwhandler.sgml:

<programlisting>
TupleTableSlot *
IterateDirectModify(ForeignScanState *node);
</programlisting>

... The data that was actually inserted, updated
or deleted must be stored in the
<literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
of the node's <structname>EState</structname>.

We will need to rewrite this without mentioning
es_result_relation_info. How about as follows:

-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the result relation's<structname>ResultRelInfo</structname> that has
+     been made available via node.

I've updated 0001 with the above change.

This would be nitpicking, but:

* IIUC, we don't use the term "result relation" in fdwhandler.sgml.
For consistency with your change to the doc for BeginDirectModify, how
about using the term "target foreign table" instead of "result
relation"?

Agreed, done.

* ISTM that "<structname>ResultRelInfo</structname> that has been made
available via node" would be a bit fuzzy to FDW authors. To be more
specific, how about changing it to
"<structname>ResultRelInfo</structname> passed to
<function>BeginDirectModify</function>" or something like that?

That works for me, although an FDW author reading this still has got
to make the connection.

Attached updated patches; only 0001 changed in this version.

Thanks for the updated version, Amit-san! I updated the 0001 patch a
bit further:

* Tweaked comments in plannodes.h, createplan.c, and nodeForeignscan.c.
* Made cosmetic changes to postgres_fdw.c.
* Adjusted doc changes a bit, mainly not to produce unnecessary diff.
* Modified the commit message.

Attached is an updated version of the 0001 patch. Does that make sense?

Best regards,
Etsuro Fujita

Attachments:

v7-0001-Remove-dependency-on-estate-es_result_relation_info-efujita.patchapplication/octet-stream; name=v7-0001-Remove-dependency-on-estate-es_result_relation_info-efujita.patchDownload
From c11220d6376aafc8695155b209560a0ba1a5142c Mon Sep 17 00:00:00 2001
From: Etsuro Fujita <efujita@postgresql.org>
Date: Thu, 8 Aug 2019 21:41:12 +0900
Subject: [PATCH] Remove dependency on estate->es_result_relation_info from FDW
 APIs.

FDW APIs for executing a foreign table direct modification assumed that
the FDW would obtain the target foreign table's ResultRelInfo from
estate->es_result_relation_info of the passed-in ForeignScanState node,
but the upcoming patch(es) to refactor partitioning-related code in
nodeModifyTable.c will remove the es_result_relation_info variable.
Revise BeginDirectModify()'s API to pass the ResultRelInfo explicitly, to
remove the dependency on that variable from the FDW APIs.  For
ExecInitForeignScan() to efficiently get the ResultRelInfo to pass to
BeginDirectModify(), add a field to ForeignScan that gives the index of
the target foreign table in the list of the query result relations.

Patch by Amit Langote, following a proposal by Andres Freund, reviewed by
Andres Freund and me

Discussion: https://postgr.es/m/20190718010911.l6xcdv6birtxiei4@alap3.anarazel.de
---
 contrib/postgres_fdw/postgres_fdw.c     | 25 +++++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            |  8 ++++++--
 src/backend/executor/nodeForeignscan.c  | 14 ++++++++++----
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 15 +++++++++++++++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  3 +++
 10 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 06a205877d..56b4b03cb0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -217,6 +217,7 @@ typedef struct PgFdwDirectModifyState
 	int			num_tuples;		/* # of result tuples */
 	int			next_tuple;		/* index of next one to return */
 	Relation	resultRel;		/* relcache entry for the target relation */
+	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for the target relation */
 	AttrNumber *attnoMap;		/* array of attnums of input user columns */
 	AttrNumber	ctidAttno;		/* attnum of input ctid column */
 	AttrNumber	oidAttno;		/* attnum of input oid column */
@@ -360,7 +361,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2331,6 +2334,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2340,7 +2348,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2368,7 +2378,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2401,6 +2411,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 		dmstate->rel = NULL;
 	}
 
+	/* Save the ResultRelInfo for the target relation. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Initialize state variable */
 	dmstate->num_tuples = -1;	/* -1 means not set yet */
 
@@ -2463,7 +2476,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4046,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4193,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 27b94fb611..cf57957ae3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -871,6 +871,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -884,6 +885,8 @@ BeginDirectModify(ForeignScanState *node,
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
      information provided by <function>PlanDirectModify</function>).
+     In addition, the <structname>ResultRelInfo</structname> struct also
+     contains information about the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -915,8 +918,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     passed to <function>BeginDirectModify</function>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..84ef31ceef 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -221,12 +221,18 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan or the direct modification.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a2617c7cfd..e981298a75 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e6ce8e2110..80eedc4a24 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 764e3bb90c..92cc90c0f0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1983,6 +1983,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2325694c5..8f228f2b7e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5455,6 +5455,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* resultRelIndex will be set by PlanDirectModify/setrefs.c, if needed */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 329ebd5f28..f18e94a879 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
 				foreach(l, splan->plans)
 				{
 					lfirst(l) = set_plan_refs(root,
@@ -1225,6 +1232,14 @@ set_foreignscan_references(PlannerInfo *root,
 			tempset = bms_add_member(tempset, x + rtoffset);
 		fscan->fs_relids = tempset;
 	}
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8e6594e355..fac1e6098a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -616,6 +616,9 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* index of foreign table in the list of query
+								 * result relations for INSERT/UPDATE/DELETE;
+								 * -1 for SELECT */
 } ForeignScan;
 
 /* ----------------
-- 
2.19.2

#40Amit Langote
amitlangote09@gmail.com
In reply to: Etsuro Fujita (#39)
Re: partition routing layering in nodeModifyTable.c

Fujita-san,

On Thu, Aug 8, 2019 at 9:49 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Thu, Aug 8, 2019 at 10:10 AM Amit Langote <amitlangote09@gmail.com> wrote:

Attached updated patches; only 0001 changed in this version.

Thanks for the updated version, Amit-san! I updated the 0001 patch a
bit further:

* Tweaked comments in plannodes.h, createplan.c, and nodeForeignscan.c.
* Made cosmetic changes to postgres_fdw.c.
* Adjusted doc changes a bit, mainly not to produce unnecessary diff.
* Modified the commit message.

Attached is an updated version of the 0001 patch. Does that make sense?

Looks perfect, thank you.

Regards,
Amit

#41Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#40)
Re: partition routing layering in nodeModifyTable.c

On Fri, Aug 9, 2019 at 10:51 AM Amit Langote <amitlangote09@gmail.com> wrote:

Fujita-san,

On Thu, Aug 8, 2019 at 9:49 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Thu, Aug 8, 2019 at 10:10 AM Amit Langote <amitlangote09@gmail.com> wrote:

Attached updated patches; only 0001 changed in this version.

Thanks for the updated version, Amit-san! I updated the 0001 patch a
bit further:

* Tweaked comments in plannodes.h, createplan.c, and nodeForeignscan.c.
* Made cosmetic changes to postgres_fdw.c.
* Adjusted doc changes a bit, mainly not to produce unnecessary diff.
* Modified the commit message.

Attached is an updated version of the 0001 patch. Does that make sense?

Looks perfect, thank you.

To avoid losing track of this, I've added this to November CF.

https://commitfest.postgresql.org/25/2277/

Struggled a bit to give a title to the entry though.

Thanks,
Amit

#42Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#41)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Wed, Sep 4, 2019 at 10:45 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Aug 9, 2019 at 10:51 AM Amit Langote <amitlangote09@gmail.com> wrote:
To avoid losing track of this, I've added this to November CF.

https://commitfest.postgresql.org/25/2277/

Struggled a bit to give a title to the entry though.

Noticed that one of the patches needed a rebase.

Attached updated patches. Note that v8-0001 is v7-0001 unchanged that
Fujita-san posted on Aug 8.

Thanks,
Amit

Attachments:

v8-0001-Remove-dependency-on-estate-es_result_relation_in.patchapplication/octet-stream; name=v8-0001-Remove-dependency-on-estate-es_result_relation_in.patchDownload
From 50e876daa386484889e30e3f754d89f56c66f867 Mon Sep 17 00:00:00 2001
From: Etsuro Fujita <efujita@postgresql.org>
Date: Thu, 8 Aug 2019 21:41:12 +0900
Subject: [PATCH v8 1/4] Remove dependency on estate->es_result_relation_info
 from FDW APIs.

FDW APIs for executing a foreign table direct modification assumed that
the FDW would obtain the target foreign table's ResultRelInfo from
estate->es_result_relation_info of the passed-in ForeignScanState node,
but the upcoming patch(es) to refactor partitioning-related code in
nodeModifyTable.c will remove the es_result_relation_info variable.
Revise BeginDirectModify()'s API to pass the ResultRelInfo explicitly, to
remove the dependency on that variable from the FDW APIs.  For
ExecInitForeignScan() to efficiently get the ResultRelInfo to pass to
BeginDirectModify(), add a field to ForeignScan that gives the index of
the target foreign table in the list of the query result relations.

Patch by Amit Langote, following a proposal by Andres Freund, reviewed by
Andres Freund and me

Discussion: https://postgr.es/m/20190718010911.l6xcdv6birtxiei4@alap3.anarazel.de
---
 contrib/postgres_fdw/postgres_fdw.c     | 25 +++++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            |  8 ++++++--
 src/backend/executor/nodeForeignscan.c  | 14 ++++++++++----
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 15 +++++++++++++++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  3 +++
 10 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 82d8140ba2..7ee783ae47 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -217,6 +217,7 @@ typedef struct PgFdwDirectModifyState
 	int			num_tuples;		/* # of result tuples */
 	int			next_tuple;		/* index of next one to return */
 	Relation	resultRel;		/* relcache entry for the target relation */
+	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for the target relation */
 	AttrNumber *attnoMap;		/* array of attnums of input user columns */
 	AttrNumber	ctidAttno;		/* attnum of input ctid column */
 	AttrNumber	oidAttno;		/* attnum of input oid column */
@@ -360,7 +361,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2331,6 +2334,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2340,7 +2348,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2368,7 +2378,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2401,6 +2411,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 		dmstate->rel = NULL;
 	}
 
+	/* Save the ResultRelInfo for the target relation. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Initialize state variable */
 	dmstate->num_tuples = -1;	/* -1 means not set yet */
 
@@ -2463,7 +2476,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4033,7 +4046,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4180,7 +4193,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..0aff958415 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -873,6 +873,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -886,6 +887,8 @@ BeginDirectModify(ForeignScanState *node,
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
      information provided by <function>PlanDirectModify</function>).
+     In addition, the <structname>ResultRelInfo</structname> struct also
+     contains information about the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -917,8 +920,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     passed to <function>BeginDirectModify</function>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..84ef31ceef 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -221,12 +221,18 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan or the direct modification.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 3432bb921d..5f4cf0528a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b0dcd02ff6..c7d63522c2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 764e3bb90c..92cc90c0f0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1983,6 +1983,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 0c036209f0..756a04de79 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5455,6 +5455,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* resultRelIndex will be set by PlanDirectModify/setrefs.c, if needed */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 566ee96da8..8799c8f572 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
 				foreach(l, splan->plans)
 				{
 					lfirst(l) = set_plan_refs(root,
@@ -1225,6 +1232,14 @@ set_foreignscan_references(PlannerInfo *root,
 			tempset = bms_add_member(tempset, x + rtoffset);
 		fscan->fs_relids = tempset;
 	}
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8e6594e355..fac1e6098a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -616,6 +616,9 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* index of foreign table in the list of query
+								 * result relations for INSERT/UPDATE/DELETE;
+								 * -1 for SELECT */
 } ForeignScan;
 
 /* ----------------
-- 
2.11.0

v8-0004-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v8-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From f40fa0689657947a9d93e99c37041481fab11126 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v8 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 ++---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  19 ++-
 src/backend/executor/nodeModifyTable.c | 207 +++++++--------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 85 insertions(+), 210 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 967dba6fcf..5e7153bb8d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3113,32 +3113,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index cdb1105b4a..17f60bfcd4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4643,9 +4644,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5749,14 +5748,24 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d23f292cb0..d7ed3fcdbe 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -931,9 +931,22 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 956bab09af..1b89ed5c31 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -339,10 +336,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1052,9 +1045,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1130,41 +1121,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1869,28 +1835,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1914,6 +1858,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1928,37 +1873,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -1974,58 +1899,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc);
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2121,17 +1994,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2300,6 +2162,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2322,8 +2185,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2333,6 +2201,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2398,6 +2273,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc));
 		resultRelInfo++;
 		i++;
 	}
@@ -2422,26 +2311,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a46feeedb0..bb080980c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -65,14 +65,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 760e580281..175e4fb96f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -486,6 +486,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1187,9 +1193,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
2.11.0

v8-0002-Remove-es_result_relation_info.patchapplication/octet-stream; name=v8-0002-Remove-es_result_relation_info.patchDownload
From 04eb853f2726e8bfbf1b97b784dff438e09bfbfe Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v8 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   5 -
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 188 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  26 +++--
 src/include/executor/executor.h          |  19 +++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 130 insertions(+), 175 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3aeef30b28..967dba6fcf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2444,9 +2444,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2479,7 +2476,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2844,7 +2842,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3116,11 +3113,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3224,7 +3216,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3295,7 +3287,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 05593f3316..706f3a258e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1745,7 +1745,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1875,7 +1874,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ea4b586984..d3a8302010 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -857,9 +857,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -903,7 +900,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
@@ -2823,7 +2819,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 			rcestate->es_num_root_result_relations = numRootResultRels;
 		}
 	}
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index ee0239b146..06bd715096 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -124,8 +124,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c9d024ead5..57b774335b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +409,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -428,7 +445,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -490,8 +507,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -551,7 +568,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -590,7 +608,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -676,6 +695,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -687,7 +707,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -697,10 +716,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1036,6 +1051,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1044,12 +1060,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1059,10 +1073,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1089,7 +1099,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1126,7 +1136,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1176,6 +1186,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1201,9 +1212,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1244,16 +1258,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1270,18 +1274,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1445,7 +1449,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1684,7 +1689,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1841,41 +1846,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -1985,10 +1986,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2037,17 +2036,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2080,7 +2068,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2125,7 +2112,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2208,25 +2194,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2238,15 +2220,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2267,7 +2243,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2310,14 +2285,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2359,7 +2328,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2383,8 +2351,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 11e6331f49..9a11cea7ce 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -193,7 +193,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -590,6 +590,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -603,13 +604,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -664,6 +665,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -697,6 +699,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -705,7 +708,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -747,7 +750,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -763,7 +767,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -786,6 +790,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -816,6 +821,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -824,7 +830,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -852,7 +858,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -864,7 +870,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6298c7c8ca..a12d02cf5e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44f76082e9..760e580281 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,7 +519,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 75e25cdf48..b73e26fc69 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 23885f638c..e5a0a05d13 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
2.11.0

v8-0003-Rearrange-partition-update-row-movement-code-a-bi.patchapplication/octet-stream; name=v8-0003-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From 2df9db3576f774b5653758d692d8cbc4e34efa31 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v8 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 57b774335b..956bab09af 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -621,31 +620,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -711,7 +709,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -955,32 +952,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1027,6 +1022,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1180,119 +1322,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.11.0

#43Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#42)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Thu, Sep 26, 2019 at 1:56 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Sep 4, 2019 at 10:45 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Aug 9, 2019 at 10:51 AM Amit Langote <amitlangote09@gmail.com> wrote:
To avoid losing track of this, I've added this to November CF.

https://commitfest.postgresql.org/25/2277/

Struggled a bit to give a title to the entry though.

Noticed that one of the patches needed a rebase.

Attached updated patches. Note that v8-0001 is v7-0001 unchanged that
Fujita-san posted on Aug 8.

Rebased again.

Thanks,
Amit

Attachments:

v9-0001-Remove-dependency-on-estate-es_result_relation_in.patchtext/plain; charset=US-ASCII; name=v9-0001-Remove-dependency-on-estate-es_result_relation_in.patchDownload
From a0f6939e5b2ee8f78c813545f45d59a346d22e5f Mon Sep 17 00:00:00 2001
From: Etsuro Fujita <efujita@postgresql.org>
Date: Thu, 8 Aug 2019 21:41:12 +0900
Subject: [PATCH v9 1/4] Remove dependency on estate->es_result_relation_info
 from FDW APIs.

FDW APIs for executing a foreign table direct modification assumed that
the FDW would obtain the target foreign table's ResultRelInfo from
estate->es_result_relation_info of the passed-in ForeignScanState node,
but the upcoming patch(es) to refactor partitioning-related code in
nodeModifyTable.c will remove the es_result_relation_info variable.
Revise BeginDirectModify()'s API to pass the ResultRelInfo explicitly, to
remove the dependency on that variable from the FDW APIs.  For
ExecInitForeignScan() to efficiently get the ResultRelInfo to pass to
BeginDirectModify(), add a field to ForeignScan that gives the index of
the target foreign table in the list of the query result relations.

Patch by Amit Langote, following a proposal by Andres Freund, reviewed by
Andres Freund and me

Discussion: https://postgr.es/m/20190718010911.l6xcdv6birtxiei4@alap3.anarazel.de
---
 contrib/postgres_fdw/postgres_fdw.c     | 25 +++++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            |  8 ++++++--
 src/backend/executor/nodeForeignscan.c  | 14 ++++++++++----
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 15 +++++++++++++++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  3 +++
 10 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index bdc21b36d1..642deaf7cb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -218,6 +218,7 @@ typedef struct PgFdwDirectModifyState
 	int			num_tuples;		/* # of result tuples */
 	int			next_tuple;		/* index of next one to return */
 	Relation	resultRel;		/* relcache entry for the target relation */
+	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for the target relation */
 	AttrNumber *attnoMap;		/* array of attnums of input user columns */
 	AttrNumber	ctidAttno;		/* attnum of input ctid column */
 	AttrNumber	oidAttno;		/* attnum of input oid column */
@@ -361,7 +362,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2328,7 +2336,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2356,7 +2366,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2389,6 +2399,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 		dmstate->rel = NULL;
 	}
 
+	/* Save the ResultRelInfo for the target relation. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Initialize state variable */
 	dmstate->num_tuples = -1;	/* -1 means not set yet */
 
@@ -2451,7 +2464,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4087,7 +4100,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4234,7 +4247,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..0aff958415 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -873,6 +873,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -886,6 +887,8 @@ BeginDirectModify(ForeignScanState *node,
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
      information provided by <function>PlanDirectModify</function>).
+     In addition, the <structname>ResultRelInfo</structname> struct also
+     contains information about the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -917,8 +920,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     passed to <function>BeginDirectModify</function>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 52af1dac5c..84ef31ceef 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -221,12 +221,18 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan or the direct modification.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a9b8b84b8f..a1094defb7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -761,6 +761,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ac02e5ec8d..91af796a00 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -698,6 +698,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f9ebc9044..d5514398f4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2013,6 +2013,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8c8b4f8ed6..455740663b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5460,6 +5460,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* resultRelIndex will be set by PlanDirectModify/setrefs.c, if needed */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index f581c5f532..c16282b082 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -901,6 +901,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
 				foreach(l, splan->plans)
 				{
 					lfirst(l) = set_plan_refs(root,
@@ -1240,6 +1247,14 @@ set_foreignscan_references(PlannerInfo *root,
 	}
 
 	fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 822686033e..adf39bc618 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 477b4da192..f3f2699e90 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -620,6 +620,9 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* index of foreign table in the list of query
+								 * result relations for INSERT/UPDATE/DELETE;
+								 * -1 for SELECT */
 } ForeignScan;
 
 /* ----------------
-- 
2.16.5

v9-0004-Refactor-transition-tuple-capture-code-a-bit.patchtext/plain; charset=US-ASCII; name=v9-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From 82a386601239c90da2a4bba69762df28351053cb Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v9 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 ++---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  19 ++-
 src/backend/executor/nodeModifyTable.c | 207 +++++++--------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 85 insertions(+), 210 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 38cffde583..61d9a1d3cf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3110,32 +3110,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index faeea16d21..191f1418fe 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4652,9 +4653,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5748,13 +5747,23 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
+		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
 		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d23f292cb0..d7ed3fcdbe 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -931,9 +931,22 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1362b2f2d1..9de5a5371c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -339,10 +336,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1051,9 +1044,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1129,41 +1120,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1868,28 +1834,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1913,6 +1857,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1927,37 +1872,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -1973,58 +1898,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc);
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2120,17 +1993,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2299,6 +2161,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2321,8 +2184,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2331,6 +2199,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
 	mtstate->fireBSTriggers = true;
 
+	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
 	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
@@ -2397,6 +2272,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc));
 		resultRelInfo++;
 		i++;
 	}
@@ -2421,26 +2310,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a46feeedb0..bb080980c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -64,14 +64,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_update_new_table;
 	bool		tcs_insert_new_table;
 
-	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
 	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 62e20fdfdc..62eed595ca 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -485,6 +485,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1185,9 +1191,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
2.16.5

v9-0002-Remove-es_result_relation_info.patchtext/plain; charset=US-ASCII; name=v9-0002-Remove-es_result_relation_info.patchDownload
From d3a17640835035e95c56859b4eed652444ec5b86 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v9 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  17 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   5 -
 src/backend/executor/execReplication.c   |  22 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 188 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  26 +++--
 src/include/executor/executor.h          |  19 +++-
 src/include/executor/nodeModifyTable.h   |   3 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 130 insertions(+), 175 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 42a147b67d..38cffde583 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2441,9 +2441,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2476,7 +2473,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2841,7 +2839,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3112,11 +3109,6 @@ CopyFrom(CopyState cstate)
 				prevResultRelInfo = resultRelInfo;
 			}
 
-			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
 			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
@@ -3221,7 +3213,7 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3292,7 +3284,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index daa80ec4aa..1495d2c60e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1749,7 +1749,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1879,7 +1878,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 40bd8049f0..357bf17e31 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c46eb8d646..879ac8fbcd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -857,9 +857,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -903,7 +900,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
@@ -2814,7 +2810,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 			rcestate->es_num_root_result_relations = numRootResultRels;
 		}
 	}
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 95e027c970..14d11e75c3 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -390,10 +390,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -416,7 +416,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -428,7 +428,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -452,11 +453,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -482,7 +483,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -494,7 +495,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -513,11 +515,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index ee0239b146..06bd715096 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -124,8 +124,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9ba1d78344..b01601578a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,9 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -334,32 +335,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot)
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
-
-	ExecMaterializeSlot(slot);
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 
 	/*
-	 * get information on the (current) result relation
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
+
+	ExecMaterializeSlot(slot);
+
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -392,7 +409,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -427,7 +444,7 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -489,8 +506,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -550,7 +567,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -589,7 +607,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -675,6 +694,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -686,7 +706,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -696,10 +715,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1035,6 +1050,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1043,12 +1059,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1058,10 +1072,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1088,7 +1098,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1125,7 +1135,7 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1175,6 +1185,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1200,9 +1211,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1242,16 +1256,6 @@ lreplace:;
 				}
 			}
 
-			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
@@ -1269,18 +1273,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
+			/* Tuple routing starts from the root table. */
+			Assert(mtstate->rootResultRelInfo != NULL);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1444,7 +1448,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1683,7 +1688,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1840,40 +1845,36 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
- *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
-	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
 	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
@@ -1984,10 +1985,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2035,17 +2034,6 @@ ExecModifyTable(PlanState *pstate)
 	subplanstate = node->mt_plans[node->mt_whichplan];
 	junkfilter = resultRelInfo->ri_junkFilter;
 
-	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
@@ -2079,7 +2067,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2124,7 +2111,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2207,25 +2193,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2237,15 +2219,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2266,7 +2242,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2309,14 +2284,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2358,7 +2327,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2382,8 +2350,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 4bf6f5e427..c3a6fc30d0 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -191,7 +191,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -581,6 +580,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -604,6 +604,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -617,13 +618,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -678,6 +679,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -711,6 +713,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -719,7 +722,7 @@ apply_handle_update(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -761,7 +764,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -777,7 +781,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -800,6 +804,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -830,6 +835,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -838,7 +844,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -866,7 +872,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -878,7 +884,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6298c7c8ca..a12d02cf5e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -567,10 +567,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -587,10 +591,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 891b119608..103d4cd6c3 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,8 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0c2a77aaf8..62e20fdfdc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -518,7 +518,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 75e25cdf48..b73e26fc69 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 23885f638c..e5a0a05d13 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
2.16.5

v9-0003-Rearrange-partition-update-row-movement-code-a-bi.patchtext/plain; charset=US-ASCII; name=v9-0003-Rearrange-partition-update-row-movement-code-a-bi.patchDownload
From 1bf84886550b8d31e8f1f8f4a46c43b14db3dd6c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v9 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index b01601578a..1362b2f2d1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -356,7 +356,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -620,31 +619,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -710,7 +708,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -954,32 +951,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1026,6 +1021,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1179,119 +1321,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
-
-			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
-			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (!tuple_deleted)
-			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
-			}
-
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.16.5

#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#43)
Re: partition routing layering in nodeModifyTable.c

Amit Langote <amitlangote09@gmail.com> writes:

Rebased again.

Seems to need that again, according to cfbot :-(

regards, tom lane

#45Amit Langote
amitlangote09@gmail.com
In reply to: Tom Lane (#44)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Mon, Mar 2, 2020 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Rebased again.

Seems to need that again, according to cfbot :-(

Thank you, done.

Regards,
Amit

Attachments:

v10-0002-Remove-es_result_relation_info.patchapplication/octet-stream; name=v10-0002-Remove-es_result_relation_info.patchDownload
From 40e2aa8468cd539c7b6dbc6406efc3e8abc30f7b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v10 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  18 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   5 -
 src/backend/executor/execReplication.c   |  24 +--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 193 ++++++++++-------------
 src/backend/replication/logical/worker.c |  26 +--
 src/include/executor/executor.h          |  19 ++-
 src/include/executor/nodeModifyTable.h   |   4 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 139 insertions(+), 175 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e79ede4cb8..7ba2f5b522 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2442,9 +2442,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2477,7 +2474,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2842,7 +2840,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3113,11 +3110,6 @@ CopyFrom(CopyState cstate)
 				prevResultRelInfo = resultRelInfo;
 			}
 
-			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
 			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
@@ -3222,7 +3214,8 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot, CMD_INSERT);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot,
+											   CMD_INSERT);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3293,7 +3286,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6a32812a1f..00ae661e4d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1798,7 +1798,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1928,7 +1927,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 1862af621b..e1d34be2e9 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7cb486b211..209b5d9c83 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -857,9 +857,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -903,7 +900,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
@@ -2816,7 +2812,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 			rcestate->es_num_root_result_relations = numRootResultRels;
 		}
 	}
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 7194becfd9..eb1cb621ca 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -393,10 +393,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -419,7 +419,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -431,7 +432,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -455,11 +457,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -485,7 +487,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -497,7 +500,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -516,11 +520,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cc5177cc2b..26c7479dce 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -124,8 +124,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d71c0a4322..dcbbf4c3bc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,10 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -359,32 +361,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+	/*
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
+	 */
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -417,7 +435,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -452,7 +471,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -514,8 +534,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -575,7 +595,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -614,7 +635,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -700,6 +722,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -711,7 +734,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -721,10 +743,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1060,6 +1078,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1068,12 +1087,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1083,10 +1100,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1113,7 +1126,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1150,7 +1164,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1200,6 +1215,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1225,9 +1241,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1267,16 +1286,6 @@ lreplace:;
 				}
 			}
 
-			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
@@ -1294,18 +1303,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+			/* Tuple routing starts from the root table. */
 			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
-
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1469,7 +1478,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1708,7 +1718,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1865,40 +1875,36 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
- *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
-	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
 	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
@@ -2009,10 +2015,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2060,17 +2064,6 @@ ExecModifyTable(PlanState *pstate)
 	subplanstate = node->mt_plans[node->mt_whichplan];
 	junkfilter = resultRelInfo->ri_junkFilter;
 
-	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
@@ -2104,7 +2097,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2149,7 +2141,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2232,25 +2223,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2262,15 +2249,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2291,7 +2272,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2334,14 +2314,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2383,7 +2357,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2407,8 +2380,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index ad4a732fd2..643cf81e66 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -192,7 +192,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -585,6 +584,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -608,6 +608,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -621,13 +622,13 @@ apply_handle_insert(StringInfo s)
 	slot_fill_defaults(rel, estate, remoteslot);
 	MemoryContextSwitchTo(oldctx);
 
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(resultRelInfo, estate, remoteslot);
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -682,6 +683,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	Oid			idxoid;
@@ -716,6 +718,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -741,7 +744,7 @@ apply_handle_update(StringInfo s)
 	fill_extraUpdatedCols(target_rte, RelationGetDescr(rel->localrel));
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Build the search tuple. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -783,7 +786,8 @@ apply_handle_update(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(resultRelInfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -799,7 +803,7 @@ apply_handle_update(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
@@ -822,6 +826,7 @@ apply_handle_update(StringInfo s)
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -852,6 +857,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -860,7 +866,7 @@ apply_handle_delete(StringInfo s)
 	EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
 
 	PushActiveSnapshot(GetTransactionSnapshot());
-	ExecOpenIndices(estate->es_result_relation_info, false);
+	ExecOpenIndices(resultRelInfo, false);
 
 	/* Find the tuple using the replica identity index. */
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
@@ -888,7 +894,7 @@ apply_handle_delete(StringInfo s)
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(resultRelInfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -900,7 +906,7 @@ apply_handle_delete(StringInfo s)
 	}
 
 	/* Cleanup. */
-	ExecCloseIndices(estate->es_result_relation_info);
+	ExecCloseIndices(resultRelInfo);
 	PopActiveSnapshot();
 
 	/* Handle queued AFTER triggers. */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 81fdfa4add..dda3410362 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -572,10 +572,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -592,10 +596,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 4ec4ebdabc..2518fe4f64 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,9 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cd3ddf781f..f38ec22b97 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -522,7 +522,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 45d77ba3a5..60ae509a4e 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 23885f638c..e5a0a05d13 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
2.20.1 (Apple Git-117)

v10-0003-Rearrange-partition-update-row-movement-code-a-b.patchapplication/octet-stream; name=v10-0003-Rearrange-partition-update-row-movement-code-a-b.patchDownload
From 6246ba6121a5342d8976e5556e0d7238f024e02d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v10 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 ++++++++++++++-----------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index dcbbf4c3bc..87fe1668c6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -382,7 +382,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -648,31 +647,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -738,7 +736,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -982,32 +979,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1054,6 +1049,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1209,119 +1351,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
-
-			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
-			 */
-			if (!tuple_deleted)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
-			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-			}
-
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
2.20.1 (Apple Git-117)

v10-0004-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v10-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From a04de2ad3697ed6815546cbb479b8796a4a65879 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v10 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 +---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  19 ++-
 src/backend/executor/nodeModifyTable.c | 207 +++++--------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 85 insertions(+), 210 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 7ba2f5b522..d5dc508274 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3111,32 +3111,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6e8b7223fe..acbe6966c3 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4646,9 +4647,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5742,13 +5741,23 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
+		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
 		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index a5542b92c7..76d5094bf8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -925,9 +925,22 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 87fe1668c6..29bebf9430 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -365,10 +362,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1079,9 +1072,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1157,41 +1148,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1898,28 +1864,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1943,6 +1887,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1957,37 +1902,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -2003,58 +1928,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc);
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2150,17 +2023,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2329,6 +2191,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2351,8 +2214,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2361,6 +2229,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
 	mtstate->fireBSTriggers = true;
 
+	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
 	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
@@ -2427,6 +2302,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc));
 		resultRelInfo++;
 		i++;
 	}
@@ -2451,26 +2340,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 5d69192643..f1977cee49 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -45,7 +45,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -64,14 +64,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_update_new_table;
 	bool		tcs_insert_new_table;
 
-	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
 	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f38ec22b97..550064a817 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -489,6 +489,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1189,9 +1195,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
2.20.1 (Apple Git-117)

v10-0001-Remove-dependency-on-estate-es_result_relation_i.patchapplication/octet-stream; name=v10-0001-Remove-dependency-on-estate-es_result_relation_i.patchDownload
From 72ae48a08cb0513e71555a6290eb230de7ba207e Mon Sep 17 00:00:00 2001
From: Etsuro Fujita <efujita@postgresql.org>
Date: Thu, 8 Aug 2019 21:41:12 +0900
Subject: [PATCH v10 1/4] Remove dependency on estate->es_result_relation_info
 from FDW APIs.

FDW APIs for executing a foreign table direct modification assumed that
the FDW would obtain the target foreign table's ResultRelInfo from
estate->es_result_relation_info of the passed-in ForeignScanState node,
but the upcoming patch(es) to refactor partitioning-related code in
nodeModifyTable.c will remove the es_result_relation_info variable.
Revise BeginDirectModify()'s API to pass the ResultRelInfo explicitly, to
remove the dependency on that variable from the FDW APIs.  For
ExecInitForeignScan() to efficiently get the ResultRelInfo to pass to
BeginDirectModify(), add a field to ForeignScan that gives the index of
the target foreign table in the list of the query result relations.

Patch by Amit Langote, following a proposal by Andres Freund, reviewed by
Andres Freund and me

Discussion: https://postgr.es/m/20190718010911.l6xcdv6birtxiei4@alap3.anarazel.de
---
 contrib/postgres_fdw/postgres_fdw.c     | 25 +++++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            |  8 ++++++--
 src/backend/executor/nodeForeignscan.c  | 14 ++++++++++----
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 15 +++++++++++++++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  3 +++
 10 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2175dff824..75761007af 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -218,6 +218,7 @@ typedef struct PgFdwDirectModifyState
 	int			num_tuples;		/* # of result tuples */
 	int			next_tuple;		/* index of next one to return */
 	Relation	resultRel;		/* relcache entry for the target relation */
+	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for the target relation */
 	AttrNumber *attnoMap;		/* array of attnums of input user columns */
 	AttrNumber	ctidAttno;		/* attnum of input ctid column */
 	AttrNumber	oidAttno;		/* attnum of input oid column */
@@ -361,7 +362,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2328,7 +2336,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2356,7 +2366,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2389,6 +2399,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 		dmstate->rel = NULL;
 	}
 
+	/* Save the ResultRelInfo for the target relation. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Initialize state variable */
 	dmstate->num_tuples = -1;	/* -1 means not set yet */
 
@@ -2451,7 +2464,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4087,7 +4100,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4234,7 +4247,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..0aff958415 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -873,6 +873,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -886,6 +887,8 @@ BeginDirectModify(ForeignScanState *node,
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
      information provided by <function>PlanDirectModify</function>).
+     In addition, the <structname>ResultRelInfo</structname> struct also
+     contains information about the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -917,8 +920,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     passed to <function>BeginDirectModify</function>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..302125284e 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -221,12 +221,18 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan or the direct modification.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e04c33e4ad..1580e2d27c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -761,6 +761,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e084c3f069..843ee5bdec 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -698,6 +698,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d5b23a3479..e6f2358dab 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2016,6 +2016,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fc25908dc6..beba1607a1 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5462,6 +5462,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* resultRelIndex will be set by PlanDirectModify/setrefs.c, if needed */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 3dcded506b..fe6db3c7c9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -903,6 +903,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
 				foreach(l, splan->plans)
 				{
 					lfirst(l) = set_plan_refs(root,
@@ -1242,6 +1249,14 @@ set_foreignscan_references(PlannerInfo *root,
 	}
 
 	fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..5c605928b2 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4869fe7b6d..81449ab92e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -620,6 +620,9 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* index of foreign table in the list of query
+								 * result relations for INSERT/UPDATE/DELETE;
+								 * -1 for SELECT */
 } ForeignScan;
 
 /* ----------------
-- 
2.20.1 (Apple Git-117)

#46Daniel Gustafsson
daniel@yesql.se
In reply to: Amit Langote (#45)
Re: partition routing layering in nodeModifyTable.c

On 2 Mar 2020, at 06:08, Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Mar 2, 2020 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Rebased again.

Seems to need that again, according to cfbot :-(

Thank you, done.

..and another one is needed as it no longer applies, please submit a rebased
version.

cheers ./daniel

#47Amit Langote
amitlangote09@gmail.com
In reply to: Daniel Gustafsson (#46)
4 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Wed, Jul 1, 2020 at 6:56 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 2 Mar 2020, at 06:08, Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Mar 2, 2020 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Rebased again.

Seems to need that again, according to cfbot :-(

Thank you, done.

..and another one is needed as it no longer applies, please submit a rebased
version.

Sorry, it took me a while to get to this.

It's been over 11 months since there was any significant commentary on
the contents of the patches themselves, so perhaps I should reiterate
what the patches are about and why it might still be a good idea to
consider them.

The thread started with some very valid criticism of the way
executor's partition tuple routing logic looks randomly sprinkled over
in nodeModifyTable.c, execPartition.c. In the process of making it
look less random, we decided to get rid of the global variable
es_result_relation_info to avoid complex maneuvers of
setting/resetting it correctly when performing partition tuple
routing, causing some other churn beside the partitioning code. Same
with another global variable TransitionCaptureState.tcs_map. So, the
patches neither add any new capabilities, nor improve performance, but
they do make the code in this area a bit easier to follow.

Actually, there is a problem that some of the changes here conflict
with patches being discussed on other threads ([1]https://commitfest.postgresql.org/28/2575/, [2]https://commitfest.postgresql.org/28/2621/), so much so
that I decided to absorb some changes here into another "refactoring"
patch that I have posted at [2]https://commitfest.postgresql.org/28/2621/.

Attached rebased patches.

0001 contains preparatory FDW API changes to stop relying on
es_result_relation_info being set correctly.

0002 removes es_result_relation_info in favor passing the active
result relation around as a parameter in the various functions that
need it

0003 Moves UPDATE tuple-routing logic into a new function

0004 removes the global variable TransitionCaptureState.tcs_map which
needed to be set/reset whenever the active result relation relation
changes in favor of a new field in ResultRelInfo to store the same map

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1]: https://commitfest.postgresql.org/28/2575/
[2]: https://commitfest.postgresql.org/28/2621/

Attachments:

v11-0001-Remove-dependency-on-estate-es_result_relation_i.patchapplication/x-patch; name=v11-0001-Remove-dependency-on-estate-es_result_relation_i.patchDownload
From db6849c6ef3c3329911bcba5d75a88d86164ca7b Mon Sep 17 00:00:00 2001
From: Etsuro Fujita <efujita@postgresql.org>
Date: Thu, 8 Aug 2019 21:41:12 +0900
Subject: [PATCH v11 1/4] Remove dependency on estate->es_result_relation_info
 from FDW APIs.

FDW APIs for executing a foreign table direct modification assumed that
the FDW would obtain the target foreign table's ResultRelInfo from
estate->es_result_relation_info of the passed-in ForeignScanState node,
but the upcoming patch(es) to refactor partitioning-related code in
nodeModifyTable.c will remove the es_result_relation_info variable.
Revise BeginDirectModify()'s API to pass the ResultRelInfo explicitly, to
remove the dependency on that variable from the FDW APIs.  For
ExecInitForeignScan() to efficiently get the ResultRelInfo to pass to
BeginDirectModify(), add a field to ForeignScan that gives the index of
the target foreign table in the list of the query result relations.

Patch by Amit Langote, following a proposal by Andres Freund, reviewed by
Andres Freund and me

Discussion: https://postgr.es/m/20190718010911.l6xcdv6birtxiei4@alap3.anarazel.de
---
 contrib/postgres_fdw/postgres_fdw.c     | 25 +++++++++++++++++++------
 doc/src/sgml/fdwhandler.sgml            |  8 ++++++--
 src/backend/executor/nodeForeignscan.c  | 14 ++++++++++----
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c    | 15 +++++++++++++++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           |  3 +++
 10 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53ca..3af5b24 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -218,6 +218,7 @@ typedef struct PgFdwDirectModifyState
 	int			num_tuples;		/* # of result tuples */
 	int			next_tuple;		/* index of next one to return */
 	Relation	resultRel;		/* relcache entry for the target relation */
+	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for the target relation */
 	AttrNumber *attnoMap;		/* array of attnums of input user columns */
 	AttrNumber	ctidAttno;		/* attnum of input ctid column */
 	AttrNumber	oidAttno;		/* attnum of input oid column */
@@ -361,7 +362,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 									 ModifyTable *plan,
 									 Index resultRelation,
 									 int subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root,
 			rebuild_fdw_scan_tlist(fscan, returningList);
 	}
 
+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
 	table_close(rel, NoLock);
 	return true;
 }
@@ -2328,7 +2336,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  *		Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+						  ResultRelInfo *rinfo,
+						  int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
@@ -2356,7 +2366,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = rinfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2389,6 +2399,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 		dmstate->rel = NULL;
 	}
 
+	/* Save the ResultRelInfo for the target relation. */
+	dmstate->resultRelInfo = rinfo;
+
 	/* Initialize state variable */
 	dmstate->num_tuples = -1;	/* -1 means not set yet */
 
@@ -2451,7 +2464,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4087,7 +4100,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4234,7 +4247,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *relInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 7479303..d2cfc6c 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -881,6 +881,7 @@ PlanDirectModify(PlannerInfo *root,
 <programlisting>
 void
 BeginDirectModify(ForeignScanState *node,
+                  ResultRelInfo *rinfo,
                   int eflags);
 </programlisting>
 
@@ -894,6 +895,8 @@ BeginDirectModify(ForeignScanState *node,
      <structname>ForeignScanState</structname> node (in particular, from the underlying
      <structname>ForeignScan</structname> plan node, which contains any FDW-private
      information provided by <function>PlanDirectModify</function>).
+     In addition, the <structname>ResultRelInfo</structname> struct also
+     contains information about the target foreign table.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -925,8 +928,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     passed to <function>BeginDirectModify</function>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471a..3021252 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -221,12 +221,18 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan or the direct modification.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
+	else
+	{
+		ResultRelInfo *resultRelInfo;
+
+		Assert(node->resultRelIndex >= 0);
+		resultRelInfo = &estate->es_result_relations[node->resultRelIndex];
+		fdwroutine->BeginDirectModify(scanstate, resultRelInfo, eflags);
+	}
 
 	return scanstate;
 }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 89c409d..2afb195 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -761,6 +761,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f1775..15fd85a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -698,6 +698,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab..4024a80 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2017,6 +2017,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index eb9543f..362bc44 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5559,6 +5559,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* resultRelIndex will be set by PlanDirectModify/setrefs.c, if needed */
+	node->resultRelIndex = -1;
 
 	return node;
 }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index baefe0e..f3d1a12 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -904,6 +904,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					rc->rti += rtoffset;
 					rc->prti += rtoffset;
 				}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
 				foreach(l, splan->plans)
 				{
 					lfirst(l) = set_plan_refs(root,
@@ -1243,6 +1250,14 @@ set_foreignscan_references(PlannerInfo *root,
 	}
 
 	fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556df..5c60592 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -112,6 +112,7 @@ typedef bool (*PlanDirectModify_function) (PlannerInfo *root,
 										   int subplan_index);
 
 typedef void (*BeginDirectModify_function) (ForeignScanState *node,
+											ResultRelInfo *rinfo,
 											int eflags);
 
 typedef TupleTableSlot *(*IterateDirectModify_function) (ForeignScanState *node);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e0107..7314d2f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -620,6 +620,9 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* index of foreign table in the list of query
+								 * result relations for INSERT/UPDATE/DELETE;
+								 * -1 for SELECT */
 } ForeignScan;
 
 /* ----------------
-- 
1.8.3.1

v11-0003-Rearrange-partition-update-row-movement-code-a-b.patchapplication/x-patch; name=v11-0003-Rearrange-partition-update-row-movement-code-a-b.patchDownload
From 8fc56638cbaece9ded770734e1369aefd1dd0e8d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v11 3/4] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 0c481cb..dd97ef5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -389,7 +389,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -655,31 +654,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -745,7 +743,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -989,32 +986,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1061,6 +1056,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1216,119 +1358,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
-
-			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
-			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (!tuple_deleted)
-			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
-			}
-
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
1.8.3.1

v11-0002-Remove-es_result_relation_info.patchapplication/x-patch; name=v11-0002-Remove-es_result_relation_info.patchDownload
From b64d57f06a6074cbbd18c8587a0162b9a97dd8eb Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v11 2/4] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  18 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   5 -
 src/backend/executor/execReplication.c   |  24 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 193 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  44 +++----
 src/include/executor/executor.h          |  19 ++-
 src/include/executor/nodeModifyTable.h   |   4 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 146 insertions(+), 186 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 99d1457..735c621 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2437,9 +2437,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2472,7 +2469,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2789,7 +2787,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3061,11 +3058,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3169,7 +3161,8 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot, CMD_INSERT);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot,
+											   CMD_INSERT);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3240,7 +3233,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ed553f7..f2258c4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1801,7 +1801,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1931,7 +1930,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 1862af6..e1d34be 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4fdffad..190f979 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -857,9 +857,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -903,7 +900,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
@@ -2816,7 +2812,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 			rcestate->es_num_root_result_relations = numRootResultRels;
 		}
 	}
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 8f474fa..2e8d9c8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -404,10 +404,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -430,7 +430,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -442,7 +443,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -466,11 +468,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -496,7 +498,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -508,7 +511,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -527,11 +531,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index d0e65b8..d8d7614 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,8 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c47..0c481cb 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,10 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -366,32 +368,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
-
-	ExecMaterializeSlot(slot);
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 
 	/*
-	 * get information on the (current) result relation
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
+
+	ExecMaterializeSlot(slot);
+
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -424,7 +442,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -459,7 +478,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -521,8 +541,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -582,7 +602,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -621,7 +642,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -707,6 +729,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -718,7 +741,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -728,10 +750,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1067,6 +1085,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1075,12 +1094,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1090,10 +1107,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1120,7 +1133,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1157,7 +1171,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1207,6 +1222,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1232,9 +1248,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1275,16 +1294,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1301,18 +1310,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
+			/* Tuple routing starts from the root table. */
+			Assert(mtstate->rootResultRelInfo != NULL);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1476,7 +1485,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1715,7 +1725,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1872,41 +1882,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
- *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -2016,10 +2022,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2068,17 +2072,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2111,7 +2104,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2156,7 +2148,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2239,25 +2230,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2269,15 +2256,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2298,7 +2279,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2341,14 +2321,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2390,7 +2364,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2414,8 +2387,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index f90a896..fae1598 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -215,7 +215,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -609,6 +608,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -632,6 +632,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -647,11 +648,10 @@ apply_handle_insert(StringInfo s)
 
 	/* For a partitioned table, insert the tuple into a partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, NULL, rel, CMD_INSERT);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, NULL,
+								   rel, CMD_INSERT);
 	else
-		apply_handle_insert_internal(estate->es_result_relation_info, estate,
-									 remoteslot);
+		apply_handle_insert_internal(resultRelInfo, estate, remoteslot);
 
 	PopActiveSnapshot();
 
@@ -674,7 +674,7 @@ apply_handle_insert_internal(ResultRelInfo *relinfo,
 	ExecOpenIndices(relinfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(relinfo, estate, remoteslot);
 
 	/* Cleanup. */
 	ExecCloseIndices(relinfo);
@@ -721,6 +721,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	EState	   *estate;
@@ -751,6 +752,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -782,11 +784,11 @@ apply_handle_update(StringInfo s)
 
 	/* For a partitioned table, apply update to correct partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, &newtup, rel, CMD_UPDATE);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, &newtup,
+								   rel, CMD_UPDATE);
 	else
-		apply_handle_update_internal(estate->es_result_relation_info, estate,
-									 remoteslot, &newtup, rel);
+		apply_handle_update_internal(resultRelInfo, estate, remoteslot,
+									 &newtup, rel);
 
 	PopActiveSnapshot();
 
@@ -838,7 +840,8 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -866,6 +869,7 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -892,6 +896,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -905,11 +910,11 @@ apply_handle_delete(StringInfo s)
 
 	/* For a partitioned table, apply delete to correct partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, NULL, rel, CMD_DELETE);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, NULL,
+								   rel, CMD_DELETE);
 	else
-		apply_handle_delete_internal(estate->es_result_relation_info, estate,
-									 remoteslot, &rel->remoterel);
+		apply_handle_delete_internal(resultRelInfo, estate, remoteslot,
+									 &rel->remoterel);
 
 	PopActiveSnapshot();
 
@@ -947,7 +952,7 @@ apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate,
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(relinfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -1055,7 +1060,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 	}
 	MemoryContextSwitchTo(oldctx);
 
-	estate->es_result_relation_info = partrelinfo;
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -1136,8 +1140,8 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					ExecOpenIndices(partrelinfo, false);
 
 					EvalPlanQualSetSlot(&epqstate, remoteslot_part);
-					ExecSimpleRelationUpdate(estate, &epqstate, localslot,
-											 remoteslot_part);
+					ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,
+											 localslot, remoteslot_part);
 					ExecCloseIndices(partrelinfo);
 					EvalPlanQualEnd(&epqstate);
 				}
@@ -1178,7 +1182,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					Assert(partrelinfo_new != partrelinfo);
 
 					/* DELETE old tuple found in the old partition. */
-					estate->es_result_relation_info = partrelinfo;
 					apply_handle_delete_internal(partrelinfo, estate,
 												 localslot,
 												 &relmapentry->remoterel);
@@ -1210,7 +1213,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 						slot_getallattrs(remoteslot);
 					}
 					MemoryContextSwitchTo(oldctx);
-					estate->es_result_relation_info = partrelinfo_new;
 					apply_handle_insert_internal(partrelinfo_new, estate,
 												 remoteslot_part);
 				}
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c7deeac..a4024b7 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -573,10 +573,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -593,10 +597,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 4ec4ebd..2518fe4 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,9 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0187989..246a3f3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -524,7 +524,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index eb9d45b..da50ee3 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index ffd4aac..963faa1 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
1.8.3.1

v11-0004-Refactor-transition-tuple-capture-code-a-bit.patchapplication/x-patch; name=v11-0004-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From d1be649eb56d7e83c064c502542256988eae6740 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v11 4/4] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 ++---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  19 ++-
 src/backend/executor/nodeModifyTable.c | 207 +++++++--------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 85 insertions(+), 210 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 735c621..891d935 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3058,32 +3058,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 672fccf..86adbee 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4292,9 +4293,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5388,14 +5387,24 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fb6ce49..d31f786 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -926,9 +926,22 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index dd97ef5..5b6c5d1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -372,10 +369,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1086,9 +1079,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1164,41 +1155,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1905,28 +1871,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1950,6 +1894,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1964,37 +1909,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -2010,58 +1935,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc);
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2157,17 +2030,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2336,6 +2198,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2358,8 +2221,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2369,6 +2237,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2434,6 +2309,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc));
 		resultRelInfo++;
 		i++;
 	}
@@ -2458,26 +2347,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a40ddf5..e38d732 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -46,7 +46,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -66,14 +66,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 246a3f3..41bf27c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -491,6 +491,12 @@ typedef struct ResultRelInfo
 
 	/* For use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1191,9 +1197,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
1.8.3.1

#48Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#47)
Re: partition routing layering in nodeModifyTable.c

On 13/07/2020 08:47, Amit Langote wrote:

It's been over 11 months since there was any significant commentary on
the contents of the patches themselves, so perhaps I should reiterate
what the patches are about and why it might still be a good idea to
consider them.

The thread started with some very valid criticism of the way
executor's partition tuple routing logic looks randomly sprinkled over
in nodeModifyTable.c, execPartition.c. In the process of making it
look less random, we decided to get rid of the global variable
es_result_relation_info to avoid complex maneuvers of
setting/resetting it correctly when performing partition tuple
routing, causing some other churn beside the partitioning code. Same
with another global variable TransitionCaptureState.tcs_map. So, the
patches neither add any new capabilities, nor improve performance, but
they do make the code in this area a bit easier to follow.

Actually, there is a problem that some of the changes here conflict
with patches being discussed on other threads ([1], [2]), so much so
that I decided to absorb some changes here into another "refactoring"
patch that I have posted at [2].

Thanks for the summary. It's been a bit hard to follow what depends on
what across these threads, and how they work together. It seems that
this patch set is the best place to start.

Attached rebased patches.

0001 contains preparatory FDW API changes to stop relying on
es_result_relation_info being set correctly.

Makes sense. The only thing I don't like about this is the way the
ForeignScan->resultRelIndex field is set. make_foreignscan() initializes
it to -1, and the FDW's PlanDirectModify() function is expected to set
it, like you did in postgres_fdw:

@@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root,
rebuild_fdw_scan_tlist(fscan, returningList);
}

+	/*
+	 * Set the index of the subplan result rel.
+	 */
+	fscan->resultRelIndex = subplan_index;
+
table_close(rel, NoLock);
return true;
}

It has to be set to that value (subplan_index is an argument to
PlanDirectModify()), the FDW doesn't have any choice there, so this is
just additional boilerplate code that has to be copied to every FDW that
implements direct modify. Furthermore, if the FDW doesn't set it
correctly, you could have some very interesting results, like updating
wrong table. It would be better to set it in make_modifytable(), just
after calling PlanDirectModify().

I'm also a bit unhappy with the way it's updated in set_plan_refs():

--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -904,6 +904,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
rc->rti += rtoffset;
rc->prti += rtoffset;
}
+				/*
+				 * Caution: Do not change the relative ordering of this loop
+				 * and the statement below that adds the result relations to
+				 * root->glob->resultRelations, because we need to use the
+				 * current value of list_length(root->glob->resultRelations)
+				 * in some plans.
+				 */
foreach(l, splan->plans)
{
lfirst(l) = set_plan_refs(root,
@@ -1243,6 +1250,14 @@ set_foreignscan_references(PlannerInfo *root,
}
fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+	/*
+	 * Adjust resultRelIndex if it's valid (note that we are called before
+	 * adding the RT indexes of ModifyTable result relations to the global
+	 * list)
+	 */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += list_length(root->glob->resultRelations);
}

/*

That "Caution" comment is well deserved, but could we make this more
robust to begin with? The most straightforward solution would be to pass
down the "current resultRelIndex" as an extra parameter to
set_plan_refs(), similar to rtoffset. If we did that, we wouldn't
actually need to set it before setrefs.c processing at all.

I'm a bit wary of adding another argument to set_plan_refs() because
that's a lot of code churn, but it does seem like the most natural
solution to me. Maybe create a new context struct to hold the
PlannerInfo, rtoffset, and the new "currentResultRelIndex" value,
similar to fix_scan_expr_context, to avoid passing through so many
arguments.

Another idea is to merge "resultRelIndex" and a "range table index" into
one value. Range table entries that are updated would have a
ResultRelInfo, others would not. I'm not sure if that would end up being
cleaner or messier than what we have now, but might be worth trying.

0002 removes es_result_relation_info in favor passing the active
result relation around as a parameter in the various functions that
need it

Looks good.

0003 Moves UPDATE tuple-routing logic into a new function

0004 removes the global variable TransitionCaptureState.tcs_map which
needed to be set/reset whenever the active result relation relation
changes in favor of a new field in ResultRelInfo to store the same map

I didn't look closely, but these make sense at a quick glance.

- Heikki

#49Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#48)
5 attachment(s)
Re: partition routing layering in nodeModifyTable.c

Hekki,

Thanks a lot for the review!

On Tue, Oct 6, 2020 at 12:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 13/07/2020 08:47, Amit Langote wrote:

It's been over 11 months since there was any significant commentary on
the contents of the patches themselves, so perhaps I should reiterate
what the patches are about and why it might still be a good idea to
consider them.

The thread started with some very valid criticism of the way
executor's partition tuple routing logic looks randomly sprinkled over
in nodeModifyTable.c, execPartition.c. In the process of making it
look less random, we decided to get rid of the global variable
es_result_relation_info to avoid complex maneuvers of
setting/resetting it correctly when performing partition tuple
routing, causing some other churn beside the partitioning code. Same
with another global variable TransitionCaptureState.tcs_map. So, the
patches neither add any new capabilities, nor improve performance, but
they do make the code in this area a bit easier to follow.

Actually, there is a problem that some of the changes here conflict
with patches being discussed on other threads ([1], [2]), so much so
that I decided to absorb some changes here into another "refactoring"
patch that I have posted at [2].

Thanks for the summary. It's been a bit hard to follow what depends on
what across these threads, and how they work together. It seems that
this patch set is the best place to start.

Great. I'd be happy if I will have one less set of patches to keep at home. :-)

Attached rebased patches.

0001 contains preparatory FDW API changes to stop relying on
es_result_relation_info being set correctly.

Makes sense. The only thing I don't like about this is the way the
ForeignScan->resultRelIndex field is set. make_foreignscan() initializes
it to -1, and the FDW's PlanDirectModify() function is expected to set
it, like you did in postgres_fdw:

@@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root,
rebuild_fdw_scan_tlist(fscan, returningList);
}

+     /*
+      * Set the index of the subplan result rel.
+      */
+     fscan->resultRelIndex = subplan_index;
+
table_close(rel, NoLock);
return true;
}

It has to be set to that value (subplan_index is an argument to
PlanDirectModify()), the FDW doesn't have any choice there, so this is
just additional boilerplate code that has to be copied to every FDW that
implements direct modify. Furthermore, if the FDW doesn't set it
correctly, you could have some very interesting results, like updating
wrong table. It would be better to set it in make_modifytable(), just
after calling PlanDirectModify().

Actually, that's how it was done in earlier iterations but I think I
decided to move that into the FDW's functions due to some concern of
one of the other patches that depended on this patch. Maybe it makes
sense to bring that back into make_modifytable() and worry about the
other patch later.

I'm also a bit unhappy with the way it's updated in set_plan_refs():

--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -904,6 +904,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
rc->rti += rtoffset;
rc->prti += rtoffset;
}
+                             /*
+                              * Caution: Do not change the relative ordering of this loop
+                              * and the statement below that adds the result relations to
+                              * root->glob->resultRelations, because we need to use the
+                              * current value of list_length(root->glob->resultRelations)
+                              * in some plans.
+                              */
foreach(l, splan->plans)
{
lfirst(l) = set_plan_refs(root,
@@ -1243,6 +1250,14 @@ set_foreignscan_references(PlannerInfo *root,
}
fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+     /*
+      * Adjust resultRelIndex if it's valid (note that we are called before
+      * adding the RT indexes of ModifyTable result relations to the global
+      * list)
+      */
+     if (fscan->resultRelIndex >= 0)
+             fscan->resultRelIndex += list_length(root->glob->resultRelations);
}

/*

That "Caution" comment is well deserved, but could we make this more
robust to begin with? The most straightforward solution would be to pass
down the "current resultRelIndex" as an extra parameter to
set_plan_refs(), similar to rtoffset. If we did that, we wouldn't
actually need to set it before setrefs.c processing at all.

Hmm, I don't think I understand the last sentence. A given
ForeignScan node's resultRelIndex will have to be set before getting
to set_plan_refs(). I mean we shouldn't be making it a job of
setrefs.c to figure out which ForeignScan nodes need to have its
resultRelIndex set to a valid value.

I'm a bit wary of adding another argument to set_plan_refs() because
that's a lot of code churn, but it does seem like the most natural
solution to me. Maybe create a new context struct to hold the
PlannerInfo, rtoffset, and the new "currentResultRelIndex" value,
similar to fix_scan_expr_context, to avoid passing through so many
arguments.

I like the idea of a context struct. I've implemented it as a
separate refactoring patch (0001) and 0002 (what was before 0001)
extends it for "current ResultRelIndex", although I used the name
rroffset for "current ResultRelIndex" to go along with rtoffset.

Another idea is to merge "resultRelIndex" and a "range table index" into
one value. Range table entries that are updated would have a
ResultRelInfo, others would not. I'm not sure if that would end up being
cleaner or messier than what we have now, but might be worth trying.

I have thought about something like this before. An idea I had is to
make es_result_relations array indexable by plain RT indexes, then we
don't need to maintain separate indexes that we do today for result
relations.

0002 removes es_result_relation_info in favor passing the active
result relation around as a parameter in the various functions that
need it

Looks good.

0003 Moves UPDATE tuple-routing logic into a new function

0004 removes the global variable TransitionCaptureState.tcs_map which
needed to be set/reset whenever the active result relation relation
changes in favor of a new field in ResultRelInfo to store the same map

I didn't look closely, but these make sense at a quick glance.

Updated patches attached.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v12-0001-Refactor-set_plan_refs.patchapplication/octet-stream; name=v12-0001-Refactor-set_plan_refs.patchDownload
From ddccecf06b38ba950850c4528aa3d37fcf2b860f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 7 Oct 2020 17:34:49 +0900
Subject: [PATCH v12 1/5] Refactor set_plan_refs()

Pass the information needed by set_plan_refs() in a "context" struct
instead of directly as arguments.  This will allow to expand the set
of information that can be passed without much code churn.
---
 src/backend/optimizer/plan/setrefs.c | 83 ++++++++++++++++++------------------
 1 file changed, 41 insertions(+), 42 deletions(-)

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dd8e2e9..e647f2d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -71,6 +71,12 @@ typedef struct
 	double		num_exec;
 } fix_upper_expr_context;
 
+typedef struct
+{
+	PlannerInfo	   *root;
+	int				rtoffset;	/* offset in root->glob->finalrtable */
+} set_plan_refs_context;
+
 /*
  * Selecting the best alternative in an AlternativeSubPlan expression requires
  * estimating how many times that expression will be evaluated.  For an
@@ -108,7 +114,7 @@ static void add_rtes_to_flat_rtable(PlannerInfo *root, bool recursing);
 static void flatten_unplanned_rtes(PlannerGlobal *glob, RangeTblEntry *rte);
 static bool flatten_rtes_walker(Node *node, PlannerGlobal *glob);
 static void add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte);
-static Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
+static Plan *set_plan_refs(Plan *plan, set_plan_refs_context *context);
 static Plan *set_indexonlyscan_references(PlannerInfo *root,
 										  IndexOnlyScan *plan,
 										  int rtoffset);
@@ -120,15 +126,12 @@ static Plan *clean_up_removed_plan_level(Plan *parent, Plan *child);
 static void set_foreignscan_references(PlannerInfo *root,
 									   ForeignScan *fscan,
 									   int rtoffset);
-static void set_customscan_references(PlannerInfo *root,
-									  CustomScan *cscan,
-									  int rtoffset);
-static Plan *set_append_references(PlannerInfo *root,
-								   Append *aplan,
-								   int rtoffset);
-static Plan *set_mergeappend_references(PlannerInfo *root,
-										MergeAppend *mplan,
-										int rtoffset);
+static void set_customscan_references(CustomScan *cscan,
+									  set_plan_refs_context *context);
+static Plan *set_append_references(Append *aplan,
+								   set_plan_refs_context *context);
+static Plan *set_mergeappend_references(MergeAppend *mplan,
+										set_plan_refs_context *context);
 static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
 static Relids offset_relid_set(Relids relids, int rtoffset);
 static Node *fix_scan_expr(PlannerInfo *root, Node *node,
@@ -252,6 +255,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
 	PlannerGlobal *glob = root->glob;
 	int			rtoffset = list_length(glob->finalrtable);
 	ListCell   *lc;
+	set_plan_refs_context	context;
 
 	/*
 	 * Add all the query's RTEs to the flattened rangetable.  The live ones
@@ -302,7 +306,9 @@ set_plan_references(PlannerInfo *root, Plan *plan)
 	}
 
 	/* Now fix the Plan tree */
-	return set_plan_refs(root, plan, rtoffset);
+	context.root = root;
+	context.rtoffset = rtoffset;
+	return set_plan_refs(plan, &context);
 }
 
 /*
@@ -496,8 +502,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
  * set_plan_refs: recurse through the Plan nodes of a single subquery level
  */
 static Plan *
-set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
+set_plan_refs(Plan *plan, set_plan_refs_context *context)
 {
+	PlannerInfo *root = context->root;
+	int			rtoffset = context->rtoffset;
 	ListCell   *l;
 
 	if (plan == NULL)
@@ -714,7 +722,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			set_foreignscan_references(root, (ForeignScan *) plan, rtoffset);
 			break;
 		case T_CustomScan:
-			set_customscan_references(root, (CustomScan *) plan, rtoffset);
+			set_customscan_references((CustomScan *) plan, context);
 			break;
 
 		case T_NestLoop:
@@ -968,9 +976,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				}
 				foreach(l, splan->plans)
 				{
-					lfirst(l) = set_plan_refs(root,
-											  (Plan *) lfirst(l),
-											  rtoffset);
+					lfirst(l) = set_plan_refs((Plan *) lfirst(l), context);
 				}
 
 				/*
@@ -1001,14 +1007,10 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			break;
 		case T_Append:
 			/* Needs special treatment, see comments below */
-			return set_append_references(root,
-										 (Append *) plan,
-										 rtoffset);
+			return set_append_references((Append *) plan, context);
 		case T_MergeAppend:
 			/* Needs special treatment, see comments below */
-			return set_mergeappend_references(root,
-											  (MergeAppend *) plan,
-											  rtoffset);
+			return set_mergeappend_references((MergeAppend *) plan, context);
 		case T_RecursiveUnion:
 			/* This doesn't evaluate targetlist or check quals either */
 			set_dummy_tlist_references(plan, rtoffset);
@@ -1023,9 +1025,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				Assert(splan->plan.qual == NIL);
 				foreach(l, splan->bitmapplans)
 				{
-					lfirst(l) = set_plan_refs(root,
-											  (Plan *) lfirst(l),
-											  rtoffset);
+					lfirst(l) = set_plan_refs((Plan *) lfirst(l), context);
 				}
 			}
 			break;
@@ -1038,9 +1038,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				Assert(splan->plan.qual == NIL);
 				foreach(l, splan->bitmapplans)
 				{
-					lfirst(l) = set_plan_refs(root,
-											  (Plan *) lfirst(l),
-											  rtoffset);
+					lfirst(l) = set_plan_refs((Plan *) lfirst(l), context);
 				}
 			}
 			break;
@@ -1058,8 +1056,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 	 * reference-adjustments bottom-up, then we would fail to match this
 	 * plan's var nodes against the already-modified nodes of the children.
 	 */
-	plan->lefttree = set_plan_refs(root, plan->lefttree, rtoffset);
-	plan->righttree = set_plan_refs(root, plan->righttree, rtoffset);
+	plan->lefttree = set_plan_refs(plan->lefttree, context);
+	plan->righttree = set_plan_refs(plan->righttree, context);
 
 	return plan;
 }
@@ -1328,10 +1326,11 @@ set_foreignscan_references(PlannerInfo *root,
  *	   Do set_plan_references processing on a CustomScan
  */
 static void
-set_customscan_references(PlannerInfo *root,
-						  CustomScan *cscan,
-						  int rtoffset)
+set_customscan_references(CustomScan *cscan,
+						  set_plan_refs_context *context)
 {
+	PlannerInfo *root = context->root;
+	int			rtoffset = context->rtoffset;
 	ListCell   *lc;
 
 	/* Adjust scanrelid if it's valid */
@@ -1387,7 +1386,7 @@ set_customscan_references(PlannerInfo *root,
 	/* Adjust child plan-nodes recursively, if needed */
 	foreach(lc, cscan->custom_plans)
 	{
-		lfirst(lc) = set_plan_refs(root, (Plan *) lfirst(lc), rtoffset);
+		lfirst(lc) = set_plan_refs((Plan *) lfirst(lc), context);
 	}
 
 	cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
@@ -1401,10 +1400,10 @@ set_customscan_references(PlannerInfo *root,
  * to do the normal processing on it.
  */
 static Plan *
-set_append_references(PlannerInfo *root,
-					  Append *aplan,
-					  int rtoffset)
+set_append_references(Append *aplan,
+					  set_plan_refs_context *context)
 {
+	int			rtoffset = context->rtoffset;
 	ListCell   *l;
 
 	/*
@@ -1417,7 +1416,7 @@ set_append_references(PlannerInfo *root,
 	/* First, we gotta recurse on the children */
 	foreach(l, aplan->appendplans)
 	{
-		lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
+		lfirst(l) = set_plan_refs((Plan *) lfirst(l), context);
 	}
 
 	/* Now, if there's just one, forget the Append and return that child */
@@ -1465,10 +1464,10 @@ set_append_references(PlannerInfo *root,
  * to do the normal processing on it.
  */
 static Plan *
-set_mergeappend_references(PlannerInfo *root,
-						   MergeAppend *mplan,
-						   int rtoffset)
+set_mergeappend_references(MergeAppend *mplan,
+						   set_plan_refs_context *context)
 {
+	int			rtoffset = context->rtoffset;
 	ListCell   *l;
 
 	/*
@@ -1481,7 +1480,7 @@ set_mergeappend_references(PlannerInfo *root,
 	/* First, we gotta recurse on the children */
 	foreach(l, mplan->mergeplans)
 	{
-		lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
+		lfirst(l) = set_plan_refs((Plan *) lfirst(l), context);
 	}
 
 	/* Now, if there's just one, forget the MergeAppend and return that child */
-- 
1.8.3.1

v12-0005-Refactor-transition-tuple-capture-code-a-bit.patchapplication/octet-stream; name=v12-0005-Refactor-transition-tuple-capture-code-a-bit.patchDownload
From f92989ca32a771ac598381b657e6c3bb1bf2241e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v12 5/5] Refactor transition tuple capture code a bit

In the case of inherited update and partitioned table inserts,
a child tuple needs to be converted back into the root table format.
The tuple conversion map needed to do that was previously stored in
ModifyTableState and adjusted every time the child relation changed,
an arrangement which is a bit cumbersome to maintain.  Instead save
the map in the child result relation's ResultRelInfo.

This allows to get rid of a bunch of code that was needed to
manipulate tcs_map.
---
 src/backend/commands/copy.c            |  31 ++---
 src/backend/commands/trigger.c         |  19 ++-
 src/backend/executor/execPartition.c   |  19 ++-
 src/backend/executor/nodeModifyTable.c | 207 +++++++--------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |   9 +-
 6 files changed, 85 insertions(+), 210 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 092c4e4..56efa58 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3113,32 +3113,15 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition, we can just use the original
+			 * unconverted tuple instead of converting the tuple in partition
+			 * format back to root format.  We must do the conversion if such
+			 * triggers exist because they may change the tuple.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 672fccf..86adbee 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4292,9 +4293,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * If there are no triggers in 'trigdesc' that request relevant transition
  * tables, then return NULL.
  *
- * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * The resulting object can be passed to the ExecAR* functions.
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5388,14 +5387,24 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	 */
 	if (row_trigger && transition_capture != NULL)
 	{
-		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleTableSlot *original_insert_tuple;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ?
+								pinfo->pi_PartitionToRootMap :
+								relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
 		bool		insert_new_table = transition_capture->tcs_insert_new_table;
 
 		/*
+		 * Get the originally inserted tuple from TransitionCaptureState and
+		 * set the variable to NULL so that the same tuple is not read again.
+		 */
+		original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		transition_capture->tcs_original_insert_tuple = NULL;
+
+		/*
 		 * For INSERT events NEW should be non-NULL, for DELETE events OLD
 		 * should be non-NULL, whereas for UPDATE events normally both OLD and
 		 * NEW are non-NULL.  But for UPDATE events fired for capturing
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 33d2c6f..3069729 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -983,9 +983,22 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be an UPDATE result relation, the map
+		 * has already been initialized by ExecInitModifyTable(); use that one
+		 * instead of building one from scratch.  To distinguish UPDATE result
+		 * relations from tuple-routing result relations, we rely on the fact
+		 * that each of the former has a distinct RT index.
+		 */
+		if (node && node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap =
+				partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6fcc9bd..a92d2c9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -372,10 +369,6 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
- *
- *		This may change the currently active tuple conversion map in
- *		mtstate->mt_transition_capture, so the callers must take care to
- *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
@@ -1086,9 +1079,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
-	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
+	TupleConversionMap *tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1164,41 +1155,16 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 		}
 	}
 
-	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
-	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
-	if (mtstate->mt_transition_capture)
-	{
-		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
-
 	/* We're done moving. */
 	return true;
 }
@@ -1905,28 +1871,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1950,6 +1894,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1964,37 +1909,17 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE
+	 * triggers on the partition, we can just use the original
+	 * unconverted tuple instead of converting the tuple in partition
+	 * format back to root format.  We must do the conversion if such
+	 * triggers exist because they may change the tuple.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -2010,58 +1935,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc);
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2157,17 +2030,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2336,6 +2198,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRel;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2358,8 +2221,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	/* If modifying a partitioned table, initialize the root table info */
 	if (node->rootResultRelIndex >= 0)
+	{
 		mtstate->rootResultRelInfo = estate->es_root_result_relations +
 			node->rootResultRelIndex;
+		rootResultRel = mtstate->rootResultRelInfo;
+	}
+	else
+		rootResultRel = mtstate->resultRelInfo;
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2369,6 +2237,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2434,6 +2309,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRel->ri_RelationDesc));
 		resultRelInfo++;
 		i++;
 	}
@@ -2458,26 +2347,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a40ddf5..e38d732 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -46,7 +46,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -66,14 +66,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8073a81..e5e0409 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -488,6 +488,12 @@ typedef struct ResultRelInfo
 
 	/* for use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child sublan tuples to root parent format, set only if
+	 * either update row movement or transition tuple capture is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1178,9 +1184,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
1.8.3.1

v12-0002-Include-result-relation-index-if-any-in-ForeignS.patchapplication/octet-stream; name=v12-0002-Include-result-relation-index-if-any-in-ForeignS.patchDownload
From 551c8b59925a09ea399c10d364fd52d978e2d456 Mon Sep 17 00:00:00 2001
From: Etsuro Fujita <efujita@postgresql.org>
Date: Thu, 8 Aug 2019 21:41:12 +0900
Subject: [PATCH v12 2/5] Include result relation index if any in ForeignScan

FDWs that can perform an UPDATE/DELETE remotely using the "direct
modify" set of APIs need in some cases to access the result relation
properties for which they can currently look at
EState.es_result_relation_info, which the core executor laboriously
makes sure is set correctly.  An upcoming patch will remove that
field from EState.  So this commit installs a new field resultRelIndex
in ForeignScan node which will be set by the core planner for an FDW
to peruse during a "direct modification" operation.

Amit Langote, Etsuro Fujita
---
 contrib/postgres_fdw/postgres_fdw.c     | 26 ++++++++++++++++++--------
 doc/src/sgml/fdwhandler.sgml            | 10 ++++++----
 src/backend/executor/nodeForeignscan.c  |  5 ++++-
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c | 13 +++++++++++++
 src/backend/optimizer/plan/setrefs.c    | 17 +++++++++++++----
 src/include/nodes/plannodes.h           |  8 ++++++++
 9 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce..13e256f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -200,6 +200,9 @@ typedef struct PgFdwDirectModifyState
 	Relation	rel;			/* relcache entry for the foreign table */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
+	int			resultRelIndex;	/* index of ResultRelInfo for the foreign table
+								 * in EState.es_result_relations */
+
 	/* extracted fdw_private data */
 	char	   *query;			/* text of UPDATE/DELETE command */
 	bool		has_returning;	/* is there a RETURNING clause? */
@@ -446,11 +449,12 @@ static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
 static void execute_dml_stmt(ForeignScanState *node);
-static TupleTableSlot *get_returning_data(ForeignScanState *node);
+static TupleTableSlot *get_returning_data(ForeignScanState *node, ResultRelInfo *resultRelInfo);
 static void init_returning_filter(PgFdwDirectModifyState *dmstate,
 								  List *fdw_scan_tlist,
 								  Index rtindex);
 static TupleTableSlot *apply_returning_filter(PgFdwDirectModifyState *dmstate,
+											  ResultRelInfo *relInfo,
 											  TupleTableSlot *slot,
 											  EState *estate);
 static void prepare_query_params(PlanState *node,
@@ -2331,6 +2335,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 {
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
+	List	   *resultRelations = estate->es_plannedstmt->resultRelations;
 	PgFdwDirectModifyState *dmstate;
 	Index		rtindex;
 	RangeTblEntry *rte;
@@ -2355,7 +2360,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	Assert(fsplan->resultRelIndex >= 0);
+	dmstate->resultRelIndex = fsplan->resultRelIndex;
+	rtindex = list_nth_int(resultRelations, fsplan->resultRelIndex);
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2450,7 +2457,10 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = &estate->es_result_relations[dmstate->resultRelIndex];
+
+	/* The executor must have initialized the ResultRelInfo for us. */
+	Assert(resultRelInfo != NULL);
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -2482,7 +2492,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 	/*
 	 * Get the next RETURNING tuple.
 	 */
-	return get_returning_data(node);
+	return get_returning_data(node, resultRelInfo);
 }
 
 /*
@@ -4082,11 +4092,10 @@ execute_dml_stmt(ForeignScanState *node)
  * Get the result of a RETURNING clause.
  */
 static TupleTableSlot *
-get_returning_data(ForeignScanState *node)
+get_returning_data(ForeignScanState *node, ResultRelInfo *resultRelInfo)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4141,7 +4150,8 @@ get_returning_data(ForeignScanState *node)
 		if (dmstate->rel)
 			resultSlot = slot;
 		else
-			resultSlot = apply_returning_filter(dmstate, slot, estate);
+			resultSlot = apply_returning_filter(dmstate, resultRelInfo, slot,
+												estate);
 	}
 	dmstate->next_tuple++;
 
@@ -4230,10 +4240,10 @@ init_returning_filter(PgFdwDirectModifyState *dmstate,
  */
 static TupleTableSlot *
 apply_returning_filter(PgFdwDirectModifyState *dmstate,
+					   ResultRelInfo *relInfo,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 72fa127..1f6de9b 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -893,8 +893,9 @@ BeginDirectModify(ForeignScanState *node,
      its <structfield>fdw_state</structfield> field is still NULL.  Information about
      the table to modify is accessible through the
      <structname>ForeignScanState</structname> node (in particular, from the underlying
-     <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
+     <structname>ForeignScan</structname> plan node, which contains an integer field
+     giving the table's index in the query's list of result relations along with any
+     FDW-private information provided by <function>PlanDirectModify</function>.
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -926,8 +927,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     obtained using the information passed to <function>BeginDirectModify</function>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471a..19433b3 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -221,10 +221,13 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan or the direct modification.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		Assert(node->resultRelIndex >= 0);
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0409a40..de57744 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -761,6 +761,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelIndex);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f038648..fe10e5d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -698,6 +698,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelIndex);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab..4024a80 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2017,6 +2017,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelIndex);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 3d7a4e3..b157848 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5541,6 +5541,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* resultRelIndex will be set by make_modifytable(), if needed */
+	node->resultRelIndex = -1;
 
 	return node;
 }
@@ -6900,7 +6902,18 @@ make_modifytable(PlannerInfo *root,
 			!has_stored_generated_columns(subroot, rti))
 			direct_modify = fdwroutine->PlanDirectModify(subroot, node, rti, i);
 		if (direct_modify)
+		{
+			ForeignScan   *fscan = (ForeignScan *) list_nth(node->plans, i);
+
+			/*
+			 * For result relations that will be modified directly, the FDW
+			 * needs to know where to find them.
+			 */
+			Assert(IsA(fscan, ForeignScan));
+			fscan->resultRelIndex = i;
+
 			direct_modify_plans = bms_add_member(direct_modify_plans, i);
+		}
 
 		if (!direct_modify &&
 			fdwroutine != NULL &&
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e647f2d..577a911 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -75,6 +75,7 @@ typedef struct
 {
 	PlannerInfo	   *root;
 	int				rtoffset;	/* offset in root->glob->finalrtable */
+	int				rroffset;	/* offset in root->glob->resultRelations */
 } set_plan_refs_context;
 
 /*
@@ -125,7 +126,7 @@ static bool trivial_subqueryscan(SubqueryScan *plan);
 static Plan *clean_up_removed_plan_level(Plan *parent, Plan *child);
 static void set_foreignscan_references(PlannerInfo *root,
 									   ForeignScan *fscan,
-									   int rtoffset);
+									   int rtoffset, int rroffset);
 static void set_customscan_references(CustomScan *cscan,
 									  set_plan_refs_context *context);
 static Plan *set_append_references(Append *aplan,
@@ -254,6 +255,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
 {
 	PlannerGlobal *glob = root->glob;
 	int			rtoffset = list_length(glob->finalrtable);
+	int			rroffset = list_length(glob->resultRelations);
 	ListCell   *lc;
 	set_plan_refs_context	context;
 
@@ -308,6 +310,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
 	/* Now fix the Plan tree */
 	context.root = root;
 	context.rtoffset = rtoffset;
+	context.rroffset = rroffset;
 	return set_plan_refs(plan, &context);
 }
 
@@ -506,6 +509,7 @@ set_plan_refs(Plan *plan, set_plan_refs_context *context)
 {
 	PlannerInfo *root = context->root;
 	int			rtoffset = context->rtoffset;
+	int			rroffset = context->rroffset;
 	ListCell   *l;
 
 	if (plan == NULL)
@@ -719,7 +723,8 @@ set_plan_refs(Plan *plan, set_plan_refs_context *context)
 			}
 			break;
 		case T_ForeignScan:
-			set_foreignscan_references(root, (ForeignScan *) plan, rtoffset);
+			set_foreignscan_references(root, (ForeignScan *) plan, rtoffset,
+									   rroffset);
 			break;
 		case T_CustomScan:
 			set_customscan_references((CustomScan *) plan, context);
@@ -985,7 +990,7 @@ set_plan_refs(Plan *plan, set_plan_refs_context *context)
 				 * resultRelIndex to reflect their starting position in the
 				 * global list.
 				 */
-				splan->resultRelIndex = list_length(root->glob->resultRelations);
+				splan->resultRelIndex = rroffset;
 				root->glob->resultRelations =
 					list_concat(root->glob->resultRelations,
 								splan->resultRelations);
@@ -1250,7 +1255,7 @@ clean_up_removed_plan_level(Plan *parent, Plan *child)
 static void
 set_foreignscan_references(PlannerInfo *root,
 						   ForeignScan *fscan,
-						   int rtoffset)
+						   int rtoffset, int rroffset)
 {
 	/* Adjust scanrelid if it's valid */
 	if (fscan->scan.scanrelid > 0)
@@ -1319,6 +1324,10 @@ set_foreignscan_references(PlannerInfo *root,
 	}
 
 	fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+	/* Adjust resultRelIndex if needed */
+	if (fscan->resultRelIndex >= 0)
+		fscan->resultRelIndex += rroffset;
 }
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e0107..b0a3924 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -607,6 +607,11 @@ typedef struct WorkTableScan
  * When the plan node represents a foreign join, scan.scanrelid is zero and
  * fs_relids must be consulted to identify the join relation.  (fs_relids
  * is valid for simple scans as well, but will always match scan.scanrelid.)
+ *
+ * If an FDW's PlanDirectModify() callback decides to repurpose a ForeignScan
+ * node to store the information about an UPDATE or DELETE operation to
+ * to perform on a given foreign table result relation, resultRelIndex is set
+ * to identify the result relation.
  * ----------------
  */
 typedef struct ForeignScan
@@ -620,6 +625,9 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	int			resultRelIndex;	/* index of foreign table in the list of query
+								 * result relations for UPDATE/DELETE; -1 for
+								 * other query types */
 } ForeignScan;
 
 /* ----------------
-- 
1.8.3.1

v12-0004-Rearrange-partition-update-row-movement-code-a-b.patchapplication/octet-stream; name=v12-0004-Rearrange-partition-update-row-movement-code-a-b.patchDownload
From 9f8d9a8a851c383fa6656407a3b9719ffa48df0d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v12 4/5] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 221a82f..6fcc9bd 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -389,7 +389,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -655,31 +654,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -745,7 +743,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -989,32 +986,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1061,6 +1056,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1216,119 +1358,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
-
-			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
-			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (!tuple_deleted)
-			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
-			}
-
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
1.8.3.1

v12-0003-Remove-es_result_relation_info.patchapplication/octet-stream; name=v12-0003-Remove-es_result_relation_info.patchDownload
From 9573e3a38c43e2e2529e3b6300c8f654cae686ec Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v12 3/5] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  18 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   5 -
 src/backend/executor/execReplication.c   |  24 ++--
 src/backend/executor/execUtils.c         |   2 -
 src/backend/executor/nodeModifyTable.c   | 193 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  44 +++----
 src/include/executor/executor.h          |  19 ++-
 src/include/executor/nodeModifyTable.h   |   4 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 146 insertions(+), 186 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3c7dbad..092c4e4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2489,9 +2489,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2524,7 +2521,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2844,7 +2842,6 @@ CopyFrom(CopyState cstate)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	ExecInitRangeTable(estate, cstate->range_table);
 
@@ -3116,11 +3113,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3224,7 +3216,8 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot, CMD_INSERT);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot,
+											   CMD_INSERT);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3295,7 +3288,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e0ac4e0..6277adf 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1815,7 +1815,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1945,7 +1944,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 1862af6..e1d34be 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2e27e26..eec5d6b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -857,9 +857,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		estate->es_result_relations = resultRelInfos;
 		estate->es_num_result_relations = numResultRelations;
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
 		/*
 		 * In the partitioned result relation case, also build ResultRelInfos
 		 * for all the partitioned table roots, because we will need them to
@@ -903,7 +900,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		 */
 		estate->es_result_relations = NULL;
 		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
 		estate->es_root_result_relations = NULL;
 		estate->es_num_root_result_relations = 0;
 	}
@@ -2805,7 +2801,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 			rcestate->es_num_root_result_relations = numRootResultRels;
 		}
 	}
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index b29db7b..01d2688 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -404,10 +404,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -430,7 +430,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -442,7 +443,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -466,11 +468,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -496,7 +498,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -508,7 +511,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -527,11 +531,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index d0e65b8..d8d7614 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,8 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_num_result_relations = 0;
-	estate->es_result_relation_info = NULL;
-
 	estate->es_root_result_relations = NULL;
 	estate->es_num_root_result_relations = 0;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9812089..221a82f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,10 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -366,32 +368,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
-
-	ExecMaterializeSlot(slot);
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 
 	/*
-	 * get information on the (current) result relation
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
+
+	ExecMaterializeSlot(slot);
+
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -424,7 +442,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -459,7 +478,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -521,8 +541,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -582,7 +602,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -621,7 +642,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -707,6 +729,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -718,7 +741,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -728,10 +750,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1067,6 +1085,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1075,12 +1094,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1090,10 +1107,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1120,7 +1133,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1157,7 +1171,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1207,6 +1222,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1232,9 +1248,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1275,16 +1294,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1301,18 +1310,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
+			/* Tuple routing starts from the root table. */
+			Assert(mtstate->rootResultRelInfo != NULL);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1476,7 +1485,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1715,7 +1725,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1872,41 +1882,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
- *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -2016,10 +2022,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2068,17 +2072,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2111,7 +2104,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2156,7 +2148,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2239,25 +2230,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2269,15 +2256,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2298,7 +2279,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l;
@@ -2341,14 +2321,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	foreach(l, node->plans)
@@ -2390,7 +2364,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2414,8 +2387,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9c6fdee..ed3ba86 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -361,7 +361,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -1150,6 +1149,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -1176,6 +1176,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -1191,11 +1192,10 @@ apply_handle_insert(StringInfo s)
 
 	/* For a partitioned table, insert the tuple into a partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, NULL, rel, CMD_INSERT);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, NULL,
+								   rel, CMD_INSERT);
 	else
-		apply_handle_insert_internal(estate->es_result_relation_info, estate,
-									 remoteslot);
+		apply_handle_insert_internal(resultRelInfo, estate, remoteslot);
 
 	PopActiveSnapshot();
 
@@ -1218,7 +1218,7 @@ apply_handle_insert_internal(ResultRelInfo *relinfo,
 	ExecOpenIndices(relinfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(relinfo, estate, remoteslot);
 
 	/* Cleanup. */
 	ExecCloseIndices(relinfo);
@@ -1265,6 +1265,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	EState	   *estate;
@@ -1298,6 +1299,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -1337,11 +1339,11 @@ apply_handle_update(StringInfo s)
 
 	/* For a partitioned table, apply update to correct partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, &newtup, rel, CMD_UPDATE);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, &newtup,
+								   rel, CMD_UPDATE);
 	else
-		apply_handle_update_internal(estate->es_result_relation_info, estate,
-									 remoteslot, &newtup, rel);
+		apply_handle_update_internal(resultRelInfo, estate, remoteslot,
+									 &newtup, rel);
 
 	PopActiveSnapshot();
 
@@ -1392,7 +1394,8 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -1420,6 +1423,7 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -1449,6 +1453,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = &estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -1462,11 +1467,11 @@ apply_handle_delete(StringInfo s)
 
 	/* For a partitioned table, apply delete to correct partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, NULL, rel, CMD_DELETE);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, NULL,
+								   rel, CMD_DELETE);
 	else
-		apply_handle_delete_internal(estate->es_result_relation_info, estate,
-									 remoteslot, &rel->remoterel);
+		apply_handle_delete_internal(resultRelInfo, estate, remoteslot,
+									 &rel->remoterel);
 
 	PopActiveSnapshot();
 
@@ -1504,7 +1509,7 @@ apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate,
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(relinfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -1612,7 +1617,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 	}
 	MemoryContextSwitchTo(oldctx);
 
-	estate->es_result_relation_info = partrelinfo;
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -1693,8 +1697,8 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					ExecOpenIndices(partrelinfo, false);
 
 					EvalPlanQualSetSlot(&epqstate, remoteslot_part);
-					ExecSimpleRelationUpdate(estate, &epqstate, localslot,
-											 remoteslot_part);
+					ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,
+											 localslot, remoteslot_part);
 					ExecCloseIndices(partrelinfo);
 					EvalPlanQualEnd(&epqstate);
 				}
@@ -1735,7 +1739,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					Assert(partrelinfo_new != partrelinfo);
 
 					/* DELETE old tuple found in the old partition. */
-					estate->es_result_relation_info = partrelinfo;
 					apply_handle_delete_internal(partrelinfo, estate,
 												 localslot,
 												 &relmapentry->remoterel);
@@ -1767,7 +1770,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 						slot_getallattrs(remoteslot);
 					}
 					MemoryContextSwitchTo(oldctx);
-					estate->es_result_relation_info = partrelinfo_new;
 					apply_handle_insert_internal(partrelinfo_new, estate,
 												 remoteslot_part);
 				}
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117..719c66f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -573,10 +573,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -593,10 +597,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 4ec4ebd..2518fe4 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,9 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ef448d6..8073a81 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -521,7 +521,6 @@ typedef struct EState
 	/* Info about target table(s) for insert/update/delete queries: */
 	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
 	int			es_num_result_relations;	/* length of array */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	/*
 	 * Info about the partition root table(s) for insert/update/delete queries
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index eb9d45b..da50ee3 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index ffd4aac..963faa1 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
1.8.3.1

#50Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#49)
Re: partition routing layering in nodeModifyTable.c

On 07/10/2020 12:50, Amit Langote wrote:

On Tue, Oct 6, 2020 at 12:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It would be better to set it in make_modifytable(), just
after calling PlanDirectModify().

Actually, that's how it was done in earlier iterations but I think I
decided to move that into the FDW's functions due to some concern of
one of the other patches that depended on this patch. Maybe it makes
sense to bring that back into make_modifytable() and worry about the
other patch later.

On second thoughts, I take back my earlier comment. Setting it in
make_modifytable() relies on the assumption that the subplan is a single
ForeignScan node, on the target relation. The documentation for
PlanDirectModify says:

To execute the direct modification on the remote server, this
function must rewrite the target subplan with a ForeignScan plan node
that executes the direct modification on the remote server.

So I guess that assumption is safe. But I'd like to have some wiggle
room here. Wouldn't it be OK to have a Result node on top of the
ForeignScan, for example? If it really must be a simple ForeignScan
node, the PlanDirectModify API seems pretty strange.

I'm not entirely sure what I would like to do with this now. I could
live with either version, but I'm not totally happy with either. (I like
your suggestion below)

Looking at this block in postgresBeginDirectModify:

/*
* Identify which user to do the remote access as. This should match what
* ExecCheckRTEPerms() does.
*/
Assert(fsplan->resultRelIndex >= 0);
dmstate->resultRelIndex = fsplan->resultRelIndex;
rtindex = list_nth_int(resultRelations, fsplan->resultRelIndex);
rte = exec_rt_fetch(rtindex, estate);
userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();

That's a complicated way of finding out the target table's RTI. We
should probably store the result RTI in the ForeignScan in the first place.

Another idea is to merge "resultRelIndex" and a "range table index" into
one value. Range table entries that are updated would have a
ResultRelInfo, others would not. I'm not sure if that would end up being
cleaner or messier than what we have now, but might be worth trying.

I have thought about something like this before. An idea I had is to
make es_result_relations array indexable by plain RT indexes, then we
don't need to maintain separate indexes that we do today for result
relations.

That sounds like a good idea. es_result_relations is currently an array
of ResultRelInfos, so that would leave a lot of unfilled structs in the
array. But in on of your other threads, you proposed turning
es_result_relations into an array of pointers anyway
(/messages/by-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com).

- Heikki

#51Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#50)
Re: partition routing layering in nodeModifyTable.c

On Wed, Oct 7, 2020 at 9:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 07/10/2020 12:50, Amit Langote wrote:

On Tue, Oct 6, 2020 at 12:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It would be better to set it in make_modifytable(), just
after calling PlanDirectModify().

Actually, that's how it was done in earlier iterations but I think I
decided to move that into the FDW's functions due to some concern of
one of the other patches that depended on this patch. Maybe it makes
sense to bring that back into make_modifytable() and worry about the
other patch later.

On second thoughts, I take back my earlier comment. Setting it in
make_modifytable() relies on the assumption that the subplan is a single
ForeignScan node, on the target relation. The documentation for
PlanDirectModify says:

To execute the direct modification on the remote server, this
function must rewrite the target subplan with a ForeignScan plan node
that executes the direct modification on the remote server.

So I guess that assumption is safe. But I'd like to have some wiggle
room here. Wouldn't it be OK to have a Result node on top of the
ForeignScan, for example? If it really must be a simple ForeignScan
node, the PlanDirectModify API seems pretty strange.

I'm not entirely sure what I would like to do with this now. I could
live with either version, but I'm not totally happy with either. (I like
your suggestion below)

Assuming you mean the idea of using RT index to access ResultRelInfos
in es_result_relations, we would still need to store the index in the
ForeignScan node, so the question of whether to do it in
make_modifytable() or in PlanDirectModify() must still be answered.

Looking at this block in postgresBeginDirectModify:

/*
* Identify which user to do the remote access as. This should match what
* ExecCheckRTEPerms() does.
*/
Assert(fsplan->resultRelIndex >= 0);
dmstate->resultRelIndex = fsplan->resultRelIndex;
rtindex = list_nth_int(resultRelations, fsplan->resultRelIndex);
rte = exec_rt_fetch(rtindex, estate);
userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();

That's a complicated way of finding out the target table's RTI. We
should probably store the result RTI in the ForeignScan in the first place.

Another idea is to merge "resultRelIndex" and a "range table index" into
one value. Range table entries that are updated would have a
ResultRelInfo, others would not. I'm not sure if that would end up being
cleaner or messier than what we have now, but might be worth trying.

I have thought about something like this before. An idea I had is to
make es_result_relations array indexable by plain RT indexes, then we
don't need to maintain separate indexes that we do today for result
relations.

That sounds like a good idea. es_result_relations is currently an array
of ResultRelInfos, so that would leave a lot of unfilled structs in the
array. But in on of your other threads, you proposed turning
es_result_relations into an array of pointers anyway
(/messages/by-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com).

Okay, I am reorganizing the patches around that idea and will post an
update soon.

--
Amit Langote
EDB: http://www.enterprisedb.com

#52Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#51)
5 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Thu, Oct 8, 2020 at 9:35 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Oct 7, 2020 at 9:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 07/10/2020 12:50, Amit Langote wrote:

I have thought about something like this before. An idea I had is to
make es_result_relations array indexable by plain RT indexes, then we
don't need to maintain separate indexes that we do today for result
relations.

That sounds like a good idea. es_result_relations is currently an array
of ResultRelInfos, so that would leave a lot of unfilled structs in the
array. But in on of your other threads, you proposed turning
es_result_relations into an array of pointers anyway
(/messages/by-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com).

Okay, I am reorganizing the patches around that idea and will post an
update soon.

Attached updated patches.

0001 makes es_result_relations an RTI-indexable array, which allows to
get rid of all "result relation index" fields across the code.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v13-0001-Make-es_result_relations-array-indexable-by-RT-i.patchapplication/octet-stream; name=v13-0001-Make-es_result_relations-array-indexable-by-RT-i.patchDownload
From 89db264f441062e6c938a44757d8faa4b977e0a6 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 9 Oct 2020 13:52:29 +0900
Subject: [PATCH v13 1/5] Make es_result_relations array indexable by RT index

This allows us to get rid of the need to keep around a separate set
of indexes for each result relation.
---
 src/backend/commands/copy.c              |  20 ++--
 src/backend/commands/explain.c           |  17 ++-
 src/backend/commands/tablecmds.c         |  12 +-
 src/backend/executor/execMain.c          | 189 ++++++++-----------------------
 src/backend/executor/execUtils.c         |  73 +++++++++---
 src/backend/executor/nodeModifyTable.c   |  24 ++--
 src/backend/nodes/copyfuncs.c            |   2 -
 src/backend/nodes/outfuncs.c             |   2 -
 src/backend/nodes/readfuncs.c            |   2 -
 src/backend/optimizer/plan/createplan.c  |   2 -
 src/backend/optimizer/plan/setrefs.c     |  10 +-
 src/backend/replication/logical/worker.c |   4 +-
 src/include/executor/executor.h          |   5 +
 src/include/nodes/execnodes.h            |  21 ++--
 src/include/nodes/plannodes.h            |   2 -
 15 files changed, 162 insertions(+), 223 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3c7dbad..6948214 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2829,25 +2829,18 @@ CopyFrom(CopyState cstate)
 	 * index-entry-making machinery.  (There used to be a huge amount of code
 	 * here that basically duplicated execUtils.c ...)
 	 */
-	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo,
-					  cstate->rel,
-					  1,		/* must match rel's position in range_table */
-					  NULL,
-					  0);
-	target_resultRelInfo = resultRelInfo;
+	ExecInitRangeTable(estate, cstate->range_table);
+	ExecInitResultRelationsArray(estate);
+	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
+	ExecInitResultRelation(estate, resultRelInfo, 1);
 
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
 	ExecOpenIndices(resultRelInfo, false);
 
-	estate->es_result_relations = resultRelInfo;
-	estate->es_num_result_relations = 1;
 	estate->es_result_relation_info = resultRelInfo;
 
-	ExecInitRangeTable(estate, cstate->range_table);
-
 	/*
 	 * Set up a ModifyTableState so we can let FDW(s) init themselves for
 	 * foreign-table result relation(s).
@@ -2856,7 +2849,7 @@ CopyFrom(CopyState cstate)
 	mtstate->ps.plan = NULL;
 	mtstate->ps.state = estate;
 	mtstate->operation = CMD_INSERT;
-	mtstate->resultRelInfo = estate->es_result_relations;
+	mtstate->resultRelInfo = resultRelInfo;
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
@@ -3359,7 +3352,8 @@ CopyFrom(CopyState cstate)
 	if (insertMethod != CIM_SINGLE)
 		CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
-	ExecCloseIndices(target_resultRelInfo);
+	ExecCloseResultRelations(estate);
+	ExecCloseRangeTableRelations(estate);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c98c9b5..3210f90 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -769,13 +769,14 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
 {
 	ResultRelInfo *rInfo;
 	bool		show_relname;
-	int			numrels = queryDesc->estate->es_num_result_relations;
-	int			numrootrels = queryDesc->estate->es_num_root_result_relations;
+	int			numrels = list_length(queryDesc->plannedstmt->resultRelations);
+	int			numrootrels = list_length(queryDesc->plannedstmt->rootResultRelations);
+	List	   *resultrels;
 	List	   *routerels;
 	List	   *targrels;
-	int			nr;
 	ListCell   *l;
 
+	resultrels = queryDesc->estate->es_opened_result_relations;
 	routerels = queryDesc->estate->es_tuple_routing_result_relations;
 	targrels = queryDesc->estate->es_trig_target_relations;
 
@@ -783,13 +784,11 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
 
 	show_relname = (numrels > 1 || numrootrels > 0 ||
 					routerels != NIL || targrels != NIL);
-	rInfo = queryDesc->estate->es_result_relations;
-	for (nr = 0; nr < numrels; rInfo++, nr++)
-		report_triggers(rInfo, show_relname, es);
-
-	rInfo = queryDesc->estate->es_root_result_relations;
-	for (nr = 0; nr < numrootrels; rInfo++, nr++)
+	foreach(l, resultrels)
+	{
+		rInfo = lfirst(l);
 		report_triggers(rInfo, show_relname, es);
+	}
 
 	foreach(l, routerels)
 	{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e0ac4e0..1cd8cfb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1693,6 +1693,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	SubTransactionId mySubid;
 	ListCell   *cell;
 	Oid		   *logrelids;
+	int			dummy_rti;
 
 	/*
 	 * Check the explicitly-specified relations.
@@ -1792,19 +1793,24 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfos = (ResultRelInfo *)
 		palloc(list_length(rels) * sizeof(ResultRelInfo));
 	resultRelInfo = resultRelInfos;
+	estate->es_result_relations = (ResultRelInfo **)
+		palloc(list_length(rels) * sizeof(ResultRelInfo *));
+	dummy_rti = 1;
 	foreach(cell, rels)
 	{
 		Relation	rel = (Relation) lfirst(cell);
 
 		InitResultRelInfo(resultRelInfo,
 						  rel,
-						  0,	/* dummy rangetable index */
+						  dummy_rti,
 						  NULL,
 						  0);
+		estate->es_result_relations[dummy_rti - 1] = resultRelInfo;
+		estate->es_opened_result_relations =
+			lappend(estate->es_opened_result_relations, resultRelInfo);
 		resultRelInfo++;
+		dummy_rti++;
 	}
-	estate->es_result_relations = resultRelInfos;
-	estate->es_num_result_relations = list_length(rels);
 
 	/*
 	 * Process all BEFORE STATEMENT TRUNCATE triggers before we begin
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2e27e26..f62fd8f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -827,85 +827,12 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 	estate->es_plannedstmt = plannedstmt;
 
-	/*
-	 * Initialize ResultRelInfo data structures, and open the result rels.
-	 */
 	if (plannedstmt->resultRelations)
 	{
-		List	   *resultRelations = plannedstmt->resultRelations;
-		int			numResultRelations = list_length(resultRelations);
-		ResultRelInfo *resultRelInfos;
-		ResultRelInfo *resultRelInfo;
-
-		resultRelInfos = (ResultRelInfo *)
-			palloc(numResultRelations * sizeof(ResultRelInfo));
-		resultRelInfo = resultRelInfos;
-		foreach(l, resultRelations)
-		{
-			Index		resultRelationIndex = lfirst_int(l);
-			Relation	resultRelation;
-
-			resultRelation = ExecGetRangeTableRelation(estate,
-													   resultRelationIndex);
-			InitResultRelInfo(resultRelInfo,
-							  resultRelation,
-							  resultRelationIndex,
-							  NULL,
-							  estate->es_instrument);
-			resultRelInfo++;
-		}
-		estate->es_result_relations = resultRelInfos;
-		estate->es_num_result_relations = numResultRelations;
+		ExecInitResultRelationsArray(estate);
 
 		/* es_result_relation_info is NULL except when within ModifyTable */
 		estate->es_result_relation_info = NULL;
-
-		/*
-		 * In the partitioned result relation case, also build ResultRelInfos
-		 * for all the partitioned table roots, because we will need them to
-		 * fire statement-level triggers, if any.
-		 */
-		if (plannedstmt->rootResultRelations)
-		{
-			int			num_roots = list_length(plannedstmt->rootResultRelations);
-
-			resultRelInfos = (ResultRelInfo *)
-				palloc(num_roots * sizeof(ResultRelInfo));
-			resultRelInfo = resultRelInfos;
-			foreach(l, plannedstmt->rootResultRelations)
-			{
-				Index		resultRelIndex = lfirst_int(l);
-				Relation	resultRelDesc;
-
-				resultRelDesc = ExecGetRangeTableRelation(estate,
-														  resultRelIndex);
-				InitResultRelInfo(resultRelInfo,
-								  resultRelDesc,
-								  resultRelIndex,
-								  NULL,
-								  estate->es_instrument);
-				resultRelInfo++;
-			}
-
-			estate->es_root_result_relations = resultRelInfos;
-			estate->es_num_root_result_relations = num_roots;
-		}
-		else
-		{
-			estate->es_root_result_relations = NULL;
-			estate->es_num_root_result_relations = 0;
-		}
-	}
-	else
-	{
-		/*
-		 * if no result relation, then set state appropriately
-		 */
-		estate->es_result_relations = NULL;
-		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
-		estate->es_root_result_relations = NULL;
-		estate->es_num_root_result_relations = 0;
 	}
 
 	/*
@@ -1334,8 +1261,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
  *
  * Most of the time, triggers are fired on one of the result relations of the
  * query, and so we can just return a member of the es_result_relations array,
- * or the es_root_result_relations array (if any), or the
- * es_tuple_routing_result_relations list (if any).  (Note: in self-join
+ * or the es_tuple_routing_result_relations list (if any). (Note: in self-join
  * situations there might be multiple members with the same OID; if so it
  * doesn't matter which one we pick.)
  *
@@ -1352,30 +1278,16 @@ ResultRelInfo *
 ExecGetTriggerResultRel(EState *estate, Oid relid)
 {
 	ResultRelInfo *rInfo;
-	int			nr;
 	ListCell   *l;
 	Relation	rel;
 	MemoryContext oldcontext;
 
 	/* First, search through the query result relations */
-	rInfo = estate->es_result_relations;
-	nr = estate->es_num_result_relations;
-	while (nr > 0)
-	{
-		if (RelationGetRelid(rInfo->ri_RelationDesc) == relid)
-			return rInfo;
-		rInfo++;
-		nr--;
-	}
-	/* Second, search through the root result relations, if any */
-	rInfo = estate->es_root_result_relations;
-	nr = estate->es_num_root_result_relations;
-	while (nr > 0)
+	foreach(l, estate->es_opened_result_relations)
 	{
+		rInfo = lfirst(l);
 		if (RelationGetRelid(rInfo->ri_RelationDesc) == relid)
 			return rInfo;
-		rInfo++;
-		nr--;
 	}
 
 	/*
@@ -1512,9 +1424,6 @@ ExecPostprocessPlan(EState *estate)
 static void
 ExecEndPlan(PlanState *planstate, EState *estate)
 {
-	ResultRelInfo *resultRelInfo;
-	Index		num_relations;
-	Index		i;
 	ListCell   *l;
 
 	/*
@@ -1540,30 +1449,52 @@ ExecEndPlan(PlanState *planstate, EState *estate)
 	 */
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
+	/* Close indexes of result relation(s), if any. */
+	ExecCloseResultRelations(estate);
+
+	/*
+	 * close whatever rangetable Relations have been opened.  We do not
+	 * release any locks we might hold on those rels.
+	 */
+	ExecCloseRangeTableRelations(estate);
+
+	/* likewise close any trigger target relations */
+	ExecCleanUpTriggerState(estate);
+}
+
+/*
+ * ExecCloseResultRelations
+ */
+void
+ExecCloseResultRelations(EState *estate)
+{
+	ListCell *l;
+
 	/*
 	 * close indexes of result relation(s) if any.  (Rels themselves get
 	 * closed next.)
 	 */
-	resultRelInfo = estate->es_result_relations;
-	for (i = estate->es_num_result_relations; i > 0; i--)
+	foreach(l, estate->es_opened_result_relations)
 	{
+		ResultRelInfo *resultRelInfo = lfirst(l);
+
 		ExecCloseIndices(resultRelInfo);
-		resultRelInfo++;
 	}
+}
 
-	/*
-	 * close whatever rangetable Relations have been opened.  We do not
-	 * release any locks we might hold on those rels.
-	 */
-	num_relations = estate->es_range_table_size;
-	for (i = 0; i < num_relations; i++)
+/*
+ * ExecCloseRangeTableRelations
+ */
+void
+ExecCloseRangeTableRelations(EState *estate)
+{
+	int		i;
+
+	for (i = 0; i < estate->es_range_table_size; i++)
 	{
 		if (estate->es_relations[i])
 			table_close(estate->es_relations[i], NoLock);
 	}
-
-	/* likewise close any trigger target relations */
-	ExecCleanUpTriggerState(estate);
 }
 
 /* ----------------------------------------------------------------
@@ -2758,17 +2689,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 
 	/*
 	 * Child EPQ EStates share the parent's copy of unchanging state such as
-	 * the snapshot, rangetable, result-rel info, and external Param info.
+	 * the snapshot, rangetable, and external Param info.
 	 * They need their own copies of local state, including a tuple table,
-	 * es_param_exec_vals, etc.
-	 *
-	 * The ResultRelInfo array management is trickier than it looks.  We
-	 * create fresh arrays for the child but copy all the content from the
-	 * parent.  This is because it's okay for the child to share any
-	 * per-relation state the parent has already created --- but if the child
-	 * sets up any ResultRelInfo fields, such as its own junkfilter, that
-	 * state must *not* propagate back to the parent.  (For one thing, the
-	 * pointed-to data is in a memory context that won't last long enough.)
+	 * es_param_exec_vals, result-rel info, etc.
 	 */
 	rcestate->es_direction = ForwardScanDirection;
 	rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2781,30 +2704,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_plannedstmt = parentestate->es_plannedstmt;
 	rcestate->es_junkFilter = parentestate->es_junkFilter;
 	rcestate->es_output_cid = parentestate->es_output_cid;
-	if (parentestate->es_num_result_relations > 0)
-	{
-		int			numResultRelations = parentestate->es_num_result_relations;
-		int			numRootResultRels = parentestate->es_num_root_result_relations;
-		ResultRelInfo *resultRelInfos;
-
-		resultRelInfos = (ResultRelInfo *)
-			palloc(numResultRelations * sizeof(ResultRelInfo));
-		memcpy(resultRelInfos, parentestate->es_result_relations,
-			   numResultRelations * sizeof(ResultRelInfo));
-		rcestate->es_result_relations = resultRelInfos;
-		rcestate->es_num_result_relations = numResultRelations;
-
-		/* Also transfer partitioned root result relations. */
-		if (numRootResultRels > 0)
-		{
-			resultRelInfos = (ResultRelInfo *)
-				palloc(numRootResultRels * sizeof(ResultRelInfo));
-			memcpy(resultRelInfos, parentestate->es_root_result_relations,
-				   numRootResultRels * sizeof(ResultRelInfo));
-			rcestate->es_root_result_relations = resultRelInfos;
-			rcestate->es_num_root_result_relations = numRootResultRels;
-		}
-	}
+	/*
+	 * ResultRelInfos needed by subplans are initialized from scratch when
+	 * the subplans themselves are initialized.
+	 */
+	if (parentestate->es_result_relations)
+		ExecInitResultRelationsArray(rcestate);
 	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
@@ -2954,6 +2859,8 @@ EvalPlanQualEnd(EPQState *epqstate)
 		ExecEndNode(subplanstate);
 	}
 
+	ExecCloseResultRelations(estate);
+
 	/* throw away the per-estate tuple table, some node may have used it */
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index d0e65b8..169dd92 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -124,14 +124,8 @@ CreateExecutorState(void)
 	estate->es_output_cid = (CommandId) 0;
 
 	estate->es_result_relations = NULL;
-	estate->es_num_result_relations = 0;
 	estate->es_result_relation_info = NULL;
-
-	estate->es_root_result_relations = NULL;
-	estate->es_num_root_result_relations = 0;
-
 	estate->es_tuple_routing_result_relations = NIL;
-
 	estate->es_trig_target_relations = NIL;
 
 	estate->es_param_list_info = NULL;
@@ -711,16 +705,10 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
 bool
 ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 {
-	ResultRelInfo *resultRelInfos;
-	int			i;
-
-	resultRelInfos = estate->es_result_relations;
-	for (i = 0; i < estate->es_num_result_relations; i++)
-	{
-		if (resultRelInfos[i].ri_RangeTableIndex == scanrelid)
-			return true;
-	}
-	return false;
+	return	list_member_int(estate->es_plannedstmt->resultRelations,
+							scanrelid) ||
+			list_member_int(estate->es_plannedstmt->rootResultRelations,
+							scanrelid);
 }
 
 /* ----------------------------------------------------------------
@@ -779,8 +767,8 @@ ExecInitRangeTable(EState *estate, List *rangeTable)
 		palloc0(estate->es_range_table_size * sizeof(Relation));
 
 	/*
-	 * es_rowmarks is also parallel to the es_range_table, but it's allocated
-	 * only if needed.
+	 * es_result_relations and es_rowmarks are also parallel to es_range_table,
+	 * but are only allocated if needed.
 	 */
 	estate->es_rowmarks = NULL;
 }
@@ -836,6 +824,55 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
 }
 
 /*
+ * ExecInitResultRelationsArray
+ *		Allocate space to hold ResultRelInfo pointers of result relations
+ *
+ * Although not relations in the range table may be result relations, we
+ * allocate that many pointers, because that allows to access individual
+ * entries by RT index (minus 1 to be accurate), which is convenient.
+ */
+void
+ExecInitResultRelationsArray(EState *estate)
+{
+	/*
+	 * Individual pointers are assigned when ExecInitResultRelation() is
+	 * called one-by-one for each result relation.
+	 */
+	estate->es_result_relations = (ResultRelInfo **)
+		palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
+}
+
+/*
+ * ExecInitResultRelation
+ *		Open result relation given by the passed-in RT index and fill its
+ *		ResultRelInfo node
+ *
+ * Here, we also save the ResultRelInfo in estate->es_result_relations array
+ * such that it can be accessed later using the RT index.
+ */
+void
+ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
+					   Index rti)
+{
+	Relation	resultRelationDesc;
+
+	resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+	InitResultRelInfo(resultRelInfo,
+					  resultRelationDesc,
+					  rti,
+					  NULL,
+					  estate->es_instrument);
+	estate->es_result_relations[rti - 1] = resultRelInfo;
+
+	/*
+	 * Saving in the list allows to avoid needlessly traversing the whole
+	 * array when only a few of its entries are possibly non-NULL.
+	 */
+	estate->es_opened_result_relations =
+		lappend(estate->es_opened_result_relations, resultRelInfo);
+}
+
+/*
  * UpdateChangedParamSet
  *		Add changed parameters to a plan node's chgParam set
  */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9812089..9b27d34 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2301,7 +2301,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	ListCell   *l;
+	ListCell   *l,
+			   *l1;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
@@ -2322,13 +2323,17 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_done = false;
 
 	mtstate->mt_plans = (PlanState **) palloc0(sizeof(PlanState *) * nplans);
-	mtstate->resultRelInfo = estate->es_result_relations + node->resultRelIndex;
+	mtstate->resultRelInfo = (ResultRelInfo *)
+		palloc(nplans * sizeof(ResultRelInfo));
 	mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
 
 	/* If modifying a partitioned table, initialize the root table info */
-	if (node->rootResultRelIndex >= 0)
-		mtstate->rootResultRelInfo = estate->es_root_result_relations +
-			node->rootResultRelIndex;
+	if (node->rootRelation > 0)
+	{
+		mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
+		ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
+							   node->rootRelation);
+	}
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2351,9 +2356,14 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
-	foreach(l, node->plans)
+	forboth(l, node->resultRelations, l1, node->plans)
 	{
-		subplan = (Plan *) lfirst(l);
+		Index		resultRelation = lfirst_int(l);
+
+		subplan = (Plan *) lfirst(l1);
+
+		/* This opens result relation and fills ResultRelInfo. */
+		ExecInitResultRelation(estate, resultRelInfo, resultRelation);
 
 		/* Initialize the usesFdwDirectModify flag */
 		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0409a40..7974fa0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -207,8 +207,6 @@ _copyModifyTable(const ModifyTable *from)
 	COPY_SCALAR_FIELD(rootRelation);
 	COPY_SCALAR_FIELD(partColsUpdated);
 	COPY_NODE_FIELD(resultRelations);
-	COPY_SCALAR_FIELD(resultRelIndex);
-	COPY_SCALAR_FIELD(rootResultRelIndex);
 	COPY_NODE_FIELD(plans);
 	COPY_NODE_FIELD(withCheckOptionLists);
 	COPY_NODE_FIELD(returningLists);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f038648..f35d2a0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -408,8 +408,6 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
 	WRITE_UINT_FIELD(rootRelation);
 	WRITE_BOOL_FIELD(partColsUpdated);
 	WRITE_NODE_FIELD(resultRelations);
-	WRITE_INT_FIELD(resultRelIndex);
-	WRITE_INT_FIELD(rootResultRelIndex);
 	WRITE_NODE_FIELD(plans);
 	WRITE_NODE_FIELD(withCheckOptionLists);
 	WRITE_NODE_FIELD(returningLists);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab..73b8d9d 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1639,8 +1639,6 @@ _readModifyTable(void)
 	READ_UINT_FIELD(rootRelation);
 	READ_BOOL_FIELD(partColsUpdated);
 	READ_NODE_FIELD(resultRelations);
-	READ_INT_FIELD(resultRelIndex);
-	READ_INT_FIELD(rootResultRelIndex);
 	READ_NODE_FIELD(plans);
 	READ_NODE_FIELD(withCheckOptionLists);
 	READ_NODE_FIELD(returningLists);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 3d7a4e3..881eaf4 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6808,8 +6808,6 @@ make_modifytable(PlannerInfo *root,
 	node->rootRelation = rootRelation;
 	node->partColsUpdated = partColsUpdated;
 	node->resultRelations = resultRelations;
-	node->resultRelIndex = -1;	/* will be set correctly in setrefs.c */
-	node->rootResultRelIndex = -1;	/* will be set correctly in setrefs.c */
 	node->plans = subplans;
 	if (!onconflict)
 	{
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dd8e2e9..66372d3 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -975,24 +975,18 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 
 				/*
 				 * Append this ModifyTable node's final result relation RT
-				 * index(es) to the global list for the plan, and set its
-				 * resultRelIndex to reflect their starting position in the
-				 * global list.
+				 * index(es) to the global list for the plan.
 				 */
-				splan->resultRelIndex = list_length(root->glob->resultRelations);
 				root->glob->resultRelations =
 					list_concat(root->glob->resultRelations,
 								splan->resultRelations);
 
 				/*
 				 * If the main target relation is a partitioned table, also
-				 * add the partition root's RT index to rootResultRelations,
-				 * and remember its index in that list in rootResultRelIndex.
+				 * add the partition root's RT index to rootResultRelations.
 				 */
 				if (splan->rootRelation)
 				{
-					splan->rootResultRelIndex =
-						list_length(root->glob->rootResultRelations);
 					root->glob->rootResultRelations =
 						lappend_int(root->glob->rootResultRelations,
 									splan->rootRelation);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9c6fdee..77f71e5 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -355,12 +355,12 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	rte->relkind = rel->localrel->rd_rel->relkind;
 	rte->rellockmode = AccessShareLock;
 	ExecInitRangeTable(estate, list_make1(rte));
+	ExecInitResultRelationsArray(estate);
 
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
 
-	estate->es_result_relations = resultRelInfo;
-	estate->es_num_result_relations = 1;
+	estate->es_result_relations[0] = resultRelInfo;
 	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117..54455e1 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -538,6 +538,9 @@ extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable);
+extern void ExecInitResultRelationsArray(EState *estate);
+extern void ExecCloseRangeTableRelations(EState *estate);
+extern void ExecCloseResultRelations(EState *estate);
 
 static inline RangeTblEntry *
 exec_rt_fetch(Index rti, EState *estate)
@@ -546,6 +549,8 @@ exec_rt_fetch(Index rti, EState *estate)
 }
 
 extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
+					   Index rti);
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ef448d6..7eb5ca6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -508,6 +508,14 @@ typedef struct EState
 	Index		es_range_table_size;	/* size of the range table arrays */
 	Relation   *es_relations;	/* Array of per-range-table-entry Relation
 								 * pointers, or NULL if not yet opened */
+	ResultRelInfo **es_result_relations;	/* Array of Per-range-table-entry
+											 * ResultRelInfo pointers, or
+											 * NULL if a given range table
+											 * relation not a target
+											 * table */
+	List		*es_opened_result_relations;	/* List of non-NULL entries
+												 * in es_result_relations added
+												 * in no specific order */
 	struct ExecRowMark **es_rowmarks;	/* Array of per-range-table-entry
 										 * ExecRowMarks, or NULL if none */
 	PlannedStmt *es_plannedstmt;	/* link to top of plan tree */
@@ -518,24 +526,13 @@ typedef struct EState
 	/* If query can insert/delete tuples, the command ID to mark them with */
 	CommandId	es_output_cid;
 
-	/* Info about target table(s) for insert/update/delete queries: */
-	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
-	int			es_num_result_relations;	/* length of array */
 	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
-	/*
-	 * Info about the partition root table(s) for insert/update/delete queries
-	 * targeting partitioned tables.  Only leaf partitions are mentioned in
-	 * es_result_relations, but we need access to the roots for firing
-	 * triggers and for runtime tuple routing.
-	 */
-	ResultRelInfo *es_root_result_relations;	/* array of ResultRelInfos */
-	int			es_num_root_result_relations;	/* length of the array */
 	PartitionDirectory es_partition_directory;	/* for PartitionDesc lookup */
 
 	/*
 	 * The following list contains ResultRelInfos created by the tuple routing
-	 * code for partitions that don't already have one.
+	 * code for partitions that aren't found in es_result_relations_array.
 	 */
 	List	   *es_tuple_routing_result_relations;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e0107..0666546 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -224,8 +224,6 @@ typedef struct ModifyTable
 	Index		rootRelation;	/* Root RT index, if target is partitioned */
 	bool		partColsUpdated;	/* some part key in hierarchy updated */
 	List	   *resultRelations;	/* integer list of RT indexes */
-	int			resultRelIndex; /* index of first resultRel in plan's list */
-	int			rootResultRelIndex; /* index of the partitioned table root */
 	List	   *plans;			/* plan(s) producing source data */
 	List	   *withCheckOptionLists;	/* per-target-table WCO lists */
 	List	   *returningLists; /* per-target-table RETURNING tlists */
-- 
1.8.3.1

v13-0002-Include-result-relation-index-if-any-in-ForeignS.patchapplication/octet-stream; name=v13-0002-Include-result-relation-index-if-any-in-ForeignS.patchDownload
From b5a268bdb661f2fc4ecf327418d6cd7a03c98d69 Mon Sep 17 00:00:00 2001
From: Etsuro Fujita <efujita@postgresql.org>
Date: Thu, 8 Aug 2019 21:41:12 +0900
Subject: [PATCH v13 2/5] Include result relation index if any in ForeignScan

FDWs that can perform an UPDATE/DELETE remotely using the "direct
modify" set of APIs need in some cases to access the result relation
properties for which they can currently look at
EState.es_result_relation_info, which the core executor laboriously
makes sure is set correctly.  An upcoming patch will remove that
field from EState.  So this commit installs a new field resultRelation
in ForeignScan node which will be set by the core planner for an FDW
to peruse during a "direct modification" operation; it gives the
range table index of the target foreign table.

Amit Langote, Etsuro Fujita
---
 contrib/postgres_fdw/postgres_fdw.c     | 23 +++++++++++++++--------
 doc/src/sgml/fdwhandler.sgml            | 13 +++++++------
 src/backend/executor/nodeForeignscan.c  |  5 ++++-
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c | 13 +++++++++++++
 src/backend/optimizer/plan/setrefs.c    |  4 ++++
 src/include/nodes/plannodes.h           |  8 ++++++++
 9 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce..8eaec6c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -200,6 +200,8 @@ typedef struct PgFdwDirectModifyState
 	Relation	rel;			/* relcache entry for the foreign table */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
+	Index			resultRelation;	/* Target foreign table's RT index */
+
 	/* extracted fdw_private data */
 	char	   *query;			/* text of UPDATE/DELETE command */
 	bool		has_returning;	/* is there a RETURNING clause? */
@@ -446,11 +448,12 @@ static List *build_remote_returning(Index rtindex, Relation rel,
 									List *returningList);
 static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
 static void execute_dml_stmt(ForeignScanState *node);
-static TupleTableSlot *get_returning_data(ForeignScanState *node);
+static TupleTableSlot *get_returning_data(ForeignScanState *node, ResultRelInfo *resultRelInfo);
 static void init_returning_filter(PgFdwDirectModifyState *dmstate,
 								  List *fdw_scan_tlist,
 								  Index rtindex);
 static TupleTableSlot *apply_returning_filter(PgFdwDirectModifyState *dmstate,
+											  ResultRelInfo *relInfo,
 											  TupleTableSlot *slot,
 											  EState *estate);
 static void prepare_query_params(PlanState *node,
@@ -2355,7 +2358,8 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	Assert(fsplan->resultRelation > 0);
+	dmstate->resultRelation = rtindex = fsplan->resultRelation;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2450,7 +2454,10 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = estate->es_result_relations[dmstate->resultRelation - 1];
+
+	/* The executor must have initialized the ResultRelInfo for us. */
+	Assert(resultRelInfo != NULL);
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -2482,7 +2489,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 	/*
 	 * Get the next RETURNING tuple.
 	 */
-	return get_returning_data(node);
+	return get_returning_data(node, resultRelInfo);
 }
 
 /*
@@ -4082,11 +4089,10 @@ execute_dml_stmt(ForeignScanState *node)
  * Get the result of a RETURNING clause.
  */
 static TupleTableSlot *
-get_returning_data(ForeignScanState *node)
+get_returning_data(ForeignScanState *node, ResultRelInfo *resultRelInfo)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4141,7 +4147,8 @@ get_returning_data(ForeignScanState *node)
 		if (dmstate->rel)
 			resultSlot = slot;
 		else
-			resultSlot = apply_returning_filter(dmstate, slot, estate);
+			resultSlot = apply_returning_filter(dmstate, resultRelInfo, slot,
+												estate);
 	}
 	dmstate->next_tuple++;
 
@@ -4230,10 +4237,10 @@ init_returning_filter(PgFdwDirectModifyState *dmstate,
  */
 static TupleTableSlot *
 apply_returning_filter(PgFdwDirectModifyState *dmstate,
+					   ResultRelInfo *relInfo,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 72fa127..e995514 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -893,10 +893,10 @@ BeginDirectModify(ForeignScanState *node,
      its <structfield>fdw_state</structfield> field is still NULL.  Information about
      the table to modify is accessible through the
      <structname>ForeignScanState</structname> node (in particular, from the underlying
-     <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
-     <literal>eflags</literal> contains flag bits describing the executor's
-     operating mode for this plan node.
+     <structname>ForeignScan</structname> plan node, which contains the table's range
+     table index along with any FDW-private information provided by
+     <function>PlanDirectModify</function>.  <literal>eflags</literal> contains flag
+     bits describing the executor's operating mode for this plan node.
     </para>
 
     <para>
@@ -926,8 +926,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     obtained using the information passed to <function>BeginDirectModify</function>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471a..515860d 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -221,10 +221,13 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan or the direct modification.
 	 */
 	if (node->operation != CMD_SELECT)
+	{
+		Assert(node->resultRelation > 0);
 		fdwroutine->BeginDirectModify(scanstate, eflags);
+	}
 	else
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7974fa0..70476d8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -759,6 +759,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelation);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f35d2a0..2d6efb5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -696,6 +696,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelation);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 73b8d9d..1a9d01f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2015,6 +2015,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelation);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 881eaf4..fd13ac9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5541,6 +5541,8 @@ make_foreignscan(List *qptlist,
 	node->fs_relids = NULL;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
+	/* resultRelation will be set by make_modifytable(), if needed */
+	node->resultRelation = 0;
 
 	return node;
 }
@@ -6898,7 +6900,18 @@ make_modifytable(PlannerInfo *root,
 			!has_stored_generated_columns(subroot, rti))
 			direct_modify = fdwroutine->PlanDirectModify(subroot, node, rti, i);
 		if (direct_modify)
+		{
+			ForeignScan   *fscan = (ForeignScan *) list_nth(node->plans, i);
+
+			/*
+			 * For result relations that will be modified directly, the FDW
+			 * needs to know where to find them.
+			 */
+			Assert(IsA(fscan, ForeignScan));
+			fscan->resultRelation = rti;
+
 			direct_modify_plans = bms_add_member(direct_modify_plans, i);
+		}
 
 		if (!direct_modify &&
 			fdwroutine != NULL &&
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 66372d3..6db6e72 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1315,6 +1315,10 @@ set_foreignscan_references(PlannerInfo *root,
 	}
 
 	fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+	/* Adjust resultRelation if needed */
+	if (fscan->resultRelation > 0)
+		fscan->resultRelation += rtoffset;
 }
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0666546..cadfdb9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -605,6 +605,11 @@ typedef struct WorkTableScan
  * When the plan node represents a foreign join, scan.scanrelid is zero and
  * fs_relids must be consulted to identify the join relation.  (fs_relids
  * is valid for simple scans as well, but will always match scan.scanrelid.)
+ *
+ * If an FDW's PlanDirectModify() callback decides to repurpose a ForeignScan
+ * node to store the information about an UPDATE or DELETE operation to
+ * to perform on a given foreign table result relation, resultRelation is set
+ * to identify the relation.
  * ----------------
  */
 typedef struct ForeignScan
@@ -618,6 +623,9 @@ typedef struct ForeignScan
 	List	   *fdw_recheck_quals;	/* original quals not in scan.plan.qual */
 	Bitmapset  *fs_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
+	Index		resultRelation;	/* Target foreign table's RT index; valid for
+								 * result relations of UPDATE/DELETE; 0 for
+								 * other query types */
 } ForeignScan;
 
 /* ----------------
-- 
1.8.3.1

v13-0005-Revise-child-to-root-tuple-conversion-map-manage.patchapplication/octet-stream; name=v13-0005-Revise-child-to-root-tuple-conversion-map-manage.patchDownload
From 3975f8f3f17c2808f6b056b0558758261a59e514 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v13 5/5] Revise child-to-root tuple conversion map management

Transition tuple capture requires to convert child tuples to the
inheritance root table format because that's the format the
transition tuplestore stores tuple in.  For INSERTs into partitioned
tables, the conversion is handled by tuple routing code which
constructs the map for a given partition only if the partition is
targeted, but for UPDATE and DELETE, maps for all result relations
are made and stored in an array in ModifyTableState during
ExecInitModifyTable, which requires their ResultRelInfos to have been
already built. During execution, map for the currently active result
relation is set in TransitionCaptureState.tcs_map.

This commit removes TransitionCaptureMap.tcs_map in favor a new
map field in ResultRelInfo named ri_ChildToRootMap that is
initialized when the ResultRelInfo for a given result relation is.
This way is less confusing and less bug-prone than setting and
resetting tcs_map. Also, this will also allow us to delay creating
the map for a given result relation to when that relation is actually
processed during execution.
---
 src/backend/commands/copy.c            |  30 +----
 src/backend/commands/trigger.c         |   9 +-
 src/backend/executor/execPartition.c   |  20 ++-
 src/backend/executor/nodeModifyTable.c | 224 ++++++++++-----------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/nodes/execnodes.h          |  11 +-
 6 files changed, 102 insertions(+), 202 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2d9e1f3..99b0f62 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3105,32 +3105,14 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition which may change the tuple, we can
+			 * just remember the original unconverted tuple to avoid a
+			 * needless round trip conversion.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 672fccf..d1b5a03 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4293,8 +4294,8 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * tables, then return NULL.
  *
  * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * should set tcs_original_insert_tuple as appropriate when dealing with child
+ * tables
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5389,7 +5390,9 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	if (row_trigger && transition_capture != NULL)
 	{
 		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo;
+		TupleConversionMap *map = pinfo ? pinfo->pi_PartitionToRootMap :
+			relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 33d2c6f..1c20f6c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -983,9 +983,23 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	if (mtstate &&
 		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
 	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot));
+		ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+
+		/*
+		 * If the partition appears to be a reused UPDATE result relation, the
+		 * necessary map would already have been set in ri_ChildToRootMap by
+		 * ExecInitModifyTable(), so use that one instead of building one from
+		 * scratch.  One can tell if it's actually a reused UPDATE result
+		 * relation by looking at its ri_RangeTableIndex which must be
+		 * different from the root RT index.
+		 */
+		if (node && node->operation == CMD_UPDATE &&
+			node->rootRelation != partRelInfo->ri_RangeTableIndex)
+			partrouteinfo->pi_PartitionToRootMap = partRelInfo->ri_ChildToRootMap;
+		else
+			partrouteinfo->pi_PartitionToRootMap =
+				convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
+									   RelationGetDescr(partRelInfo->ri_PartitionRoot));
 	}
 	else
 		partrouteinfo->pi_PartitionToRootMap = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3405ef5..e5e2564 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -1086,9 +1083,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
 	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1165,39 +1160,29 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-	 * should convert the tuple into root's tuple descriptor, since
-	 * ExecInsert() starts the search from root.  The tuple conversion
-	 * map list is in the order of mtstate->resultRelInfo[], so to
-	 * retrieve the one for this resultRel, we need to know the
-	 * position of the resultRel in mtstate->resultRelInfo[].
+	 * resultRelInfo is one of the per-subplan resultRelInfos and the tuple
+	 * in the input slot is in its format.  Before we can call ExecInsert() to
+	 * route the tuple, which expects the root result relation to be passed,
+	 * we must convert the tuple into root relation's tuple descriptor, if at
+	 * all needed.
 	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-	 * so save the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
+	/*
+	 * Reset the transition state that may possibly have been written
+	 * by INSERT.
+	 */
 	if (mtstate->mt_transition_capture)
-	{
 		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
 
 	/* We're done moving. */
 	return true;
@@ -1905,28 +1890,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1950,6 +1913,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
@@ -1964,37 +1928,15 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE triggers
+	 * on the partition which may change the tuple, we can just remember the
+	 * original unconverted tuple to avoid a needless round trip conversion.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -2010,58 +1952,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc);
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2157,17 +2047,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2337,6 +2216,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRelInfo;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2358,13 +2238,24 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		palloc(nplans * sizeof(ResultRelInfo));
 	mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
 
-	/* If modifying a partitioned table, initialize the root table info */
+	/*
+	 * Initialize the designated "root" result relation.  When modifying
+	 * partitioned tables, it's given by node->rootRelation, while in other
+	 * cases, it's the first relation in node->resultRelations.  We need to
+	 * initialize this one before any others, because
+	 * ExecSetupTransitionCaptureState() needs it.
+	 */
 	if (node->rootRelation > 0)
 	{
 		mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
 		ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
 							   node->rootRelation);
 	}
+	else
+		ExecInitResultRelation(estate, mtstate->resultRelInfo,
+							   linitial_int(node->resultRelations));
+
+	rootResultRelInfo = getTargetResultRelInfo(mtstate);
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2374,6 +2265,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2387,8 +2285,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 		subplan = (Plan *) lfirst(l1);
 
-		/* This opens result relation and fills ResultRelInfo. */
-		ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+		/*
+		 * This opens result relation and fills ResultRelInfo.
+		 * ("root" relation already opened.)
+		 */
+		if (resultRelInfo != rootResultRelInfo)
+			ExecInitResultRelation(estate, resultRelInfo, resultRelation);
 
 		/* Initialize the usesFdwDirectModify flag */
 		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
@@ -2444,12 +2346,28 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.  During INSERT, partition tuples to
+		 * store into the transition tuple store are converted using
+		 * PartitionToRoot map in the partition's PartitionRoutingInfo.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRelInfo->ri_RelationDesc));
 		resultRelInfo++;
 		i++;
 	}
 
 	/* Get the target relation */
-	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
+	rel = rootResultRelInfo->ri_RelationDesc;
 
 	/*
 	 * If it's not a partitioned table after all, UPDATE tuple routing should
@@ -2468,26 +2386,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a40ddf5..e38d732 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -46,7 +46,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -66,14 +66,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 36a3318..e657493 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -488,6 +488,14 @@ typedef struct ResultRelInfo
 
 	/* for use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child result relation tuples to the format of the
+	 * table actually mentioned in the query (called "root").  Set only
+	 * if either transition tuple capture or update partition row
+	 * movement is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1174,9 +1182,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
1.8.3.1

v13-0004-Rearrange-partition-update-row-movement-code-a-b.patchapplication/octet-stream; name=v13-0004-Rearrange-partition-update-row-movement-code-a-b.patchDownload
From 500e3459e84ed5291ec810580f28d65fa1eeb6eb Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 16:24:38 +0900
Subject: [PATCH v13 4/5] Rearrange partition update row movement code a bit

The block of code that does the actual moving (DELETE+INSERT) has
been moved to a function named ExecCrossPartitionUpdate() which must
be retried until it says the movement has been done or can't be done.

This also rearrange the code in ExecDelete() and ExecInsert() around
executing AFTER ROW DELETE and AFTER ROW INSERT triggers, resp.  In
the case of an update row movement, such triggers should not see the
affected tuple in their OLD/NEW transition table.
---
 src/backend/executor/nodeModifyTable.c | 347 +++++++++++++++++++--------------
 1 file changed, 199 insertions(+), 148 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c51320..3405ef5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -389,7 +389,6 @@ ExecInsert(ModifyTableState *mtstate,
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
-	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
@@ -655,31 +654,30 @@ ExecInsert(ModifyTableState *mtstate,
 	}
 
 	/*
-	 * If this insert is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition NEW TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the insert is a part of update row movement, put this row into the
+	 * UPDATE trigger's NEW TABLE (transition table) instead of that of an
+	 * INSERT trigger.
 	 */
-	ar_insert_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_new_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_new_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo, NULL,
-							 NULL,
-							 slot,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot,
+							 NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * INSERT trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW INSERT Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_insert_trig_tcs = NULL;
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW INSERT Triggers */
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
 
 	list_free(recheckIndexes);
 
@@ -745,7 +743,6 @@ ExecDelete(ModifyTableState *mtstate,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
-	TransitionCaptureState *ar_delete_trig_tcs;
 
 	if (tupleDeleted)
 		*tupleDeleted = false;
@@ -989,32 +986,30 @@ ldelete:;
 		*tupleDeleted = true;
 
 	/*
-	 * If this delete is the result of a partition key update that moved the
-	 * tuple to a new partition, put this row into the transition OLD TABLE,
-	 * if there is one. We need to do this separately for DELETE and INSERT
-	 * because they happen on different tables.
+	 * If the delete is a part of update row movement, put this row into the
+	 * UPDATE trigger's OLD TABLE (transition table) instead of that of an
+	 * DELETE trigger.
 	 */
-	ar_delete_trig_tcs = mtstate->mt_transition_capture;
-	if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture
-		&& mtstate->mt_transition_capture->tcs_update_old_table)
+	if (mtstate->operation == CMD_UPDATE &&
+		mtstate->mt_transition_capture &&
+		mtstate->mt_transition_capture->tcs_update_old_table)
 	{
-		ExecARUpdateTriggers(estate, resultRelInfo,
-							 tupleid,
-							 oldtuple,
-							 NULL,
-							 NULL,
-							 mtstate->mt_transition_capture);
+		ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL, NIL, mtstate->mt_transition_capture);
 
 		/*
-		 * We've already captured the NEW TABLE row, so make sure any AR
-		 * DELETE trigger fired below doesn't capture it again.
+		 * Execute AFTER ROW DELETE Triggers, but such that the row is not
+		 * captured again in the transition table if any.
 		 */
-		ar_delete_trig_tcs = NULL;
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 NULL);
+	}
+	else
+	{
+		/* AFTER ROW DELETE Triggers */
+		ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
+							 mtstate->mt_transition_capture);
 	}
-
-	/* AFTER ROW DELETE Triggers */
-	ExecARDeleteTriggers(estate, resultRelInfo, tupleid, oldtuple,
-						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
 	if (processReturning && resultRelInfo->ri_projectReturning)
@@ -1061,6 +1056,153 @@ ldelete:;
 	return NULL;
 }
 
+/*
+ *	ExecCrossPartitionUpdate
+ *		Move an updated tuple from a given partition to the correct partition
+ *		of its root parent table
+ *
+ *	This works by first deleting the tuple from the current partition,
+ *	followed by inserting it into the root parent table, that is,
+ *	mtstate->rootResultRelInfo, from where it's re-routed to the correct
+ *	partition.
+ *
+ *	Returns true if the tuple has been successfully moved or if it's found
+ *	that the tuple was concurrently deleted so there's nothing more to do
+ *	for the caller.
+ *
+ *	False is returned if the tuple we're trying to move is found to have been
+ *	concurrently updated.  Caller should check if the updated tuple that's
+ *	returned in *retry_slot still needs to be re-routed and call this function
+ *	again if needed.
+ */
+static bool
+ExecCrossPartitionUpdate(ModifyTableState *mtstate,
+						 ResultRelInfo *resultRelInfo,
+						 ItemPointer tupleid, HeapTuple oldtuple,
+						 TupleTableSlot *slot, TupleTableSlot *planSlot,
+						 EPQState *epqstate, bool canSetTag,
+						 TupleTableSlot **retry_slot,
+						 TupleTableSlot **inserted_tuple)
+{
+	EState	   *estate = mtstate->ps.state;
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	int			map_index;
+	TupleConversionMap *tupconv_map;
+	TupleConversionMap *saved_tcs_map = NULL;
+	bool		tuple_deleted;
+	TupleTableSlot *epqslot = NULL;
+
+	*inserted_tuple = NULL;
+	*retry_slot = NULL;
+
+	/*
+	 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
+	 * original row to migrate to a different partition.  Maybe this
+	 * can be implemented some day, but it seems a fringe feature with
+	 * little redeeming value.
+	 */
+	if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("invalid ON UPDATE specification"),
+				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
+
+	/*
+	 * When an UPDATE is run on a leaf partition, we will not have
+	 * partition tuple routing set up. In that case, fail with
+	 * partition constraint violation error.
+	 */
+	if (proute == NULL)
+		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+	/*
+	 * Row movement, part 1.  Delete the tuple, but skip RETURNING
+	 * processing. We want to return rows from INSERT.
+	 */
+	ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+			   epqstate, estate,
+			   false,	/* processReturning */
+			   false,	/* canSetTag */
+			   true,	/* changingPart */
+			   &tuple_deleted, &epqslot);
+
+	/*
+	 * For some reason if DELETE didn't happen (e.g. trigger prevented
+	 * it, or it was already deleted by self, or it was concurrently
+	 * deleted by another transaction), then we should skip the insert
+	 * as well; otherwise, an UPDATE could cause an increase in the
+	 * total number of rows across all partitions, which is clearly
+	 * wrong.
+	 *
+	 * For a normal UPDATE, the case where the tuple has been the
+	 * subject of a concurrent UPDATE or DELETE would be handled by
+	 * the EvalPlanQual machinery, but for an UPDATE that we've
+	 * translated into a DELETE from this partition and an INSERT into
+	 * some other partition, that's not available, because CTID chains
+	 * can't span relation boundaries.  We mimic the semantics to a
+	 * limited extent by skipping the INSERT if the DELETE fails to
+	 * find a tuple. This ensures that two concurrent attempts to
+	 * UPDATE the same tuple at the same time can't turn one tuple
+	 * into two, and that an UPDATE of a just-deleted tuple can't
+	 * resurrect it.
+	 */
+	if (!tuple_deleted)
+	{
+		/*
+		 * epqslot will be typically NULL.  But when ExecDelete()
+		 * finds that another transaction has concurrently updated the
+		 * same row, it re-fetches the row, skips the delete, and
+		 * epqslot is set to the re-fetched tuple slot. In that case,
+		 * we need to do all the checks again.
+		 */
+		if (TupIsNull(epqslot))
+			return true;
+		else
+		{
+			*retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+			return false;
+		}
+	}
+
+	/*
+	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
+	 * should convert the tuple into root's tuple descriptor, since
+	 * ExecInsert() starts the search from root.  The tuple conversion
+	 * map list is in the order of mtstate->resultRelInfo[], so to
+	 * retrieve the one for this resultRel, we need to know the
+	 * position of the resultRel in mtstate->resultRelInfo[].
+	 */
+	map_index = resultRelInfo - mtstate->resultRelInfo;
+	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
+	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	if (tupconv_map != NULL)
+		slot = execute_attr_map_slot(tupconv_map->attrMap,
+									 slot,
+									 mtstate->mt_root_tuple_slot);
+
+	/*
+	 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+	 * so save the currently active map.
+	 */
+	if (mtstate->mt_transition_capture)
+		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
+
+	/* Tuple routing starts from the root table. */
+	Assert(mtstate->rootResultRelInfo != NULL);
+	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								 planSlot, estate, canSetTag);
+
+	/* Clear the INSERT's tuple and restore the saved map. */
+	if (mtstate->mt_transition_capture)
+	{
+		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+	}
+
+	/* We're done moving. */
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecUpdate
  *
@@ -1216,119 +1358,28 @@ lreplace:;
 		 */
 		if (partition_constraint_failed)
 		{
-			bool		tuple_deleted;
-			TupleTableSlot *ret_slot;
-			TupleTableSlot *epqslot = NULL;
-			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-			int			map_index;
-			TupleConversionMap *tupconv_map;
-			TupleConversionMap *saved_tcs_map = NULL;
-
-			/*
-			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
-			 * original row to migrate to a different partition.  Maybe this
-			 * can be implemented some day, but it seems a fringe feature with
-			 * little redeeming value.
-			 */
-			if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("invalid ON UPDATE specification"),
-						 errdetail("The result tuple would appear in a different partition than the original tuple.")));
-
-			/*
-			 * When an UPDATE is run on a leaf partition, we will not have
-			 * partition tuple routing set up. In that case, fail with
-			 * partition constraint violation error.
-			 */
-			if (proute == NULL)
-				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
-
-			/*
-			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
-			 * processing. We want to return rows from INSERT.
-			 */
-			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
-					   epqstate, estate,
-					   false,	/* processReturning */
-					   false,	/* canSetTag */
-					   true,	/* changingPart */
-					   &tuple_deleted, &epqslot);
+			TupleTableSlot *inserted_tuple,
+						   *retry_slot;
+			bool			retry;
 
 			/*
-			 * For some reason if DELETE didn't happen (e.g. trigger prevented
-			 * it, or it was already deleted by self, or it was concurrently
-			 * deleted by another transaction), then we should skip the insert
-			 * as well; otherwise, an UPDATE could cause an increase in the
-			 * total number of rows across all partitions, which is clearly
-			 * wrong.
-			 *
-			 * For a normal UPDATE, the case where the tuple has been the
-			 * subject of a concurrent UPDATE or DELETE would be handled by
-			 * the EvalPlanQual machinery, but for an UPDATE that we've
-			 * translated into a DELETE from this partition and an INSERT into
-			 * some other partition, that's not available, because CTID chains
-			 * can't span relation boundaries.  We mimic the semantics to a
-			 * limited extent by skipping the INSERT if the DELETE fails to
-			 * find a tuple. This ensures that two concurrent attempts to
-			 * UPDATE the same tuple at the same time can't turn one tuple
-			 * into two, and that an UPDATE of a just-deleted tuple can't
-			 * resurrect it.
+			 * ExecCrossPartitionUpdate will first DELETE the row from the
+			 * partition it's currently in and then insert it back into the
+			 * root table, which will re-route it to the correct partition.
+			 * The first part may have to be repeated if it is detected that
+			 * the tuple we're trying to move has been concurrently updated.
 			 */
-			if (!tuple_deleted)
-			{
-				/*
-				 * epqslot will be typically NULL.  But when ExecDelete()
-				 * finds that another transaction has concurrently updated the
-				 * same row, it re-fetches the row, skips the delete, and
-				 * epqslot is set to the re-fetched tuple slot. In that case,
-				 * we need to do all the checks again.
-				 */
-				if (TupIsNull(epqslot))
-					return NULL;
-				else
-				{
-					slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
-					goto lreplace;
-				}
-			}
-
-			/*
-			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
-			 * should convert the tuple into root's tuple descriptor, since
-			 * ExecInsert() starts the search from root.  The tuple conversion
-			 * map list is in the order of mtstate->resultRelInfo[], so to
-			 * retrieve the one for this resultRel, we need to know the
-			 * position of the resultRel in mtstate->resultRelInfo[].
-			 */
-			map_index = resultRelInfo - mtstate->resultRelInfo;
-			Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-			tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
-			if (tupconv_map != NULL)
-				slot = execute_attr_map_slot(tupconv_map->attrMap,
-											 slot,
-											 mtstate->mt_root_tuple_slot);
-
-			/*
-			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
-			 * so save the currently active map.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/* Tuple routing starts from the root table. */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
-								  planSlot, estate, canSetTag);
-
-			/* Clear the INSERT's tuple and restore the saved map. */
-			if (mtstate->mt_transition_capture)
+			retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
+											  oldtuple, slot, planSlot,
+											  epqstate, canSetTag,
+											  &retry_slot, &inserted_tuple);
+			if (retry)
 			{
-				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-				mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
+				slot = retry_slot;
+				goto lreplace;
 			}
 
-			return ret_slot;
+			return inserted_tuple;
 		}
 
 		/*
-- 
1.8.3.1

v13-0003-Remove-es_result_relation_info.patchapplication/octet-stream; name=v13-0003-Remove-es_result_relation_info.patchDownload
From e75bae4f6ad8f26faae0bef502af8a09ddbc4dfe Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 19 Jul 2019 14:53:20 +0900
Subject: [PATCH v13 3/5] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.
---
 src/backend/commands/copy.c              |  19 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |  12 +-
 src/backend/executor/execMain.c          |   6 -
 src/backend/executor/execReplication.c   |  24 ++--
 src/backend/executor/execUtils.c         |   1 -
 src/backend/executor/nodeModifyTable.c   | 193 +++++++++++++------------------
 src/backend/replication/logical/worker.c |  44 +++----
 src/include/executor/executor.h          |  19 ++-
 src/include/executor/nodeModifyTable.h   |   4 +-
 src/include/nodes/execnodes.h            |   2 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 146 insertions(+), 188 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6948214..2d9e1f3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2489,9 +2489,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2524,7 +2521,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2839,8 +2837,6 @@ CopyFrom(CopyState cstate)
 
 	ExecOpenIndices(resultRelInfo, false);
 
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Set up a ModifyTableState so we can let FDW(s) init themselves for
 	 * foreign-table result relation(s).
@@ -3109,11 +3105,6 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
-			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
 			 */
@@ -3217,7 +3208,8 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot, CMD_INSERT);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot,
+											   CMD_INSERT);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3288,7 +3280,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1cd8cfb..feec994 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1821,7 +1821,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1951,7 +1950,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 1862af6..e1d34be 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -498,10 +496,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	ItemPointerSetInvalid(conflictTid);
 	ItemPointerSetInvalid(&invalidItemPtr);
 
-	/*
-	 * Get information from the result relation info structure.
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f62fd8f..425d018 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -828,13 +828,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	estate->es_plannedstmt = plannedstmt;
 
 	if (plannedstmt->resultRelations)
-	{
 		ExecInitResultRelationsArray(estate);
 
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-	}
-
 	/*
 	 * Next, build the ExecRowMark array from the PlanRowMark(s), if any.
 	 */
@@ -2710,7 +2705,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	if (parentestate->es_result_relations)
 		ExecInitResultRelationsArray(rcestate);
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index b29db7b..01d2688 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -404,10 +404,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -430,7 +430,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -442,7 +443,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -466,11 +468,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -496,7 +498,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -508,7 +511,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -527,11 +531,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 169dd92..d96093f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -124,7 +124,6 @@ CreateExecutorState(void)
 	estate->es_output_cid = (CommandId) 0;
 
 	estate->es_result_relations = NULL;
-	estate->es_result_relation_info = NULL;
 	estate->es_tuple_routing_result_relations = NIL;
 	estate->es_trig_target_relations = NIL;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9b27d34..9c51320 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,10 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -366,32 +368,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
-
-	ExecMaterializeSlot(slot);
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 
 	/*
-	 * get information on the (current) result relation
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
+
+	ExecMaterializeSlot(slot);
+
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -424,7 +442,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -459,7 +478,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -521,8 +541,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -582,7 +602,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -621,7 +642,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -707,6 +729,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -718,7 +741,6 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
@@ -728,10 +750,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW DELETE Triggers */
@@ -1067,6 +1085,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1075,12 +1094,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1090,10 +1107,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1120,7 +1133,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1157,7 +1171,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1207,6 +1222,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1232,9 +1248,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1275,16 +1294,6 @@ lreplace:;
 			}
 
 			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
-			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
 			 * ExecInsert() starts the search from root.  The tuple conversion
@@ -1301,18 +1310,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
+			/* Tuple routing starts from the root table. */
+			Assert(mtstate->rootResultRelInfo != NULL);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1476,7 +1485,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1715,7 +1725,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1872,41 +1882,37 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
- *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
-	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
 	 */
@@ -2016,10 +2022,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2068,17 +2072,6 @@ ExecModifyTable(PlanState *pstate)
 	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
-	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
 	 */
@@ -2111,7 +2104,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2156,7 +2148,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2239,25 +2230,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2269,15 +2256,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2298,7 +2279,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l,
@@ -2346,14 +2326,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	forboth(l, node->resultRelations, l1, node->plans)
@@ -2400,7 +2374,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2424,8 +2397,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 77f71e5..d92b90f 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -361,7 +361,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
 
 	estate->es_result_relations[0] = resultRelInfo;
-	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
 
@@ -1150,6 +1149,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -1176,6 +1176,7 @@ apply_handle_insert(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -1191,11 +1192,10 @@ apply_handle_insert(StringInfo s)
 
 	/* For a partitioned table, insert the tuple into a partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, NULL, rel, CMD_INSERT);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, NULL,
+								   rel, CMD_INSERT);
 	else
-		apply_handle_insert_internal(estate->es_result_relation_info, estate,
-									 remoteslot);
+		apply_handle_insert_internal(resultRelInfo, estate, remoteslot);
 
 	PopActiveSnapshot();
 
@@ -1218,7 +1218,7 @@ apply_handle_insert_internal(ResultRelInfo *relinfo,
 	ExecOpenIndices(relinfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(relinfo, estate, remoteslot);
 
 	/* Cleanup. */
 	ExecCloseIndices(relinfo);
@@ -1265,6 +1265,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	EState	   *estate;
@@ -1298,6 +1299,7 @@ apply_handle_update(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -1337,11 +1339,11 @@ apply_handle_update(StringInfo s)
 
 	/* For a partitioned table, apply update to correct partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, &newtup, rel, CMD_UPDATE);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, &newtup,
+								   rel, CMD_UPDATE);
 	else
-		apply_handle_update_internal(estate->es_result_relation_info, estate,
-									 remoteslot, &newtup, rel);
+		apply_handle_update_internal(resultRelInfo, estate, remoteslot,
+									 &newtup, rel);
 
 	PopActiveSnapshot();
 
@@ -1392,7 +1394,8 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -1420,6 +1423,7 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -1449,6 +1453,7 @@ apply_handle_delete(StringInfo s)
 
 	/* Initialize the executor state. */
 	estate = create_estate_for_relation(rel);
+	resultRelInfo = estate->es_result_relations[0];
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
@@ -1462,11 +1467,11 @@ apply_handle_delete(StringInfo s)
 
 	/* For a partitioned table, apply delete to correct partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
-								   remoteslot, NULL, rel, CMD_DELETE);
+		apply_handle_tuple_routing(resultRelInfo, estate, remoteslot, NULL,
+								   rel, CMD_DELETE);
 	else
-		apply_handle_delete_internal(estate->es_result_relation_info, estate,
-									 remoteslot, &rel->remoterel);
+		apply_handle_delete_internal(resultRelInfo, estate, remoteslot,
+									 &rel->remoterel);
 
 	PopActiveSnapshot();
 
@@ -1504,7 +1509,7 @@ apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate,
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(relinfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -1612,7 +1617,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 	}
 	MemoryContextSwitchTo(oldctx);
 
-	estate->es_result_relation_info = partrelinfo;
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -1693,8 +1697,8 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					ExecOpenIndices(partrelinfo, false);
 
 					EvalPlanQualSetSlot(&epqstate, remoteslot_part);
-					ExecSimpleRelationUpdate(estate, &epqstate, localslot,
-											 remoteslot_part);
+					ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,
+											 localslot, remoteslot_part);
 					ExecCloseIndices(partrelinfo);
 					EvalPlanQualEnd(&epqstate);
 				}
@@ -1735,7 +1739,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					Assert(partrelinfo_new != partrelinfo);
 
 					/* DELETE old tuple found in the old partition. */
-					estate->es_result_relation_info = partrelinfo;
 					apply_handle_delete_internal(partrelinfo, estate,
 												 localslot,
 												 &relmapentry->remoterel);
@@ -1767,7 +1770,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 						slot_getallattrs(remoteslot);
 					}
 					MemoryContextSwitchTo(oldctx);
-					estate->es_result_relation_info = partrelinfo_new;
 					apply_handle_insert_internal(partrelinfo_new, estate,
 												 remoteslot_part);
 				}
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 54455e1..610169f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -578,10 +578,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -598,10 +602,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 4ec4ebd..2518fe4 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,9 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7eb5ca6..36a3318 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -526,8 +526,6 @@ typedef struct EState
 	/* If query can insert/delete tuples, the command ID to mark them with */
 	CommandId	es_output_cid;
 
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
-
 	PartitionDirectory es_partition_directory;	/* for PartitionDesc lookup */
 
 	/*
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index eb9d45b..da50ee3 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index ffd4aac..963faa1 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
1.8.3.1

#53Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#52)
2 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On 09/10/2020 11:01, Amit Langote wrote:

On Thu, Oct 8, 2020 at 9:35 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Oct 7, 2020 at 9:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 07/10/2020 12:50, Amit Langote wrote:

I have thought about something like this before. An idea I had is to
make es_result_relations array indexable by plain RT indexes, then we
don't need to maintain separate indexes that we do today for result
relations.

That sounds like a good idea. es_result_relations is currently an array
of ResultRelInfos, so that would leave a lot of unfilled structs in the
array. But in on of your other threads, you proposed turning
es_result_relations into an array of pointers anyway
(/messages/by-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com).

Okay, I am reorganizing the patches around that idea and will post an
update soon.

Attached updated patches.

0001 makes es_result_relations an RTI-indexable array, which allows to
get rid of all "result relation index" fields across the code.

Thanks! A couple small things I wanted to check with you before committing:

1. We have many different cleanup/close routines now:
ExecCloseResultRelations, ExecCloseRangeTableRelations, and
ExecCleanUpTriggerState. Do we need them all? It seems to me that we
could merge ExecCloseRangeTableRelations() and
ExecCleanUpTriggerState(), they seem to do roughly the same thing: close
relations that were opened for ResultRelInfos. They are always called
together, except in afterTriggerInvokeEvents(). And in
afterTriggerInvokeEvents() too, there would be no harm in doing both,
even though we know there aren't any entries in the es_result_relations
array at that point.

2. The way this is handled in worker.c is a bit funny. In
create_estate_for_relation(), you create a ResultRelInfo, but you
*don't* put it in the es_opened_result_relations list. That's
surprising, but I'm also surprised there are no
ExecCloseResultRelations() calls before the FreeExecutorState() calls in
worker.c. It's not needed because the
apply_handle_insert/update/delete_internal() functions call
ExecCloseIndices() directly, so they don't rely on the
ExecCloseResultRelations() function for cleanup. That works too, but
it's a bit surprising because it's different from how it's done in
copy.c and nodeModifyTable.c. It would feel natural to rely on
ExecCloseResultRelations() in worker.c as well, but on the other hand,
it also calls ExecOpenIndices() in a more lazy fashion, and it makes
sense to call ExecCloseIndices() in the same functions that
ExecOpenIndices() is called. So I'm not sure if changing that would be
an improvement overall. What do you think? Did you consider doing that?

Attached is your original patch v13, and a patch on top of it that
merges ExecCloseResultRelations() and ExecCleanUpTriggerState(), and
makes some minor comment changes. I didn't do anything about the
worker.c business, aside from adding a comment about it.

- Heikki

Attachments:

0001-Make-es_result_relations-array-indexable-by-RT-index.patchtext/x-patch; charset=UTF-8; name=0001-Make-es_result_relations-array-indexable-by-RT-index.patchDownload
From 17e22c13509941c561346f386d53fd4f4ff16ef8 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 9 Oct 2020 13:52:29 +0900
Subject: [PATCH 1/2] Make es_result_relations array indexable by RT index

This allows us to get rid of the need to keep around a separate set
of indexes for each result relation.
---
 src/backend/commands/copy.c              |  20 +--
 src/backend/commands/explain.c           |  17 +-
 src/backend/commands/tablecmds.c         |  12 +-
 src/backend/executor/execMain.c          | 189 ++++++-----------------
 src/backend/executor/execUtils.c         |  73 ++++++---
 src/backend/executor/nodeModifyTable.c   |  24 ++-
 src/backend/nodes/copyfuncs.c            |   2 -
 src/backend/nodes/outfuncs.c             |   2 -
 src/backend/nodes/readfuncs.c            |   2 -
 src/backend/optimizer/plan/createplan.c  |   2 -
 src/backend/optimizer/plan/setrefs.c     |  10 +-
 src/backend/replication/logical/worker.c |   4 +-
 src/include/executor/executor.h          |   5 +
 src/include/nodes/execnodes.h            |  21 ++-
 src/include/nodes/plannodes.h            |   2 -
 15 files changed, 162 insertions(+), 223 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3c7dbad27a..6948214334 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2829,25 +2829,18 @@ CopyFrom(CopyState cstate)
 	 * index-entry-making machinery.  (There used to be a huge amount of code
 	 * here that basically duplicated execUtils.c ...)
 	 */
-	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo,
-					  cstate->rel,
-					  1,		/* must match rel's position in range_table */
-					  NULL,
-					  0);
-	target_resultRelInfo = resultRelInfo;
+	ExecInitRangeTable(estate, cstate->range_table);
+	ExecInitResultRelationsArray(estate);
+	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
+	ExecInitResultRelation(estate, resultRelInfo, 1);
 
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
 	ExecOpenIndices(resultRelInfo, false);
 
-	estate->es_result_relations = resultRelInfo;
-	estate->es_num_result_relations = 1;
 	estate->es_result_relation_info = resultRelInfo;
 
-	ExecInitRangeTable(estate, cstate->range_table);
-
 	/*
 	 * Set up a ModifyTableState so we can let FDW(s) init themselves for
 	 * foreign-table result relation(s).
@@ -2856,7 +2849,7 @@ CopyFrom(CopyState cstate)
 	mtstate->ps.plan = NULL;
 	mtstate->ps.state = estate;
 	mtstate->operation = CMD_INSERT;
-	mtstate->resultRelInfo = estate->es_result_relations;
+	mtstate->resultRelInfo = resultRelInfo;
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
@@ -3359,7 +3352,8 @@ CopyFrom(CopyState cstate)
 	if (insertMethod != CIM_SINGLE)
 		CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
-	ExecCloseIndices(target_resultRelInfo);
+	ExecCloseResultRelations(estate);
+	ExecCloseRangeTableRelations(estate);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c98c9b5547..3210f90bd1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -769,13 +769,14 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
 {
 	ResultRelInfo *rInfo;
 	bool		show_relname;
-	int			numrels = queryDesc->estate->es_num_result_relations;
-	int			numrootrels = queryDesc->estate->es_num_root_result_relations;
+	int			numrels = list_length(queryDesc->plannedstmt->resultRelations);
+	int			numrootrels = list_length(queryDesc->plannedstmt->rootResultRelations);
+	List	   *resultrels;
 	List	   *routerels;
 	List	   *targrels;
-	int			nr;
 	ListCell   *l;
 
+	resultrels = queryDesc->estate->es_opened_result_relations;
 	routerels = queryDesc->estate->es_tuple_routing_result_relations;
 	targrels = queryDesc->estate->es_trig_target_relations;
 
@@ -783,13 +784,11 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
 
 	show_relname = (numrels > 1 || numrootrels > 0 ||
 					routerels != NIL || targrels != NIL);
-	rInfo = queryDesc->estate->es_result_relations;
-	for (nr = 0; nr < numrels; rInfo++, nr++)
-		report_triggers(rInfo, show_relname, es);
-
-	rInfo = queryDesc->estate->es_root_result_relations;
-	for (nr = 0; nr < numrootrels; rInfo++, nr++)
+	foreach(l, resultrels)
+	{
+		rInfo = lfirst(l);
 		report_triggers(rInfo, show_relname, es);
+	}
 
 	foreach(l, routerels)
 	{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e0ac4e05e5..1cd8cfb41d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1693,6 +1693,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	SubTransactionId mySubid;
 	ListCell   *cell;
 	Oid		   *logrelids;
+	int			dummy_rti;
 
 	/*
 	 * Check the explicitly-specified relations.
@@ -1792,19 +1793,24 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfos = (ResultRelInfo *)
 		palloc(list_length(rels) * sizeof(ResultRelInfo));
 	resultRelInfo = resultRelInfos;
+	estate->es_result_relations = (ResultRelInfo **)
+		palloc(list_length(rels) * sizeof(ResultRelInfo *));
+	dummy_rti = 1;
 	foreach(cell, rels)
 	{
 		Relation	rel = (Relation) lfirst(cell);
 
 		InitResultRelInfo(resultRelInfo,
 						  rel,
-						  0,	/* dummy rangetable index */
+						  dummy_rti,
 						  NULL,
 						  0);
+		estate->es_result_relations[dummy_rti - 1] = resultRelInfo;
+		estate->es_opened_result_relations =
+			lappend(estate->es_opened_result_relations, resultRelInfo);
 		resultRelInfo++;
+		dummy_rti++;
 	}
-	estate->es_result_relations = resultRelInfos;
-	estate->es_num_result_relations = list_length(rels);
 
 	/*
 	 * Process all BEFORE STATEMENT TRUNCATE triggers before we begin
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2e27e26ba4..f62fd8f7aa 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -827,85 +827,12 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 	estate->es_plannedstmt = plannedstmt;
 
-	/*
-	 * Initialize ResultRelInfo data structures, and open the result rels.
-	 */
 	if (plannedstmt->resultRelations)
 	{
-		List	   *resultRelations = plannedstmt->resultRelations;
-		int			numResultRelations = list_length(resultRelations);
-		ResultRelInfo *resultRelInfos;
-		ResultRelInfo *resultRelInfo;
-
-		resultRelInfos = (ResultRelInfo *)
-			palloc(numResultRelations * sizeof(ResultRelInfo));
-		resultRelInfo = resultRelInfos;
-		foreach(l, resultRelations)
-		{
-			Index		resultRelationIndex = lfirst_int(l);
-			Relation	resultRelation;
-
-			resultRelation = ExecGetRangeTableRelation(estate,
-													   resultRelationIndex);
-			InitResultRelInfo(resultRelInfo,
-							  resultRelation,
-							  resultRelationIndex,
-							  NULL,
-							  estate->es_instrument);
-			resultRelInfo++;
-		}
-		estate->es_result_relations = resultRelInfos;
-		estate->es_num_result_relations = numResultRelations;
+		ExecInitResultRelationsArray(estate);
 
 		/* es_result_relation_info is NULL except when within ModifyTable */
 		estate->es_result_relation_info = NULL;
-
-		/*
-		 * In the partitioned result relation case, also build ResultRelInfos
-		 * for all the partitioned table roots, because we will need them to
-		 * fire statement-level triggers, if any.
-		 */
-		if (plannedstmt->rootResultRelations)
-		{
-			int			num_roots = list_length(plannedstmt->rootResultRelations);
-
-			resultRelInfos = (ResultRelInfo *)
-				palloc(num_roots * sizeof(ResultRelInfo));
-			resultRelInfo = resultRelInfos;
-			foreach(l, plannedstmt->rootResultRelations)
-			{
-				Index		resultRelIndex = lfirst_int(l);
-				Relation	resultRelDesc;
-
-				resultRelDesc = ExecGetRangeTableRelation(estate,
-														  resultRelIndex);
-				InitResultRelInfo(resultRelInfo,
-								  resultRelDesc,
-								  resultRelIndex,
-								  NULL,
-								  estate->es_instrument);
-				resultRelInfo++;
-			}
-
-			estate->es_root_result_relations = resultRelInfos;
-			estate->es_num_root_result_relations = num_roots;
-		}
-		else
-		{
-			estate->es_root_result_relations = NULL;
-			estate->es_num_root_result_relations = 0;
-		}
-	}
-	else
-	{
-		/*
-		 * if no result relation, then set state appropriately
-		 */
-		estate->es_result_relations = NULL;
-		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
-		estate->es_root_result_relations = NULL;
-		estate->es_num_root_result_relations = 0;
 	}
 
 	/*
@@ -1334,8 +1261,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
  *
  * Most of the time, triggers are fired on one of the result relations of the
  * query, and so we can just return a member of the es_result_relations array,
- * or the es_root_result_relations array (if any), or the
- * es_tuple_routing_result_relations list (if any).  (Note: in self-join
+ * or the es_tuple_routing_result_relations list (if any). (Note: in self-join
  * situations there might be multiple members with the same OID; if so it
  * doesn't matter which one we pick.)
  *
@@ -1352,30 +1278,16 @@ ResultRelInfo *
 ExecGetTriggerResultRel(EState *estate, Oid relid)
 {
 	ResultRelInfo *rInfo;
-	int			nr;
 	ListCell   *l;
 	Relation	rel;
 	MemoryContext oldcontext;
 
 	/* First, search through the query result relations */
-	rInfo = estate->es_result_relations;
-	nr = estate->es_num_result_relations;
-	while (nr > 0)
-	{
-		if (RelationGetRelid(rInfo->ri_RelationDesc) == relid)
-			return rInfo;
-		rInfo++;
-		nr--;
-	}
-	/* Second, search through the root result relations, if any */
-	rInfo = estate->es_root_result_relations;
-	nr = estate->es_num_root_result_relations;
-	while (nr > 0)
+	foreach(l, estate->es_opened_result_relations)
 	{
+		rInfo = lfirst(l);
 		if (RelationGetRelid(rInfo->ri_RelationDesc) == relid)
 			return rInfo;
-		rInfo++;
-		nr--;
 	}
 
 	/*
@@ -1512,9 +1424,6 @@ ExecPostprocessPlan(EState *estate)
 static void
 ExecEndPlan(PlanState *planstate, EState *estate)
 {
-	ResultRelInfo *resultRelInfo;
-	Index		num_relations;
-	Index		i;
 	ListCell   *l;
 
 	/*
@@ -1540,30 +1449,52 @@ ExecEndPlan(PlanState *planstate, EState *estate)
 	 */
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
+	/* Close indexes of result relation(s), if any. */
+	ExecCloseResultRelations(estate);
+
+	/*
+	 * close whatever rangetable Relations have been opened.  We do not
+	 * release any locks we might hold on those rels.
+	 */
+	ExecCloseRangeTableRelations(estate);
+
+	/* likewise close any trigger target relations */
+	ExecCleanUpTriggerState(estate);
+}
+
+/*
+ * ExecCloseResultRelations
+ */
+void
+ExecCloseResultRelations(EState *estate)
+{
+	ListCell *l;
+
 	/*
 	 * close indexes of result relation(s) if any.  (Rels themselves get
 	 * closed next.)
 	 */
-	resultRelInfo = estate->es_result_relations;
-	for (i = estate->es_num_result_relations; i > 0; i--)
+	foreach(l, estate->es_opened_result_relations)
 	{
+		ResultRelInfo *resultRelInfo = lfirst(l);
+
 		ExecCloseIndices(resultRelInfo);
-		resultRelInfo++;
 	}
+}
 
-	/*
-	 * close whatever rangetable Relations have been opened.  We do not
-	 * release any locks we might hold on those rels.
-	 */
-	num_relations = estate->es_range_table_size;
-	for (i = 0; i < num_relations; i++)
+/*
+ * ExecCloseRangeTableRelations
+ */
+void
+ExecCloseRangeTableRelations(EState *estate)
+{
+	int		i;
+
+	for (i = 0; i < estate->es_range_table_size; i++)
 	{
 		if (estate->es_relations[i])
 			table_close(estate->es_relations[i], NoLock);
 	}
-
-	/* likewise close any trigger target relations */
-	ExecCleanUpTriggerState(estate);
 }
 
 /* ----------------------------------------------------------------
@@ -2758,17 +2689,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 
 	/*
 	 * Child EPQ EStates share the parent's copy of unchanging state such as
-	 * the snapshot, rangetable, result-rel info, and external Param info.
+	 * the snapshot, rangetable, and external Param info.
 	 * They need their own copies of local state, including a tuple table,
-	 * es_param_exec_vals, etc.
-	 *
-	 * The ResultRelInfo array management is trickier than it looks.  We
-	 * create fresh arrays for the child but copy all the content from the
-	 * parent.  This is because it's okay for the child to share any
-	 * per-relation state the parent has already created --- but if the child
-	 * sets up any ResultRelInfo fields, such as its own junkfilter, that
-	 * state must *not* propagate back to the parent.  (For one thing, the
-	 * pointed-to data is in a memory context that won't last long enough.)
+	 * es_param_exec_vals, result-rel info, etc.
 	 */
 	rcestate->es_direction = ForwardScanDirection;
 	rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2781,30 +2704,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_plannedstmt = parentestate->es_plannedstmt;
 	rcestate->es_junkFilter = parentestate->es_junkFilter;
 	rcestate->es_output_cid = parentestate->es_output_cid;
-	if (parentestate->es_num_result_relations > 0)
-	{
-		int			numResultRelations = parentestate->es_num_result_relations;
-		int			numRootResultRels = parentestate->es_num_root_result_relations;
-		ResultRelInfo *resultRelInfos;
-
-		resultRelInfos = (ResultRelInfo *)
-			palloc(numResultRelations * sizeof(ResultRelInfo));
-		memcpy(resultRelInfos, parentestate->es_result_relations,
-			   numResultRelations * sizeof(ResultRelInfo));
-		rcestate->es_result_relations = resultRelInfos;
-		rcestate->es_num_result_relations = numResultRelations;
-
-		/* Also transfer partitioned root result relations. */
-		if (numRootResultRels > 0)
-		{
-			resultRelInfos = (ResultRelInfo *)
-				palloc(numRootResultRels * sizeof(ResultRelInfo));
-			memcpy(resultRelInfos, parentestate->es_root_result_relations,
-				   numRootResultRels * sizeof(ResultRelInfo));
-			rcestate->es_root_result_relations = resultRelInfos;
-			rcestate->es_num_root_result_relations = numRootResultRels;
-		}
-	}
+	/*
+	 * ResultRelInfos needed by subplans are initialized from scratch when
+	 * the subplans themselves are initialized.
+	 */
+	if (parentestate->es_result_relations)
+		ExecInitResultRelationsArray(rcestate);
 	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
@@ -2954,6 +2859,8 @@ EvalPlanQualEnd(EPQState *epqstate)
 		ExecEndNode(subplanstate);
 	}
 
+	ExecCloseResultRelations(estate);
+
 	/* throw away the per-estate tuple table, some node may have used it */
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index d0e65b8647..169dd925ad 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -124,14 +124,8 @@ CreateExecutorState(void)
 	estate->es_output_cid = (CommandId) 0;
 
 	estate->es_result_relations = NULL;
-	estate->es_num_result_relations = 0;
 	estate->es_result_relation_info = NULL;
-
-	estate->es_root_result_relations = NULL;
-	estate->es_num_root_result_relations = 0;
-
 	estate->es_tuple_routing_result_relations = NIL;
-
 	estate->es_trig_target_relations = NIL;
 
 	estate->es_param_list_info = NULL;
@@ -711,16 +705,10 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
 bool
 ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 {
-	ResultRelInfo *resultRelInfos;
-	int			i;
-
-	resultRelInfos = estate->es_result_relations;
-	for (i = 0; i < estate->es_num_result_relations; i++)
-	{
-		if (resultRelInfos[i].ri_RangeTableIndex == scanrelid)
-			return true;
-	}
-	return false;
+	return	list_member_int(estate->es_plannedstmt->resultRelations,
+							scanrelid) ||
+			list_member_int(estate->es_plannedstmt->rootResultRelations,
+							scanrelid);
 }
 
 /* ----------------------------------------------------------------
@@ -779,8 +767,8 @@ ExecInitRangeTable(EState *estate, List *rangeTable)
 		palloc0(estate->es_range_table_size * sizeof(Relation));
 
 	/*
-	 * es_rowmarks is also parallel to the es_range_table, but it's allocated
-	 * only if needed.
+	 * es_result_relations and es_rowmarks are also parallel to es_range_table,
+	 * but are only allocated if needed.
 	 */
 	estate->es_rowmarks = NULL;
 }
@@ -835,6 +823,55 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
 	return rel;
 }
 
+/*
+ * ExecInitResultRelationsArray
+ *		Allocate space to hold ResultRelInfo pointers of result relations
+ *
+ * Although not relations in the range table may be result relations, we
+ * allocate that many pointers, because that allows to access individual
+ * entries by RT index (minus 1 to be accurate), which is convenient.
+ */
+void
+ExecInitResultRelationsArray(EState *estate)
+{
+	/*
+	 * Individual pointers are assigned when ExecInitResultRelation() is
+	 * called one-by-one for each result relation.
+	 */
+	estate->es_result_relations = (ResultRelInfo **)
+		palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
+}
+
+/*
+ * ExecInitResultRelation
+ *		Open result relation given by the passed-in RT index and fill its
+ *		ResultRelInfo node
+ *
+ * Here, we also save the ResultRelInfo in estate->es_result_relations array
+ * such that it can be accessed later using the RT index.
+ */
+void
+ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
+					   Index rti)
+{
+	Relation	resultRelationDesc;
+
+	resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+	InitResultRelInfo(resultRelInfo,
+					  resultRelationDesc,
+					  rti,
+					  NULL,
+					  estate->es_instrument);
+	estate->es_result_relations[rti - 1] = resultRelInfo;
+
+	/*
+	 * Saving in the list allows to avoid needlessly traversing the whole
+	 * array when only a few of its entries are possibly non-NULL.
+	 */
+	estate->es_opened_result_relations =
+		lappend(estate->es_opened_result_relations, resultRelInfo);
+}
+
 /*
  * UpdateChangedParamSet
  *		Add changed parameters to a plan node's chgParam set
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9812089161..9b27d34ba5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2301,7 +2301,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	ListCell   *l;
+	ListCell   *l,
+			   *l1;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
@@ -2322,13 +2323,17 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_done = false;
 
 	mtstate->mt_plans = (PlanState **) palloc0(sizeof(PlanState *) * nplans);
-	mtstate->resultRelInfo = estate->es_result_relations + node->resultRelIndex;
+	mtstate->resultRelInfo = (ResultRelInfo *)
+		palloc(nplans * sizeof(ResultRelInfo));
 	mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
 
 	/* If modifying a partitioned table, initialize the root table info */
-	if (node->rootResultRelIndex >= 0)
-		mtstate->rootResultRelInfo = estate->es_root_result_relations +
-			node->rootResultRelIndex;
+	if (node->rootRelation > 0)
+	{
+		mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
+		ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
+							   node->rootRelation);
+	}
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2351,9 +2356,14 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
-	foreach(l, node->plans)
+	forboth(l, node->resultRelations, l1, node->plans)
 	{
-		subplan = (Plan *) lfirst(l);
+		Index		resultRelation = lfirst_int(l);
+
+		subplan = (Plan *) lfirst(l1);
+
+		/* This opens result relation and fills ResultRelInfo. */
+		ExecInitResultRelation(estate, resultRelInfo, resultRelation);
 
 		/* Initialize the usesFdwDirectModify flag */
 		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0409a40b82..7974fa01e6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -207,8 +207,6 @@ _copyModifyTable(const ModifyTable *from)
 	COPY_SCALAR_FIELD(rootRelation);
 	COPY_SCALAR_FIELD(partColsUpdated);
 	COPY_NODE_FIELD(resultRelations);
-	COPY_SCALAR_FIELD(resultRelIndex);
-	COPY_SCALAR_FIELD(rootResultRelIndex);
 	COPY_NODE_FIELD(plans);
 	COPY_NODE_FIELD(withCheckOptionLists);
 	COPY_NODE_FIELD(returningLists);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f0386480ab..f35d2a04a7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -408,8 +408,6 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
 	WRITE_UINT_FIELD(rootRelation);
 	WRITE_BOOL_FIELD(partColsUpdated);
 	WRITE_NODE_FIELD(resultRelations);
-	WRITE_INT_FIELD(resultRelIndex);
-	WRITE_INT_FIELD(rootResultRelIndex);
 	WRITE_NODE_FIELD(plans);
 	WRITE_NODE_FIELD(withCheckOptionLists);
 	WRITE_NODE_FIELD(returningLists);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..73b8d9d7f6 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1639,8 +1639,6 @@ _readModifyTable(void)
 	READ_UINT_FIELD(rootRelation);
 	READ_BOOL_FIELD(partColsUpdated);
 	READ_NODE_FIELD(resultRelations);
-	READ_INT_FIELD(resultRelIndex);
-	READ_INT_FIELD(rootResultRelIndex);
 	READ_NODE_FIELD(plans);
 	READ_NODE_FIELD(withCheckOptionLists);
 	READ_NODE_FIELD(returningLists);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 3d7a4e373f..881eaf4813 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6808,8 +6808,6 @@ make_modifytable(PlannerInfo *root,
 	node->rootRelation = rootRelation;
 	node->partColsUpdated = partColsUpdated;
 	node->resultRelations = resultRelations;
-	node->resultRelIndex = -1;	/* will be set correctly in setrefs.c */
-	node->rootResultRelIndex = -1;	/* will be set correctly in setrefs.c */
 	node->plans = subplans;
 	if (!onconflict)
 	{
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dd8e2e966d..66372d30fe 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -975,24 +975,18 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 
 				/*
 				 * Append this ModifyTable node's final result relation RT
-				 * index(es) to the global list for the plan, and set its
-				 * resultRelIndex to reflect their starting position in the
-				 * global list.
+				 * index(es) to the global list for the plan.
 				 */
-				splan->resultRelIndex = list_length(root->glob->resultRelations);
 				root->glob->resultRelations =
 					list_concat(root->glob->resultRelations,
 								splan->resultRelations);
 
 				/*
 				 * If the main target relation is a partitioned table, also
-				 * add the partition root's RT index to rootResultRelations,
-				 * and remember its index in that list in rootResultRelIndex.
+				 * add the partition root's RT index to rootResultRelations.
 				 */
 				if (splan->rootRelation)
 				{
-					splan->rootResultRelIndex =
-						list_length(root->glob->rootResultRelations);
 					root->glob->rootResultRelations =
 						lappend_int(root->glob->rootResultRelations,
 									splan->rootRelation);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9c6fdeeb56..77f71e52d4 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -355,12 +355,12 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	rte->relkind = rel->localrel->rd_rel->relkind;
 	rte->rellockmode = AccessShareLock;
 	ExecInitRangeTable(estate, list_make1(rte));
+	ExecInitResultRelationsArray(estate);
 
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
 
-	estate->es_result_relations = resultRelInfo;
-	estate->es_num_result_relations = 1;
+	estate->es_result_relations[0] = resultRelInfo;
 	estate->es_result_relation_info = resultRelInfo;
 
 	estate->es_output_cid = GetCurrentCommandId(true);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..54455e1446 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -538,6 +538,9 @@ extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable);
+extern void ExecInitResultRelationsArray(EState *estate);
+extern void ExecCloseRangeTableRelations(EState *estate);
+extern void ExecCloseResultRelations(EState *estate);
 
 static inline RangeTblEntry *
 exec_rt_fetch(Index rti, EState *estate)
@@ -546,6 +549,8 @@ exec_rt_fetch(Index rti, EState *estate)
 }
 
 extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
+					   Index rti);
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ef448d67c7..7eb5ca6a04 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -508,6 +508,14 @@ typedef struct EState
 	Index		es_range_table_size;	/* size of the range table arrays */
 	Relation   *es_relations;	/* Array of per-range-table-entry Relation
 								 * pointers, or NULL if not yet opened */
+	ResultRelInfo **es_result_relations;	/* Array of Per-range-table-entry
+											 * ResultRelInfo pointers, or
+											 * NULL if a given range table
+											 * relation not a target
+											 * table */
+	List		*es_opened_result_relations;	/* List of non-NULL entries
+												 * in es_result_relations added
+												 * in no specific order */
 	struct ExecRowMark **es_rowmarks;	/* Array of per-range-table-entry
 										 * ExecRowMarks, or NULL if none */
 	PlannedStmt *es_plannedstmt;	/* link to top of plan tree */
@@ -518,24 +526,13 @@ typedef struct EState
 	/* If query can insert/delete tuples, the command ID to mark them with */
 	CommandId	es_output_cid;
 
-	/* Info about target table(s) for insert/update/delete queries: */
-	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
-	int			es_num_result_relations;	/* length of array */
 	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
-	/*
-	 * Info about the partition root table(s) for insert/update/delete queries
-	 * targeting partitioned tables.  Only leaf partitions are mentioned in
-	 * es_result_relations, but we need access to the roots for firing
-	 * triggers and for runtime tuple routing.
-	 */
-	ResultRelInfo *es_root_result_relations;	/* array of ResultRelInfos */
-	int			es_num_root_result_relations;	/* length of the array */
 	PartitionDirectory es_partition_directory;	/* for PartitionDesc lookup */
 
 	/*
 	 * The following list contains ResultRelInfos created by the tuple routing
-	 * code for partitions that don't already have one.
+	 * code for partitions that aren't found in es_result_relations_array.
 	 */
 	List	   *es_tuple_routing_result_relations;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..06665468a5 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -224,8 +224,6 @@ typedef struct ModifyTable
 	Index		rootRelation;	/* Root RT index, if target is partitioned */
 	bool		partColsUpdated;	/* some part key in hierarchy updated */
 	List	   *resultRelations;	/* integer list of RT indexes */
-	int			resultRelIndex; /* index of first resultRel in plan's list */
-	int			rootResultRelIndex; /* index of the partitioned table root */
 	List	   *plans;			/* plan(s) producing source data */
 	List	   *withCheckOptionLists;	/* per-target-table WCO lists */
 	List	   *returningLists; /* per-target-table RETURNING tlists */
-- 
2.20.1

0002-Merge-ExecCleanUpTriggerState-and-ExecCloseResultRel.patchtext/x-patch; charset=UTF-8; name=0002-Merge-ExecCleanUpTriggerState-and-ExecCloseResultRel.patchDownload
From 807dde3d299a2d4ae89d96ea7422d0feb9c58e4b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 9 Oct 2020 22:09:08 +0300
Subject: [PATCH 2/2] Merge ExecCleanUpTriggerState() and
 ExecCloseResultRelations().

And some other kibitzing.
---
 src/backend/commands/copy.c              |  9 ++-
 src/backend/commands/explain.c           |  2 +-
 src/backend/commands/trigger.c           |  2 +-
 src/backend/executor/execMain.c          | 83 ++++++++++--------------
 src/backend/executor/execUtils.c         |  7 +-
 src/backend/replication/logical/worker.c |  6 +-
 src/include/executor/executor.h          |  1 -
 src/include/nodes/execnodes.h            | 17 ++---
 8 files changed, 59 insertions(+), 68 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6948214334..bfdd366139 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2727,6 +2727,7 @@ CopyFrom(CopyState cstate)
 	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
+	Assert(list_length(cstate->range_table) == 1);
 
 	/*
 	 * The target must be a plain, foreign, or partitioned relation, or have
@@ -3352,15 +3353,13 @@ CopyFrom(CopyState cstate)
 	if (insertMethod != CIM_SINGLE)
 		CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
-	ExecCloseResultRelations(estate);
-	ExecCloseRangeTableRelations(estate);
-
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
 		ExecCleanupTupleRouting(mtstate, proute);
 
-	/* Close any trigger target relations */
-	ExecCleanUpTriggerState(estate);
+	/* Close the result relations */
+	ExecCloseResultRelations(estate);
+	ExecCloseRangeTableRelations(estate);
 
 	FreeExecutorState(estate);
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 3210f90bd1..1cc47b0e77 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -786,7 +786,7 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
 					routerels != NIL || targrels != NIL);
 	foreach(l, resultrels)
 	{
-		rInfo = lfirst(l);
+		rInfo = (ResultRelInfo *) lfirst(l);
 		report_triggers(rInfo, show_relname, es);
 	}
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 672fccff5b..3b4fbdadf4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -4227,7 +4227,7 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 
 	if (local_estate)
 	{
-		ExecCleanUpTriggerState(estate);
+		ExecCloseResultRelations(estate);
 		ExecResetTupleTable(estate->es_tupleTable, false);
 		FreeExecutorState(estate);
 	}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f62fd8f7aa..403d62f2b6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1340,35 +1340,6 @@ ExecGetTriggerResultRel(EState *estate, Oid relid)
 	return rInfo;
 }
 
-/*
- * Close any relations that have been opened by ExecGetTriggerResultRel().
- */
-void
-ExecCleanUpTriggerState(EState *estate)
-{
-	ListCell   *l;
-
-	foreach(l, estate->es_trig_target_relations)
-	{
-		ResultRelInfo *resultRelInfo = (ResultRelInfo *) lfirst(l);
-
-		/*
-		 * Assert this is a "dummy" ResultRelInfo, see above.  Otherwise we
-		 * might be issuing a duplicate close against a Relation opened by
-		 * ExecGetRangeTableRelation.
-		 */
-		Assert(resultRelInfo->ri_RangeTableIndex == 0);
-
-		/*
-		 * Since ExecGetTriggerResultRel doesn't call ExecOpenIndices for
-		 * these rels, we needn't call ExecCloseIndices either.
-		 */
-		Assert(resultRelInfo->ri_NumIndices == 0);
-
-		table_close(resultRelInfo->ri_RelationDesc, NoLock);
-	}
-}
-
 /* ----------------------------------------------------------------
  *		ExecPostprocessPlan
  *
@@ -1449,7 +1420,7 @@ ExecEndPlan(PlanState *planstate, EState *estate)
 	 */
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
-	/* Close indexes of result relation(s), if any. */
+	/* Close result relation(s), if any. */
 	ExecCloseResultRelations(estate);
 
 	/*
@@ -1457,22 +1428,19 @@ ExecEndPlan(PlanState *planstate, EState *estate)
 	 * release any locks we might hold on those rels.
 	 */
 	ExecCloseRangeTableRelations(estate);
-
-	/* likewise close any trigger target relations */
-	ExecCleanUpTriggerState(estate);
 }
 
 /*
- * ExecCloseResultRelations
+ * Close any relations that have been opened for ResultRelInfos.
  */
 void
 ExecCloseResultRelations(EState *estate)
 {
-	ListCell *l;
+	ListCell   *l;
 
 	/*
-	 * close indexes of result relation(s) if any.  (Rels themselves get
-	 * closed next.)
+	 * close indexes of result relation(s) if any.  (Rels themselves are
+	 * closed in ExecCloseRangeTableRelations())
 	 */
 	foreach(l, estate->es_opened_result_relations)
 	{
@@ -1480,15 +1448,36 @@ ExecCloseResultRelations(EState *estate)
 
 		ExecCloseIndices(resultRelInfo);
 	}
+
+	/* Close any relations that have been opened by ExecGetTriggerResultRel(). */
+	foreach(l, estate->es_trig_target_relations)
+	{
+		ResultRelInfo *resultRelInfo = (ResultRelInfo *) lfirst(l);
+
+		/*
+		 * Assert this is a "dummy" ResultRelInfo, see above.  Otherwise we
+		 * might be issuing a duplicate close against a Relation opened by
+		 * ExecGetRangeTableRelation.
+		 */
+		Assert(resultRelInfo->ri_RangeTableIndex == 0);
+
+		/*
+		 * Since ExecGetTriggerResultRel doesn't call ExecOpenIndices for
+		 * these rels, we needn't call ExecCloseIndices either.
+		 */
+		Assert(resultRelInfo->ri_NumIndices == 0);
+
+		table_close(resultRelInfo->ri_RelationDesc, NoLock);
+	}
 }
 
 /*
- * ExecCloseRangeTableRelations
+ * Close all relations opened by ExecGetRangeTableRelation()
  */
 void
 ExecCloseRangeTableRelations(EState *estate)
 {
-	int		i;
+	int			i;
 
 	for (i = 0; i < estate->es_range_table_size; i++)
 	{
@@ -2689,9 +2678,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 
 	/*
 	 * Child EPQ EStates share the parent's copy of unchanging state such as
-	 * the snapshot, rangetable, and external Param info.
-	 * They need their own copies of local state, including a tuple table,
-	 * es_param_exec_vals, result-rel info, etc.
+	 * the snapshot, rangetable, and external Param info.  They need their own
+	 * copies of local state, including a tuple table, es_param_exec_vals,
+	 * result-rel info, etc.
 	 */
 	rcestate->es_direction = ForwardScanDirection;
 	rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2704,9 +2693,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_plannedstmt = parentestate->es_plannedstmt;
 	rcestate->es_junkFilter = parentestate->es_junkFilter;
 	rcestate->es_output_cid = parentestate->es_output_cid;
+
 	/*
-	 * ResultRelInfos needed by subplans are initialized from scratch when
-	 * the subplans themselves are initialized.
+	 * ResultRelInfos needed by subplans are initialized from scratch when the
+	 * subplans themselves are initialized.
 	 */
 	if (parentestate->es_result_relations)
 		ExecInitResultRelationsArray(rcestate);
@@ -2859,13 +2849,10 @@ EvalPlanQualEnd(EPQState *epqstate)
 		ExecEndNode(subplanstate);
 	}
 
-	ExecCloseResultRelations(estate);
-
 	/* throw away the per-estate tuple table, some node may have used it */
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
-	/* close any trigger target relations attached to this EState */
-	ExecCleanUpTriggerState(estate);
+	ExecCloseResultRelations(estate);
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 169dd925ad..eda84bca45 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -770,6 +770,7 @@ ExecInitRangeTable(EState *estate, List *rangeTable)
 	 * es_result_relations and es_rowmarks are also parallel to es_range_table,
 	 * but are only allocated if needed.
 	 */
+	estate->es_result_relations = NULL;
 	estate->es_rowmarks = NULL;
 }
 
@@ -827,9 +828,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
  * ExecInitResultRelationsArray
  *		Allocate space to hold ResultRelInfo pointers of result relations
  *
- * Although not relations in the range table may be result relations, we
- * allocate that many pointers, because that allows to access individual
- * entries by RT index (minus 1 to be accurate), which is convenient.
+ * Usually, only some relations in the range table are result relations, but
+ * we allocate an array with the same size as the range table, so that we
+ * can index it by the RT index (minus 1 to be accurate).
  */
 void
 ExecInitResultRelationsArray(EState *estate)
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 77f71e52d4..07889f7a7b 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -357,9 +357,13 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	ExecInitRangeTable(estate, list_make1(rte));
 	ExecInitResultRelationsArray(estate);
 
+	/*
+	 * Initialize a ResultRelInfo for the target relation.  Note that we
+	 * intentionally don't add it to the es_opened_result_relations list,
+	 * because we do our own cleanup and don't use ExecCloseResultRelations().
+	 */
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
-
 	estate->es_result_relations[0] = resultRelInfo;
 	estate->es_result_relation_info = resultRelInfo;
 
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 54455e1446..76ef5cd91c 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -191,7 +191,6 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Relation partition_root,
 							  int instrument_options);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
-extern void ExecCleanUpTriggerState(EState *estate);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
 extern bool ExecPartitionCheck(ResultRelInfo *resultRelInfo,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7eb5ca6a04..b6500abc24 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -508,14 +508,6 @@ typedef struct EState
 	Index		es_range_table_size;	/* size of the range table arrays */
 	Relation   *es_relations;	/* Array of per-range-table-entry Relation
 								 * pointers, or NULL if not yet opened */
-	ResultRelInfo **es_result_relations;	/* Array of Per-range-table-entry
-											 * ResultRelInfo pointers, or
-											 * NULL if a given range table
-											 * relation not a target
-											 * table */
-	List		*es_opened_result_relations;	/* List of non-NULL entries
-												 * in es_result_relations added
-												 * in no specific order */
 	struct ExecRowMark **es_rowmarks;	/* Array of per-range-table-entry
 										 * ExecRowMarks, or NULL if none */
 	PlannedStmt *es_plannedstmt;	/* link to top of plan tree */
@@ -526,6 +518,15 @@ typedef struct EState
 	/* If query can insert/delete tuples, the command ID to mark them with */
 	CommandId	es_output_cid;
 
+	/* Info about target table(s) for insert/update/delete queries: */
+	ResultRelInfo **es_result_relations;	/* Array of Per-range-table-entry
+											 * ResultRelInfo pointers, or
+											 * NULL if a given range table
+											 * relation not a target
+											 * table */
+	List		*es_opened_result_relations;	/* List of non-NULL entries
+												 * in es_result_relations added
+												 * in no specific order */
 	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	PartitionDirectory es_partition_directory;	/* for PartitionDesc lookup */
-- 
2.20.1

#54Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#53)
Re: partition routing layering in nodeModifyTable.c

On Mon, Oct 12, 2020 at 8:12 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 09/10/2020 11:01, Amit Langote wrote:

On Thu, Oct 8, 2020 at 9:35 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Oct 7, 2020 at 9:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 07/10/2020 12:50, Amit Langote wrote:

I have thought about something like this before. An idea I had is to
make es_result_relations array indexable by plain RT indexes, then we
don't need to maintain separate indexes that we do today for result
relations.

That sounds like a good idea. es_result_relations is currently an array
of ResultRelInfos, so that would leave a lot of unfilled structs in the
array. But in on of your other threads, you proposed turning
es_result_relations into an array of pointers anyway
(/messages/by-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com).

Okay, I am reorganizing the patches around that idea and will post an
update soon.

Attached updated patches.

0001 makes es_result_relations an RTI-indexable array, which allows to
get rid of all "result relation index" fields across the code.

Thanks! A couple small things I wanted to check with you before committing:

Thanks for checking.

1. We have many different cleanup/close routines now:
ExecCloseResultRelations, ExecCloseRangeTableRelations, and
ExecCleanUpTriggerState. Do we need them all? It seems to me that we
could merge ExecCloseRangeTableRelations() and
ExecCleanUpTriggerState(), they seem to do roughly the same thing: close
relations that were opened for ResultRelInfos. They are always called
together, except in afterTriggerInvokeEvents(). And in
afterTriggerInvokeEvents() too, there would be no harm in doing both,
even though we know there aren't any entries in the es_result_relations
array at that point.

Hmm, I find trigger result relations to behave differently enough to
deserve a separate function. For example, unlike plan-specified
result relations, they don't point to range table relations and don't
open indices. Maybe the name could be revisited, say,
ExecCloseTriggerResultRelations(). Also, maybe call the other
functions:

ExecInitPlanResultRelationsArray()
ExecInitPlanResultRelation()
ExecClosePlanResultRelations()

Thoughts?

2. The way this is handled in worker.c is a bit funny. In
create_estate_for_relation(), you create a ResultRelInfo, but you
*don't* put it in the es_opened_result_relations list. That's
surprising, but I'm also surprised there are no
ExecCloseResultRelations() calls before the FreeExecutorState() calls in
worker.c. It's not needed because the
apply_handle_insert/update/delete_internal() functions call
ExecCloseIndices() directly, so they don't rely on the
ExecCloseResultRelations() function for cleanup. That works too, but
it's a bit surprising because it's different from how it's done in
copy.c and nodeModifyTable.c. It would feel natural to rely on
ExecCloseResultRelations() in worker.c as well, but on the other hand,
it also calls ExecOpenIndices() in a more lazy fashion, and it makes
sense to call ExecCloseIndices() in the same functions that
ExecOpenIndices() is called. So I'm not sure if changing that would be
an improvement overall. What do you think? Did you consider doing that?

Yeah, that did bother me too a bit. I'm okay either way but it does
look a bit inconsistent.

Actually, maybe we don't need to be so paranoid about setting up
es_result_relations in worker.c, because none of the downstream
functionality invoked seems to rely on it, that is, no need to call
ExecInitResultRelationsArray() and ExecInitResultRelation().
ExecSimpleRelation* and downstream functionality assume a
single-relation operation and the ResultRelInfo is explicitly passed.

Attached is your original patch v13, and a patch on top of it that
merges ExecCloseResultRelations() and ExecCleanUpTriggerState(), and
makes some minor comment changes. I didn't do anything about the
worker.c business, aside from adding a comment about it.

Thanks for the cleanup.

I had noticed there was some funny capitalization in my patch:

+ ResultRelInfo **es_result_relations; /* Array of Per-range-table-entry

s/Per-/per-

Also, I think a comma may be needed in the parenthetical below:

+ * can index it by the RT index (minus 1 to be accurate).

...(minus 1, to be accurate)

--
Amit Langote
EDB: http://www.enterprisedb.com

#55Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#54)
1 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On 12/10/2020 16:47, Amit Langote wrote:

On Mon, Oct 12, 2020 at 8:12 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

1. We have many different cleanup/close routines now:
ExecCloseResultRelations, ExecCloseRangeTableRelations, and
ExecCleanUpTriggerState. Do we need them all? It seems to me that we
could merge ExecCloseRangeTableRelations() and
ExecCleanUpTriggerState(), they seem to do roughly the same thing: close
relations that were opened for ResultRelInfos. They are always called
together, except in afterTriggerInvokeEvents(). And in
afterTriggerInvokeEvents() too, there would be no harm in doing both,
even though we know there aren't any entries in the es_result_relations
array at that point.

Hmm, I find trigger result relations to behave differently enough to
deserve a separate function. For example, unlike plan-specified
result relations, they don't point to range table relations and don't
open indices. Maybe the name could be revisited, say,
ExecCloseTriggerResultRelations().

Matter of perception I guess. I still prefer to club them together into
one Close call. It's true that they're slightly different, but they're
also pretty similar. And IMHO they're more similar than different.

Also, maybe call the other functions:

ExecInitPlanResultRelationsArray()
ExecInitPlanResultRelation()
ExecClosePlanResultRelations()

Thoughts?

Hmm. How about initializing the array lazily, on the first
ExecInitPlanResultRelation() call? It's not performance critical, and
that way there's one fewer initialization function that you need to
remember to call.

It occurred to me that if we do that (initialize the array lazily),
there's very little need for the PlannedStmt->resultRelations list
anymore. It's only used in ExecRelationIsTargetRelation(), but if we
assume that ExecRelationIsTargetRelation() is only called after InitPlan
has initialized the result relation for the relation, it can easily
check es_result_relations instead. I think that's a safe assumption.
ExecRelationIsTargetRelation() is only used in FDWs, and I believe the
FDWs initialization routine can only be called after ExecInitModifyTable
has been called on the relation.

The PlannedStmt->rootResultRelations field is even more useless.

Actually, maybe we don't need to be so paranoid about setting up
es_result_relations in worker.c, because none of the downstream
functionality invoked seems to rely on it, that is, no need to call
ExecInitResultRelationsArray() and ExecInitResultRelation().
ExecSimpleRelation* and downstream functionality assume a
single-relation operation and the ResultRelInfo is explicitly passed.

Hmm, yeah, I like that. Similarly in ExecuteTruncateGuts(), there isn't
actually any need to put the ResultRelInfos in the es_result_relations
array.

Putting all this together, I ended up with the attached. It doesn't
include the subsequent commits in this patch set yet, for removal of
es_result_relation_info et al.

- Heikki

Attachments:

v15-0001-Make-es_result_relations-array-indexable-by-RT-i.patchtext/x-patch; charset=UTF-8; name=v15-0001-Make-es_result_relations-array-indexable-by-RT-i.patchDownload
From c3f24ca219aa3d535df48c7add376ee3f8f4bc1e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Oct 2020 19:43:55 +0300
Subject: [PATCH v15 1/1] Make es_result_relations array indexable by RT index.

Create each ResultRelInfo on demand in ExecInitModifyTable, instead of
initializing the whole array at once in InitPlan. And instead of having
a separate concept of a "result rel index", access the array using the
range table indexes. This allows several simplifications. The arrays of
regular result rels and root result are merged into one array. And we
don't need the global resultRelations list in PlannedStmt anymore.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
---
 src/backend/commands/copy.c              |  24 +--
 src/backend/commands/explain.c           |  17 +-
 src/backend/commands/tablecmds.c         |  11 +-
 src/backend/commands/trigger.c           |   2 +-
 src/backend/executor/execMain.c          | 256 +++++++----------------
 src/backend/executor/execParallel.c      |   2 -
 src/backend/executor/execUtils.c         |  58 +++--
 src/backend/executor/nodeModifyTable.c   |  24 ++-
 src/backend/nodes/copyfuncs.c            |   4 -
 src/backend/nodes/outfuncs.c             |   6 -
 src/backend/nodes/readfuncs.c            |   4 -
 src/backend/optimizer/plan/createplan.c  |   2 -
 src/backend/optimizer/plan/planner.c     |   6 -
 src/backend/optimizer/plan/setrefs.c     |  30 +--
 src/backend/replication/logical/worker.c |  32 +--
 src/include/executor/executor.h          |   5 +-
 src/include/nodes/execnodes.h            |  19 +-
 src/include/nodes/pathnodes.h            |   4 -
 src/include/nodes/plannodes.h            |  11 -
 19 files changed, 188 insertions(+), 329 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3c7dbad27a..ca05702139 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2727,6 +2727,7 @@ CopyFrom(CopyState cstate)
 	bool		leafpart_use_multi_insert = false;
 
 	Assert(cstate->rel);
+	Assert(list_length(cstate->range_table) == 1);
 
 	/*
 	 * The target must be a plain, foreign, or partitioned relation, or have
@@ -2829,25 +2830,17 @@ CopyFrom(CopyState cstate)
 	 * index-entry-making machinery.  (There used to be a huge amount of code
 	 * here that basically duplicated execUtils.c ...)
 	 */
-	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo,
-					  cstate->rel,
-					  1,		/* must match rel's position in range_table */
-					  NULL,
-					  0);
-	target_resultRelInfo = resultRelInfo;
+	ExecInitRangeTable(estate, cstate->range_table);
+	resultRelInfo = target_resultRelInfo = makeNode(ResultRelInfo);
+	ExecInitResultRelation(estate, resultRelInfo, 1);
 
 	/* Verify the named relation is a valid target for INSERT */
 	CheckValidResultRel(resultRelInfo, CMD_INSERT);
 
 	ExecOpenIndices(resultRelInfo, false);
 
-	estate->es_result_relations = resultRelInfo;
-	estate->es_num_result_relations = 1;
 	estate->es_result_relation_info = resultRelInfo;
 
-	ExecInitRangeTable(estate, cstate->range_table);
-
 	/*
 	 * Set up a ModifyTableState so we can let FDW(s) init themselves for
 	 * foreign-table result relation(s).
@@ -2856,7 +2849,7 @@ CopyFrom(CopyState cstate)
 	mtstate->ps.plan = NULL;
 	mtstate->ps.state = estate;
 	mtstate->operation = CMD_INSERT;
-	mtstate->resultRelInfo = estate->es_result_relations;
+	mtstate->resultRelInfo = resultRelInfo;
 
 	if (resultRelInfo->ri_FdwRoutine != NULL &&
 		resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
@@ -3359,14 +3352,13 @@ CopyFrom(CopyState cstate)
 	if (insertMethod != CIM_SINGLE)
 		CopyMultiInsertInfoCleanup(&multiInsertInfo);
 
-	ExecCloseIndices(target_resultRelInfo);
-
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
 		ExecCleanupTupleRouting(mtstate, proute);
 
-	/* Close any trigger target relations */
-	ExecCleanUpTriggerState(estate);
+	/* Close the result relations */
+	ExecCloseResultRelations(estate);
+	ExecCloseRangeTableRelations(estate);
 
 	FreeExecutorState(estate);
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c98c9b5547..c8e292adfa 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -769,27 +769,24 @@ ExplainPrintTriggers(ExplainState *es, QueryDesc *queryDesc)
 {
 	ResultRelInfo *rInfo;
 	bool		show_relname;
-	int			numrels = queryDesc->estate->es_num_result_relations;
-	int			numrootrels = queryDesc->estate->es_num_root_result_relations;
+	List	   *resultrels;
 	List	   *routerels;
 	List	   *targrels;
-	int			nr;
 	ListCell   *l;
 
+	resultrels = queryDesc->estate->es_opened_result_relations;
 	routerels = queryDesc->estate->es_tuple_routing_result_relations;
 	targrels = queryDesc->estate->es_trig_target_relations;
 
 	ExplainOpenGroup("Triggers", "Triggers", false, es);
 
-	show_relname = (numrels > 1 || numrootrels > 0 ||
+	show_relname = (list_length(resultrels) > 1 ||
 					routerels != NIL || targrels != NIL);
-	rInfo = queryDesc->estate->es_result_relations;
-	for (nr = 0; nr < numrels; rInfo++, nr++)
-		report_triggers(rInfo, show_relname, es);
-
-	rInfo = queryDesc->estate->es_root_result_relations;
-	for (nr = 0; nr < numrootrels; rInfo++, nr++)
+	foreach(l, resultrels)
+	{
+		rInfo = (ResultRelInfo *) lfirst(l);
 		report_triggers(rInfo, show_relname, es);
+	}
 
 	foreach(l, routerels)
 	{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e0ac4e05e5..1f0b0f3ef9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1787,11 +1787,18 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	/*
 	 * To fire triggers, we'll need an EState as well as a ResultRelInfo for
 	 * each relation.  We don't need to call ExecOpenIndices, though.
+	 *
+	 * We put the ResultRelInfos in the es_opened_result_relations list, even
+	 * though we don't have a range table and don't populate the
+	 * es_result_relations array.  That's a big bogus, but it's enough to make
+	 * ExecGetTriggerResultRel() find them.
 	 */
 	estate = CreateExecutorState();
 	resultRelInfos = (ResultRelInfo *)
 		palloc(list_length(rels) * sizeof(ResultRelInfo));
 	resultRelInfo = resultRelInfos;
+	estate->es_result_relations = (ResultRelInfo **)
+		palloc(list_length(rels) * sizeof(ResultRelInfo *));
 	foreach(cell, rels)
 	{
 		Relation	rel = (Relation) lfirst(cell);
@@ -1801,10 +1808,10 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 						  0,	/* dummy rangetable index */
 						  NULL,
 						  0);
+		estate->es_opened_result_relations =
+			lappend(estate->es_opened_result_relations, resultRelInfo);
 		resultRelInfo++;
 	}
-	estate->es_result_relations = resultRelInfos;
-	estate->es_num_result_relations = list_length(rels);
 
 	/*
 	 * Process all BEFORE STATEMENT TRUNCATE triggers before we begin
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 672fccff5b..3b4fbdadf4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -4227,7 +4227,7 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 
 	if (local_estate)
 	{
-		ExecCleanUpTriggerState(estate);
+		ExecCloseResultRelations(estate);
 		ExecResetTupleTable(estate->es_tupleTable, false);
 		FreeExecutorState(estate);
 	}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2e27e26ba4..e32f5984e5 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -827,86 +827,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 	estate->es_plannedstmt = plannedstmt;
 
-	/*
-	 * Initialize ResultRelInfo data structures, and open the result rels.
-	 */
-	if (plannedstmt->resultRelations)
-	{
-		List	   *resultRelations = plannedstmt->resultRelations;
-		int			numResultRelations = list_length(resultRelations);
-		ResultRelInfo *resultRelInfos;
-		ResultRelInfo *resultRelInfo;
-
-		resultRelInfos = (ResultRelInfo *)
-			palloc(numResultRelations * sizeof(ResultRelInfo));
-		resultRelInfo = resultRelInfos;
-		foreach(l, resultRelations)
-		{
-			Index		resultRelationIndex = lfirst_int(l);
-			Relation	resultRelation;
-
-			resultRelation = ExecGetRangeTableRelation(estate,
-													   resultRelationIndex);
-			InitResultRelInfo(resultRelInfo,
-							  resultRelation,
-							  resultRelationIndex,
-							  NULL,
-							  estate->es_instrument);
-			resultRelInfo++;
-		}
-		estate->es_result_relations = resultRelInfos;
-		estate->es_num_result_relations = numResultRelations;
-
-		/* es_result_relation_info is NULL except when within ModifyTable */
-		estate->es_result_relation_info = NULL;
-
-		/*
-		 * In the partitioned result relation case, also build ResultRelInfos
-		 * for all the partitioned table roots, because we will need them to
-		 * fire statement-level triggers, if any.
-		 */
-		if (plannedstmt->rootResultRelations)
-		{
-			int			num_roots = list_length(plannedstmt->rootResultRelations);
-
-			resultRelInfos = (ResultRelInfo *)
-				palloc(num_roots * sizeof(ResultRelInfo));
-			resultRelInfo = resultRelInfos;
-			foreach(l, plannedstmt->rootResultRelations)
-			{
-				Index		resultRelIndex = lfirst_int(l);
-				Relation	resultRelDesc;
-
-				resultRelDesc = ExecGetRangeTableRelation(estate,
-														  resultRelIndex);
-				InitResultRelInfo(resultRelInfo,
-								  resultRelDesc,
-								  resultRelIndex,
-								  NULL,
-								  estate->es_instrument);
-				resultRelInfo++;
-			}
-
-			estate->es_root_result_relations = resultRelInfos;
-			estate->es_num_root_result_relations = num_roots;
-		}
-		else
-		{
-			estate->es_root_result_relations = NULL;
-			estate->es_num_root_result_relations = 0;
-		}
-	}
-	else
-	{
-		/*
-		 * if no result relation, then set state appropriately
-		 */
-		estate->es_result_relations = NULL;
-		estate->es_num_result_relations = 0;
-		estate->es_result_relation_info = NULL;
-		estate->es_root_result_relations = NULL;
-		estate->es_num_root_result_relations = 0;
-	}
+	/* es_result_relation_info is NULL except when within ModifyTable */
+	estate->es_result_relation_info = NULL;
 
 	/*
 	 * Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -1334,8 +1256,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
  *
  * Most of the time, triggers are fired on one of the result relations of the
  * query, and so we can just return a member of the es_result_relations array,
- * or the es_root_result_relations array (if any), or the
- * es_tuple_routing_result_relations list (if any).  (Note: in self-join
+ * or the es_tuple_routing_result_relations list (if any). (Note: in self-join
  * situations there might be multiple members with the same OID; if so it
  * doesn't matter which one we pick.)
  *
@@ -1352,30 +1273,16 @@ ResultRelInfo *
 ExecGetTriggerResultRel(EState *estate, Oid relid)
 {
 	ResultRelInfo *rInfo;
-	int			nr;
 	ListCell   *l;
 	Relation	rel;
 	MemoryContext oldcontext;
 
 	/* First, search through the query result relations */
-	rInfo = estate->es_result_relations;
-	nr = estate->es_num_result_relations;
-	while (nr > 0)
-	{
-		if (RelationGetRelid(rInfo->ri_RelationDesc) == relid)
-			return rInfo;
-		rInfo++;
-		nr--;
-	}
-	/* Second, search through the root result relations, if any */
-	rInfo = estate->es_root_result_relations;
-	nr = estate->es_num_root_result_relations;
-	while (nr > 0)
+	foreach(l, estate->es_opened_result_relations)
 	{
+		rInfo = lfirst(l);
 		if (RelationGetRelid(rInfo->ri_RelationDesc) == relid)
 			return rInfo;
-		rInfo++;
-		nr--;
 	}
 
 	/*
@@ -1428,35 +1335,6 @@ ExecGetTriggerResultRel(EState *estate, Oid relid)
 	return rInfo;
 }
 
-/*
- * Close any relations that have been opened by ExecGetTriggerResultRel().
- */
-void
-ExecCleanUpTriggerState(EState *estate)
-{
-	ListCell   *l;
-
-	foreach(l, estate->es_trig_target_relations)
-	{
-		ResultRelInfo *resultRelInfo = (ResultRelInfo *) lfirst(l);
-
-		/*
-		 * Assert this is a "dummy" ResultRelInfo, see above.  Otherwise we
-		 * might be issuing a duplicate close against a Relation opened by
-		 * ExecGetRangeTableRelation.
-		 */
-		Assert(resultRelInfo->ri_RangeTableIndex == 0);
-
-		/*
-		 * Since ExecGetTriggerResultRel doesn't call ExecOpenIndices for
-		 * these rels, we needn't call ExecCloseIndices either.
-		 */
-		Assert(resultRelInfo->ri_NumIndices == 0);
-
-		table_close(resultRelInfo->ri_RelationDesc, NoLock);
-	}
-}
-
 /* ----------------------------------------------------------------
  *		ExecPostprocessPlan
  *
@@ -1512,9 +1390,6 @@ ExecPostprocessPlan(EState *estate)
 static void
 ExecEndPlan(PlanState *planstate, EState *estate)
 {
-	ResultRelInfo *resultRelInfo;
-	Index		num_relations;
-	Index		i;
 	ListCell   *l;
 
 	/*
@@ -1540,30 +1415,70 @@ ExecEndPlan(PlanState *planstate, EState *estate)
 	 */
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
+	/* Close result relation(s), if any. */
+	ExecCloseResultRelations(estate);
+
+	/*
+	 * close whatever rangetable Relations have been opened.  We do not
+	 * release any locks we might hold on those rels.
+	 */
+	ExecCloseRangeTableRelations(estate);
+}
+
+/*
+ * Close any relations that have been opened for ResultRelInfos.
+ */
+void
+ExecCloseResultRelations(EState *estate)
+{
+	ListCell   *l;
+
 	/*
-	 * close indexes of result relation(s) if any.  (Rels themselves get
-	 * closed next.)
+	 * close indexes of result relation(s) if any.  (Rels themselves are
+	 * closed in ExecCloseRangeTableRelations())
 	 */
-	resultRelInfo = estate->es_result_relations;
-	for (i = estate->es_num_result_relations; i > 0; i--)
+	foreach(l, estate->es_opened_result_relations)
 	{
+		ResultRelInfo *resultRelInfo = lfirst(l);
+
 		ExecCloseIndices(resultRelInfo);
-		resultRelInfo++;
 	}
 
-	/*
-	 * close whatever rangetable Relations have been opened.  We do not
-	 * release any locks we might hold on those rels.
-	 */
-	num_relations = estate->es_range_table_size;
-	for (i = 0; i < num_relations; i++)
+	/* Close any relations that have been opened by ExecGetTriggerResultRel(). */
+	foreach(l, estate->es_trig_target_relations)
+	{
+		ResultRelInfo *resultRelInfo = (ResultRelInfo *) lfirst(l);
+
+		/*
+		 * Assert this is a "dummy" ResultRelInfo, see above.  Otherwise we
+		 * might be issuing a duplicate close against a Relation opened by
+		 * ExecGetRangeTableRelation.
+		 */
+		Assert(resultRelInfo->ri_RangeTableIndex == 0);
+
+		/*
+		 * Since ExecGetTriggerResultRel doesn't call ExecOpenIndices for
+		 * these rels, we needn't call ExecCloseIndices either.
+		 */
+		Assert(resultRelInfo->ri_NumIndices == 0);
+
+		table_close(resultRelInfo->ri_RelationDesc, NoLock);
+	}
+}
+
+/*
+ * Close all relations opened by ExecGetRangeTableRelation()
+ */
+void
+ExecCloseRangeTableRelations(EState *estate)
+{
+	int			i;
+
+	for (i = 0; i < estate->es_range_table_size; i++)
 	{
 		if (estate->es_relations[i])
 			table_close(estate->es_relations[i], NoLock);
 	}
-
-	/* likewise close any trigger target relations */
-	ExecCleanUpTriggerState(estate);
 }
 
 /* ----------------------------------------------------------------
@@ -2758,17 +2673,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 
 	/*
 	 * Child EPQ EStates share the parent's copy of unchanging state such as
-	 * the snapshot, rangetable, result-rel info, and external Param info.
-	 * They need their own copies of local state, including a tuple table,
-	 * es_param_exec_vals, etc.
-	 *
-	 * The ResultRelInfo array management is trickier than it looks.  We
-	 * create fresh arrays for the child but copy all the content from the
-	 * parent.  This is because it's okay for the child to share any
-	 * per-relation state the parent has already created --- but if the child
-	 * sets up any ResultRelInfo fields, such as its own junkfilter, that
-	 * state must *not* propagate back to the parent.  (For one thing, the
-	 * pointed-to data is in a memory context that won't last long enough.)
+	 * the snapshot, rangetable, and external Param info.  They need their own
+	 * copies of local state, including a tuple table, es_param_exec_vals,
+	 * result-rel info, etc.
 	 */
 	rcestate->es_direction = ForwardScanDirection;
 	rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2781,30 +2688,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_plannedstmt = parentestate->es_plannedstmt;
 	rcestate->es_junkFilter = parentestate->es_junkFilter;
 	rcestate->es_output_cid = parentestate->es_output_cid;
-	if (parentestate->es_num_result_relations > 0)
-	{
-		int			numResultRelations = parentestate->es_num_result_relations;
-		int			numRootResultRels = parentestate->es_num_root_result_relations;
-		ResultRelInfo *resultRelInfos;
-
-		resultRelInfos = (ResultRelInfo *)
-			palloc(numResultRelations * sizeof(ResultRelInfo));
-		memcpy(resultRelInfos, parentestate->es_result_relations,
-			   numResultRelations * sizeof(ResultRelInfo));
-		rcestate->es_result_relations = resultRelInfos;
-		rcestate->es_num_result_relations = numResultRelations;
-
-		/* Also transfer partitioned root result relations. */
-		if (numRootResultRels > 0)
-		{
-			resultRelInfos = (ResultRelInfo *)
-				palloc(numRootResultRels * sizeof(ResultRelInfo));
-			memcpy(resultRelInfos, parentestate->es_root_result_relations,
-				   numRootResultRels * sizeof(ResultRelInfo));
-			rcestate->es_root_result_relations = resultRelInfos;
-			rcestate->es_num_root_result_relations = numRootResultRels;
-		}
-	}
+
+	/*
+	 * ResultRelInfos needed by subplans are initialized from scratch when the
+	 * subplans themselves are initialized.
+	 */
+	parentestate->es_result_relations = NULL;
 	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
@@ -2914,8 +2803,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
  * This is a cut-down version of ExecutorEnd(); basically we want to do most
  * of the normal cleanup, but *not* close result relations (which we are
  * just sharing from the outer query).  We do, however, have to close any
- * trigger target relations that got opened, since those are not shared.
- * (There probably shouldn't be any of the latter, but just in case...)
+ * result and trigger target relations that got opened, since those are not
+ * shared.  (There probably shouldn't be any of the latter, but just in
+ * case...)
  */
 void
 EvalPlanQualEnd(EPQState *epqstate)
@@ -2957,8 +2847,8 @@ EvalPlanQualEnd(EPQState *epqstate)
 	/* throw away the per-estate tuple table, some node may have used it */
 	ExecResetTupleTable(estate->es_tupleTable, false);
 
-	/* close any trigger target relations attached to this EState */
-	ExecCleanUpTriggerState(estate);
+	/* Close any result and trigger target relations attached to this EState */
+	ExecCloseResultRelations(estate);
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 382e78fb7f..d85d0fdb64 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,8 +183,6 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->parallelModeNeeded = false;
 	pstmt->planTree = plan;
 	pstmt->rtable = estate->es_range_table;
-	pstmt->resultRelations = NIL;
-	pstmt->rootResultRelations = NIL;
 	pstmt->appendRelations = NIL;
 
 	/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index d0e65b8647..c41cdc79e4 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -124,14 +124,9 @@ CreateExecutorState(void)
 	estate->es_output_cid = (CommandId) 0;
 
 	estate->es_result_relations = NULL;
-	estate->es_num_result_relations = 0;
+	estate->es_opened_result_relations = NIL;
 	estate->es_result_relation_info = NULL;
-
-	estate->es_root_result_relations = NULL;
-	estate->es_num_root_result_relations = 0;
-
 	estate->es_tuple_routing_result_relations = NIL;
-
 	estate->es_trig_target_relations = NIL;
 
 	estate->es_param_list_info = NULL;
@@ -711,16 +706,10 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
 bool
 ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 {
-	ResultRelInfo *resultRelInfos;
-	int			i;
+	if (!estate->es_result_relations)
+		return false;
 
-	resultRelInfos = estate->es_result_relations;
-	for (i = 0; i < estate->es_num_result_relations; i++)
-	{
-		if (resultRelInfos[i].ri_RangeTableIndex == scanrelid)
-			return true;
-	}
-	return false;
+	return estate->es_result_relations[scanrelid - 1] != NULL;
 }
 
 /* ----------------------------------------------------------------
@@ -779,9 +768,10 @@ ExecInitRangeTable(EState *estate, List *rangeTable)
 		palloc0(estate->es_range_table_size * sizeof(Relation));
 
 	/*
-	 * es_rowmarks is also parallel to the es_range_table, but it's allocated
-	 * only if needed.
+	 * es_result_relations and es_rowmarks are also parallel to es_range_table,
+	 * but are allocated only if needed.
 	 */
+	estate->es_result_relations = NULL;
 	estate->es_rowmarks = NULL;
 }
 
@@ -835,6 +825,40 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
 	return rel;
 }
 
+/*
+ * ExecInitResultRelation
+ *		Open result relation given by the passed-in RT index and fill its
+ *		ResultRelInfo node
+ *
+ * Here, we also save the ResultRelInfo in estate->es_result_relations array
+ * such that it can be accessed later using the RT index.
+ */
+void
+ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
+					   Index rti)
+{
+	Relation	resultRelationDesc;
+
+	resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+	InitResultRelInfo(resultRelInfo,
+					  resultRelationDesc,
+					  rti,
+					  NULL,
+					  estate->es_instrument);
+
+	if (estate->es_result_relations == NULL)
+		estate->es_result_relations = (ResultRelInfo **)
+			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
+	estate->es_result_relations[rti - 1] = resultRelInfo;
+
+	/*
+	 * Saving in the list allows to avoid needlessly traversing the whole
+	 * array when only a few of its entries are possibly non-NULL.
+	 */
+	estate->es_opened_result_relations =
+		lappend(estate->es_opened_result_relations, resultRelInfo);
+}
+
 /*
  * UpdateChangedParamSet
  *		Add changed parameters to a plan node's chgParam set
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9812089161..9b27d34ba5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2301,7 +2301,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	ListCell   *l;
+	ListCell   *l,
+			   *l1;
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
@@ -2322,13 +2323,17 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_done = false;
 
 	mtstate->mt_plans = (PlanState **) palloc0(sizeof(PlanState *) * nplans);
-	mtstate->resultRelInfo = estate->es_result_relations + node->resultRelIndex;
+	mtstate->resultRelInfo = (ResultRelInfo *)
+		palloc(nplans * sizeof(ResultRelInfo));
 	mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
 
 	/* If modifying a partitioned table, initialize the root table info */
-	if (node->rootResultRelIndex >= 0)
-		mtstate->rootResultRelInfo = estate->es_root_result_relations +
-			node->rootResultRelIndex;
+	if (node->rootRelation > 0)
+	{
+		mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
+		ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
+							   node->rootRelation);
+	}
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2351,9 +2356,14 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
-	foreach(l, node->plans)
+	forboth(l, node->resultRelations, l1, node->plans)
 	{
-		subplan = (Plan *) lfirst(l);
+		Index		resultRelation = lfirst_int(l);
+
+		subplan = (Plan *) lfirst(l1);
+
+		/* This opens result relation and fills ResultRelInfo. */
+		ExecInitResultRelation(estate, resultRelInfo, resultRelation);
 
 		/* Initialize the usesFdwDirectModify flag */
 		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0409a40b82..e6b48c73ff 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -90,8 +90,6 @@ _copyPlannedStmt(const PlannedStmt *from)
 	COPY_SCALAR_FIELD(jitFlags);
 	COPY_NODE_FIELD(planTree);
 	COPY_NODE_FIELD(rtable);
-	COPY_NODE_FIELD(resultRelations);
-	COPY_NODE_FIELD(rootResultRelations);
 	COPY_NODE_FIELD(appendRelations);
 	COPY_NODE_FIELD(subplans);
 	COPY_BITMAPSET_FIELD(rewindPlanIDs);
@@ -207,8 +205,6 @@ _copyModifyTable(const ModifyTable *from)
 	COPY_SCALAR_FIELD(rootRelation);
 	COPY_SCALAR_FIELD(partColsUpdated);
 	COPY_NODE_FIELD(resultRelations);
-	COPY_SCALAR_FIELD(resultRelIndex);
-	COPY_SCALAR_FIELD(rootResultRelIndex);
 	COPY_NODE_FIELD(plans);
 	COPY_NODE_FIELD(withCheckOptionLists);
 	COPY_NODE_FIELD(returningLists);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f0386480ab..fae01357d8 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -308,8 +308,6 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_INT_FIELD(jitFlags);
 	WRITE_NODE_FIELD(planTree);
 	WRITE_NODE_FIELD(rtable);
-	WRITE_NODE_FIELD(resultRelations);
-	WRITE_NODE_FIELD(rootResultRelations);
 	WRITE_NODE_FIELD(appendRelations);
 	WRITE_NODE_FIELD(subplans);
 	WRITE_BITMAPSET_FIELD(rewindPlanIDs);
@@ -408,8 +406,6 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
 	WRITE_UINT_FIELD(rootRelation);
 	WRITE_BOOL_FIELD(partColsUpdated);
 	WRITE_NODE_FIELD(resultRelations);
-	WRITE_INT_FIELD(resultRelIndex);
-	WRITE_INT_FIELD(rootResultRelIndex);
 	WRITE_NODE_FIELD(plans);
 	WRITE_NODE_FIELD(withCheckOptionLists);
 	WRITE_NODE_FIELD(returningLists);
@@ -2193,8 +2189,6 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
 	WRITE_BITMAPSET_FIELD(rewindPlanIDs);
 	WRITE_NODE_FIELD(finalrtable);
 	WRITE_NODE_FIELD(finalrowmarks);
-	WRITE_NODE_FIELD(resultRelations);
-	WRITE_NODE_FIELD(rootResultRelations);
 	WRITE_NODE_FIELD(appendRelations);
 	WRITE_NODE_FIELD(relationOids);
 	WRITE_NODE_FIELD(invalItems);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..867d1360b8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1541,8 +1541,6 @@ _readPlannedStmt(void)
 	READ_INT_FIELD(jitFlags);
 	READ_NODE_FIELD(planTree);
 	READ_NODE_FIELD(rtable);
-	READ_NODE_FIELD(resultRelations);
-	READ_NODE_FIELD(rootResultRelations);
 	READ_NODE_FIELD(appendRelations);
 	READ_NODE_FIELD(subplans);
 	READ_BITMAPSET_FIELD(rewindPlanIDs);
@@ -1639,8 +1637,6 @@ _readModifyTable(void)
 	READ_UINT_FIELD(rootRelation);
 	READ_BOOL_FIELD(partColsUpdated);
 	READ_NODE_FIELD(resultRelations);
-	READ_INT_FIELD(resultRelIndex);
-	READ_INT_FIELD(rootResultRelIndex);
 	READ_NODE_FIELD(plans);
 	READ_NODE_FIELD(withCheckOptionLists);
 	READ_NODE_FIELD(returningLists);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 3d7a4e373f..881eaf4813 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6808,8 +6808,6 @@ make_modifytable(PlannerInfo *root,
 	node->rootRelation = rootRelation;
 	node->partColsUpdated = partColsUpdated;
 	node->resultRelations = resultRelations;
-	node->resultRelIndex = -1;	/* will be set correctly in setrefs.c */
-	node->rootResultRelIndex = -1;	/* will be set correctly in setrefs.c */
 	node->plans = subplans;
 	if (!onconflict)
 	{
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f331f82a6c..3e4de96b8e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -304,8 +304,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	glob->rewindPlanIDs = NULL;
 	glob->finalrtable = NIL;
 	glob->finalrowmarks = NIL;
-	glob->resultRelations = NIL;
-	glob->rootResultRelations = NIL;
 	glob->appendRelations = NIL;
 	glob->relationOids = NIL;
 	glob->invalItems = NIL;
@@ -492,8 +490,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	/* final cleanup of the plan */
 	Assert(glob->finalrtable == NIL);
 	Assert(glob->finalrowmarks == NIL);
-	Assert(glob->resultRelations == NIL);
-	Assert(glob->rootResultRelations == NIL);
 	Assert(glob->appendRelations == NIL);
 	top_plan = set_plan_references(root, top_plan);
 	/* ... and the subplans (both regular subplans and initplans) */
@@ -519,8 +515,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->parallelModeNeeded = glob->parallelModeNeeded;
 	result->planTree = top_plan;
 	result->rtable = glob->finalrtable;
-	result->resultRelations = glob->resultRelations;
-	result->rootResultRelations = glob->rootResultRelations;
 	result->appendRelations = glob->appendRelations;
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dd8e2e966d..be151416af 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -235,9 +235,8 @@ static List *set_returning_clause_references(PlannerInfo *root,
  * different when the passed-in Plan is a node we decide isn't needed.
  *
  * The flattened rangetable entries are appended to root->glob->finalrtable.
- * Also, rowmarks entries are appended to root->glob->finalrowmarks, and the
- * RT indexes of ModifyTable result relations to root->glob->resultRelations,
- * and flattened AppendRelInfos are appended to root->glob->appendRelations.
+ * Also, rowmarks entries are appended to root->glob->finalrowmarks, and
+ * flattened AppendRelInfos are appended to root->glob->appendRelations.
  * Plan dependencies are appended to root->glob->relationOids (for relations)
  * and root->glob->invalItems (for everything else).
  *
@@ -972,31 +971,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 											  (Plan *) lfirst(l),
 											  rtoffset);
 				}
-
-				/*
-				 * Append this ModifyTable node's final result relation RT
-				 * index(es) to the global list for the plan, and set its
-				 * resultRelIndex to reflect their starting position in the
-				 * global list.
-				 */
-				splan->resultRelIndex = list_length(root->glob->resultRelations);
-				root->glob->resultRelations =
-					list_concat(root->glob->resultRelations,
-								splan->resultRelations);
-
-				/*
-				 * If the main target relation is a partitioned table, also
-				 * add the partition root's RT index to rootResultRelations,
-				 * and remember its index in that list in rootResultRelIndex.
-				 */
-				if (splan->rootRelation)
-				{
-					splan->rootResultRelIndex =
-						list_length(root->glob->rootResultRelations);
-					root->glob->rootResultRelations =
-						lappend_int(root->glob->rootResultRelations,
-									splan->rootRelation);
-				}
 			}
 			break;
 		case T_Append:
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 9c6fdeeb56..8d5d9e05b3 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -344,7 +344,6 @@ static EState *
 create_estate_for_relation(LogicalRepRelMapEntry *rel)
 {
 	EState	   *estate;
-	ResultRelInfo *resultRelInfo;
 	RangeTblEntry *rte;
 
 	estate = CreateExecutorState();
@@ -356,13 +355,6 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	rte->rellockmode = AccessShareLock;
 	ExecInitRangeTable(estate, list_make1(rte));
 
-	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
-
-	estate->es_result_relations = resultRelInfo;
-	estate->es_num_result_relations = 1;
-	estate->es_result_relation_info = resultRelInfo;
-
 	estate->es_output_cid = GetCurrentCommandId(true);
 
 	/* Prepare to catch AFTER triggers. */
@@ -1150,6 +1142,7 @@ GetRelationIdentityOrPK(Relation rel)
 static void
 apply_handle_insert(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
 	LogicalRepRelId relid;
@@ -1179,6 +1172,9 @@ apply_handle_insert(StringInfo s)
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
+	resultRelInfo = makeNode(ResultRelInfo);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	estate->es_result_relation_info = resultRelInfo;
 
 	/* Input functions may need an active snapshot, so get one */
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -1191,10 +1187,10 @@ apply_handle_insert(StringInfo s)
 
 	/* For a partitioned table, insert the tuple into a partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
+		apply_handle_tuple_routing(resultRelInfo, estate,
 								   remoteslot, NULL, rel, CMD_INSERT);
 	else
-		apply_handle_insert_internal(estate->es_result_relation_info, estate,
+		apply_handle_insert_internal(resultRelInfo, estate,
 									 remoteslot);
 
 	PopActiveSnapshot();
@@ -1265,6 +1261,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
 static void
 apply_handle_update(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepRelId relid;
 	EState	   *estate;
@@ -1301,6 +1298,9 @@ apply_handle_update(StringInfo s)
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
+	resultRelInfo = makeNode(ResultRelInfo);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	estate->es_result_relation_info = resultRelInfo;
 
 	/*
 	 * Populate updatedCols so that per-column triggers can fire.  This could
@@ -1337,10 +1337,10 @@ apply_handle_update(StringInfo s)
 
 	/* For a partitioned table, apply update to correct partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
+		apply_handle_tuple_routing(resultRelInfo, estate,
 								   remoteslot, &newtup, rel, CMD_UPDATE);
 	else
-		apply_handle_update_internal(estate->es_result_relation_info, estate,
+		apply_handle_update_internal(resultRelInfo, estate,
 									 remoteslot, &newtup, rel);
 
 	PopActiveSnapshot();
@@ -1420,6 +1420,7 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 static void
 apply_handle_delete(StringInfo s)
 {
+	ResultRelInfo *resultRelInfo;
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData oldtup;
 	LogicalRepRelId relid;
@@ -1452,6 +1453,9 @@ apply_handle_delete(StringInfo s)
 	remoteslot = ExecInitExtraTupleSlot(estate,
 										RelationGetDescr(rel->localrel),
 										&TTSOpsVirtual);
+	resultRelInfo = makeNode(ResultRelInfo);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	estate->es_result_relation_info = resultRelInfo;
 
 	PushActiveSnapshot(GetTransactionSnapshot());
 
@@ -1462,10 +1466,10 @@ apply_handle_delete(StringInfo s)
 
 	/* For a partitioned table, apply delete to correct partition. */
 	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		apply_handle_tuple_routing(estate->es_result_relation_info, estate,
+		apply_handle_tuple_routing(resultRelInfo, estate,
 								   remoteslot, NULL, rel, CMD_DELETE);
 	else
-		apply_handle_delete_internal(estate->es_result_relation_info, estate,
+		apply_handle_delete_internal(resultRelInfo, estate,
 									 remoteslot, &rel->remoterel);
 
 	PopActiveSnapshot();
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 415e117407..c283bf1454 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -191,7 +191,6 @@ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 							  Relation partition_root,
 							  int instrument_options);
 extern ResultRelInfo *ExecGetTriggerResultRel(EState *estate, Oid relid);
-extern void ExecCleanUpTriggerState(EState *estate);
 extern void ExecConstraints(ResultRelInfo *resultRelInfo,
 							TupleTableSlot *slot, EState *estate);
 extern bool ExecPartitionCheck(ResultRelInfo *resultRelInfo,
@@ -538,6 +537,8 @@ extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable);
+extern void ExecCloseRangeTableRelations(EState *estate);
+extern void ExecCloseResultRelations(EState *estate);
 
 static inline RangeTblEntry *
 exec_rt_fetch(Index rti, EState *estate)
@@ -546,6 +547,8 @@ exec_rt_fetch(Index rti, EState *estate)
 }
 
 extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
+								   Index rti);
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ef448d67c7..8683484050 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -519,23 +519,20 @@ typedef struct EState
 	CommandId	es_output_cid;
 
 	/* Info about target table(s) for insert/update/delete queries: */
-	ResultRelInfo *es_result_relations; /* array of ResultRelInfos */
-	int			es_num_result_relations;	/* length of array */
+	ResultRelInfo **es_result_relations;	/* Array of per-range-table-entry
+											 * ResultRelInfo pointers, or NULL
+											 * if a given range table relation
+											 * not a target table */
+	List	   *es_opened_result_relations; /* List of non-NULL entries in
+											 * es_result_relations added in no
+											 * specific order */
 	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
-	/*
-	 * Info about the partition root table(s) for insert/update/delete queries
-	 * targeting partitioned tables.  Only leaf partitions are mentioned in
-	 * es_result_relations, but we need access to the roots for firing
-	 * triggers and for runtime tuple routing.
-	 */
-	ResultRelInfo *es_root_result_relations;	/* array of ResultRelInfos */
-	int			es_num_root_result_relations;	/* length of the array */
 	PartitionDirectory es_partition_directory;	/* for PartitionDesc lookup */
 
 	/*
 	 * The following list contains ResultRelInfos created by the tuple routing
-	 * code for partitions that don't already have one.
+	 * code for partitions that aren't found in the es_result_relations array.
 	 */
 	List	   *es_tuple_routing_result_relations;
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dbe86e7af6..c3ed0ee8d4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -118,10 +118,6 @@ typedef struct PlannerGlobal
 
 	List	   *finalrowmarks;	/* "flat" list of PlanRowMarks */
 
-	List	   *resultRelations;	/* "flat" list of integer RT indexes */
-
-	List	   *rootResultRelations;	/* "flat" list of integer RT indexes */
-
 	List	   *appendRelations;	/* "flat" list of AppendRelInfos */
 
 	List	   *relationOids;	/* OIDs of relations the plan depends on */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..1943ec4f48 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -65,15 +65,6 @@ typedef struct PlannedStmt
 
 	List	   *rtable;			/* list of RangeTblEntry nodes */
 
-	/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
-	List	   *resultRelations;	/* integer list of RT indexes, or NIL */
-
-	/*
-	 * rtable indexes of partitioned table roots that are UPDATE/DELETE
-	 * targets; needed for trigger firing.
-	 */
-	List	   *rootResultRelations;
-
 	List	   *appendRelations;	/* list of AppendRelInfo nodes */
 
 	List	   *subplans;		/* Plan trees for SubPlan expressions; note
@@ -224,8 +215,6 @@ typedef struct ModifyTable
 	Index		rootRelation;	/* Root RT index, if target is partitioned */
 	bool		partColsUpdated;	/* some part key in hierarchy updated */
 	List	   *resultRelations;	/* integer list of RT indexes */
-	int			resultRelIndex; /* index of first resultRel in plan's list */
-	int			rootResultRelIndex; /* index of the partitioned table root */
 	List	   *plans;			/* plan(s) producing source data */
 	List	   *withCheckOptionLists;	/* per-target-table WCO lists */
 	List	   *returningLists; /* per-target-table RETURNING tlists */
-- 
2.20.1

#56Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#55)
Re: partition routing layering in nodeModifyTable.c

On Tue, Oct 13, 2020 at 1:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 12/10/2020 16:47, Amit Langote wrote:

On Mon, Oct 12, 2020 at 8:12 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

1. We have many different cleanup/close routines now:
ExecCloseResultRelations, ExecCloseRangeTableRelations, and
ExecCleanUpTriggerState. Do we need them all? It seems to me that we
could merge ExecCloseRangeTableRelations() and
ExecCleanUpTriggerState(), they seem to do roughly the same thing: close
relations that were opened for ResultRelInfos. They are always called
together, except in afterTriggerInvokeEvents(). And in
afterTriggerInvokeEvents() too, there would be no harm in doing both,
even though we know there aren't any entries in the es_result_relations
array at that point.

Hmm, I find trigger result relations to behave differently enough to
deserve a separate function. For example, unlike plan-specified
result relations, they don't point to range table relations and don't
open indices. Maybe the name could be revisited, say,
ExecCloseTriggerResultRelations().

Matter of perception I guess. I still prefer to club them together into
one Close call. It's true that they're slightly different, but they're
also pretty similar. And IMHO they're more similar than different.

Okay, fine with me.

Also, maybe call the other functions:

ExecInitPlanResultRelationsArray()
ExecInitPlanResultRelation()
ExecClosePlanResultRelations()

Thoughts?

Hmm. How about initializing the array lazily, on the first
ExecInitPlanResultRelation() call? It's not performance critical, and
that way there's one fewer initialization function that you need to
remember to call.

Agree that's better.

It occurred to me that if we do that (initialize the array lazily),
there's very little need for the PlannedStmt->resultRelations list
anymore. It's only used in ExecRelationIsTargetRelation(), but if we
assume that ExecRelationIsTargetRelation() is only called after InitPlan
has initialized the result relation for the relation, it can easily
check es_result_relations instead. I think that's a safe assumption.
ExecRelationIsTargetRelation() is only used in FDWs, and I believe the
FDWs initialization routine can only be called after ExecInitModifyTable
has been called on the relation.

The PlannedStmt->rootResultRelations field is even more useless.

I am very much tempted to remove those fields from PlannedStmt,
although I am concerned that the following now assumes that *all*
result relations are initialized in the executor initialization phase:

bool
ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
{
if (!estate->es_result_relations)
return false;

return estate->es_result_relations[scanrelid - 1] != NULL;
}

In the other thread [1]https://commitfest.postgresql.org/30/2621/, I am proposing that we initialize result
relations lazily, but the above will be a blocker to that.

Actually, maybe we don't need to be so paranoid about setting up
es_result_relations in worker.c, because none of the downstream
functionality invoked seems to rely on it, that is, no need to call
ExecInitResultRelationsArray() and ExecInitResultRelation().
ExecSimpleRelation* and downstream functionality assume a
single-relation operation and the ResultRelInfo is explicitly passed.

Hmm, yeah, I like that. Similarly in ExecuteTruncateGuts(), there isn't
actually any need to put the ResultRelInfos in the es_result_relations
array.

Putting all this together, I ended up with the attached. It doesn't
include the subsequent commits in this patch set yet, for removal of
es_result_relation_info et al.

Thanks.

+    * We put the ResultRelInfos in the es_opened_result_relations list, even
+    * though we don't have a range table and don't populate the
+    * es_result_relations array.  That's a big bogus, but it's enough to make
+    * ExecGetTriggerResultRel() find them.
     */
    estate = CreateExecutorState();
    resultRelInfos = (ResultRelInfo *)
        palloc(list_length(rels) * sizeof(ResultRelInfo));
    resultRelInfo = resultRelInfos;
+   estate->es_result_relations = (ResultRelInfo **)
+       palloc(list_length(rels) * sizeof(ResultRelInfo *));

Maybe don't allocate es_result_relations here?

+/*
+ * Close all relations opened by ExecGetRangeTableRelation()
+ */
+void
+ExecCloseRangeTableRelations(EState *estate)
+{
+   int         i;
+
+   for (i = 0; i < estate->es_range_table_size; i++)
    {
        if (estate->es_relations[i])
            table_close(estate->es_relations[i], NoLock);
    }

I think we have an optimization opportunity here (maybe as a separate
patch). Why don't we introduce es_opened_relations? That way, if
only a single or few of potentially 1000s relations in the range table
is/are opened, we don't needlessly loop over *all* relations here.
That can happen, for example, with a query where no partitions could
be pruned at planning time, so the range table contains all
partitions, but only one or few are accessed during execution and the
rest run-time pruned. Although, in the workloads where it would
matter, other overheads easily mask the overhead of this loop; see the
first message at the linked thread [1]https://commitfest.postgresql.org/30/2621/, so it is hard to show an
immediate benefit from this.

Anyway, other than my concern about ExecRelationIsTargetRelation()
mentioned above, I think the patch looks good.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: https://commitfest.postgresql.org/30/2621/

#57Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#56)
Re: partition routing layering in nodeModifyTable.c

On 13/10/2020 07:32, Amit Langote wrote:

On Tue, Oct 13, 2020 at 1:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It occurred to me that if we do that (initialize the array lazily),
there's very little need for the PlannedStmt->resultRelations list
anymore. It's only used in ExecRelationIsTargetRelation(), but if we
assume that ExecRelationIsTargetRelation() is only called after InitPlan
has initialized the result relation for the relation, it can easily
check es_result_relations instead. I think that's a safe assumption.
ExecRelationIsTargetRelation() is only used in FDWs, and I believe the
FDWs initialization routine can only be called after ExecInitModifyTable
has been called on the relation.

The PlannedStmt->rootResultRelations field is even more useless.

I am very much tempted to remove those fields from PlannedStmt,
although I am concerned that the following now assumes that *all*
result relations are initialized in the executor initialization phase:

bool
ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
{
if (!estate->es_result_relations)
return false;

return estate->es_result_relations[scanrelid - 1] != NULL;
}

In the other thread [1], I am proposing that we initialize result
relations lazily, but the above will be a blocker to that.

Ok, I'll leave it alone then. But I'll still merge resultRelations and
rootResultRelations into one list. I don't see any point in keeping them
separate.

I'm tempted to remove ExecRelationIsTargetRelation() altogether, but
keeping the resultRelations list isn't really a big deal, so I'll leave
that for another discussion.

Actually, maybe we don't need to be so paranoid about setting up
es_result_relations in worker.c, because none of the downstream
functionality invoked seems to rely on it, that is, no need to call
ExecInitResultRelationsArray() and ExecInitResultRelation().
ExecSimpleRelation* and downstream functionality assume a
single-relation operation and the ResultRelInfo is explicitly passed.

Hmm, yeah, I like that. Similarly in ExecuteTruncateGuts(), there isn't
actually any need to put the ResultRelInfos in the es_result_relations
array.

Putting all this together, I ended up with the attached. It doesn't
include the subsequent commits in this patch set yet, for removal of
es_result_relation_info et al.

Thanks.

+    * We put the ResultRelInfos in the es_opened_result_relations list, even
+    * though we don't have a range table and don't populate the
+    * es_result_relations array.  That's a big bogus, but it's enough to make
+    * ExecGetTriggerResultRel() find them.
*/
estate = CreateExecutorState();
resultRelInfos = (ResultRelInfo *)
palloc(list_length(rels) * sizeof(ResultRelInfo));
resultRelInfo = resultRelInfos;
+   estate->es_result_relations = (ResultRelInfo **)
+       palloc(list_length(rels) * sizeof(ResultRelInfo *));

Maybe don't allocate es_result_relations here?

Fixed.

Anyway, other than my concern about ExecRelationIsTargetRelation()
mentioned above, I think the patch looks good.

Ok, committed. I'll continue to look at the rest of the patches in this
patch series now.

- Heikki

#58Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#57)
Re: partition routing layering in nodeModifyTable.c

On Tue, Oct 13, 2020 at 7:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 13/10/2020 07:32, Amit Langote wrote:

On Tue, Oct 13, 2020 at 1:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It occurred to me that if we do that (initialize the array lazily),
there's very little need for the PlannedStmt->resultRelations list
anymore. It's only used in ExecRelationIsTargetRelation(), but if we
assume that ExecRelationIsTargetRelation() is only called after InitPlan
has initialized the result relation for the relation, it can easily
check es_result_relations instead. I think that's a safe assumption.
ExecRelationIsTargetRelation() is only used in FDWs, and I believe the
FDWs initialization routine can only be called after ExecInitModifyTable
has been called on the relation.

The PlannedStmt->rootResultRelations field is even more useless.

I am very much tempted to remove those fields from PlannedStmt,
although I am concerned that the following now assumes that *all*
result relations are initialized in the executor initialization phase:

bool
ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
{
if (!estate->es_result_relations)
return false;

return estate->es_result_relations[scanrelid - 1] != NULL;
}

In the other thread [1], I am proposing that we initialize result
relations lazily, but the above will be a blocker to that.

Ok, I'll leave it alone then. But I'll still merge resultRelations and
rootResultRelations into one list. I don't see any point in keeping them
separate.

Should be fine. As you said in the commit message, it should probably
have been that way to begin with, but I don't recall why I didn't make
it so.

I'm tempted to remove ExecRelationIsTargetRelation() altogether, but
keeping the resultRelations list isn't really a big deal, so I'll leave
that for another discussion.

Yeah, makes sense.

Anyway, other than my concern about ExecRelationIsTargetRelation()
mentioned above, I think the patch looks good.

Ok, committed. I'll continue to look at the rest of the patches in this
patch series now.

Thanks.

BTW, you mentioned the lazy ResultRelInfo optimization bit in the
commit message, so does that mean you intend to take a look at the
other thread [1]https://commitfest.postgresql.org/30/2621/ too? Or should I post a rebased version of the lazy
ResultRelInfo initialization patch here in this thread? That patch is
just a bunch of refactoring too.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: https://commitfest.postgresql.org/30/2621/

#59Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#58)
2 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On 13/10/2020 15:03, Amit Langote wrote:

On Tue, Oct 13, 2020 at 7:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Ok, committed. I'll continue to look at the rest of the patches in this
patch series now.

I've reviewed the next two patches in the series, they are pretty much
ready for commit now. I made just a few minor changes, notably:

- I moved the responsibility to set ForeignTable->resultRelation to the
FDWs, like you had in the original patch version. Sorry for
flip-flopping on that.

- In postgres_fdw.c, I changed it to store the ResultRelInfo pointer in
PgFdwDirectModifyState, instead of storing the RT index and looking it
up in the BeginDirectModify and IterateDirectModify. I think you did it
that way in the earlier patch versions, too.

- Some minor comment and docs kibitzing.

One little idea I had:

I think all FDWs that support direct modify will have to carry the
resultRelaton index or the ResultRelInfo pointer from BeginDirectModify
to IterateDirectModify in the FDW's private struct. It's not
complicated, but should we make life easier for FDWs by storing the
ResultRelInfo pointer in the ForeignScanState struct in the core code?
The doc now says:

The data that was actually inserted, updated or deleted must be
stored in the ri_projectReturning->pi_exprContext->ecxt_scantuple of
the target foreign table's ResultRelInfo obtained using the
information passed to BeginDirectModify. Return NULL if no more rows
are available.

That "ResultRelInfo obtained using the information passed to
BeginDirectModify" part is a pretty vague. We could expand it, but if we
stored the ResultRelInfo in the ForeignScanState, we could explain it
succinctly.

BTW, you mentioned the lazy ResultRelInfo optimization bit in the
commit message, so does that mean you intend to take a look at the
other thread [1] too? Or should I post a rebased version of the lazy
ResultRelInfo initialization patch here in this thread? That patch is
just a bunch of refactoring too.

No promises, but yeah, now that I'm knee-deep in this ResultRelInfo
business, I'll try to take a look at that too :-).

- Heikki

Attachments:

v16-0001-Include-result-relation-index-in-ForeignScan-for.patchtext/x-patch; charset=UTF-8; name=v16-0001-Include-result-relation-index-in-ForeignScan-for.patchDownload
From fbf1940d054bdafdac3795b62abe1ad17eaa3dce Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 13 Oct 2020 18:28:43 +0300
Subject: [PATCH v16 1/2] Include result relation index in ForeignScan for
 direct modify plans.

FDWs that can perform an UPDATE/DELETE remotely using the "direct
modify" set of APIs need in some cases to access the result relation
properties for which they can currently look at
EState.es_result_relation_info, which the core executor laboriously
makes sure is set correctly.  An upcoming patch will remove that field
from EState.

This commit adds a new resultRelation field in ForeignScan, to store
the target relation's RT index. The FDW's PlanDirectModify callback is
expected to set it along with 'operation'. The core code doesn't need
it for anything, but the FDW's Begin- and IterateDirectModify
callbacks can use it to get the target relation's ResultRelInfo.

Amit Langote, Etsuro Fujita
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
---
 contrib/postgres_fdw/postgres_fdw.c     | 41 +++++++++++++++++--------
 doc/src/sgml/fdwhandler.sgml            | 22 ++++++++-----
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  4 +++
 src/backend/optimizer/plan/setrefs.c    |  4 +++
 src/include/nodes/plannodes.h           |  8 +++++
 8 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce7c9..bfd73b40f2 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -218,6 +218,7 @@ typedef struct PgFdwDirectModifyState
 	int			num_tuples;		/* # of result tuples */
 	int			next_tuple;		/* index of next one to return */
 	Relation	resultRel;		/* relcache entry for the target relation */
+	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for the target relation */
 	AttrNumber *attnoMap;		/* array of attnums of input user columns */
 	AttrNumber	ctidAttno;		/* attnum of input ctid column */
 	AttrNumber	oidAttno;		/* attnum of input oid column */
@@ -2287,9 +2288,10 @@ postgresPlanDirectModify(PlannerInfo *root,
 	}
 
 	/*
-	 * Update the operation info.
+	 * Update the operation and target relation info.
 	 */
 	fscan->operation = operation;
+	fscan->resultRelation = resultRelation;
 
 	/*
 	 * Update the fdw_exprs list that will be available to the executor.
@@ -2333,6 +2335,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	EState	   *estate = node->ss.ps.state;
 	PgFdwDirectModifyState *dmstate;
 	Index		rtindex;
+	Relation	rel;
 	RangeTblEntry *rte;
 	Oid			userid;
 	ForeignTable *table;
@@ -2355,18 +2358,31 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	Assert(fsplan->resultRelation > 0);
+	rtindex = fsplan->resultRelation;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
-	/* Get info about foreign table. */
+	/*
+	 * Get info about target table.  For a simple scan on a single foreign
+	 * table, the target table is the table being scanned.  For a join, it's
+	 * one of the tables being joined.
+	 */
 	if (fsplan->scan.scanrelid == 0)
-		dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+		rel = ExecOpenScanRelation(estate, rtindex, eflags);
 	else
-		dmstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(dmstate->rel));
+	{
+		Assert(rtindex == fsplan->scan.scanrelid);
+		rel = node->ss.ss_currentRelation;
+	}
+	table = GetForeignTable(RelationGetRelid(rel));
 	user = GetUserMapping(userid, table->serverid);
 
+	dmstate->resultRelInfo = estate->es_result_relations[rtindex - 1];
+	/* the executor must have initialized the ResultRelInfo for us. */
+	Assert(dmstate->resultRelInfo != NULL);
+	dmstate->resultRel = rel;
+
 	/*
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
@@ -2376,9 +2392,6 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	/* Update the foreign-join-related fields. */
 	if (fsplan->scan.scanrelid == 0)
 	{
-		/* Save info about foreign table. */
-		dmstate->resultRel = dmstate->rel;
-
 		/*
 		 * Set dmstate->rel to NULL to teach get_returning_data() and
 		 * make_tuple_from_result_row() that columns fetched from the remote
@@ -2387,6 +2400,8 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 		 */
 		dmstate->rel = NULL;
 	}
+	else
+		dmstate->rel = rel;
 
 	/* Initialize state variable */
 	dmstate->num_tuples = -1;	/* -1 means not set yet */
@@ -2450,7 +2465,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4085,8 +4100,8 @@ static TupleTableSlot *
 get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4233,7 +4248,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = dmstate->resultRelInfo;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
@@ -4245,7 +4260,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 	/*
 	 * Use the return tuple slot as a place to store the result tuple.
 	 */
-	resultSlot = ExecGetReturningSlot(estate, relInfo);
+	resultSlot = ExecGetReturningSlot(estate, resultRelInfo);
 
 	/*
 	 * Extract all the values of the scan tuple.
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 72fa127212..43f287a29c 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -861,11 +861,15 @@ PlanDirectModify(PlannerInfo *root,
      To execute the direct modification on the remote server, this function
      must rewrite the target subplan with a <structname>ForeignScan</structname> plan
      node that executes the direct modification on the remote server.  The
-     <structfield>operation</structfield> field of the <structname>ForeignScan</structname> must
-     be set to the <literal>CmdType</literal> enumeration appropriately; that is,
+     <structfield>operation</structfield> and <structfield>resultRelation</structfield> fields
+     of the <structname>ForeignScan</structname> must be set appropriately.
+     <structfield>operation</structfield> must be set to the <literal>CmdType</literal>
+     enumeration corresponding to the statement kind (that is,
      <literal>CMD_UPDATE</literal> for <command>UPDATE</command>,
      <literal>CMD_INSERT</literal> for <command>INSERT</command>, and
-     <literal>CMD_DELETE</literal> for <command>DELETE</command>.
+     <literal>CMD_DELETE</literal> for <command>DELETE</command>), and the
+     <literal>resultRelation</literal> argument must be copied to the
+     <structfield>resultRelation</structfield> field.
     </para>
 
     <para>
@@ -892,9 +896,10 @@ BeginDirectModify(ForeignScanState *node,
      The <structname>ForeignScanState</structname> node has already been created, but
      its <structfield>fdw_state</structfield> field is still NULL.  Information about
      the table to modify is accessible through the
-     <structname>ForeignScanState</structname> node (in particular, from the underlying
-     <structname>ForeignScan</structname> plan node, which contains any FDW-private
-     information provided by <function>PlanDirectModify</function>).
+     <structname>ForeignScanState</structname> node (in particular, from the
+     underlying <structname>ForeignScan</structname> plan node, which contains
+     the target table's range table index and any FDW-private information
+     provided by <function>PlanDirectModify</function>).
      <literal>eflags</literal> contains flag bits describing the executor's
      operating mode for this plan node.
     </para>
@@ -926,8 +931,9 @@ IterateDirectModify(ForeignScanState *node);
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
      or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     <literal>ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
+     of the target foreign table's <structname>ResultRelInfo</structname>
+     obtained using the information passed to <function>BeginDirectModify</function>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 4d79f70950..2b4d7654cc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelation);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f441ae3c51..08a049232e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelation);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3a54765f5c..ab7b535caa 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2014,6 +2014,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelation);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 881eaf4813..94280a730c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5530,7 +5530,11 @@ make_foreignscan(List *qptlist,
 	plan->lefttree = outer_plan;
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
+
+	/* these may be overridden by the FDW's PlanDirectModify callback. */
 	node->operation = CMD_SELECT;
+	node->resultRelation = 0;
+
 	/* fs_server will be filled in by create_foreignscan_plan */
 	node->fs_server = InvalidOid;
 	node->fdw_exprs = fdw_exprs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 6847ff6f44..8b43371425 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1310,6 +1310,10 @@ set_foreignscan_references(PlannerInfo *root,
 	}
 
 	fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+	/* Adjust resultRelation if it's valid */
+	if (fscan->resultRelation > 0)
+		fscan->resultRelation += rtoffset;
 }
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a7bdf3497e..7e6b10f86b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -599,12 +599,20 @@ typedef struct WorkTableScan
  * When the plan node represents a foreign join, scan.scanrelid is zero and
  * fs_relids must be consulted to identify the join relation.  (fs_relids
  * is valid for simple scans as well, but will always match scan.scanrelid.)
+ *
+ * If the FDW's PlanDirectModify() callback decides to repurpose a ForeignScan
+ * node to perform the UPDATE or DELETE operation directly in the remote
+ * server, it sets 'operation' and 'resultRelation' to identify the operation
+ * type and target relation.  Note that these fields are only set if the
+ * modification is performed *fully* remotely; otherwise, the modification is
+ * driven by a local ModifyTable node and 'operation' is left to CMD_SELECT.
  * ----------------
  */
 typedef struct ForeignScan
 {
 	Scan		scan;
 	CmdType		operation;		/* SELECT/INSERT/UPDATE/DELETE */
+	Index		resultRelation; /* direct modification target's RT index */
 	Oid			fs_server;		/* OID of foreign server */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
 	List	   *fdw_private;	/* private data for FDW */
-- 
2.20.1

v16-0002-Remove-es_result_relation_info.patchtext/x-patch; charset=UTF-8; name=v16-0002-Remove-es_result_relation_info.patchDownload
From f02ec7543e4d17075b62734f3c1938f307c5b07c Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 13 Oct 2020 18:37:57 +0300
Subject: [PATCH v16 2/2] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
---
 src/backend/commands/copy.c              |  19 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |   9 +-
 src/backend/executor/execMain.c          |   4 -
 src/backend/executor/execReplication.c   |  24 +--
 src/backend/executor/execUtils.c         |   1 -
 src/backend/executor/nodeModifyTable.c   | 201 ++++++++++-------------
 src/backend/replication/logical/worker.c |  17 +-
 src/include/executor/executor.h          |  19 ++-
 src/include/executor/nodeModifyTable.h   |   4 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 131 insertions(+), 178 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 71d48d4574..531bd7c73a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2489,9 +2489,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2524,7 +2521,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2839,8 +2837,6 @@ CopyFrom(CopyState cstate)
 
 	ExecOpenIndices(resultRelInfo, false);
 
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Set up a ModifyTableState so we can let FDW(s) init themselves for
 	 * foreign-table result relation(s).
@@ -3108,11 +3104,6 @@ CopyFrom(CopyState cstate)
 				prevResultRelInfo = resultRelInfo;
 			}
 
-			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
 			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
@@ -3217,7 +3208,8 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot, CMD_INSERT);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot,
+											   CMD_INSERT);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3288,7 +3280,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 80fedad5e0..511f015a86 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1820,7 +1820,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1950,7 +1949,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 1862af621b..c6b5bcba7b 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -501,7 +499,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 783eecbc13..293f53d07c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -827,9 +827,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 	estate->es_plannedstmt = plannedstmt;
 
-	/* es_result_relation_info is NULL except when within ModifyTable */
-	estate->es_result_relation_info = NULL;
-
 	/*
 	 * Next, build the ExecRowMark array from the PlanRowMark(s), if any.
 	 */
@@ -2694,7 +2691,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 * subplans themselves are initialized.
 	 */
 	parentestate->es_result_relations = NULL;
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index b29db7bf4f..01d26881e7 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -404,10 +404,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -430,7 +430,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -442,7 +443,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -466,11 +468,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -496,7 +498,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -508,7 +511,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -527,11 +531,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 6d8c112e2f..071a0007eb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,7 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_opened_result_relations = NIL;
-	estate->es_result_relation_info = NULL;
 	estate->es_tuple_routing_result_relations = NIL;
 	estate->es_trig_target_relations = NIL;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index b3f7012e38..ad9920883b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,10 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -366,32 +368,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
-
-	ExecMaterializeSlot(slot);
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 
 	/*
-	 * get information on the (current) result relation
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
+
+	ExecMaterializeSlot(slot);
+
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -424,7 +442,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -459,7 +478,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -521,8 +541,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -582,7 +602,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -621,7 +642,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -707,6 +729,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -718,8 +741,7 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
-	Relation	resultRelationDesc;
+	Relation	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
@@ -728,12 +750,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
-	resultRelationDesc = resultRelInfo->ri_RelationDesc;
-
 	/* BEFORE ROW DELETE Triggers */
 	if (resultRelInfo->ri_TrigDesc &&
 		resultRelInfo->ri_TrigDesc->trig_delete_before_row)
@@ -1067,6 +1083,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1075,12 +1092,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
-	Relation	resultRelationDesc;
+	Relation	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1090,12 +1105,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
-	resultRelationDesc = resultRelInfo->ri_RelationDesc;
-
 	/* BEFORE ROW UPDATE Triggers */
 	if (resultRelInfo->ri_TrigDesc &&
 		resultRelInfo->ri_TrigDesc->trig_update_before_row)
@@ -1120,7 +1129,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1157,7 +1167,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1207,6 +1218,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1232,9 +1244,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1274,16 +1289,6 @@ lreplace:;
 				}
 			}
 
-			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
@@ -1301,18 +1306,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
+			/* Tuple routing starts from the root table. */
+			Assert(mtstate->rootResultRelInfo != NULL);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1476,7 +1481,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1715,7 +1721,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1872,40 +1878,36 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
- *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
-	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
 	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
@@ -2016,10 +2018,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2067,17 +2067,6 @@ ExecModifyTable(PlanState *pstate)
 	subplanstate = node->mt_plans[node->mt_whichplan];
 	junkfilter = resultRelInfo->ri_junkFilter;
 
-	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
@@ -2111,7 +2100,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2156,7 +2144,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2239,25 +2226,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2269,15 +2252,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2298,7 +2275,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l,
@@ -2346,14 +2322,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	forboth(l, node->resultRelations, l1, node->plans)
@@ -2400,7 +2370,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2424,8 +2393,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 8d5d9e05b3..4f32dc74c8 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1174,7 +1174,6 @@ apply_handle_insert(StringInfo s)
 										&TTSOpsVirtual);
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
-	estate->es_result_relation_info = resultRelInfo;
 
 	/* Input functions may need an active snapshot, so get one */
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -1214,7 +1213,7 @@ apply_handle_insert_internal(ResultRelInfo *relinfo,
 	ExecOpenIndices(relinfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(relinfo, estate, remoteslot);
 
 	/* Cleanup. */
 	ExecCloseIndices(relinfo);
@@ -1300,7 +1299,6 @@ apply_handle_update(StringInfo s)
 										&TTSOpsVirtual);
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
-	estate->es_result_relation_info = resultRelInfo;
 
 	/*
 	 * Populate updatedCols so that per-column triggers can fire.  This could
@@ -1392,7 +1390,8 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -1455,7 +1454,6 @@ apply_handle_delete(StringInfo s)
 										&TTSOpsVirtual);
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
-	estate->es_result_relation_info = resultRelInfo;
 
 	PushActiveSnapshot(GetTransactionSnapshot());
 
@@ -1508,7 +1506,7 @@ apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate,
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(relinfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -1616,7 +1614,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 	}
 	MemoryContextSwitchTo(oldctx);
 
-	estate->es_result_relation_info = partrelinfo;
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -1697,8 +1694,8 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					ExecOpenIndices(partrelinfo, false);
 
 					EvalPlanQualSetSlot(&epqstate, remoteslot_part);
-					ExecSimpleRelationUpdate(estate, &epqstate, localslot,
-											 remoteslot_part);
+					ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,
+											 localslot, remoteslot_part);
 					ExecCloseIndices(partrelinfo);
 					EvalPlanQualEnd(&epqstate);
 				}
@@ -1739,7 +1736,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					Assert(partrelinfo_new != partrelinfo);
 
 					/* DELETE old tuple found in the old partition. */
-					estate->es_result_relation_info = partrelinfo;
 					apply_handle_delete_internal(partrelinfo, estate,
 												 localslot,
 												 &relmapentry->remoterel);
@@ -1771,7 +1767,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 						slot_getallattrs(remoteslot);
 					}
 					MemoryContextSwitchTo(oldctx);
-					estate->es_result_relation_info = partrelinfo_new;
 					apply_handle_insert_internal(partrelinfo_new, estate,
 												 remoteslot_part);
 				}
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c283bf1454..2f221d7114 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -576,10 +576,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -596,10 +600,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 4ec4ebdabc..2518fe4f64 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,9 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a926ff1711..0310e640fe 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -525,7 +525,6 @@ typedef struct EState
 	List	   *es_opened_result_relations; /* List of non-NULL entries in
 											 * es_result_relations in no
 											 * specific order */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	PartitionDirectory es_partition_directory;	/* for PartitionDesc lookup */
 
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index eb9d45be5e..da50ee3b67 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index ffd4aacbc4..963faa1614 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
2.20.1

#60Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Heikki Linnakangas (#59)
2 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On 13/10/2020 19:09, Heikki Linnakangas wrote:

One little idea I had:

I think all FDWs that support direct modify will have to carry the
resultRelaton index or the ResultRelInfo pointer from BeginDirectModify
to IterateDirectModify in the FDW's private struct. It's not
complicated, but should we make life easier for FDWs by storing the
ResultRelInfo pointer in the ForeignScanState struct in the core code?
The doc now says:

The data that was actually inserted, updated or deleted must be
stored in the ri_projectReturning->pi_exprContext->ecxt_scantuple of
the target foreign table's ResultRelInfo obtained using the
information passed to BeginDirectModify. Return NULL if no more rows
are available.

That "ResultRelInfo obtained using the information passed to
BeginDirectModify" part is a pretty vague. We could expand it, but if we
stored the ResultRelInfo in the ForeignScanState, we could explain it
succinctly.

I tried that approach, see attached. Yeah, this feels better to me.

- Heikki

Attachments:

v17-0001-Include-result-relation-index-in-ForeignScan-for.patchtext/x-patch; charset=UTF-8; name=v17-0001-Include-result-relation-index-in-ForeignScan-for.patchDownload
From 0206a1429171949de1141c0e9a99a244f7bbc115 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 13 Oct 2020 19:28:00 +0300
Subject: [PATCH v17 1/2] Include result relation index in ForeignScan for
 direct modify plans.

FDWs that can perform an UPDATE/DELETE remotely using the "direct
modify" set of APIs need in some cases to access the result relation
properties for which they can currently look at
EState.es_result_relation_info, which the core executor laboriously
makes sure is set correctly.  An upcoming patch will remove that field
from EState.

This commit adds a new resultRelation field in ForeignScan, to store
the target relation's RT index. The FDW's PlanDirectModify callback is
expected to set it along with 'operation'. The core code doesn't need
it for anything, but the FDW's Begin- and IterateDirectModify
callbacks can use it to get the target relation's ResultRelInfo.

Amit Langote, Etsuro Fujita
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
---
 contrib/postgres_fdw/postgres_fdw.c     | 16 +++++++++-------
 doc/src/sgml/fdwhandler.sgml            | 15 +++++++++------
 src/backend/executor/nodeForeignscan.c  |  7 +++++++
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c |  4 ++++
 src/backend/optimizer/plan/setrefs.c    |  4 ++++
 src/include/nodes/execnodes.h           |  2 ++
 src/include/nodes/plannodes.h           |  8 ++++++++
 10 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce7c9..78facb8ebf 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -451,6 +451,7 @@ static void init_returning_filter(PgFdwDirectModifyState *dmstate,
 								  List *fdw_scan_tlist,
 								  Index rtindex);
 static TupleTableSlot *apply_returning_filter(PgFdwDirectModifyState *dmstate,
+											  ResultRelInfo *resultRelInfo,
 											  TupleTableSlot *slot,
 											  EState *estate);
 static void prepare_query_params(PlanState *node,
@@ -2287,9 +2288,10 @@ postgresPlanDirectModify(PlannerInfo *root,
 	}
 
 	/*
-	 * Update the operation info.
+	 * Update the operation and target relation info.
 	 */
 	fscan->operation = operation;
+	fscan->resultRelation = resultRelation;
 
 	/*
 	 * Update the fdw_exprs list that will be available to the executor.
@@ -2355,7 +2357,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
+	rtindex = node->resultRelInfo->ri_RangeTableIndex;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
@@ -2450,7 +2452,7 @@ postgresIterateDirectModify(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = node->resultRelInfo;
 
 	/*
 	 * If this is the first call after Begin, execute the statement.
@@ -4086,7 +4088,7 @@ get_returning_data(ForeignScanState *node)
 {
 	PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
 	EState	   *estate = node->ss.ps.state;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+	ResultRelInfo *resultRelInfo = node->resultRelInfo;
 	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
 	TupleTableSlot *resultSlot;
 
@@ -4141,7 +4143,7 @@ get_returning_data(ForeignScanState *node)
 		if (dmstate->rel)
 			resultSlot = slot;
 		else
-			resultSlot = apply_returning_filter(dmstate, slot, estate);
+			resultSlot = apply_returning_filter(dmstate, resultRelInfo, slot, estate);
 	}
 	dmstate->next_tuple++;
 
@@ -4230,10 +4232,10 @@ init_returning_filter(PgFdwDirectModifyState *dmstate,
  */
 static TupleTableSlot *
 apply_returning_filter(PgFdwDirectModifyState *dmstate,
+					   ResultRelInfo *resultRelInfo,
 					   TupleTableSlot *slot,
 					   EState *estate)
 {
-	ResultRelInfo *relInfo = estate->es_result_relation_info;
 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
 	TupleTableSlot *resultSlot;
 	Datum	   *values;
@@ -4245,7 +4247,7 @@ apply_returning_filter(PgFdwDirectModifyState *dmstate,
 	/*
 	 * Use the return tuple slot as a place to store the result tuple.
 	 */
-	resultSlot = ExecGetReturningSlot(estate, relInfo);
+	resultSlot = ExecGetReturningSlot(estate, resultRelInfo);
 
 	/*
 	 * Extract all the values of the scan tuple.
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 72fa127212..9c9293414c 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -861,11 +861,15 @@ PlanDirectModify(PlannerInfo *root,
      To execute the direct modification on the remote server, this function
      must rewrite the target subplan with a <structname>ForeignScan</structname> plan
      node that executes the direct modification on the remote server.  The
-     <structfield>operation</structfield> field of the <structname>ForeignScan</structname> must
-     be set to the <literal>CmdType</literal> enumeration appropriately; that is,
+     <structfield>operation</structfield> and <structfield>resultRelation</structfield> fields
+     of the <structname>ForeignScan</structname> must be set appropriately.
+     <structfield>operation</structfield> must be set to the <literal>CmdType</literal>
+     enumeration corresponding to the statement kind (that is,
      <literal>CMD_UPDATE</literal> for <command>UPDATE</command>,
      <literal>CMD_INSERT</literal> for <command>INSERT</command>, and
-     <literal>CMD_DELETE</literal> for <command>DELETE</command>.
+     <literal>CMD_DELETE</literal> for <command>DELETE</command>), and the
+     <literal>resultRelation</literal> argument must be copied to the
+     <structfield>resultRelation</structfield> field.
     </para>
 
     <para>
@@ -925,9 +929,8 @@ IterateDirectModify(ForeignScanState *node);
      needed for the <literal>RETURNING</literal> calculation, returning it in a
      tuple table slot (the node's <structfield>ScanTupleSlot</structfield> should be
      used for this purpose).  The data that was actually inserted, updated
-     or deleted must be stored in the
-     <literal>es_result_relation_info-&gt;ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>
-     of the node's <structname>EState</structname>.
+     or deleted must be stored in
+     <literal>node->resultRelInfo->ri_projectReturning-&gt;pi_exprContext-&gt;ecxt_scantuple</literal>.
      Return NULL if no more rows are available.
      Note that this is called in a short-lived memory context that will be
      reset between invocations.  Create a memory context in
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 513471ab9b..0b20f94035 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -215,6 +215,13 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
+	/*
+	 * For the FDW's convenience, look up the modification target relation's.
+	 * ResultRelInfo.
+	 */
+	if (node->resultRelation > 0)
+		scanstate->resultRelInfo = estate->es_result_relations[node->resultRelation - 1];
+
 	/* Initialize any outer plan. */
 	if (outerPlan(node))
 		outerPlanState(scanstate) =
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 4d79f70950..2b4d7654cc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -758,6 +758,7 @@ _copyForeignScan(const ForeignScan *from)
 	COPY_NODE_FIELD(fdw_recheck_quals);
 	COPY_BITMAPSET_FIELD(fs_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
+	COPY_SCALAR_FIELD(resultRelation);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f441ae3c51..08a049232e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -695,6 +695,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_NODE_FIELD(fdw_recheck_quals);
 	WRITE_BITMAPSET_FIELD(fs_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
+	WRITE_INT_FIELD(resultRelation);
 }
 
 static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3a54765f5c..ab7b535caa 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2014,6 +2014,7 @@ _readForeignScan(void)
 	READ_NODE_FIELD(fdw_recheck_quals);
 	READ_BITMAPSET_FIELD(fs_relids);
 	READ_BOOL_FIELD(fsSystemCol);
+	READ_INT_FIELD(resultRelation);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 881eaf4813..94280a730c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5530,7 +5530,11 @@ make_foreignscan(List *qptlist,
 	plan->lefttree = outer_plan;
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
+
+	/* these may be overridden by the FDW's PlanDirectModify callback. */
 	node->operation = CMD_SELECT;
+	node->resultRelation = 0;
+
 	/* fs_server will be filled in by create_foreignscan_plan */
 	node->fs_server = InvalidOid;
 	node->fdw_exprs = fdw_exprs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 6847ff6f44..8b43371425 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1310,6 +1310,10 @@ set_foreignscan_references(PlannerInfo *root,
 	}
 
 	fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset);
+
+	/* Adjust resultRelation if it's valid */
+	if (fscan->resultRelation > 0)
+		fscan->resultRelation += rtoffset;
 }
 
 /*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a926ff1711..da1f3f269a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1780,6 +1780,8 @@ typedef struct ForeignScanState
 	/* use struct pointer to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
+
+	ResultRelInfo *resultRelInfo;	/* result rel info, if UPDATE or DELETE */
 } ForeignScanState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a7bdf3497e..7e6b10f86b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -599,12 +599,20 @@ typedef struct WorkTableScan
  * When the plan node represents a foreign join, scan.scanrelid is zero and
  * fs_relids must be consulted to identify the join relation.  (fs_relids
  * is valid for simple scans as well, but will always match scan.scanrelid.)
+ *
+ * If the FDW's PlanDirectModify() callback decides to repurpose a ForeignScan
+ * node to perform the UPDATE or DELETE operation directly in the remote
+ * server, it sets 'operation' and 'resultRelation' to identify the operation
+ * type and target relation.  Note that these fields are only set if the
+ * modification is performed *fully* remotely; otherwise, the modification is
+ * driven by a local ModifyTable node and 'operation' is left to CMD_SELECT.
  * ----------------
  */
 typedef struct ForeignScan
 {
 	Scan		scan;
 	CmdType		operation;		/* SELECT/INSERT/UPDATE/DELETE */
+	Index		resultRelation; /* direct modification target's RT index */
 	Oid			fs_server;		/* OID of foreign server */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
 	List	   *fdw_private;	/* private data for FDW */
-- 
2.20.1

v17-0002-Remove-es_result_relation_info.patchtext/x-patch; charset=UTF-8; name=v17-0002-Remove-es_result_relation_info.patchDownload
From 4e7aa355561303ca9998993964d5cf393eff50f3 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 13 Oct 2020 18:37:57 +0300
Subject: [PATCH v17 2/2] Remove es_result_relation_info

This changes many places that access the currently active result
relation via es_result_relation_info to instead receive it directly
via function parameters.  Maintaining that state in
es_result_relation_info has become cumbersome, especially with
partitioning where each partition gets its own result relation info.
Having to set and reset it across arbitrary operations has caused
bugs in the past.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
---
 src/backend/commands/copy.c              |  19 +--
 src/backend/commands/tablecmds.c         |   2 -
 src/backend/executor/execIndexing.c      |   9 +-
 src/backend/executor/execMain.c          |   4 -
 src/backend/executor/execReplication.c   |  24 +--
 src/backend/executor/execUtils.c         |   1 -
 src/backend/executor/nodeModifyTable.c   | 201 ++++++++++-------------
 src/backend/replication/logical/worker.c |  17 +-
 src/include/executor/executor.h          |  19 ++-
 src/include/executor/nodeModifyTable.h   |   4 +-
 src/include/nodes/execnodes.h            |   1 -
 src/test/regress/expected/insert.out     |   4 +-
 src/test/regress/sql/insert.sql          |   4 +-
 13 files changed, 131 insertions(+), 178 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 71d48d4574..531bd7c73a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2489,9 +2489,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
 	TupleTableSlot **slots = buffer->slots;
 
-	/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Print error context information correctly, if one of the operations
 	 * below fail.
@@ -2524,7 +2521,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
+				ExecInsertIndexTuples(resultRelInfo,
+									  buffer->slots[i], estate, false, NULL,
 									  NIL);
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 slots[i], recheckIndexes,
@@ -2839,8 +2837,6 @@ CopyFrom(CopyState cstate)
 
 	ExecOpenIndices(resultRelInfo, false);
 
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Set up a ModifyTableState so we can let FDW(s) init themselves for
 	 * foreign-table result relation(s).
@@ -3108,11 +3104,6 @@ CopyFrom(CopyState cstate)
 				prevResultRelInfo = resultRelInfo;
 			}
 
-			/*
-			 * For ExecInsertIndexTuples() to work on the partition's indexes
-			 */
-			estate->es_result_relation_info = resultRelInfo;
-
 			/*
 			 * If we're capturing transition tuples, we might need to convert
 			 * from the partition rowtype to root rowtype.
@@ -3217,7 +3208,8 @@ CopyFrom(CopyState cstate)
 				/* Compute stored generated columns */
 				if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 					resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-					ExecComputeStoredGenerated(estate, myslot, CMD_INSERT);
+					ExecComputeStoredGenerated(resultRelInfo, estate, myslot,
+											   CMD_INSERT);
 
 				/*
 				 * If the target is a plain table, check the constraints of
@@ -3288,7 +3280,8 @@ CopyFrom(CopyState cstate)
 										   myslot, mycid, ti_options, bistate);
 
 						if (resultRelInfo->ri_NumIndices > 0)
-							recheckIndexes = ExecInsertIndexTuples(myslot,
+							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+																   myslot,
 																   estate,
 																   false,
 																   NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 80fedad5e0..511f015a86 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1820,7 +1820,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecBSTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
@@ -1950,7 +1949,6 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 	resultRelInfo = resultRelInfos;
 	foreach(cell, rels)
 	{
-		estate->es_result_relation_info = resultRelInfo;
 		ExecASTruncateTriggers(estate, resultRelInfo);
 		resultRelInfo++;
 	}
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 1862af621b..c6b5bcba7b 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -270,7 +270,8 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  * ----------------------------------------------------------------
  */
 List *
-ExecInsertIndexTuples(TupleTableSlot *slot,
+ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+					  TupleTableSlot *slot,
 					  EState *estate,
 					  bool noDupErr,
 					  bool *specConflict,
@@ -278,7 +279,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 {
 	ItemPointer tupleid = &slot->tts_tid;
 	List	   *result = NIL;
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -293,7 +293,6 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
@@ -479,11 +478,10 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  * ----------------------------------------------------------------
  */
 bool
-ExecCheckIndexConstraints(TupleTableSlot *slot,
+ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
 						  EState *estate, ItemPointer conflictTid,
 						  List *arbiterIndexes)
 {
-	ResultRelInfo *resultRelInfo;
 	int			i;
 	int			numIndices;
 	RelationPtr relationDescs;
@@ -501,7 +499,6 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 	/*
 	 * Get information from the result relation info structure.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
 	numIndices = resultRelInfo->ri_NumIndices;
 	relationDescs = resultRelInfo->ri_IndexRelationDescs;
 	indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 783eecbc13..293f53d07c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -827,9 +827,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 	estate->es_plannedstmt = plannedstmt;
 
-	/* es_result_relation_info is NULL except when within ModifyTable */
-	estate->es_result_relation_info = NULL;
-
 	/*
 	 * Next, build the ExecRowMark array from the PlanRowMark(s), if any.
 	 */
@@ -2694,7 +2691,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 * subplans themselves are initialized.
 	 */
 	parentestate->es_result_relations = NULL;
-	/* es_result_relation_info must NOT be copied */
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index b29db7bf4f..01d26881e7 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -404,10 +404,10 @@ retry:
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
+ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+						 EState *estate, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 
 	/* For now we support only tables. */
@@ -430,7 +430,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -442,7 +443,8 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		simple_table_tuple_insert(resultRelInfo->ri_RelationDesc, slot);
 
 		if (resultRelInfo->ri_NumIndices > 0)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW INSERT Triggers */
@@ -466,11 +468,11 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot, TupleTableSlot *slot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &(searchslot->tts_tid);
 
@@ -496,7 +498,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -508,7 +511,8 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 								  &update_indexes);
 
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL,
 												   NIL);
 
 		/* AFTER ROW UPDATE Triggers */
@@ -527,11 +531,11 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
  * Caller is responsible for opening the indexes.
  */
 void
-ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+						 EState *estate, EPQState *epqstate,
 						 TupleTableSlot *searchslot)
 {
 	bool		skip_tuple = false;
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	ItemPointer tid = &searchslot->tts_tid;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 6d8c112e2f..071a0007eb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,7 +125,6 @@ CreateExecutorState(void)
 
 	estate->es_result_relations = NULL;
 	estate->es_opened_result_relations = NIL;
-	estate->es_result_relation_info = NULL;
 	estate->es_tuple_routing_result_relations = NIL;
 	estate->es_trig_target_relations = NIL;
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index b3f7012e38..ad9920883b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -70,7 +70,8 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   EState *estate,
 											   PartitionTupleRouting *proute,
 											   ResultRelInfo *targetRelInfo,
-											   TupleTableSlot *slot);
+											   TupleTableSlot *slot,
+											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
 static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
 static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
@@ -246,9 +247,10 @@ ExecCheckTIDVisible(EState *estate,
  * Compute stored generated columns for a tuple
  */
 void
-ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype)
+ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype)
 {
-	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	int			natts = tupdesc->natts;
@@ -366,32 +368,48 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype
  *		ExecInsert
  *
  *		For INSERT, we have to insert the tuple into the target relation
- *		and insert appropriate tuples into the index relations.
+ *		(or partition thereof) and insert appropriate tuples into the index
+ *		relations.
  *
  *		Returns RETURNING result if any, otherwise NULL.
+ *
+ *		This may change the currently active tuple conversion map in
+ *		mtstate->mt_transition_capture, so the callers must take care to
+ *		save the previous value to avoid losing track of it.
  * ----------------------------------------------------------------
  */
 static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   TupleTableSlot *slot,
 		   TupleTableSlot *planSlot,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
 	Relation	resultRelationDesc;
 	List	   *recheckIndexes = NIL;
 	TupleTableSlot *result = NULL;
 	TransitionCaptureState *ar_insert_trig_tcs;
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	OnConflictAction onconflict = node->onConflictAction;
-
-	ExecMaterializeSlot(slot);
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 
 	/*
-	 * get information on the (current) result relation
+	 * If the input result relation is a partitioned table, find the leaf
+	 * partition to insert the tuple into.
 	 */
-	resultRelInfo = estate->es_result_relation_info;
+	if (proute)
+	{
+		ResultRelInfo *partRelInfo;
+
+		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
+									   resultRelInfo, slot,
+									   &partRelInfo);
+		resultRelInfo = partRelInfo;
+	}
+
+	ExecMaterializeSlot(slot);
+
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
 	/*
@@ -424,7 +442,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * insert into foreign table: let the FDW do it
@@ -459,7 +478,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_INSERT);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -521,8 +541,8 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 	vlock:
 			specConflict = false;
-			if (!ExecCheckIndexConstraints(slot, estate, &conflictTid,
-										   arbiterIndexes))
+			if (!ExecCheckIndexConstraints(resultRelInfo, slot, estate,
+										   &conflictTid, arbiterIndexes))
 			{
 				/* committed conflict tuple found */
 				if (onconflict == ONCONFLICT_UPDATE)
@@ -582,7 +602,8 @@ ExecInsert(ModifyTableState *mtstate,
 										   specToken);
 
 			/* insert index entries for tuple */
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, true,
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, true,
 												   &specConflict,
 												   arbiterIndexes);
 
@@ -621,7 +642,8 @@ ExecInsert(ModifyTableState *mtstate,
 
 			/* insert index entries for tuple */
 			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL,
+				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+													   slot, estate, false, NULL,
 													   NIL);
 		}
 	}
@@ -707,6 +729,7 @@ ExecInsert(ModifyTableState *mtstate,
  */
 static TupleTableSlot *
 ExecDelete(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *planSlot,
@@ -718,8 +741,7 @@ ExecDelete(ModifyTableState *mtstate,
 		   bool *tupleDeleted,
 		   TupleTableSlot **epqreturnslot)
 {
-	ResultRelInfo *resultRelInfo;
-	Relation	resultRelationDesc;
+	Relation	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	TupleTableSlot *slot = NULL;
@@ -728,12 +750,6 @@ ExecDelete(ModifyTableState *mtstate,
 	if (tupleDeleted)
 		*tupleDeleted = false;
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
-	resultRelationDesc = resultRelInfo->ri_RelationDesc;
-
 	/* BEFORE ROW DELETE Triggers */
 	if (resultRelInfo->ri_TrigDesc &&
 		resultRelInfo->ri_TrigDesc->trig_delete_before_row)
@@ -1067,6 +1083,7 @@ ldelete:;
  */
 static TupleTableSlot *
 ExecUpdate(ModifyTableState *mtstate,
+		   ResultRelInfo *resultRelInfo,
 		   ItemPointer tupleid,
 		   HeapTuple oldtuple,
 		   TupleTableSlot *slot,
@@ -1075,12 +1092,10 @@ ExecUpdate(ModifyTableState *mtstate,
 		   EState *estate,
 		   bool canSetTag)
 {
-	ResultRelInfo *resultRelInfo;
-	Relation	resultRelationDesc;
+	Relation	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 	TM_Result	result;
 	TM_FailureData tmfd;
 	List	   *recheckIndexes = NIL;
-	TupleConversionMap *saved_tcs_map = NULL;
 
 	/*
 	 * abort the operation if not running transactions
@@ -1090,12 +1105,6 @@ ExecUpdate(ModifyTableState *mtstate,
 
 	ExecMaterializeSlot(slot);
 
-	/*
-	 * get information on the (current) result relation
-	 */
-	resultRelInfo = estate->es_result_relation_info;
-	resultRelationDesc = resultRelInfo->ri_RelationDesc;
-
 	/* BEFORE ROW UPDATE Triggers */
 	if (resultRelInfo->ri_TrigDesc &&
 		resultRelInfo->ri_TrigDesc->trig_update_before_row)
@@ -1120,7 +1129,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * update in foreign table: let the FDW do it
@@ -1157,7 +1167,8 @@ ExecUpdate(ModifyTableState *mtstate,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(estate, slot, CMD_UPDATE);
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_UPDATE);
 
 		/*
 		 * Check any RLS UPDATE WITH CHECK policies
@@ -1207,6 +1218,7 @@ lreplace:;
 			PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 			int			map_index;
 			TupleConversionMap *tupconv_map;
+			TupleConversionMap *saved_tcs_map = NULL;
 
 			/*
 			 * Disallow an INSERT ON CONFLICT DO UPDATE that causes the
@@ -1232,9 +1244,12 @@ lreplace:;
 			 * Row movement, part 1.  Delete the tuple, but skip RETURNING
 			 * processing. We want to return rows from INSERT.
 			 */
-			ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate,
-					   estate, false, false /* canSetTag */ ,
-					   true /* changingPart */ , &tuple_deleted, &epqslot);
+			ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot,
+					   epqstate, estate,
+					   false,	/* processReturning */
+					   false,	/* canSetTag */
+					   true,	/* changingPart */
+					   &tuple_deleted, &epqslot);
 
 			/*
 			 * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1274,16 +1289,6 @@ lreplace:;
 				}
 			}
 
-			/*
-			 * Updates set the transition capture map only when a new subplan
-			 * is chosen.  But for inserts, it is set for each row. So after
-			 * INSERT, we need to revert back to the map created for UPDATE;
-			 * otherwise the next UPDATE will incorrectly use the one created
-			 * for INSERT.  So first save the one created for UPDATE.
-			 */
-			if (mtstate->mt_transition_capture)
-				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 			/*
 			 * resultRelInfo is one of the per-subplan resultRelInfos.  So we
 			 * should convert the tuple into root's tuple descriptor, since
@@ -1301,18 +1306,18 @@ lreplace:;
 											 mtstate->mt_root_tuple_slot);
 
 			/*
-			 * Prepare for tuple routing, making it look like we're inserting
-			 * into the root.
+			 * ExecInsert() may scribble on mtstate->mt_transition_capture,
+			 * so save the currently active map.
 			 */
-			Assert(mtstate->rootResultRelInfo != NULL);
-			slot = ExecPrepareTupleRouting(mtstate, estate, proute,
-										   mtstate->rootResultRelInfo, slot);
+			if (mtstate->mt_transition_capture)
+				saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
 
-			ret_slot = ExecInsert(mtstate, slot, planSlot,
-								  estate, canSetTag);
+			/* Tuple routing starts from the root table. */
+			Assert(mtstate->rootResultRelInfo != NULL);
+			ret_slot = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
+								  planSlot, estate, canSetTag);
 
-			/* Revert ExecPrepareTupleRouting's node change. */
-			estate->es_result_relation_info = resultRelInfo;
+			/* Clear the INSERT's tuple and restore the saved map. */
 			if (mtstate->mt_transition_capture)
 			{
 				mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
@@ -1476,7 +1481,8 @@ lreplace:;
 
 		/* insert index entries for tuple if necessary */
 		if (resultRelInfo->ri_NumIndices > 0 && update_indexes)
-			recheckIndexes = ExecInsertIndexTuples(slot, estate, false, NULL, NIL);
+			recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+												   slot, estate, false, NULL, NIL);
 	}
 
 	if (canSetTag)
@@ -1715,7 +1721,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	 */
 
 	/* Execute UPDATE with projection */
-	*returning = ExecUpdate(mtstate, conflictTid, NULL,
+	*returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
 							resultRelInfo->ri_onConflict->oc_ProjSlot,
 							planSlot,
 							&mtstate->mt_epqstate, mtstate->ps.state,
@@ -1872,40 +1878,36 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
  * ExecPrepareTupleRouting --- prepare for routing one tuple
  *
  * Determine the partition in which the tuple in slot is to be inserted,
- * and modify mtstate and estate to prepare for it.
- *
- * Caller must revert the estate changes after executing the insertion!
- * In mtstate, transition capture changes may also need to be reverted.
+ * and return its ResultRelInfo in *partRelInfo.  The returned value is
+ * a slot holding the tuple of the partition rowtype.
  *
- * Returns a slot holding the tuple of the partition rowtype.
+ * This also sets the transition table information in mtstate based on the
+ * selected partition.
  */
 static TupleTableSlot *
 ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						EState *estate,
 						PartitionTupleRouting *proute,
 						ResultRelInfo *targetRelInfo,
-						TupleTableSlot *slot)
+						TupleTableSlot *slot,
+						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 
 	/*
-	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
-	 * not find a valid partition for the tuple in 'slot' then an error is
+	 * Look up the target partition's ResultRelInfo.  If ExecFindPartition
+	 * doesn't find a valid partition for the tuple in 'slot' then an error is
 	 * raised.  An error may also be raised if the found partition is not a
 	 * valid target for INSERTs.  This is required since a partitioned table
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
+	*partRelInfo = partrel;
 	partrouteinfo = partrel->ri_PartitionInfo;
 	Assert(partrouteinfo != NULL);
 
-	/*
-	 * Make it look like we are inserting into the partition.
-	 */
-	estate->es_result_relation_info = partrel;
-
 	/*
 	 * If we're capturing transition tuples, we might need to convert from the
 	 * partition rowtype to root partitioned table's rowtype.
@@ -2016,10 +2018,8 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
-	PartitionTupleRouting *proute = node->mt_partition_tuple_routing;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	PlanState  *subplanstate;
 	JunkFilter *junkfilter;
@@ -2067,17 +2067,6 @@ ExecModifyTable(PlanState *pstate)
 	subplanstate = node->mt_plans[node->mt_whichplan];
 	junkfilter = resultRelInfo->ri_junkFilter;
 
-	/*
-	 * es_result_relation_info must point to the currently active result
-	 * relation while we are within this ModifyTable node.  Even though
-	 * ModifyTable nodes can't be nested statically, they can be nested
-	 * dynamically (since our subplan could include a reference to a modifying
-	 * CTE).  So we have to save and restore the caller's value.
-	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
-	estate->es_result_relation_info = resultRelInfo;
-
 	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
 	 * for each row.
@@ -2111,7 +2100,6 @@ ExecModifyTable(PlanState *pstate)
 				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
 				junkfilter = resultRelInfo->ri_junkFilter;
-				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				/* Prepare to convert transition tuples from this child. */
@@ -2156,7 +2144,6 @@ ExecModifyTable(PlanState *pstate)
 			 */
 			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
 
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
 		}
 
@@ -2239,25 +2226,21 @@ ExecModifyTable(PlanState *pstate)
 		switch (operation)
 		{
 			case CMD_INSERT:
-				/* Prepare for tuple routing if needed. */
-				if (proute)
-					slot = ExecPrepareTupleRouting(node, estate, proute,
-												   resultRelInfo, slot);
-				slot = ExecInsert(node, slot, planSlot,
+				slot = ExecInsert(node, resultRelInfo, slot, planSlot,
 								  estate, node->canSetTag);
-				/* Revert ExecPrepareTupleRouting's state change. */
-				if (proute)
-					estate->es_result_relation_info = resultRelInfo;
 				break;
 			case CMD_UPDATE:
-				slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot,
-								  &node->mt_epqstate, estate, node->canSetTag);
+				slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+								  planSlot, &node->mt_epqstate, estate,
+								  node->canSetTag);
 				break;
 			case CMD_DELETE:
-				slot = ExecDelete(node, tupleid, oldtuple, planSlot,
-								  &node->mt_epqstate, estate,
-								  true, node->canSetTag,
-								  false /* changingPart */ , NULL, NULL);
+				slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
+								  planSlot, &node->mt_epqstate, estate,
+								  true,		/* processReturning */
+								  node->canSetTag,
+								  false,	/* changingPart */
+								  NULL, NULL);
 				break;
 			default:
 				elog(ERROR, "unknown operation");
@@ -2269,15 +2252,9 @@ ExecModifyTable(PlanState *pstate)
 		 * the work on next call.
 		 */
 		if (slot)
-		{
-			estate->es_result_relation_info = saved_resultRelInfo;
 			return slot;
-		}
 	}
 
-	/* Restore es_result_relation_info before exiting */
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/*
 	 * We're done, but fire AFTER STATEMENT triggers before exiting.
 	 */
@@ -2298,7 +2275,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *saved_resultRelInfo;
 	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
 	ListCell   *l,
@@ -2346,14 +2322,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.  Note we *must* set
-	 * estate->es_result_relation_info correctly while we initialize each
-	 * sub-plan; external modules such as FDWs may depend on that (see
-	 * contrib/postgres_fdw/postgres_fdw.c: postgresBeginDirectModify() as one
-	 * example).
+	 * indexes for insertion of new index entries.
 	 */
-	saved_resultRelInfo = estate->es_result_relation_info;
-
 	resultRelInfo = mtstate->resultRelInfo;
 	i = 0;
 	forboth(l, node->resultRelations, l1, node->plans)
@@ -2400,7 +2370,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			update_tuple_routing_needed = true;
 
 		/* Now init the plan for this result rel */
-		estate->es_result_relation_info = resultRelInfo;
 		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
 		mtstate->mt_scans[i] =
 			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
@@ -2424,8 +2393,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		i++;
 	}
 
-	estate->es_result_relation_info = saved_resultRelInfo;
-
 	/* Get the target relation */
 	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 8d5d9e05b3..4f32dc74c8 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1174,7 +1174,6 @@ apply_handle_insert(StringInfo s)
 										&TTSOpsVirtual);
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
-	estate->es_result_relation_info = resultRelInfo;
 
 	/* Input functions may need an active snapshot, so get one */
 	PushActiveSnapshot(GetTransactionSnapshot());
@@ -1214,7 +1213,7 @@ apply_handle_insert_internal(ResultRelInfo *relinfo,
 	ExecOpenIndices(relinfo, false);
 
 	/* Do the insert. */
-	ExecSimpleRelationInsert(estate, remoteslot);
+	ExecSimpleRelationInsert(relinfo, estate, remoteslot);
 
 	/* Cleanup. */
 	ExecCloseIndices(relinfo);
@@ -1300,7 +1299,6 @@ apply_handle_update(StringInfo s)
 										&TTSOpsVirtual);
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
-	estate->es_result_relation_info = resultRelInfo;
 
 	/*
 	 * Populate updatedCols so that per-column triggers can fire.  This could
@@ -1392,7 +1390,8 @@ apply_handle_update_internal(ResultRelInfo *relinfo,
 		EvalPlanQualSetSlot(&epqstate, remoteslot);
 
 		/* Do the actual update. */
-		ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
+		ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
+								 remoteslot);
 	}
 	else
 	{
@@ -1455,7 +1454,6 @@ apply_handle_delete(StringInfo s)
 										&TTSOpsVirtual);
 	resultRelInfo = makeNode(ResultRelInfo);
 	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
-	estate->es_result_relation_info = resultRelInfo;
 
 	PushActiveSnapshot(GetTransactionSnapshot());
 
@@ -1508,7 +1506,7 @@ apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate,
 		EvalPlanQualSetSlot(&epqstate, localslot);
 
 		/* Do the actual delete. */
-		ExecSimpleRelationDelete(estate, &epqstate, localslot);
+		ExecSimpleRelationDelete(relinfo, estate, &epqstate, localslot);
 	}
 	else
 	{
@@ -1616,7 +1614,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 	}
 	MemoryContextSwitchTo(oldctx);
 
-	estate->es_result_relation_info = partrelinfo;
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -1697,8 +1694,8 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					ExecOpenIndices(partrelinfo, false);
 
 					EvalPlanQualSetSlot(&epqstate, remoteslot_part);
-					ExecSimpleRelationUpdate(estate, &epqstate, localslot,
-											 remoteslot_part);
+					ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,
+											 localslot, remoteslot_part);
 					ExecCloseIndices(partrelinfo);
 					EvalPlanQualEnd(&epqstate);
 				}
@@ -1739,7 +1736,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					Assert(partrelinfo_new != partrelinfo);
 
 					/* DELETE old tuple found in the old partition. */
-					estate->es_result_relation_info = partrelinfo;
 					apply_handle_delete_internal(partrelinfo, estate,
 												 localslot,
 												 &relmapentry->remoterel);
@@ -1771,7 +1767,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 						slot_getallattrs(remoteslot);
 					}
 					MemoryContextSwitchTo(oldctx);
-					estate->es_result_relation_info = partrelinfo_new;
 					apply_handle_insert_internal(partrelinfo_new, estate,
 												 remoteslot_part);
 				}
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c283bf1454..2f221d7114 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -576,10 +576,14 @@ extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relIn
  */
 extern void ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative);
 extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
-extern List *ExecInsertIndexTuples(TupleTableSlot *slot, EState *estate, bool noDupErr,
+extern List *ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
+								   TupleTableSlot *slot, EState *estate,
+								   bool noDupErr,
 								   bool *specConflict, List *arbiterIndexes);
-extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-									  ItemPointer conflictTid, List *arbiterIndexes);
+extern bool ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo,
+						  TupleTableSlot *slot,
+						  EState *estate, ItemPointer conflictTid,
+						  List *arbiterIndexes);
 extern void check_exclusion_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
@@ -596,10 +600,13 @@ extern bool RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 									 TupleTableSlot *searchslot, TupleTableSlot *outslot);
 
-extern void ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot);
-extern void ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
+									 EState *estate, TupleTableSlot *slot);
+extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
-extern void ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
+extern void ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
+									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot);
 extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 4ec4ebdabc..2518fe4f64 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -15,7 +15,9 @@
 
 #include "nodes/execnodes.h"
 
-extern void ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype);
+extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
+						   EState *estate, TupleTableSlot *slot,
+						   CmdType cmdtype);
 
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index da1f3f269a..c1e41c77c7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -525,7 +525,6 @@ typedef struct EState
 	List	   *es_opened_result_relations; /* List of non-NULL entries in
 											 * es_result_relations in no
 											 * specific order */
-	ResultRelInfo *es_result_relation_info; /* currently active array elt */
 
 	PartitionDirectory es_partition_directory;	/* for PartitionDesc lookup */
 
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index eb9d45be5e..da50ee3b67 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -818,9 +818,7 @@ drop role regress_coldesc_role;
 drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index ffd4aacbc4..963faa1614 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -542,9 +542,7 @@ drop table inserttest3;
 drop table brtrigpartcon;
 drop function brtrigpartcon1trigf();
 
--- check that "do nothing" BR triggers work with tuple-routing (this checks
--- that estate->es_result_relation_info is appropriately set/reset for each
--- routed tuple)
+-- check that "do nothing" BR triggers work with tuple-routing
 create table donothingbrtrig_test (a int, b text) partition by list (a);
 create table donothingbrtrig_test1 (b text, a int);
 create table donothingbrtrig_test2 (c text, b text, a int);
-- 
2.20.1

#61Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#60)
1 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Wed, Oct 14, 2020 at 1:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 13/10/2020 19:09, Heikki Linnakangas wrote:

One little idea I had:

I think all FDWs that support direct modify will have to carry the
resultRelaton index or the ResultRelInfo pointer from BeginDirectModify
to IterateDirectModify in the FDW's private struct. It's not
complicated, but should we make life easier for FDWs by storing the
ResultRelInfo pointer in the ForeignScanState struct in the core code?
The doc now says:

The data that was actually inserted, updated or deleted must be
stored in the ri_projectReturning->pi_exprContext->ecxt_scantuple of
the target foreign table's ResultRelInfo obtained using the
information passed to BeginDirectModify. Return NULL if no more rows
are available.

That "ResultRelInfo obtained using the information passed to
BeginDirectModify" part is a pretty vague. We could expand it, but if we
stored the ResultRelInfo in the ForeignScanState, we could explain it
succinctly.

I tried that approach, see attached. Yeah, this feels better to me.

I like the idea of storing the ResultRelInfo in ForeignScanState, but
it would be better if we can document the fact that an FDW may not
reliably access until IterateDirectModify(). That's because, setting
it in ExecInitForeignScan() will mean *all* result relations must be
initialized during ExecInitModifyTable(), which defies my
lazy-ResultRelInfo-initiailization proposal. As to why why I'm
pushing that proposal, consider that when we'll get the ability to use
run-time pruning for UPDATE/DELETE with [1]https://commitfest.postgresql.org/30/2575/, initializing all result
relations before initializing the plan tree will mean most of those
ResultRelInfos will be unused, because run-time pruning that occurs
when the plan tree is initialized (and/or when it is executed) may
eliminate most but a few result relations.

I've attached a diff to v17-0001 to show one way of delaying setting
ForeignScanState.resultRelInfo.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: https://commitfest.postgresql.org/30/2575/

Attachments:

v17-0001-delta.patchapplication/octet-stream; name=v17-0001-delta.patchDownload
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 78facb8..e2b4b56 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2357,7 +2357,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = node->resultRelInfo->ri_RangeTableIndex;
+	rtindex = fsplan->resultRelation;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 0b20f94..ce1ff17 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -49,7 +49,18 @@ ForeignNext(ForeignScanState *node)
 	/* Call the Iterate function in short-lived context */
 	oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 	if (plan->operation != CMD_SELECT)
+	{
+		EState   *estate = node->ss.ps.state;
+
+		/*
+		 * For the FDW's convenience, look up the modification target
+		 * relation's ResultRelInfo.
+		 */
+		Assert(plan->resultRelation > 0);
+		if (node->resultRelInfo == NULL)
+			node->resultRelInfo = estate->es_result_relations[plan->resultRelation - 1];
 		slot = node->fdwroutine->IterateDirectModify(node);
+	}
 	else
 		slot = node->fdwroutine->IterateForeignScan(node);
 	MemoryContextSwitchTo(oldcontext);
@@ -215,13 +226,6 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
-	/*
-	 * For the FDW's convenience, look up the modification target relation's.
-	 * ResultRelInfo.
-	 */
-	if (node->resultRelation > 0)
-		scanstate->resultRelInfo = estate->es_result_relations[node->resultRelation - 1];
-
 	/* Initialize any outer plan. */
 	if (outerPlan(node))
 		outerPlanState(scanstate) =
#62Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#61)
Re: partition routing layering in nodeModifyTable.c

On 14/10/2020 09:44, Amit Langote wrote:

I like the idea of storing the ResultRelInfo in ForeignScanState, but
it would be better if we can document the fact that an FDW may not
reliably access until IterateDirectModify(). That's because, setting
it in ExecInitForeignScan() will mean *all* result relations must be
initialized during ExecInitModifyTable(), which defies my
lazy-ResultRelInfo-initiailization proposal. As to why why I'm
pushing that proposal, consider that when we'll get the ability to use
run-time pruning for UPDATE/DELETE with [1], initializing all result
relations before initializing the plan tree will mean most of those
ResultRelInfos will be unused, because run-time pruning that occurs
when the plan tree is initialized (and/or when it is executed) may
eliminate most but a few result relations.

I've attached a diff to v17-0001 to show one way of delaying setting
ForeignScanState.resultRelInfo.

The BeginDirectModify function does a lot of expensive things, like
opening a connection to the remote server. If we want to optimize
run-time pruning, I think we need to avoid calling BeginDirectModify for
pruned partitions altogether.

I pushed this without those delay-setting-resultRelInfo changes. But we
can revisit those changes with the run-time pruning optimization patch.

I'll continue with the last couple of patches in this thread.

- Heikki

#63Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#62)
Re: partition routing layering in nodeModifyTable.c

On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 14/10/2020 09:44, Amit Langote wrote:

I like the idea of storing the ResultRelInfo in ForeignScanState, but
it would be better if we can document the fact that an FDW may not
reliably access until IterateDirectModify(). That's because, setting
it in ExecInitForeignScan() will mean *all* result relations must be
initialized during ExecInitModifyTable(), which defies my
lazy-ResultRelInfo-initiailization proposal. As to why why I'm
pushing that proposal, consider that when we'll get the ability to use
run-time pruning for UPDATE/DELETE with [1], initializing all result
relations before initializing the plan tree will mean most of those
ResultRelInfos will be unused, because run-time pruning that occurs
when the plan tree is initialized (and/or when it is executed) may
eliminate most but a few result relations.

I've attached a diff to v17-0001 to show one way of delaying setting
ForeignScanState.resultRelInfo.

The BeginDirectModify function does a lot of expensive things, like
opening a connection to the remote server. If we want to optimize
run-time pruning, I think we need to avoid calling BeginDirectModify for
pruned partitions altogether.

Note that if foreign partitions get pruned during the so called "init"
run-time pruning (that is, in the ExecInitNode() phase),
BeginDirectModify() won't get called on them. Although, your concern
does apply if there is only going to be "exec" run-time pruning and no
"initial" pruning.

For me, the former case is a bit more interesting, as it occurs with
prepared statements using a generic plan (update parted_table set ...
where partkey = $1).

I pushed this without those delay-setting-resultRelInfo changes. But we
can revisit those changes with the run-time pruning optimization patch.

Sure, that makes sense.

I'll continue with the last couple of patches in this thread.

Okay, thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

#64Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#63)
Re: partition routing layering in nodeModifyTable.c

On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I'll continue with the last couple of patches in this thread.

I committed the move of the cross-partition logic to new
ExecCrossPartitionUpdate() function, with just minor comment editing and
pgindent. I left out the refactoring around the calls to AFTER ROW
INSERT/DELETE triggers. I stared at the change for a while, and wasn't
sure if I liked the patched or the unpatched new version better, so I
left it alone.

Looking at the last patch, "Revise child-to-root tuple conversion map
management", that's certainly an improvement. However, I find it
confusing that sometimes the mapping from child to root is in
relinfo->ri_ChildToRootMap, and sometimes in
relinfo->ri_PartitionInfo->pi_PartitionToRootMap. When is each of those
filled in? If both are set, is it well defined which one is initialized
first?

In general, I'm pretty confused by the initialization of
ri_PartitionInfo. Where is initialized, and when? In execnodes.h, the
definition of ResultRelInfo says:

/* info for partition tuple routing (NULL if not set up yet) */
struct PartitionRoutingInfo *ri_PartitionInfo;

That implies that the field is initialized lazily. But in
ExecFindPartition, we have this:

if (partidx == partdesc->boundinfo->default_index)
{
PartitionRoutingInfo *partrouteinfo = rri->ri_PartitionInfo;

/*
* The tuple must match the partition's layout for the constraint
* expression to be evaluated successfully. If the partition is
* sub-partitioned, that would already be the case due to the code
* above, but for a leaf partition the tuple still matches the
* parent's layout.
*
* Note that we have a map to convert from root to current
* partition, but not from immediate parent to current partition.
* So if we have to convert, do it from the root slot; if not, use
* the root slot as-is.
*/
if (partrouteinfo)
{
TupleConversionMap *map = partrouteinfo->pi_RootToPartitionMap;

if (map)
slot = execute_attr_map_slot(map->attrMap, rootslot,
partrouteinfo->pi_PartitionTupleSlot);
else
slot = rootslot;
}

ExecPartitionCheck(rri, slot, estate, true);
}

That check implies that it's not just lazily initialized, the code will
work differently if ri_PartitionInfo is set or not.

I think all this would be more clear if ri_PartitionInfo and
ri_ChildToRootMap were both truly lazily initialized, the first time
they're needed. And if we removed
ri_PartitionInfo->pi_PartitionToRootMap, and always used
ri_ChildToRootMap for it.

Maybe remove PartitionRoutingInfo struct altogether, and just move its
fields directly to ResultRelInfo.

- Heikki

#65Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#64)
1 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I'll continue with the last couple of patches in this thread.

I committed the move of the cross-partition logic to new
ExecCrossPartitionUpdate() function, with just minor comment editing and
pgindent. I left out the refactoring around the calls to AFTER ROW
INSERT/DELETE triggers. I stared at the change for a while, and wasn't
sure if I liked the patched or the unpatched new version better, so I
left it alone.

Okay, thanks for committing that.

Looking at the last patch, "Revise child-to-root tuple conversion map
management", that's certainly an improvement. However, I find it
confusing that sometimes the mapping from child to root is in
relinfo->ri_ChildToRootMap, and sometimes in
relinfo->ri_PartitionInfo->pi_PartitionToRootMap. When is each of those
filled in? If both are set, is it well defined which one is initialized
first?

It is ri_ChildToRootMap that is set first, because it's only set in
child UPDATE target relations which are initialized in
ExecInitModifyTable(), that is way before partition tuple routing
comes into picture.

ri_PartitionInfo and hence pi_PartitionToRootMap is set in tuple
routing target partition's ResultRelInfos, which are lazily
initialized when tuples land into them.

If a tuple routing target partition happens to be an UPDATE target
relation and we need to initialize the partition-to-root map, which
for a tuple routing target partition is to be saved in
pi_PartitionToRootMap, with the patch, we will try to reuse
ri_ChildToRootMap because it would already be initialized.

But as you say below, maybe we don't need to have two fields for the
same thing, which I agree with. Having only ri_ChildToRootMap as you
suggest below suffices.

In general, I'm pretty confused by the initialization of
ri_PartitionInfo. Where is initialized, and when? In execnodes.h, the
definition of ResultRelInfo says:

/* info for partition tuple routing (NULL if not set up yet) */
struct PartitionRoutingInfo *ri_PartitionInfo;

That implies that the field is initialized lazily. But in
ExecFindPartition, we have this:

if (partidx == partdesc->boundinfo->default_index)
{
PartitionRoutingInfo *partrouteinfo = rri->ri_PartitionInfo;

/*
* The tuple must match the partition's layout for the constraint
* expression to be evaluated successfully. If the partition is
* sub-partitioned, that would already be the case due to the code
* above, but for a leaf partition the tuple still matches the
* parent's layout.
*
* Note that we have a map to convert from root to current
* partition, but not from immediate parent to current partition.
* So if we have to convert, do it from the root slot; if not, use
* the root slot as-is.
*/
if (partrouteinfo)
{
TupleConversionMap *map = partrouteinfo->pi_RootToPartitionMap;

if (map)
slot = execute_attr_map_slot(map->attrMap, rootslot,
partrouteinfo->pi_PartitionTupleSlot);
else
slot = rootslot;
}

ExecPartitionCheck(rri, slot, estate, true);
}

That check implies that it's not just lazily initialized, the code will
work differently if ri_PartitionInfo is set or not.

I think all this would be more clear if ri_PartitionInfo and
ri_ChildToRootMap were both truly lazily initialized, the first time
they're needed.

So, we initialize these maps when we initialize a partition's
ResultRelInfo. I mean if the partition has a different tuple
descriptor than root, we know we are going to need to convert tuples
between them (in either direction), so we might as well initialize the
maps when the ResultRelInfo is built, which we do lazily for tuple
routing target relations at least. In that sense, at least
root-to-partition maps are initialized lazily, that is only when a
partition receives a tuple via routing.

Partition-to-root maps' initialization though is not always lazy,
because they are also needed by UPDATE target relations, whose
ResultRelInfo are initialized in ExecInitModifyTable(), which is not
lazy enough. That will change with my other patch though. :)

And if we removed
ri_PartitionInfo->pi_PartitionToRootMap, and always used
ri_ChildToRootMap for it.

Done in the attached.

Maybe remove PartitionRoutingInfo struct altogether, and just move its
fields directly to ResultRelInfo.

If we do that, we'll end up with 3 notations for the same thing across
releases: In v10 and v11, PartitionRoutingInfos members are saved in
arrays in ModifyTableState, totally detached from the partition
ResultRelInfos. In v12 (3f2393edef), we moved them into ResultRelInfo
but chose to add them into a sub-struct (PartitionRoutingInfo), which
in retrospect was not a great decision. Now if we pull them into
ResultRelInfo, we'll have invented the 3rd notation. Maybe that makes
things hard when back-patching bug-fixes?

Attached updated patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v18-0001-Revise-child-to-root-tuple-conversion-map-manage.patchapplication/octet-stream; name=v18-0001-Revise-child-to-root-tuple-conversion-map-manage.patchDownload
From 15b84595a7fd9e081bba8443cbc6116544440a2e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v18] Revise child-to-root tuple conversion map management

Transition tuple capture requires to convert child tuples to the
inheritance root table format because that's the format the
transition tuplestore stores tuple in.  For INSERTs into partitioned
tables, the conversion is handled by tuple routing code which
constructs the map for a given partition only if the partition is
targeted, but for UPDATE and DELETE, maps for all result relations
are made and stored in an array in ModifyTableState during
ExecInitModifyTable, which requires their ResultRelInfos to have been
already built. During execution, map for the currently active result
relation is set in TransitionCaptureState.tcs_map.

This commit removes TransitionCaptureMap.tcs_map in favor a new
map field in ResultRelInfo named ri_ChildToRootMap that is
initialized when the ResultRelInfo for a given result relation is.
This way is less confusing and less bug-prone than setting and
resetting tcs_map. Also, this will also allow us to delay creating
the map for a given result relation to when that relation is actually
processed during execution.
---
 src/backend/commands/copy.c            |  30 +----
 src/backend/commands/trigger.c         |   7 +-
 src/backend/executor/execPartition.c   |  23 ++--
 src/backend/executor/nodeModifyTable.c | 220 +++++++++------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/executor/execPartition.h   |   6 -
 src/include/nodes/execnodes.h          |  11 +-
 7 files changed, 89 insertions(+), 218 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 531bd7c..eb326de 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3105,32 +3105,14 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition which may change the tuple, we can
+			 * just remember the original unconverted tuple to avoid a
+			 * needless round trip conversion.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 3b4fbda..e76f5d4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4293,8 +4294,8 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * tables, then return NULL.
  *
  * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * should set tcs_original_insert_tuple as appropriate when dealing with child
+ * tables
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5389,7 +5390,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	if (row_trigger && transition_capture != NULL)
 	{
 		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleConversionMap *map = relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 33d2c6f..08f91e5 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -908,6 +908,15 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	}
 
 	/*
+	 * Also, if transition capture is required, store a map to convert tuples
+	 * from partition's rowtype to the root partition table's.
+	 */
+	if (mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture)
+		leaf_part_rri->ri_ChildToRootMap =
+			convert_tuples_by_name(RelationGetDescr(leaf_part_rri->ri_RelationDesc),
+								   RelationGetDescr(leaf_part_rri->ri_PartitionRoot));
+
+	/*
 	 * Since we've just initialized this ResultRelInfo, it's not in any list
 	 * attached to the estate as yet.  Add it, so that it can be found later.
 	 *
@@ -977,20 +986,6 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 		partrouteinfo->pi_PartitionTupleSlot = NULL;
 
 	/*
-	 * Also, if transition capture is required, store a map to convert tuples
-	 * from partition's rowtype to the root partition table's.
-	 */
-	if (mtstate &&
-		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
-	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot));
-	}
-	else
-		partrouteinfo->pi_PartitionToRootMap = NULL;
-
-	/*
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 0c055ed..01ce3ec 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -1087,9 +1084,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
 	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1164,38 +1159,26 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 
 	/*
 	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we should
-	 * convert the tuple into root's tuple descriptor, since ExecInsert()
-	 * starts the search from root.  The tuple conversion map list is in the
-	 * order of mtstate->resultRelInfo[], so to retrieve the one for this
-	 * resultRel, we need to know the position of the resultRel in
-	 * mtstate->resultRelInfo[].
+	 * convert the tuple into root's tuple descriptor if needed, since
+	 * ExecInsert() starts the search from root.
 	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture, so save
-	 * the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
+	/*
+	 * Reset the transition state that may possibly have been written
+	 * by INSERT.
+	 */
 	if (mtstate->mt_transition_capture)
-	{
 		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
 
 	/* We're done moving. */
 	return true;
@@ -1902,28 +1885,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1947,6 +1908,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
@@ -1960,37 +1922,15 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE triggers
+	 * on the partition which may change the tuple, we can just remember the
+	 * original unconverted tuple to avoid a needless round trip conversion.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -2007,58 +1947,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc);
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2154,17 +2042,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2334,6 +2211,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRelInfo;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2355,13 +2233,24 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		palloc(nplans * sizeof(ResultRelInfo));
 	mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
 
-	/* If modifying a partitioned table, initialize the root table info */
+	/*
+	 * Initialize the designated "root" result relation.  When modifying
+	 * partitioned tables, it's given by node->rootRelation, while in other
+	 * cases, it's the first relation in node->resultRelations.  We need to
+	 * initialize this one before any others, because
+	 * ExecSetupTransitionCaptureState() needs it.
+	 */
 	if (node->rootRelation > 0)
 	{
 		mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
 		ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
 							   node->rootRelation);
 	}
+	else
+		ExecInitResultRelation(estate, mtstate->resultRelInfo,
+							   linitial_int(node->resultRelations));
+
+	rootResultRelInfo = getTargetResultRelInfo(mtstate);
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2371,6 +2260,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2384,8 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 		subplan = (Plan *) lfirst(l1);
 
-		/* This opens the relation and fills ResultRelInfo. */
-		ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+		/*
+		 * This opens result relation and fills ResultRelInfo.
+		 * ("root" relation already opened.)
+		 */
+		if (resultRelInfo != rootResultRelInfo)
+			ExecInitResultRelation(estate, resultRelInfo, resultRelation);
 
 		/* Initialize the usesFdwDirectModify flag */
 		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
@@ -2441,12 +2341,28 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.  During INSERT, partition tuples to
+		 * store into the transition tuple store are converted using
+		 * PartitionToRoot map in the partition's PartitionRoutingInfo.
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRelInfo->ri_RelationDesc));
 		resultRelInfo++;
 		i++;
 	}
 
 	/* Get the target relation */
-	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
+	rel = rootResultRelInfo->ri_RelationDesc;
 
 	/*
 	 * If it's not a partitioned table after all, UPDATE tuple routing should
@@ -2465,26 +2381,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a40ddf5..e38d732 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -46,7 +46,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -66,14 +66,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 6d1b722..74c3991 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -37,12 +37,6 @@ typedef struct PartitionRoutingInfo
 	TupleConversionMap *pi_RootToPartitionMap;
 
 	/*
-	 * Map for converting tuples in partition format into the root partitioned
-	 * table format, or NULL if no conversion is required.
-	 */
-	TupleConversionMap *pi_PartitionToRootMap;
-
-	/*
 	 * Slot to store tuples in partition format, or NULL when no translation
 	 * is required between root and partition.
 	 */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b7e9e5d..ce83b81 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -488,6 +488,14 @@ typedef struct ResultRelInfo
 
 	/* for use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child result relation tuples to the format of the
+	 * table actually mentioned in the query (called "root").  Set only
+	 * if either transition tuple capture or update partition row
+	 * movement is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1174,9 +1182,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
1.8.3.1

#66Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#65)
Re: partition routing layering in nodeModifyTable.c

On 2020-Oct-16, Amit Langote wrote:

On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

And if we removed
ri_PartitionInfo->pi_PartitionToRootMap, and always used
ri_ChildToRootMap for it.

Done in the attached.

Hmm... Overall I like the simplification.

Maybe remove PartitionRoutingInfo struct altogether, and just move its
fields directly to ResultRelInfo.

If we do that, we'll end up with 3 notations for the same thing across
releases: In v10 and v11, PartitionRoutingInfos members are saved in
arrays in ModifyTableState, totally detached from the partition
ResultRelInfos. In v12 (3f2393edef), we moved them into ResultRelInfo
but chose to add them into a sub-struct (PartitionRoutingInfo), which
in retrospect was not a great decision. Now if we pull them into
ResultRelInfo, we'll have invented the 3rd notation. Maybe that makes
things hard when back-patching bug-fixes?

I don't necessarily agree that PartitionRoutingInfo was such a bad idea.
In fact I wonder if we shouldn't move *more* stuff into it
(ri_PartitionCheckExpr), and keep struct ResultRelInfo clean of
partitioning-related stuff (other than ri_PartitionInfo and
ri_PartitionRoot); there are plenty of ResultRelInfos that are not
partitions, so I think it makes sense to keep the split. I'm thinking
that the ChildToRootMap should continue to be in PartitionRoutingInfo.

Maybe what we need in order to keep the initialization "lazy enough" is
some inline functions that act as getters, initializing members of
PartitionRoutingInfo when first needed. (This would probably need
boolean flags, to distinguish "hasn't been set up yet" from "it is not
needed for this partition" for each member that requires it).

BTW it is curious that ExecInitRoutingInfo is called both in
ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo
for the partition is not found) *and* from ExecFindPartition again, when
the ResultRelInfo for the partition *is* found. Doesn't this mean that
ri_PartitionInfo is set up twice for the same partition?

#67Amit Langote
amitlangote09@gmail.com
In reply to: Alvaro Herrera (#66)
Re: partition routing layering in nodeModifyTable.c

On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2020-Oct-16, Amit Langote wrote:

On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

And if we removed
ri_PartitionInfo->pi_PartitionToRootMap, and always used
ri_ChildToRootMap for it.

Done in the attached.

Hmm... Overall I like the simplification.

Thank you for looking it over.

Maybe remove PartitionRoutingInfo struct altogether, and just move its
fields directly to ResultRelInfo.

If we do that, we'll end up with 3 notations for the same thing across
releases: In v10 and v11, PartitionRoutingInfos members are saved in
arrays in ModifyTableState, totally detached from the partition
ResultRelInfos. In v12 (3f2393edef), we moved them into ResultRelInfo
but chose to add them into a sub-struct (PartitionRoutingInfo), which
in retrospect was not a great decision. Now if we pull them into
ResultRelInfo, we'll have invented the 3rd notation. Maybe that makes
things hard when back-patching bug-fixes?

I don't necessarily agree that PartitionRoutingInfo was such a bad idea.
In fact I wonder if we shouldn't move *more* stuff into it
(ri_PartitionCheckExpr), and keep struct ResultRelInfo clean of
partitioning-related stuff (other than ri_PartitionInfo and
ri_PartitionRoot); there are plenty of ResultRelInfos that are not
partitions, so I think it makes sense to keep the split. I'm thinking
that the ChildToRootMap should continue to be in PartitionRoutingInfo.

Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing
information, because it's primarily meant to be used when inserting
*directly* into a partition, although it's true we do initialize it in
routing target partitions too in some cases.

Also, ChildToRootMap was introduced by the trigger transition table
project, not tuple routing. I think we misjudged this when we added
PartitionToRootMap to PartitionRoutingInfo, because it doesn't really
belong there. This patch fixes that by removing PartitionToRootMap.

RootToPartitionMap and the associated partition slot is the only piece
of extra information that is needed by tuple routing target relations.

Maybe what we need in order to keep the initialization "lazy enough" is
some inline functions that act as getters, initializing members of
PartitionRoutingInfo when first needed. (This would probably need
boolean flags, to distinguish "hasn't been set up yet" from "it is not
needed for this partition" for each member that requires it).

As I said in my previous email, I don't see how we can make
initializing the map any lazier than it already is. If a partition
has a different tuple descriptor than the root table, then we know for
sure that any tuples that are routed to it will need to be converted
from the root tuple format to its tuple format, so we might as well
build the map when the ResultRelInfo is built. If no tuple lands into
a partition, we would neither build its ResultRelInfo nor the map.
With the current arrangement, if the map field is NULL, it simply
means that the partition has the same tuple format as the root table.

BTW it is curious that ExecInitRoutingInfo is called both in
ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo
for the partition is not found) *and* from ExecFindPartition again, when
the ResultRelInfo for the partition *is* found. Doesn't this mean that
ri_PartitionInfo is set up twice for the same partition?

No. ExecFindPartition() directly calls ExecInitRoutingInfo() only for
reused update result relations, that too, only the first time a tuple
lands into such a partition. For the subsequent tuples that land into
the same partition, ExecFindPartition() will be able to find that
ResultRelInfo in the proute->partitions[] array. All ResultRelInfos
in that array are assumed to have been processed by
ExecInitRoutingInfo().

--
Amit Langote
EDB: http://www.enterprisedb.com

#68Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#67)
Re: partition routing layering in nodeModifyTable.c

On 2020-Oct-17, Amit Langote wrote:

Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing
information, because it's primarily meant to be used when inserting
*directly* into a partition, although it's true we do initialize it in
routing target partitions too in some cases.

Also, ChildToRootMap was introduced by the trigger transition table
project, not tuple routing. I think we misjudged this when we added
PartitionToRootMap to PartitionRoutingInfo, because it doesn't really
belong there. This patch fixes that by removing PartitionToRootMap.

RootToPartitionMap and the associated partition slot is the only piece
of extra information that is needed by tuple routing target relations.

Well, I was thinking on making the ri_PartitionInfo be about
partitioning in general, not just specifically for partition tuple
routing. Maybe Heikki is right that it may end up being simpler to
remove ri_PartitionInfo altogether. It'd just be a couple of additional
pointers in ResultRelInfo after all. (Remember that we wanted to get
rid of fields specific to only certain kinds of RTEs in RangeTblEntry
for example, to keep things cleanly separated, although that project
eventually found its demise for other reasons.)

As I said in my previous email, I don't see how we can make
initializing the map any lazier than it already is. If a partition
has a different tuple descriptor than the root table, then we know for
sure that any tuples that are routed to it will need to be converted
from the root tuple format to its tuple format, so we might as well
build the map when the ResultRelInfo is built. If no tuple lands into
a partition, we would neither build its ResultRelInfo nor the map.
With the current arrangement, if the map field is NULL, it simply
means that the partition has the same tuple format as the root table.

I see -- makes sense.

On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

BTW it is curious that ExecInitRoutingInfo is called both in
ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo
for the partition is not found) *and* from ExecFindPartition again, when
the ResultRelInfo for the partition *is* found. Doesn't this mean that
ri_PartitionInfo is set up twice for the same partition?

No. ExecFindPartition() directly calls ExecInitRoutingInfo() only for
reused update result relations, that too, only the first time a tuple
lands into such a partition. For the subsequent tuples that land into
the same partition, ExecFindPartition() will be able to find that
ResultRelInfo in the proute->partitions[] array. All ResultRelInfos
in that array are assumed to have been processed by
ExecInitRoutingInfo().

Doh, right, sorry, I was misreading the if/else maze there.

#69Amit Langote
amitlangote09@gmail.com
In reply to: Alvaro Herrera (#68)
2 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Sun, Oct 18, 2020 at 12:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2020-Oct-17, Amit Langote wrote:

Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing
information, because it's primarily meant to be used when inserting
*directly* into a partition, although it's true we do initialize it in
routing target partitions too in some cases.

Also, ChildToRootMap was introduced by the trigger transition table
project, not tuple routing. I think we misjudged this when we added
PartitionToRootMap to PartitionRoutingInfo, because it doesn't really
belong there. This patch fixes that by removing PartitionToRootMap.

RootToPartitionMap and the associated partition slot is the only piece
of extra information that is needed by tuple routing target relations.

Well, I was thinking on making the ri_PartitionInfo be about
partitioning in general, not just specifically for partition tuple
routing. Maybe Heikki is right that it may end up being simpler to
remove ri_PartitionInfo altogether. It'd just be a couple of additional
pointers in ResultRelInfo after all.

So that's 2 votes for removing PartitionRoutingInfo from the tree.
Okay, I have tried that in the attached 0002 patch. Also, I fixed
some comments in 0001 that still referenced PartitionToRootMap.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v19-0001-Revise-child-to-root-tuple-conversion-map-manage.patchapplication/octet-stream; name=v19-0001-Revise-child-to-root-tuple-conversion-map-manage.patchDownload
From 14de4960934c101e860df1249eeb3c8996617ee5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 30 Jul 2019 10:51:35 +0900
Subject: [PATCH v19 1/2] Revise child-to-root tuple conversion map management

Transition tuple capture requires to convert child tuples to the
inheritance root table format because that's the format the
transition tuplestore stores tuple in.  For INSERTs into partitioned
tables, the conversion is handled by tuple routing code which
constructs the map for a given partition only if the partition is
targeted, but for UPDATE and DELETE, maps for all result relations
are made and stored in an array in ModifyTableState during
ExecInitModifyTable, which requires their ResultRelInfos to have been
already built. During execution, map for the currently active result
relation is set in TransitionCaptureState.tcs_map.

This commit removes TransitionCaptureMap.tcs_map in favor a new
map field in ResultRelInfo named ri_ChildToRootMap that is
initialized when the ResultRelInfo for a given result relation is.
This way is less confusing and less bug-prone than setting and
resetting tcs_map. Also, this will also allow us to delay creating
the map for a given result relation to when that relation is actually
processed during execution.
---
 src/backend/commands/copy.c            |  30 +----
 src/backend/commands/trigger.c         |   7 +-
 src/backend/executor/execPartition.c   |  23 ++--
 src/backend/executor/nodeModifyTable.c | 221 +++++++++------------------------
 src/include/commands/trigger.h         |  10 +-
 src/include/executor/execPartition.h   |   6 -
 src/include/nodes/execnodes.h          |  11 +-
 7 files changed, 90 insertions(+), 218 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 531bd7c..eb326de 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3105,32 +3105,14 @@ CopyFrom(CopyState cstate)
 			}
 
 			/*
-			 * If we're capturing transition tuples, we might need to convert
-			 * from the partition rowtype to root rowtype.
+			 * If we're capturing transition tuples and there are no BEFORE
+			 * triggers on the partition which may change the tuple, we can
+			 * just remember the original unconverted tuple to avoid a
+			 * needless round trip conversion.
 			 */
 			if (cstate->transition_capture != NULL)
-			{
-				if (has_before_insert_row_trig)
-				{
-					/*
-					 * If there are any BEFORE triggers on the partition,
-					 * we'll have to be ready to convert their result back to
-					 * tuplestore format.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = NULL;
-					cstate->transition_capture->tcs_map =
-						resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
-				}
-				else
-				{
-					/*
-					 * Otherwise, just remember the original unconverted
-					 * tuple, to avoid a needless round trip conversion.
-					 */
-					cstate->transition_capture->tcs_original_insert_tuple = myslot;
-					cstate->transition_capture->tcs_map = NULL;
-				}
-			}
+				cstate->transition_capture->tcs_original_insert_tuple =
+					!has_before_insert_row_trig ? myslot : NULL;
 
 			/*
 			 * We might need to convert from the root rowtype to the partition
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 3b4fbda..e76f5d4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -35,6 +35,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "executor/execPartition.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
@@ -4293,8 +4294,8 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
  * tables, then return NULL.
  *
  * The resulting object can be passed to the ExecAR* functions.  The caller
- * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
- * with child tables.
+ * should set tcs_original_insert_tuple as appropriate when dealing with child
+ * tables
  *
  * Note that we copy the flags from a parent table into this struct (rather
  * than subsequently using the relation's TriggerDesc directly) so that we can
@@ -5389,7 +5390,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	if (row_trigger && transition_capture != NULL)
 	{
 		TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple;
-		TupleConversionMap *map = transition_capture->tcs_map;
+		TupleConversionMap *map = relinfo->ri_ChildToRootMap;
 		bool		delete_old_table = transition_capture->tcs_delete_old_table;
 		bool		update_old_table = transition_capture->tcs_update_old_table;
 		bool		update_new_table = transition_capture->tcs_update_new_table;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 33d2c6f..08f91e5 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -908,6 +908,15 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	}
 
 	/*
+	 * Also, if transition capture is required, store a map to convert tuples
+	 * from partition's rowtype to the root partition table's.
+	 */
+	if (mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture)
+		leaf_part_rri->ri_ChildToRootMap =
+			convert_tuples_by_name(RelationGetDescr(leaf_part_rri->ri_RelationDesc),
+								   RelationGetDescr(leaf_part_rri->ri_PartitionRoot));
+
+	/*
 	 * Since we've just initialized this ResultRelInfo, it's not in any list
 	 * attached to the estate as yet.  Add it, so that it can be found later.
 	 *
@@ -977,20 +986,6 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 		partrouteinfo->pi_PartitionTupleSlot = NULL;
 
 	/*
-	 * Also, if transition capture is required, store a map to convert tuples
-	 * from partition's rowtype to the root partition table's.
-	 */
-	if (mtstate &&
-		(mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture))
-	{
-		partrouteinfo->pi_PartitionToRootMap =
-			convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc),
-								   RelationGetDescr(partRelInfo->ri_PartitionRoot));
-	}
-	else
-		partrouteinfo->pi_PartitionToRootMap = NULL;
-
-	/*
 	 * If the partition is a foreign table, let the FDW init itself for
 	 * routing tuples to the partition.
 	 */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 0c055ed..5efbfb6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -73,9 +73,6 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
 											   TupleTableSlot *slot,
 											   ResultRelInfo **partRelInfo);
 static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node);
-static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
-static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
-												   int whichplan);
 
 /*
  * Verify that the tuples to be produced by INSERT or UPDATE match the
@@ -1087,9 +1084,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 {
 	EState	   *estate = mtstate->ps.state;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
-	int			map_index;
 	TupleConversionMap *tupconv_map;
-	TupleConversionMap *saved_tcs_map = NULL;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
 
@@ -1164,38 +1159,26 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 
 	/*
 	 * resultRelInfo is one of the per-subplan resultRelInfos.  So we should
-	 * convert the tuple into root's tuple descriptor, since ExecInsert()
-	 * starts the search from root.  The tuple conversion map list is in the
-	 * order of mtstate->resultRelInfo[], so to retrieve the one for this
-	 * resultRel, we need to know the position of the resultRel in
-	 * mtstate->resultRelInfo[].
+	 * convert the tuple into root's tuple descriptor if needed, since
+	 * ExecInsert() starts the search from root.
 	 */
-	map_index = resultRelInfo - mtstate->resultRelInfo;
-	Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-	tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+	tupconv_map = resultRelInfo->ri_ChildToRootMap;
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
 									 mtstate->mt_root_tuple_slot);
 
-	/*
-	 * ExecInsert() may scribble on mtstate->mt_transition_capture, so save
-	 * the currently active map.
-	 */
-	if (mtstate->mt_transition_capture)
-		saved_tcs_map = mtstate->mt_transition_capture->tcs_map;
-
 	/* Tuple routing starts from the root table. */
 	Assert(mtstate->rootResultRelInfo != NULL);
 	*inserted_tuple = ExecInsert(mtstate, mtstate->rootResultRelInfo, slot,
 								 planSlot, estate, canSetTag);
 
-	/* Clear the INSERT's tuple and restore the saved map. */
+	/*
+	 * Reset the transition state that may possibly have been written
+	 * by INSERT.
+	 */
 	if (mtstate->mt_transition_capture)
-	{
 		mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-		mtstate->mt_transition_capture->tcs_map = saved_tcs_map;
-	}
 
 	/* We're done moving. */
 	return true;
@@ -1902,28 +1885,6 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
 									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
 									   CMD_UPDATE);
-
-	/*
-	 * If we found that we need to collect transition tuples then we may also
-	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.  (We can share these maps
-	 * between the regular and ON CONFLICT cases.)
-	 */
-	if (mtstate->mt_transition_capture != NULL ||
-		mtstate->mt_oc_transition_capture != NULL)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-		/*
-		 * Install the conversion map for the first plan for UPDATE and DELETE
-		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time, so we need not update
-		 * mtstate->mt_oc_transition_capture here.)
-		 */
-		if (mtstate->mt_transition_capture && mtstate->operation != CMD_INSERT)
-			mtstate->mt_transition_capture->tcs_map =
-				tupconv_map_for_subplan(mtstate, 0);
-	}
 }
 
 /*
@@ -1947,6 +1908,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	ResultRelInfo *partrel;
 	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
+	bool		has_before_insert_row_trig;
 
 	/*
 	 * Lookup the target partition's ResultRelInfo.  If ExecFindPartition does
@@ -1960,37 +1922,15 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	Assert(partrouteinfo != NULL);
 
 	/*
-	 * If we're capturing transition tuples, we might need to convert from the
-	 * partition rowtype to root partitioned table's rowtype.
+	 * If we're capturing transition tuples and there are no BEFORE triggers
+	 * on the partition which may change the tuple, we can just remember the
+	 * original unconverted tuple to avoid a needless round trip conversion.
 	 */
+	has_before_insert_row_trig = (partrel->ri_TrigDesc &&
+								  partrel->ri_TrigDesc->trig_insert_before_row);
 	if (mtstate->mt_transition_capture != NULL)
-	{
-		if (partrel->ri_TrigDesc &&
-			partrel->ri_TrigDesc->trig_insert_before_row)
-		{
-			/*
-			 * If there are any BEFORE triggers on the partition, we'll have
-			 * to be ready to convert their result back to tuplestore format.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
-			mtstate->mt_transition_capture->tcs_map =
-				partrouteinfo->pi_PartitionToRootMap;
-		}
-		else
-		{
-			/*
-			 * Otherwise, just remember the original unconverted tuple, to
-			 * avoid a needless round trip conversion.
-			 */
-			mtstate->mt_transition_capture->tcs_original_insert_tuple = slot;
-			mtstate->mt_transition_capture->tcs_map = NULL;
-		}
-	}
-	if (mtstate->mt_oc_transition_capture != NULL)
-	{
-		mtstate->mt_oc_transition_capture->tcs_map =
-			partrouteinfo->pi_PartitionToRootMap;
-	}
+		mtstate->mt_transition_capture->tcs_original_insert_tuple =
+			!has_before_insert_row_trig ? slot : NULL;
 
 	/*
 	 * Convert the tuple, if necessary.
@@ -2007,58 +1947,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	return slot;
 }
 
-/*
- * Initialize the child-to-root tuple conversion map array for UPDATE subplans.
- *
- * This map array is required to convert the tuple from the subplan result rel
- * to the target table descriptor. This requirement arises for two independent
- * scenarios:
- * 1. For update-tuple-routing.
- * 2. For capturing tuples in transition tables.
- */
-static void
-ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate)
-{
-	ResultRelInfo *targetRelInfo = getTargetResultRelInfo(mtstate);
-	ResultRelInfo *resultRelInfos = mtstate->resultRelInfo;
-	TupleDesc	outdesc;
-	int			numResultRelInfos = mtstate->mt_nplans;
-	int			i;
-
-	/*
-	 * Build array of conversion maps from each child's TupleDesc to the one
-	 * used in the target relation.  The map pointers may be NULL when no
-	 * conversion is necessary, which is hopefully a common case.
-	 */
-
-	/* Get tuple descriptor of the target rel. */
-	outdesc = RelationGetDescr(targetRelInfo->ri_RelationDesc);
-
-	mtstate->mt_per_subplan_tupconv_maps = (TupleConversionMap **)
-		palloc(sizeof(TupleConversionMap *) * numResultRelInfos);
-
-	for (i = 0; i < numResultRelInfos; ++i)
-	{
-		mtstate->mt_per_subplan_tupconv_maps[i] =
-			convert_tuples_by_name(RelationGetDescr(resultRelInfos[i].ri_RelationDesc),
-								   outdesc);
-	}
-}
-
-/*
- * For a given subplan index, get the tuple conversion map.
- */
-static TupleConversionMap *
-tupconv_map_for_subplan(ModifyTableState *mtstate, int whichplan)
-{
-	/* If nobody else set the per-subplan array of maps, do so ourselves. */
-	if (mtstate->mt_per_subplan_tupconv_maps == NULL)
-		ExecSetupChildParentMapForSubplan(mtstate);
-
-	Assert(whichplan >= 0 && whichplan < mtstate->mt_nplans);
-	return mtstate->mt_per_subplan_tupconv_maps[whichplan];
-}
-
 /* ----------------------------------------------------------------
  *	   ExecModifyTable
  *
@@ -2154,17 +2042,6 @@ ExecModifyTable(PlanState *pstate)
 				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
-				/* Prepare to convert transition tuples from this child. */
-				if (node->mt_transition_capture != NULL)
-				{
-					node->mt_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
-				if (node->mt_oc_transition_capture != NULL)
-				{
-					node->mt_oc_transition_capture->tcs_map =
-						tupconv_map_for_subplan(node, node->mt_whichplan);
-				}
 				continue;
 			}
 			else
@@ -2334,6 +2211,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	int			i;
 	Relation	rel;
 	bool		update_tuple_routing_needed = node->partColsUpdated;
+	ResultRelInfo *rootResultRelInfo;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2355,13 +2233,24 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		palloc(nplans * sizeof(ResultRelInfo));
 	mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
 
-	/* If modifying a partitioned table, initialize the root table info */
+	/*
+	 * Initialize the designated "root" result relation.  When modifying
+	 * partitioned tables, it's given by node->rootRelation, while in other
+	 * cases, it's the first relation in node->resultRelations.  We need to
+	 * initialize this one before any others, because
+	 * ExecSetupTransitionCaptureState() needs it.
+	 */
 	if (node->rootRelation > 0)
 	{
 		mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
 		ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
 							   node->rootRelation);
 	}
+	else
+		ExecInitResultRelation(estate, mtstate->resultRelInfo,
+							   linitial_int(node->resultRelations));
+
+	rootResultRelInfo = getTargetResultRelInfo(mtstate);
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
 	mtstate->mt_nplans = nplans;
@@ -2371,6 +2260,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->fireBSTriggers = true;
 
 	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);
+
+	/*
 	 * call ExecInitNode on each of the plans to be executed and save the
 	 * results into the array "mt_plans".  This is also a convenient place to
 	 * verify that the proposed target relations are valid and open their
@@ -2384,8 +2280,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 		subplan = (Plan *) lfirst(l1);
 
-		/* This opens the relation and fills ResultRelInfo. */
-		ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+		/*
+		 * This opens result relation and fills ResultRelInfo.
+		 * ("root" relation already opened.)
+		 */
+		if (resultRelInfo != rootResultRelInfo)
+			ExecInitResultRelation(estate, resultRelInfo, resultRelation);
 
 		/* Initialize the usesFdwDirectModify flag */
 		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
@@ -2441,12 +2341,29 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 															 eflags);
 		}
 
+		/*
+		 * If needed, initialize a map to convert tuples in the child format
+		 * to the format of the table mentioned in the query (root relation).
+		 * It's needed for update tuple routing, because the routing starts
+		 * from the root relation.  It's also needed for capturing transition
+		 * tuples, because the transition tuple store can only store tuples
+		 * in the root table format.
+		 *
+		 * For INSERT, the map is only initialized for a given partition when
+		 * the partition itself is first initialized by ExecFindPartition().
+		 */
+		if (update_tuple_routing_needed ||
+			(mtstate->mt_transition_capture &&
+			 mtstate->operation != CMD_INSERT))
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
+									   RelationGetDescr(rootResultRelInfo->ri_RelationDesc));
 		resultRelInfo++;
 		i++;
 	}
 
 	/* Get the target relation */
-	rel = (getTargetResultRelInfo(mtstate))->ri_RelationDesc;
+	rel = rootResultRelInfo->ri_RelationDesc;
 
 	/*
 	 * If it's not a partitioned table after all, UPDATE tuple routing should
@@ -2465,26 +2382,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
-	 * Build state for collecting transition tuples.  This requires having a
-	 * valid trigger query context, so skip it in explain-only mode.
-	 */
-	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-		ExecSetupTransitionCaptureState(mtstate, estate);
-
-	/*
-	 * Construct mapping from each of the per-subplan partition attnos to the
-	 * root attno.  This is required when during update row movement the tuple
-	 * descriptor of a source partition does not match the root partitioned
-	 * table descriptor.  In such a case we need to convert tuples to the root
-	 * tuple descriptor, because the search for destination partition starts
-	 * from the root.  We'll also need a slot to store these converted tuples.
-	 * We can skip this setup if it's not a partition key update.
+	 * For update row movement we'll need a dedicated slot to store the
+	 * tuples that have been converted from partition format to the root
+	 * table format.
 	 */
 	if (update_tuple_routing_needed)
-	{
-		ExecSetupChildParentMapForSubplan(mtstate);
 		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-	}
 
 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a40ddf5..e38d732 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -46,7 +46,7 @@ typedef struct TriggerData
  * The state for capturing old and new tuples into transition tables for a
  * single ModifyTable node (or other operation source, e.g. copy.c).
  *
- * This is per-caller to avoid conflicts in setting tcs_map or
+ * This is per-caller to avoid conflicts in setting
  * tcs_original_insert_tuple.  Note, however, that the pointed-to
  * private data may be shared across multiple callers.
  */
@@ -66,14 +66,6 @@ typedef struct TransitionCaptureState
 	bool		tcs_insert_new_table;
 
 	/*
-	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
-	 * new and old tuples from a child table's format to the format of the
-	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.  The caller must store the conversion map here if so.
-	 */
-	TupleConversionMap *tcs_map;
-
-	/*
 	 * For INSERT and COPY, it would be wasteful to convert tuples from child
 	 * format to parent format after they have already been converted in the
 	 * opposite direction during routing.  In that case we bypass conversion
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 6d1b722..74c3991 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -37,12 +37,6 @@ typedef struct PartitionRoutingInfo
 	TupleConversionMap *pi_RootToPartitionMap;
 
 	/*
-	 * Map for converting tuples in partition format into the root partitioned
-	 * table format, or NULL if no conversion is required.
-	 */
-	TupleConversionMap *pi_PartitionToRootMap;
-
-	/*
 	 * Slot to store tuples in partition format, or NULL when no translation
 	 * is required between root and partition.
 	 */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b7e9e5d..ce83b81 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -488,6 +488,14 @@ typedef struct ResultRelInfo
 
 	/* for use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
+
+	/*
+	 * Map to convert child result relation tuples to the format of the
+	 * table actually mentioned in the query (called "root").  Set only
+	 * if either transition tuple capture or update partition row
+	 * movement is active.
+	 */
+	TupleConversionMap *ri_ChildToRootMap;
 } ResultRelInfo;
 
 /* ----------------
@@ -1174,9 +1182,6 @@ typedef struct ModifyTableState
 
 	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	struct TransitionCaptureState *mt_oc_transition_capture;
-
-	/* Per plan map for tuple conversion from child to root */
-	TupleConversionMap **mt_per_subplan_tupconv_maps;
 } ModifyTableState;
 
 /* ----------------
-- 
1.8.3.1

v19-0002-Remove-PartitionRoutingInfo-struct.patchapplication/octet-stream; name=v19-0002-Remove-PartitionRoutingInfo-struct.patchDownload
From 7d321c1627b9ce16357a35704fee030b3ecbdedd Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 19 Oct 2020 12:32:09 +0900
Subject: [PATCH v19 2/2] Remove PartitionRoutingInfo struct

The extra indirection neeeded to access its members via its enclosing
ResultRelInfo seems pointless.
---
 src/backend/commands/copy.c              |  4 ++--
 src/backend/executor/execMain.c          |  3 ++-
 src/backend/executor/execPartition.c     | 32 ++++++++++++++++----------------
 src/backend/executor/nodeModifyTable.c   |  7 ++-----
 src/backend/replication/logical/worker.c | 11 ++++-------
 src/include/executor/execPartition.h     | 21 ---------------------
 src/include/nodes/execnodes.h            | 15 ++++++++++-----
 7 files changed, 36 insertions(+), 57 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index eb326de..0c3672e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3118,7 +3118,7 @@ CopyFrom(CopyState cstate)
 			 * We might need to convert from the root rowtype to the partition
 			 * rowtype.
 			 */
-			map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
+			map = resultRelInfo->ri_RootToPartitionMap;
 			if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
 			{
 				/* non batch insert */
@@ -3126,7 +3126,7 @@ CopyFrom(CopyState cstate)
 				{
 					TupleTableSlot *new_slot;
 
-					new_slot = resultRelInfo->ri_PartitionInfo->pi_PartitionTupleSlot;
+					new_slot = resultRelInfo->ri_PartitionTupleSlot;
 					myslot = execute_attr_map_slot(map->attrMap, myslot, new_slot);
 				}
 			}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 293f53d..eef3516 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1243,7 +1243,8 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_TrigOldSlot = NULL;
 	resultRelInfo->ri_TrigNewSlot = NULL;
 	resultRelInfo->ri_PartitionRoot = partition_root;
-	resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
+	resultRelInfo->ri_RootToPartitionMap = NULL; /* set by ExecInitPartitionInfo */
+	resultRelInfo->ri_PartitionTupleSlot = NULL; /* ditto */
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 08f91e5..e72487f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -261,7 +261,7 @@ ExecSetupPartitionTupleRouting(EState *estate, ModifyTableState *mtstate,
  * If the partition's ResultRelInfo does not yet exist in 'proute' then we set
  * one up or reuse one from mtstate's resultRelInfo array.  When reusing a
  * ResultRelInfo from the mtstate we verify that the relation is a valid
- * target for INSERTs and then set up a PartitionRoutingInfo for it.
+ * target for INSERTs and initialize tuple routing information.
  *
  * rootResultRelInfo is the relation named in the query.
  *
@@ -307,6 +307,7 @@ ExecFindPartition(ModifyTableState *mtstate,
 	while (dispatch != NULL)
 	{
 		int			partidx = -1;
+		bool		is_leaf = false;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -348,6 +349,8 @@ ExecFindPartition(ModifyTableState *mtstate,
 
 		if (partdesc->is_leaf[partidx])
 		{
+			is_leaf = true;
+
 			/*
 			 * We've reached the leaf -- hurray, we're done.  Look to see if
 			 * we've already got a ResultRelInfo for this partition.
@@ -382,7 +385,10 @@ ExecFindPartition(ModifyTableState *mtstate,
 						/* Verify this ResultRelInfo allows INSERTs */
 						CheckValidResultRel(rri, CMD_INSERT);
 
-						/* Set up the PartitionRoutingInfo for it */
+						/*
+						 * Initialize information needed to insert this and
+						 * subsequent tuples routed to this partition.
+						 */
 						ExecInitRoutingInfo(mtstate, estate, proute, dispatch,
 											rri, partidx);
 					}
@@ -464,8 +470,6 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 */
 		if (partidx == partdesc->boundinfo->default_index)
 		{
-			PartitionRoutingInfo *partrouteinfo = rri->ri_PartitionInfo;
-
 			/*
 			 * The tuple must match the partition's layout for the constraint
 			 * expression to be evaluated successfully.  If the partition is
@@ -478,13 +482,13 @@ ExecFindPartition(ModifyTableState *mtstate,
 			 * So if we have to convert, do it from the root slot; if not, use
 			 * the root slot as-is.
 			 */
-			if (partrouteinfo)
+			if (is_leaf)
 			{
-				TupleConversionMap *map = partrouteinfo->pi_RootToPartitionMap;
+				TupleConversionMap *map = rri->ri_RootToPartitionMap;
 
 				if (map)
 					slot = execute_attr_map_slot(map->attrMap, rootslot,
-												 partrouteinfo->pi_PartitionTupleSlot);
+												 rri->ri_PartitionTupleSlot);
 				else
 					slot = rootslot;
 			}
@@ -788,7 +792,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 		{
 			TupleConversionMap *map;
 
-			map = leaf_part_rri->ri_PartitionInfo->pi_RootToPartitionMap;
+			map = leaf_part_rri->ri_RootToPartitionMap;
 
 			Assert(node->onConflictSet != NIL);
 			Assert(rootResultRelInfo->ri_onConflict != NULL);
@@ -949,18 +953,15 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 					int partidx)
 {
 	MemoryContext oldcxt;
-	PartitionRoutingInfo *partrouteinfo;
 	int			rri_index;
 
 	oldcxt = MemoryContextSwitchTo(proute->memcxt);
 
-	partrouteinfo = palloc(sizeof(PartitionRoutingInfo));
-
 	/*
 	 * Set up a tuple conversion map to convert a tuple routed to the
 	 * partition from the parent's type to the partition's.
 	 */
-	partrouteinfo->pi_RootToPartitionMap =
+	partRelInfo->ri_RootToPartitionMap =
 		convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_PartitionRoot),
 							   RelationGetDescr(partRelInfo->ri_RelationDesc));
 
@@ -970,7 +971,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 	 * for various operations that are applied to tuples after routing, such
 	 * as checking constraints.
 	 */
-	if (partrouteinfo->pi_RootToPartitionMap != NULL)
+	if (partRelInfo->ri_RootToPartitionMap != NULL)
 	{
 		Relation	partrel = partRelInfo->ri_RelationDesc;
 
@@ -979,11 +980,11 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 		 * partition's TupleDesc; TupleDesc reference will be released at the
 		 * end of the command.
 		 */
-		partrouteinfo->pi_PartitionTupleSlot =
+		partRelInfo->ri_PartitionTupleSlot =
 			table_slot_create(partrel, &estate->es_tupleTable);
 	}
 	else
-		partrouteinfo->pi_PartitionTupleSlot = NULL;
+		partRelInfo->ri_PartitionTupleSlot = NULL;
 
 	/*
 	 * If the partition is a foreign table, let the FDW init itself for
@@ -993,7 +994,6 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
 		partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
 
-	partRelInfo->ri_PartitionInfo = partrouteinfo;
 	partRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5efbfb6..4a26a07 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1906,7 +1906,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 						ResultRelInfo **partRelInfo)
 {
 	ResultRelInfo *partrel;
-	PartitionRoutingInfo *partrouteinfo;
 	TupleConversionMap *map;
 	bool		has_before_insert_row_trig;
 
@@ -1918,8 +1917,6 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	 * UPDATE to another partition becomes a DELETE+INSERT.
 	 */
 	partrel = ExecFindPartition(mtstate, targetRelInfo, proute, slot, estate);
-	partrouteinfo = partrel->ri_PartitionInfo;
-	Assert(partrouteinfo != NULL);
 
 	/*
 	 * If we're capturing transition tuples and there are no BEFORE triggers
@@ -1935,10 +1932,10 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
 	/*
 	 * Convert the tuple, if necessary.
 	 */
-	map = partrouteinfo->pi_RootToPartitionMap;
+	map = partrel->ri_RootToPartitionMap;
 	if (map != NULL)
 	{
-		TupleTableSlot *new_slot = partrouteinfo->pi_PartitionTupleSlot;
+		TupleTableSlot *new_slot = partrel->ri_PartitionTupleSlot;
 
 		slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
 	}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b8e297c..3a5b733 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1572,7 +1572,6 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 	ResultRelInfo *partrelinfo;
 	Relation	partrel;
 	TupleTableSlot *remoteslot_part;
-	PartitionRoutingInfo *partinfo;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
 
@@ -1599,11 +1598,10 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 	 * partition's rowtype. Convert if needed or just copy, using a dedicated
 	 * slot to store the tuple in any case.
 	 */
-	partinfo = partrelinfo->ri_PartitionInfo;
-	remoteslot_part = partinfo->pi_PartitionTupleSlot;
+	remoteslot_part = partrelinfo->ri_PartitionTupleSlot;
 	if (remoteslot_part == NULL)
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
-	map = partinfo->pi_RootToPartitionMap;
+	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
 		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
 												remoteslot_part);
@@ -1748,12 +1746,11 @@ apply_handle_tuple_routing(ResultRelInfo *relinfo,
 					 */
 					oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 					partrel = partrelinfo_new->ri_RelationDesc;
-					partinfo = partrelinfo_new->ri_PartitionInfo;
-					remoteslot_part = partinfo->pi_PartitionTupleSlot;
+					remoteslot_part = partrelinfo_new->ri_PartitionTupleSlot;
 					if (remoteslot_part == NULL)
 						remoteslot_part = table_slot_create(partrel,
 															&estate->es_tupleTable);
-					map = partinfo->pi_RootToPartitionMap;
+					map = partrelinfo_new->ri_RootToPartitionMap;
 					if (map != NULL)
 					{
 						remoteslot_part = execute_attr_map_slot(map->attrMap,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 74c3991..473c4cd 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -23,27 +23,6 @@ typedef struct PartitionDispatchData *PartitionDispatch;
 typedef struct PartitionTupleRouting PartitionTupleRouting;
 
 /*
- * PartitionRoutingInfo
- *
- * Additional result relation information specific to routing tuples to a
- * table partition.
- */
-typedef struct PartitionRoutingInfo
-{
-	/*
-	 * Map for converting tuples in root partitioned table format into
-	 * partition format, or NULL if no conversion is required.
-	 */
-	TupleConversionMap *pi_RootToPartitionMap;
-
-	/*
-	 * Slot to store tuples in partition format, or NULL when no translation
-	 * is required between root and partition.
-	 */
-	TupleTableSlot *pi_PartitionTupleSlot;
-} PartitionRoutingInfo;
-
-/*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
  * of partitions.  For a multilevel partitioned table, we have one of these
  * for the topmost partition plus one for each non-leaf child partition.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ce83b81..1d9265c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -33,7 +33,6 @@
 #include "utils/tuplestore.h"
 
 struct PlanState;				/* forward references in this file */
-struct PartitionRoutingInfo;
 struct ParallelHashJoinState;
 struct ExecRowMark;
 struct ExprState;
@@ -480,11 +479,17 @@ typedef struct ResultRelInfo
 	/* partition check expression state (NULL if not set up yet) */
 	ExprState  *ri_PartitionCheckExpr;
 
-	/* relation descriptor for partitioned table's root, if any */
+	/*
+	 * information needed by tuple routing target relations
+	 *
+	 * PartitionRoot gives the target relation mentioned in the query.
+	 * RootToPartitionMap and PartitionTupleSlot, initialized by
+	 * ExecInitRoutingInfo, are non-NULL if partition has a different tuple
+	 * format than the root table.
+	 */
 	Relation	ri_PartitionRoot;
-
-	/* info for partition tuple routing (NULL if not set up yet) */
-	struct PartitionRoutingInfo *ri_PartitionInfo;
+	TupleConversionMap *ri_RootToPartitionMap;
+	TupleTableSlot *ri_PartitionTupleSlot;
 
 	/* for use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
-- 
1.8.3.1

#70Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#69)
Re: partition routing layering in nodeModifyTable.c

On 19/10/2020 07:54, Amit Langote wrote:

On Sun, Oct 18, 2020 at 12:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2020-Oct-17, Amit Langote wrote:

Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing
information, because it's primarily meant to be used when inserting
*directly* into a partition, although it's true we do initialize it in
routing target partitions too in some cases.

Also, ChildToRootMap was introduced by the trigger transition table
project, not tuple routing. I think we misjudged this when we added
PartitionToRootMap to PartitionRoutingInfo, because it doesn't really
belong there. This patch fixes that by removing PartitionToRootMap.

RootToPartitionMap and the associated partition slot is the only piece
of extra information that is needed by tuple routing target relations.

Well, I was thinking on making the ri_PartitionInfo be about
partitioning in general, not just specifically for partition tuple
routing. Maybe Heikki is right that it may end up being simpler to
remove ri_PartitionInfo altogether. It'd just be a couple of additional
pointers in ResultRelInfo after all.

So that's 2 votes for removing PartitionRoutingInfo from the tree.
Okay, I have tried that in the attached 0002 patch. Also, I fixed
some comments in 0001 that still referenced PartitionToRootMap.

Pushed, with minor comment changes.

I also noticed that the way the getTargetResultRelInfo() helper function
was used, was a bit messy. It was used when firing AFTER STATEMENT
triggers, but for some reason the code to fire BEFORE STATEMENT triggers
didn't use it but duplicated the logic instead. I made that a bit
simpler, by always setting the rootResultRelInfo field in
ExecInitModifyTable(), making the getTargetResultRelInfo() function
unnecessary.

Thanks!

- Heikki

#71Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Alvaro Herrera (#68)
Re: partition routing layering in nodeModifyTable.c

On 17/10/2020 18:54, Alvaro Herrera wrote:

On 2020-Oct-17, Amit Langote wrote:

As I said in my previous email, I don't see how we can make
initializing the map any lazier than it already is. If a partition
has a different tuple descriptor than the root table, then we know for
sure that any tuples that are routed to it will need to be converted
from the root tuple format to its tuple format, so we might as well
build the map when the ResultRelInfo is built. If no tuple lands into
a partition, we would neither build its ResultRelInfo nor the map.
With the current arrangement, if the map field is NULL, it simply
means that the partition has the same tuple format as the root table.

I see -- makes sense.

It's probably true that there's no performance gain from initializing
them more lazily. But the reasoning and logic around the initialization
is complicated. After tracing through various path through the code, I'm
convinced enough that it's correct, or at least these patches didn't
break it, but I still think some sort of lazy initialization on first
use would make it more readable. Or perhaps there's some other
refactoring we could do.

Perhaps we should have a magic TupleConversionMap value to mean "no
conversion required". NULL could then mean "not initialized yet".

On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

BTW it is curious that ExecInitRoutingInfo is called both in
ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo
for the partition is not found) *and* from ExecFindPartition again, when
the ResultRelInfo for the partition *is* found. Doesn't this mean that
ri_PartitionInfo is set up twice for the same partition?

No. ExecFindPartition() directly calls ExecInitRoutingInfo() only for
reused update result relations, that too, only the first time a tuple
lands into such a partition. For the subsequent tuples that land into
the same partition, ExecFindPartition() will be able to find that
ResultRelInfo in the proute->partitions[] array. All ResultRelInfos
in that array are assumed to have been processed by
ExecInitRoutingInfo().

Doh, right, sorry, I was misreading the if/else maze there.

I think that demonstrates my point that the logic is hard to follow :-).

- Heikki

#72Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#70)
Re: partition routing layering in nodeModifyTable.c

On Mon, Oct 19, 2020 at 8:48 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 19/10/2020 07:54, Amit Langote wrote:

On Sun, Oct 18, 2020 at 12:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Well, I was thinking on making the ri_PartitionInfo be about
partitioning in general, not just specifically for partition tuple
routing. Maybe Heikki is right that it may end up being simpler to
remove ri_PartitionInfo altogether. It'd just be a couple of additional
pointers in ResultRelInfo after all.

So that's 2 votes for removing PartitionRoutingInfo from the tree.
Okay, I have tried that in the attached 0002 patch. Also, I fixed
some comments in 0001 that still referenced PartitionToRootMap.

Pushed, with minor comment changes.

Thank you.

I also noticed that the way the getTargetResultRelInfo() helper function
was used, was a bit messy. It was used when firing AFTER STATEMENT
triggers, but for some reason the code to fire BEFORE STATEMENT triggers
didn't use it but duplicated the logic instead. I made that a bit
simpler, by always setting the rootResultRelInfo field in
ExecInitModifyTable(), making the getTargetResultRelInfo() function
unnecessary.

Good, I was mildly annoyed by that function too.

--
Amit Langote
EDB: http://www.enterprisedb.com

#73Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#71)
Re: partition routing layering in nodeModifyTable.c

On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 17/10/2020 18:54, Alvaro Herrera wrote:

On 2020-Oct-17, Amit Langote wrote:

As I said in my previous email, I don't see how we can make
initializing the map any lazier than it already is. If a partition
has a different tuple descriptor than the root table, then we know for
sure that any tuples that are routed to it will need to be converted
from the root tuple format to its tuple format, so we might as well
build the map when the ResultRelInfo is built. If no tuple lands into
a partition, we would neither build its ResultRelInfo nor the map.
With the current arrangement, if the map field is NULL, it simply
means that the partition has the same tuple format as the root table.

I see -- makes sense.

It's probably true that there's no performance gain from initializing
them more lazily. But the reasoning and logic around the initialization
is complicated. After tracing through various path through the code, I'm
convinced enough that it's correct, or at least these patches didn't
break it, but I still think some sort of lazy initialization on first
use would make it more readable. Or perhaps there's some other
refactoring we could do.

So the other patch I have mentioned is about lazy initialization of
the ResultRelInfo itself, not the individual fields, but maybe with
enough refactoring we can get the latter too.

Currently, ExecInitModifyTable() performs ExecInitResultRelation() for
all relations in ModifyTable.resultRelations, which sets most but not
all ResultRelInfo fields (whatever InitResultRelInfo() can set),
followed by initializing some other fields based on the contents of
the ModifyTable plan. My patch moves those two steps into a function
ExecBuildResultRelation() which is called lazily during
ExecModifyTable() for a given result relation on the first tuple
produced by that relation's plan. Actually, there's a "getter" named
ExecGetResultRelation() which first consults es_result_relations[rti -
1] for the requested relation and if it's NULL then calls
ExecBuildResultRelation().

Would you mind taking a look at that as a starting point? I am
thinking there's enough relevant discussion here that I should post
the rebased version of that patch here.

Perhaps we should have a magic TupleConversionMap value to mean "no
conversion required". NULL could then mean "not initialized yet".

Perhaps, a TupleConversionMap with its attrMap set to NULL means "no
conversion required".

--
Amit Langote
EDB: http://www.enterprisedb.com

#74Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#73)
3 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Tue, Oct 20, 2020 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It's probably true that there's no performance gain from initializing
them more lazily. But the reasoning and logic around the initialization
is complicated. After tracing through various path through the code, I'm
convinced enough that it's correct, or at least these patches didn't
break it, but I still think some sort of lazy initialization on first
use would make it more readable. Or perhaps there's some other
refactoring we could do.

So the other patch I have mentioned is about lazy initialization of
the ResultRelInfo itself, not the individual fields, but maybe with
enough refactoring we can get the latter too.

So, I tried implementing a lazy-initialization-on-first-access
approach for both the ResultRelInfos themselves and some of the
individual fields of ResultRelInfo that don't need to be set right
away. You can see the end result in the attached 0003 patch. This
slims down ExecInitModifyTable() significantly, both in terms of code
footprint and the amount of work that it does.

0001 fixes a thinko of the recent commit 1375422c782 that I discovered
when debugging a problem with 0003.

0002 is for something I have mentioned upthread.
ForeignScanState.resultRelInfo cannot be set in ExecInit* stage as
it's done now, because with 0003, child ResultRelInfos will not have
been added to es_result_relations during that stage.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v1-0002-Initialize-ForeignScanState.resultRelInfo-in-Exec.patchapplication/octet-stream; name=v1-0002-Initialize-ForeignScanState.resultRelInfo-in-Exec.patchDownload
From b4933ee484d71bd90c96cc0cafabbc2529be4ea3 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 19 Oct 2020 17:17:33 +0900
Subject: [PATCH v1 2/3] Initialize ForeignScanState.resultRelInfo in
 ExecForeignScan()

An upcoming patch will make the initialization of the individual
elements of es_result_relations[] to be delayed until the ModifyTable
node, from where the result relations come from, actually begins
executing.  To allow that, initialize ForeignScanState.resultRelInfo
in ExecForeignScan instead of in ExecInitForeignScan(), because
child result relations' ResultRelInfos are now not available during
the latter.
---
 contrib/postgres_fdw/postgres_fdw.c    |  2 +-
 src/backend/executor/nodeForeignscan.c | 19 ++++++++++++-------
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaac..523ffac 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2357,7 +2357,7 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
 	 * Identify which user to do the remote access as.  This should match what
 	 * ExecCheckRTEPerms() does.
 	 */
-	rtindex = node->resultRelInfo->ri_RangeTableIndex;
+	rtindex = fsplan->resultRelation;
 	rte = exec_rt_fetch(rtindex, estate);
 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 0b20f94..503ef8b 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -111,6 +111,18 @@ static TupleTableSlot *
 ExecForeignScan(PlanState *pstate)
 {
 	ForeignScanState *node = castNode(ForeignScanState, pstate);
+	ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
+
+	/*
+	 * For the FDW's convenience, look up the modification target relation's
+	 * ResultRelInfo.
+	 */
+	if (plan->resultRelation > 0 && node->resultRelInfo == NULL)
+	{
+		EState   *estate = node->ss.ps.state;
+
+		node->resultRelInfo = estate->es_result_relations[plan->resultRelation - 1];
+	}
 
 	return ExecScan(&node->ss,
 					(ExecScanAccessMtd) ForeignNext,
@@ -215,13 +227,6 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
-	/*
-	 * For the FDW's convenience, look up the modification target relation's.
-	 * ResultRelInfo.
-	 */
-	if (node->resultRelation > 0)
-		scanstate->resultRelInfo = estate->es_result_relations[node->resultRelation - 1];
-
 	/* Initialize any outer plan. */
 	if (outerPlan(node))
 		outerPlanState(scanstate) =
-- 
1.8.3.1

v1-0001-Fix-a-thinko-of-1375422c782.patchapplication/octet-stream; name=v1-0001-Fix-a-thinko-of-1375422c782.patchDownload
From f4b779e7d0e4a329073d6588decd5c8ea53e5f7d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 22 Oct 2020 14:45:47 +0900
Subject: [PATCH v1 1/3] Fix a thinko of 1375422c782

EvalPlanQualStart() was mistakenly modified to reset parent EState's
es_result_relations.
---
 src/backend/executor/execMain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index aea0479..7179f58 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2693,7 +2693,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
 	 */
-	parentestate->es_result_relations = NULL;
+	rcestate->es_result_relations = NULL;
 	/* es_trig_target_relations must NOT be copied */
 	rcestate->es_top_eflags = parentestate->es_top_eflags;
 	rcestate->es_instrument = parentestate->es_instrument;
-- 
1.8.3.1

v1-0003-Initialize-result-relation-information-lazily.patchapplication/octet-stream; name=v1-0003-Initialize-result-relation-information-lazily.patchDownload
From a38c0349471c9b338528a6296a6d73b2242dc0b9 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 2 Jul 2020 10:51:45 +0900
Subject: [PATCH v1 3/3] Initialize result relation information lazily

Currently, all elements of the ModifyTableState.resultRelInfo array
are initialized in ExecInitModifyTable(), possibly wastefully,
because only one or a handful of potentially many result relations
appearing in that array may actually have any rows to update or
delete.

This commit refactors all places that directly access the individual
elements of the array to instead go through a lazy-initialization-on-
access function, such that only the elements corresponding to result
relations that are actually operated on are initialized.

Also, extend this lazy initialization approach to some of the
individual fields of ResultRelInfo such that even for the result
relations that are initialized, those fields are only initialized on
first access.  While no performance improvement is to be expected
there, it can lead to a simpler initialization logic of the
ResultRelInfo itself, because the conditions for whether a given
field is needed or not tends to look confusing.  One side-effect
of this is that any "SubPlans" referenced in the expressions of
those fields are also lazily initialized and hence changes the
output of EXPLAIN (without ANALYZE) in some regression tests.

Another unrelated regression test output change is in update.out,
which is caused by deferred initialization of PartitionTupleRouting
for update tuple routing.  Whereas previously a partition constraint
violation error would be reported as occurring on a leaf partition,
due to the aforementioned change, it is now shown as occurring on
the query's target relation, which is valid because it is really
that table's (which is a sub-partitioned table) partition constraint
that is actually violated in the affected test cases.
---
 src/backend/commands/explain.c                |    6 +-
 src/backend/executor/execMain.c               |    7 +
 src/backend/executor/execPartition.c          |  116 ++-
 src/backend/executor/nodeModifyTable.c        | 1067 ++++++++++++++-----------
 src/include/executor/nodeModifyTable.h        |    1 +
 src/include/nodes/execnodes.h                 |    2 +
 src/test/regress/expected/insert_conflict.out |    5 +-
 src/test/regress/expected/updatable_views.out |   18 +-
 src/test/regress/expected/update.out          |   12 +-
 9 files changed, 707 insertions(+), 527 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 41317f1..e51d0f2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,7 +18,9 @@
 #include "commands/createas.h"
 #include "commands/defrem.h"
 #include "commands/prepare.h"
+#include "executor/executor.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
 #include "nodes/extensible.h"
@@ -3678,14 +3680,14 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 	/* Should we explicitly label target relations? */
 	labeltargets = (mtstate->mt_nplans > 1 ||
 					(mtstate->mt_nplans == 1 &&
-					 mtstate->resultRelInfo->ri_RangeTableIndex != node->nominalRelation));
+					 ExecGetResultRelation(mtstate, 0)->ri_RangeTableIndex != node->nominalRelation));
 
 	if (labeltargets)
 		ExplainOpenGroup("Target Tables", "Target Tables", false, es);
 
 	for (j = 0; j < mtstate->mt_nplans; j++)
 	{
-		ResultRelInfo *resultRelInfo = mtstate->resultRelInfo + j;
+		ResultRelInfo *resultRelInfo = ExecGetResultRelation(mtstate, j);
 		FdwRoutine *fdwroutine = resultRelInfo->ri_FdwRoutine;
 
 		if (labeltargets)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7179f58..f484e6a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1236,6 +1236,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_ConstraintExprs = NULL;
 	resultRelInfo->ri_GeneratedExprs = NULL;
 	resultRelInfo->ri_junkFilter = NULL;
+	resultRelInfo->ri_junkFilterValid = false;;
 	resultRelInfo->ri_projectReturning = NULL;
 	resultRelInfo->ri_onConflictArbiterIndexes = NIL;
 	resultRelInfo->ri_onConflict = NULL;
@@ -1247,6 +1248,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_ChildToRootMapValid = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
@@ -1440,6 +1442,11 @@ ExecCloseResultRelations(EState *estate)
 		ResultRelInfo *resultRelInfo = lfirst(l);
 
 		ExecCloseIndices(resultRelInfo);
+		if (!resultRelInfo->ri_usesFdwDirectModify &&
+			resultRelInfo->ri_FdwRoutine != NULL &&
+			resultRelInfo->ri_FdwRoutine->EndForeignModify != NULL)
+			resultRelInfo->ri_FdwRoutine->EndForeignModify(estate,
+														   resultRelInfo);
 	}
 
 	/* Close any relations that have been opened by ExecGetTriggerResultRel(). */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 86594bd..8265db2 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -20,6 +20,7 @@
 #include "catalog/pg_type.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
+#include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -157,10 +158,11 @@ typedef struct PartitionDispatchData
 typedef struct SubplanResultRelHashElem
 {
 	Oid			relid;			/* hash key -- must be first */
-	ResultRelInfo *rri;
+	int			index;
 } SubplanResultRelHashElem;
 
 
+static ResultRelInfo *ExecLookupUpdateResultRelByOid(ModifyTableState *mtstate, Oid reloid);
 static void ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 										   PartitionTupleRouting *proute);
 static ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
@@ -218,7 +220,6 @@ ExecSetupPartitionTupleRouting(EState *estate, ModifyTableState *mtstate,
 							   Relation rel)
 {
 	PartitionTupleRouting *proute;
-	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Here we attempt to expend as little effort as possible in setting up
@@ -240,17 +241,6 @@ ExecSetupPartitionTupleRouting(EState *estate, ModifyTableState *mtstate,
 	ExecInitPartitionDispatchInfo(estate, proute, RelationGetRelid(rel),
 								  NULL, 0);
 
-	/*
-	 * If performing an UPDATE with tuple routing, we can reuse partition
-	 * sub-plan result rels.  We build a hash table to map the OIDs of
-	 * partitions present in mtstate->resultRelInfo to their ResultRelInfos.
-	 * Every time a tuple is routed to a partition that we've yet to set the
-	 * ResultRelInfo for, before we go to the trouble of making one, we check
-	 * for a pre-made one in the hash table.
-	 */
-	if (node && node->operation == CMD_UPDATE)
-		ExecHashSubPlanResultRelsByOid(mtstate, proute);
-
 	return proute;
 }
 
@@ -350,7 +340,6 @@ ExecFindPartition(ModifyTableState *mtstate,
 		is_leaf = partdesc->is_leaf[partidx];
 		if (is_leaf)
 		{
-
 			/*
 			 * We've reached the leaf -- hurray, we're done.  Look to see if
 			 * we've already got a ResultRelInfo for this partition.
@@ -367,20 +356,19 @@ ExecFindPartition(ModifyTableState *mtstate,
 
 				/*
 				 * We have not yet set up a ResultRelInfo for this partition,
-				 * but if we have a subplan hash table, we might have one
-				 * there.  If not, we'll have to create one.
+				 * but if the partition is also an UPDATE result relation, use
+				 * the one in mtstate->resultRelInfo instead of creating a new
+				 * one with ExecInitPartitionInfo().
 				 */
-				if (proute->subplan_resultrel_htab)
+				if (mtstate->operation == CMD_UPDATE && mtstate->ps.plan)
 				{
 					Oid			partoid = partdesc->oids[partidx];
-					SubplanResultRelHashElem *elem;
 
-					elem = hash_search(proute->subplan_resultrel_htab,
-									   &partoid, HASH_FIND, NULL);
-					if (elem)
+					rri = ExecLookupUpdateResultRelByOid(mtstate, partoid);
+
+					if (rri)
 					{
 						found = true;
-						rri = elem->rri;
 
 						/* Verify this ResultRelInfo allows INSERTs */
 						CheckValidResultRel(rri, CMD_INSERT);
@@ -508,6 +496,41 @@ ExecFindPartition(ModifyTableState *mtstate,
 }
 
 /*
+ * ExecLookupUpdateResultRelByOid
+ * 		If the table with given OID appears in the list of result relations
+ * 		to be updated by the given ModifyTable node, return its
+ * 		ResultRelInfo, NULL otherwise.
+ */
+static ResultRelInfo *
+ExecLookupUpdateResultRelByOid(ModifyTableState *mtstate, Oid reloid)
+{
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	SubplanResultRelHashElem *elem;
+	ResultRelInfo *result = NULL;
+
+	Assert(proute != NULL);
+	if (proute->subplan_resultrel_htab == NULL)
+		ExecHashSubPlanResultRelsByOid(mtstate, proute);
+
+	elem = hash_search(proute->subplan_resultrel_htab, &reloid,
+					   HASH_FIND, NULL);
+
+	if (elem)
+	{
+		result = ExecGetResultRelation(mtstate, elem->index);
+
+		/*
+		 * This is required in order to convert the partition's tuple to be
+		 * compatible with the root partitioned table's tuple descriptor. When
+		 * generating the per-subplan result rels, this was not set.
+		 */
+		result->ri_PartitionRoot = proute->partition_root;
+	}
+
+	return result;
+}
+
+/*
  * ExecHashSubPlanResultRelsByOid
  *		Build a hash table to allow fast lookups of subplan ResultRelInfos by
  *		partition Oid.  We also populate the subplan ResultRelInfo with an
@@ -517,9 +540,13 @@ static void
 ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 							   PartitionTupleRouting *proute)
 {
+	EState	   *estate = mtstate->ps.state;
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	ListCell   *l;
 	HASHCTL		ctl;
 	HTAB	   *htab;
 	int			i;
+	MemoryContext oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
 
 	memset(&ctl, 0, sizeof(ctl));
 	ctl.keysize = sizeof(Oid);
@@ -530,26 +557,26 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 					   &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	proute->subplan_resultrel_htab = htab;
 
-	/* Hash all subplans by their Oid */
-	for (i = 0; i < mtstate->mt_nplans; i++)
+	/*
+	 * Map each result relation's OID to its ordinal position in
+	 * plan->resultRelations.
+	 */
+	i = 0;
+	foreach(l, plan->resultRelations)
 	{
-		ResultRelInfo *rri = &mtstate->resultRelInfo[i];
+		Index		rti = lfirst_int(l);
+		RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+		Oid			partoid = rte->relid;
 		bool		found;
-		Oid			partoid = RelationGetRelid(rri->ri_RelationDesc);
 		SubplanResultRelHashElem *elem;
 
 		elem = (SubplanResultRelHashElem *)
 			hash_search(htab, &partoid, HASH_ENTER, &found);
 		Assert(!found);
-		elem->rri = rri;
-
-		/*
-		 * This is required in order to convert the partition's tuple to be
-		 * compatible with the root partitioned table's tuple descriptor. When
-		 * generating the per-subplan result rels, this was not set.
-		 */
-		rri->ri_PartitionRoot = proute->partition_root;
+		elem->index = i++;
 	}
+
+	MemoryContextSwitchTo(oldcxt);
 }
 
 /*
@@ -570,7 +597,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	Relation	rootrel = rootResultRelInfo->ri_RelationDesc,
 				partrel;
-	Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+	Relation	firstResultRel = NULL;
+	Index		firstVarno = 0;
 	ResultRelInfo *leaf_part_rri;
 	MemoryContext oldcxt;
 	AttrMap    *part_attmap = NULL;
@@ -606,19 +634,26 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 						(node != NULL &&
 						 node->onConflictAction != ONCONFLICT_NONE));
 
+	if (node)
+	{
+		ResultRelInfo *firstResultRelInfo = ExecGetResultRelation(mtstate, 0);
+
+		firstResultRel = firstResultRelInfo->ri_RelationDesc;
+		firstVarno = firstResultRelInfo->ri_RangeTableIndex;
+	}
+
 	/*
 	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
 	 * didn't build the withCheckOptionList for partitions within the planner,
 	 * but simple translation of varattnos will suffice.  This only occurs for
 	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
-	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 * find a result rel to reuse.
 	 */
 	if (node && node->withCheckOptionLists != NIL)
 	{
 		List	   *wcoList;
 		List	   *wcoExprs = NIL;
 		ListCell   *ll;
-		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
 
 		/*
 		 * In the case of INSERT on a partitioned table, there is only one
@@ -682,7 +717,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 		TupleTableSlot *slot;
 		ExprContext *econtext;
 		List	   *returningList;
-		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
 
 		/* See the comment above for WCO lists. */
 		Assert((node->operation == CMD_INSERT &&
@@ -741,7 +775,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	 */
 	if (node && node->onConflictAction != ONCONFLICT_NONE)
 	{
-		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
 		TupleDesc	partrelDesc = RelationGetDescr(partrel);
 		ExprContext *econtext = mtstate->ps.ps_ExprContext;
 		ListCell   *lc;
@@ -916,9 +949,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	 * from partition's rowtype to the root partition table's.
 	 */
 	if (mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture)
+	{
 		leaf_part_rri->ri_ChildToRootMap =
 			convert_tuples_by_name(RelationGetDescr(leaf_part_rri->ri_RelationDesc),
 								   RelationGetDescr(leaf_part_rri->ri_PartitionRoot));
+		/* First time creating the map for this result relation. */
+		Assert(!leaf_part_rri->ri_ChildToRootMapValid);
+		leaf_part_rri->ri_ChildToRootMapValid = true;
+	}
 
 	/*
 	 * Since we've just initialized this ResultRelInfo, it's not in any list
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index a33423c..145b49b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -144,10 +144,41 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
 }
 
 /*
+ * Initialize ri_returningList and ri_projectReturning for RETURNING
+ */
+static void
+InitReturningProjection(ModifyTableState *mtstate,
+						ResultRelInfo *resultRelInfo)
+{
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	int		whichrel = resultRelInfo - mtstate->resultRelInfo;
+	List	*rlist;
+	TupleTableSlot *slot;
+	ExprContext *econtext;
+
+	Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans);
+	rlist = (List *) list_nth(plan->returningLists, whichrel);
+	slot = mtstate->ps.ps_ResultTupleSlot;
+	Assert(slot != NULL);
+	econtext = mtstate->ps.ps_ExprContext;
+	Assert(econtext != NULL);
+
+	/* Must not do this a second time! */
+	Assert(resultRelInfo->ri_returningList == NIL &&
+		   resultRelInfo->ri_projectReturning == NULL);
+	resultRelInfo->ri_returningList = rlist;
+	resultRelInfo->ri_projectReturning =
+		ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+								resultRelInfo->ri_RelationDesc->rd_att);
+}
+
+/*
  * ExecProcessReturning --- evaluate a RETURNING list
  *
  * resultRelInfo: current result rel
- * tupleSlot: slot holding tuple actually inserted/updated/deleted
+ * tupleSlot: slot holding tuple actually inserted/updated or NULL for delete
+ * tupleid, oldtuple: when called for delete, one of these can be used to
+ * fill the RETURNING slot for the relation
  * planSlot: slot holding tuple returned by top subplan node
  *
  * Note: If tupleSlot is NULL, the FDW should have already provided econtext's
@@ -156,12 +187,50 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
  * Returns a slot holding the result tuple
  */
 static TupleTableSlot *
-ExecProcessReturning(ResultRelInfo *resultRelInfo,
+ExecProcessReturning(ModifyTableState *mtstate,
+					 ResultRelInfo *resultRelInfo,
 					 TupleTableSlot *tupleSlot,
+					 ItemPointer tupleid, HeapTuple oldtuple,
 					 TupleTableSlot *planSlot)
 {
-	ProjectionInfo *projectReturning = resultRelInfo->ri_projectReturning;
-	ExprContext *econtext = projectReturning->pi_exprContext;
+	EState *estate = mtstate->ps.state;
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	ProjectionInfo *projectReturning;
+	ExprContext *econtext;
+	bool		clearTupleSlot = false;
+	TupleTableSlot *result;
+
+	if (plan->returningLists == NIL)
+		return NULL;
+
+	if (resultRelInfo->ri_returningList == NIL)
+		InitReturningProjection(mtstate, resultRelInfo);
+
+	projectReturning = resultRelInfo->ri_projectReturning;
+	econtext = projectReturning->pi_exprContext;
+
+	/*
+	 * Fill tupleSlot with provided tuple or after fetching the tuple with
+	 * provided tupleid.
+	 */
+	if (tupleSlot == NULL && resultRelInfo->ri_FdwRoutine == NULL)
+	{
+		/* FDW must have provided a slot containing the deleted row */
+		Assert(resultRelInfo->ri_FdwRoutine == NULL);
+		tupleSlot = ExecGetReturningSlot(estate, resultRelInfo);
+		if (oldtuple != NULL)
+		{
+			ExecForceStoreHeapTuple(oldtuple, tupleSlot, false);
+		}
+		else
+		{
+			if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+											   tupleid, SnapshotAny,
+											   tupleSlot))
+				elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
+		}
+		clearTupleSlot = true;
+	}
 
 	/* Make tuple and any needed join variables available to ExecProject */
 	if (tupleSlot)
@@ -176,7 +245,362 @@ ExecProcessReturning(ResultRelInfo *resultRelInfo,
 		RelationGetRelid(resultRelInfo->ri_RelationDesc);
 
 	/* Compute the RETURNING expressions */
-	return ExecProject(projectReturning);
+	result = ExecProject(projectReturning);
+
+	if (clearTupleSlot)
+		ExecClearTuple(tupleSlot);
+
+	return result;
+}
+
+/*
+ * Perform WITH CHECK OPTIONS check, if any.
+ */
+static void
+ExecProcessWithCheckOptions(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
+							TupleTableSlot *slot, WCOKind wco_kind)
+{
+	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+	EState *estate = mtstate->ps.state;
+
+	if (node->withCheckOptionLists == NIL)
+		return;
+
+	/* Initilize expression state if not already done. */
+	if (resultRelInfo->ri_WithCheckOptions == NIL)
+	{
+		int		whichrel = resultRelInfo - mtstate->resultRelInfo;
+		List   *wcoList;
+		List   *wcoExprs = NIL;
+		ListCell   *ll;
+
+		Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans);
+		wcoList = (List *) list_nth(node->withCheckOptionLists, whichrel);
+		foreach(ll, wcoList)
+		{
+			WithCheckOption *wco = (WithCheckOption *) lfirst(ll);
+			ExprState  *wcoExpr = ExecInitQual((List *) wco->qual,
+											   &mtstate->ps);
+
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		resultRelInfo->ri_WithCheckOptions = wcoList;
+		resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * ExecWithCheckOptions() will skip any WCOs which are not of the kind
+	 * we are looking for at this point.
+	 */
+	ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);
+}
+
+/*
+ * Return the list of arbiter indexes to be used for ON CONFLICT processing
+ * on given result relation, fetching it from the plan if not already done.
+ */
+static List *
+GetOnConflictArbiterIndexes(ModifyTableState *mtstate,
+							ResultRelInfo *resultRelInfo)
+{
+	if (resultRelInfo->ri_onConflictArbiterIndexes == NIL)
+	{
+		ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+
+		resultRelInfo->ri_onConflictArbiterIndexes = plan->arbiterIndexes;
+	}
+
+	return resultRelInfo->ri_onConflictArbiterIndexes;
+}
+
+/*
+ * Initialize target list, projection and qual for ON CONFLICT DO UPDATE.
+ */
+static void
+InitOnConflictState(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	EState	   *estate = mtstate->ps.state;
+	TupleDesc	relationDesc;
+	TupleDesc	tupDesc;
+	ExprContext *econtext;
+
+	/* insert may only have one relation, inheritance is not expanded */
+	Assert(mtstate->mt_nplans == 1);
+
+	/* already exists if created by RETURNING processing above */
+	if (mtstate->ps.ps_ExprContext == NULL)
+		ExecAssignExprContext(estate, &mtstate->ps);
+
+	econtext = mtstate->ps.ps_ExprContext;
+	relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
+
+	/* create state for DO UPDATE SET operation */
+	resultRelInfo->ri_onConflict = makeNode(OnConflictSetState);
+
+	/* initialize slot for the existing tuple */
+	resultRelInfo->ri_onConflict->oc_Existing =
+		table_slot_create(resultRelInfo->ri_RelationDesc,
+						  &mtstate->ps.state->es_tupleTable);
+
+	/*
+	 * Create the tuple slot for the UPDATE SET projection. We want a slot
+	 * of the table's type here, because the slot will be used to insert
+	 * into the table, and for RETURNING processing - which may access
+	 * system attributes.
+	 */
+	tupDesc = ExecTypeFromTL((List *) plan->onConflictSet);
+	resultRelInfo->ri_onConflict->oc_ProjSlot =
+		ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
+							   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
+
+	/* build UPDATE SET projection state */
+	resultRelInfo->ri_onConflict->oc_ProjInfo =
+		ExecBuildProjectionInfo(plan->onConflictSet, econtext,
+								resultRelInfo->ri_onConflict->oc_ProjSlot,
+								&mtstate->ps,
+								relationDesc);
+
+	/* initialize state to evaluate the WHERE clause, if any */
+	if (plan->onConflictWhere)
+	{
+		ExprState  *qualexpr;
+
+		qualexpr = ExecInitQual((List *) plan->onConflictWhere,
+								&mtstate->ps);
+		resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
+	}
+}
+
+/*
+ * Initialize ri_junkFilter if needed.
+ *
+ * INSERT queries need a filter if there are any junk attrs in the tlist.
+ * UPDATE and DELETE always need a filter, since there's always at least one
+ * junk attribute present --- no need to look first.  Typically, this will be
+ * a 'ctid' or 'wholerow' attribute, but in the case of a foreign data wrapper
+ * it might be a set of junk attributes sufficient to identify the remote row.
+ *
+ * If there are multiple result relations, each one needs its own junk filter.
+ * Note multiple rels are only possible for UPDATE/DELETE, so we can't be
+ * fooled by some needing a filter and some not.
+ *
+ * This is also a convenient place to verify that the output of an INSERT or
+ * UPDATE matches the target table(s).
+ */
+static void
+InitJunkFilter(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo)
+{
+	EState	   *estate = mtstate->ps.state;
+	CmdType		operation = mtstate->operation;
+	Plan	   *subplan = mtstate->mt_plans[mtstate->mt_whichplan]->plan;
+	ListCell   *l;
+	bool		junk_filter_needed = false;
+
+	switch (operation)
+	{
+		case CMD_INSERT:
+			foreach(l, subplan->targetlist)
+			{
+				TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+				if (tle->resjunk)
+				{
+					junk_filter_needed = true;
+					break;
+				}
+			}
+			break;
+		case CMD_UPDATE:
+		case CMD_DELETE:
+			junk_filter_needed = true;
+			break;
+		default:
+			elog(ERROR, "unknown operation");
+			break;
+	}
+
+	if (junk_filter_needed)
+	{
+		JunkFilter *j;
+		TupleTableSlot *junkresslot;
+
+		junkresslot =
+			ExecInitExtraTupleSlot(estate, NULL,
+								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
+		j = ExecInitJunkFilter(subplan->targetlist, junkresslot);
+
+		if (operation == CMD_UPDATE || operation == CMD_DELETE)
+		{
+			/* For UPDATE/DELETE, find the appropriate junk attr now */
+			char		relkind;
+
+			relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
+			if (relkind == RELKIND_RELATION ||
+				relkind == RELKIND_MATVIEW ||
+				relkind == RELKIND_PARTITIONED_TABLE)
+			{
+				j->jf_junkAttNo = ExecFindJunkAttribute(j, "ctid");
+				if (!AttributeNumberIsValid(j->jf_junkAttNo))
+					elog(ERROR, "could not find junk ctid column");
+			}
+			else if (relkind == RELKIND_FOREIGN_TABLE)
+			{
+				/*
+				 * When there is a row-level trigger, there should be
+				 * a wholerow attribute.
+				 */
+				j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
+			}
+			else
+			{
+				j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
+				if (!AttributeNumberIsValid(j->jf_junkAttNo))
+					elog(ERROR, "could not find junk wholerow column");
+			}
+		}
+
+		/* Must not do this a second time! */
+		Assert(resultRelInfo->ri_junkFilter == NULL);
+		resultRelInfo->ri_junkFilter = j;
+		resultRelInfo->ri_junkFilterValid = true;
+	}
+
+	if (operation == CMD_INSERT || operation == CMD_UPDATE)
+		ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
+							subplan->targetlist);
+}
+
+/*
+ * Returns the map needed to convert given child relation's tuples to the
+ * root relation's format, possibly initializing if not already done.
+ */
+static TupleConversionMap *
+GetChildToRootMap(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo)
+{
+	if (!resultRelInfo->ri_ChildToRootMapValid)
+	{
+		Relation	relation = resultRelInfo->ri_RelationDesc;
+		Relation	targetRel = mtstate->rootResultRelInfo->ri_RelationDesc;
+
+		resultRelInfo->ri_ChildToRootMap =
+			convert_tuples_by_name(RelationGetDescr(relation),
+								   RelationGetDescr(targetRel));
+		resultRelInfo->ri_ChildToRootMapValid = true;
+	}
+
+	return resultRelInfo->ri_ChildToRootMap;
+}
+
+/*
+ * ExecGetResultRelation
+ *		Returns mtstate->resultRelInfo[whichrel], possibly initializing it
+ *		if being requested for the first time
+ */
+ResultRelInfo *
+ExecGetResultRelation(ModifyTableState *mtstate, int whichrel)
+{
+	EState	   *estate = mtstate->ps.state;
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	Index		rti;
+	ResultRelInfo *resultRelInfo = NULL;
+
+	/*
+	 * Initialized result relations are added to es_result_relations, so check
+	 * there first.  Remember that es_result_relations is indexed by RT index,
+	 * so fetch the relation's RT index from the plan.
+	 */
+	Assert(plan != NULL);
+	Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans);
+	rti = list_nth_int(plan->resultRelations, whichrel);
+	if (estate->es_result_relations)
+		resultRelInfo = estate->es_result_relations[rti - 1];
+
+	/* Nope, so initialize. */
+	if (resultRelInfo == NULL)
+	{
+		int		eflags = estate->es_top_eflags;
+		CmdType	operation = mtstate->operation;
+		MemoryContext oldcxt;
+
+		Assert(whichrel >= 0);
+		resultRelInfo = &mtstate->resultRelInfo[whichrel];
+
+		/* Things built here have to last for the query duration. */
+		oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+
+		/*
+		 * Perform InitResultRelInfo() and save the pointer in
+		 * es_result_relations.
+		 */
+		ExecInitResultRelation(estate, resultRelInfo, rti);
+
+		/*
+		 * A few more initializations that are not handled by
+		 * InitResultRelInfo() follow.
+		 */
+
+		/*
+		 * Verify result relation is a valid target for the current operation.
+		 */
+		CheckValidResultRel(resultRelInfo, operation);
+
+		/* Initialize the usesFdwDirectModify flag */
+		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(whichrel,
+															  plan->fdwDirectModifyPlans);
+
+		/* Also let FDWs init themselves for foreign-table result rels */
+		if (!resultRelInfo->ri_usesFdwDirectModify &&
+			resultRelInfo->ri_FdwRoutine != NULL &&
+			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
+		{
+			List	   *fdw_private = (List *) list_nth(plan->fdwPrivLists,
+														whichrel);
+
+			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
+															 resultRelInfo,
+															 fdw_private,
+															 whichrel,
+															 eflags);
+		}
+
+		/*
+		 * If transition tuples will be captured, initialize a map to convert
+		 * child tuples into the format of the table mentioned in the query
+		 * (root relation), because the transition tuple store can only store
+		 * tuples in the root table format.  However for INSERT, the map is
+		 * only initialized for a given partition when the partition itself is
+		 * first initialized by ExecFindPartition.  Also, this map is also
+		 * needed if an UPDATE ends up having to move tuples across
+		 * partitions, because in that case the child tuple to be moved first
+		 * needs to be converted into the root table's format.  In that case,
+		 * we use GetChildToRootMap() to either create one from scratch if
+		 * we didn't already create it here.
+		 *
+		 * Note: We cannot always initialize this map lazily, that is, use
+		 * GetChildToRootMap(), because AfterTriggerSaveEvent(), which needs
+		 * the map, doesn't have access to the "target" relation that is
+		 * needed to create the map.
+		 */
+		if (mtstate->mt_transition_capture && operation != CMD_INSERT)
+		{
+			Relation	relation = resultRelInfo->ri_RelationDesc;
+			Relation	targetRel = mtstate->rootResultRelInfo->ri_RelationDesc;
+
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(relation),
+									   RelationGetDescr(targetRel));
+			/* First time creating the map for this result relation. */
+			Assert(!resultRelInfo->ri_ChildToRootMapValid);
+			resultRelInfo->ri_ChildToRootMapValid = true;
+		}
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	return resultRelInfo;
 }
 
 /*
@@ -398,12 +822,27 @@ ExecInsert(ModifyTableState *mtstate,
 	{
 		ResultRelInfo *partRelInfo;
 
+		/*
+		 * ExecInitPartitionInfo() expects that the root parent's ri_onConflict
+		 * is initialized. XXX maybe it shouldn't?
+		 */
+		if (onconflict != ONCONFLICT_NONE &&
+			resultRelInfo->ri_onConflict == NULL)
+		{
+			(void) GetOnConflictArbiterIndexes(mtstate, resultRelInfo);
+			if (onconflict == ONCONFLICT_UPDATE)
+				InitOnConflictState(mtstate, resultRelInfo);
+		}
+
 		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
 									   resultRelInfo, slot,
 									   &partRelInfo);
 		resultRelInfo = partRelInfo;
 	}
 
+	if (resultRelInfo->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
+
 	ExecMaterializeSlot(slot);
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
@@ -489,12 +928,7 @@ ExecInsert(ModifyTableState *mtstate,
 		wco_kind = (mtstate->operation == CMD_UPDATE) ?
 			WCO_RLS_UPDATE_CHECK : WCO_RLS_INSERT_CHECK;
 
-		/*
-		 * ExecWithCheckOptions() will skip any WCOs which are not of the kind
-		 * we are looking for at this point.
-		 */
-		if (resultRelInfo->ri_WithCheckOptions != NIL)
-			ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);
+		ExecProcessWithCheckOptions(mtstate, resultRelInfo, slot, wco_kind);
 
 		/*
 		 * Check the constraints of the tuple.
@@ -521,7 +955,8 @@ ExecInsert(ModifyTableState *mtstate,
 			bool		specConflict;
 			List	   *arbiterIndexes;
 
-			arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
+			arbiterIndexes = GetOnConflictArbiterIndexes(mtstate,
+														 resultRelInfo);
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -691,12 +1126,11 @@ ExecInsert(ModifyTableState *mtstate,
 	 * ExecWithCheckOptions() will skip any WCOs which are not of the kind we
 	 * are looking for at this point.
 	 */
-	if (resultRelInfo->ri_WithCheckOptions != NIL)
-		ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	ExecProcessWithCheckOptions(mtstate, resultRelInfo, slot, WCO_VIEW_CHECK);
 
 	/* Process RETURNING if present */
-	if (resultRelInfo->ri_projectReturning)
-		result = ExecProcessReturning(resultRelInfo, slot, planSlot);
+	result = ExecProcessReturning(mtstate, resultRelInfo, slot, NULL, NULL,
+								  planSlot);
 
 	return result;
 }
@@ -1011,45 +1445,23 @@ ldelete:;
 						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
-	if (processReturning && resultRelInfo->ri_projectReturning)
+	if (processReturning)
 	{
-		/*
-		 * We have to put the target tuple into a slot, which means first we
-		 * gotta fetch it.  We can use the trigger tuple slot.
-		 */
-		TupleTableSlot *rslot;
-
-		if (resultRelInfo->ri_FdwRoutine)
-		{
-			/* FDW must have provided a slot containing the deleted row */
-			Assert(!TupIsNull(slot));
-		}
-		else
-		{
-			slot = ExecGetReturningSlot(estate, resultRelInfo);
-			if (oldtuple != NULL)
-			{
-				ExecForceStoreHeapTuple(oldtuple, slot, false);
-			}
-			else
-			{
-				if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
-												   SnapshotAny, slot))
-					elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
-			}
-		}
-
-		rslot = ExecProcessReturning(resultRelInfo, slot, planSlot);
+		TupleTableSlot *rslot = ExecProcessReturning(mtstate, resultRelInfo,
+													 slot, tupleid, oldtuple,
+													 planSlot);
 
 		/*
 		 * Before releasing the target tuple again, make sure rslot has a
 		 * local copy of any pass-by-reference values.
 		 */
-		ExecMaterializeSlot(rslot);
-
-		ExecClearTuple(slot);
-
-		return rslot;
+		if (rslot)
+		{
+			ExecMaterializeSlot(rslot);
+			if (slot)
+				ExecClearTuple(slot);
+			return rslot;
+		}
 	}
 
 	return NULL;
@@ -1082,7 +1494,6 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 						 TupleTableSlot **inserted_tuple)
 {
 	EState	   *estate = mtstate->ps.state;
-	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 	TupleConversionMap *tupconv_map;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
@@ -1101,13 +1512,27 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 				 errmsg("invalid ON UPDATE specification"),
 				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
 
-	/*
-	 * When an UPDATE is run on a leaf partition, we will not have partition
-	 * tuple routing set up.  In that case, fail with partition constraint
-	 * violation error.
-	 */
-	if (proute == NULL)
-		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+	/* Initialize tuple routing info if not already done. */
+	if (mtstate->mt_partition_tuple_routing == NULL)
+	{
+		Relation	targetRel = mtstate->rootResultRelInfo->ri_RelationDesc;
+		MemoryContext	oldcxt;
+
+		/* Things built here have to last for the query duration. */
+		oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+
+		mtstate->mt_partition_tuple_routing =
+			ExecSetupPartitionTupleRouting(estate, mtstate, targetRel);
+
+		/*
+		 * Before a partition's tuple can be re-routed, it must first
+		 * be converted to the root's format and we need a slot for
+		 * storing such tuple.
+		 */
+		Assert(mtstate->mt_root_tuple_slot == NULL);
+		mtstate->mt_root_tuple_slot = table_slot_create(targetRel, NULL);
+		MemoryContextSwitchTo(oldcxt);
+	}
 
 	/*
 	 * Row movement, part 1.  Delete the tuple, but skip RETURNING processing.
@@ -1161,7 +1586,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 	 * convert the tuple into root's tuple descriptor if needed, since
 	 * ExecInsert() starts the search from root.
 	 */
-	tupconv_map = resultRelInfo->ri_ChildToRootMap;
+	tupconv_map = GetChildToRootMap(mtstate, resultRelInfo);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
@@ -1226,6 +1651,9 @@ ExecUpdate(ModifyTableState *mtstate,
 	if (IsBootstrapProcessingMode())
 		elog(ERROR, "cannot UPDATE during bootstrap");
 
+	if (resultRelInfo->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(resultRelInfo, false);
+
 	ExecMaterializeSlot(slot);
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1318,16 +1746,9 @@ lreplace:;
 			resultRelationDesc->rd_rel->relispartition &&
 			!ExecPartitionCheck(resultRelInfo, slot, estate, false);
 
-		if (!partition_constraint_failed &&
-			resultRelInfo->ri_WithCheckOptions != NIL)
-		{
-			/*
-			 * ExecWithCheckOptions() will skip any WCOs which are not of the
-			 * kind we are looking for at this point.
-			 */
-			ExecWithCheckOptions(WCO_RLS_UPDATE_CHECK,
-								 resultRelInfo, slot, estate);
-		}
+		if (!partition_constraint_failed)
+			ExecProcessWithCheckOptions(mtstate, resultRelInfo, slot,
+										WCO_RLS_UPDATE_CHECK);
 
 		/*
 		 * If a partition check failed, try to move the row into the right
@@ -1340,6 +1761,13 @@ lreplace:;
 			bool		retry;
 
 			/*
+			 * When an UPDATE is run directly on a leaf partition, simply fail
+			 * with partition constraint violation error.
+			 */
+			if (resultRelInfo == mtstate->rootResultRelInfo)
+				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+			/*
 			 * ExecCrossPartitionUpdate will first DELETE the row from the
 			 * partition it's currently in and then insert it back into the
 			 * root table, which will re-route it to the correct partition.
@@ -1535,18 +1963,12 @@ lreplace:;
 	 * required to do this after testing all constraints and uniqueness
 	 * violations per the SQL spec, so we do it after actually updating the
 	 * record in the heap and all indexes.
-	 *
-	 * ExecWithCheckOptions() will skip any WCOs which are not of the kind we
-	 * are looking for at this point.
 	 */
-	if (resultRelInfo->ri_WithCheckOptions != NIL)
-		ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	ExecProcessWithCheckOptions(mtstate, resultRelInfo, slot, WCO_VIEW_CHECK);
 
 	/* Process RETURNING if present */
-	if (resultRelInfo->ri_projectReturning)
-		return ExecProcessReturning(resultRelInfo, slot, planSlot);
-
-	return NULL;
+	return ExecProcessReturning(mtstate, resultRelInfo, slot, NULL, NULL,
+								planSlot);
 }
 
 /*
@@ -1570,10 +1992,10 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 bool canSetTag,
 					 TupleTableSlot **returning)
 {
-	ExprContext *econtext = mtstate->ps.ps_ExprContext;
+	ExprContext *econtext;
 	Relation	relation = resultRelInfo->ri_RelationDesc;
-	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
-	TupleTableSlot *existing = resultRelInfo->ri_onConflict->oc_Existing;
+	ExprState  *onConflictSetWhere;
+	TupleTableSlot *existing;
 	TM_FailureData tmfd;
 	LockTupleMode lockmode;
 	TM_Result	test;
@@ -1581,6 +2003,13 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	TransactionId xmin;
 	bool		isnull;
 
+	if (resultRelInfo->ri_onConflict == NULL)
+		InitOnConflictState(mtstate, resultRelInfo);
+
+	econtext = mtstate->ps.ps_ExprContext;
+	onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
+	existing = resultRelInfo->ri_onConflict->oc_Existing;
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(estate, resultRelInfo);
 
@@ -1719,27 +2148,23 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 		return true;			/* done with the tuple */
 	}
 
-	if (resultRelInfo->ri_WithCheckOptions != NIL)
-	{
-		/*
-		 * Check target's existing tuple against UPDATE-applicable USING
-		 * security barrier quals (if any), enforced here as RLS checks/WCOs.
-		 *
-		 * The rewriter creates UPDATE RLS checks/WCOs for UPDATE security
-		 * quals, and stores them as WCOs of "kind" WCO_RLS_CONFLICT_CHECK,
-		 * but that's almost the extent of its special handling for ON
-		 * CONFLICT DO UPDATE.
-		 *
-		 * The rewriter will also have associated UPDATE applicable straight
-		 * RLS checks/WCOs for the benefit of the ExecUpdate() call that
-		 * follows.  INSERTs and UPDATEs naturally have mutually exclusive WCO
-		 * kinds, so there is no danger of spurious over-enforcement in the
-		 * INSERT or UPDATE path.
-		 */
-		ExecWithCheckOptions(WCO_RLS_CONFLICT_CHECK, resultRelInfo,
-							 existing,
-							 mtstate->ps.state);
-	}
+	/*
+	 * Check target's existing tuple against UPDATE-applicable USING
+	 * security barrier quals (if any), enforced here as RLS checks/WCOs.
+	 *
+	 * The rewriter creates UPDATE RLS checks/WCOs for UPDATE security
+	 * quals, and stores them as WCOs of "kind" WCO_RLS_CONFLICT_CHECK,
+	 * but that's almost the extent of its special handling for ON
+	 * CONFLICT DO UPDATE.
+	 *
+	 * The rewriter will also have associated UPDATE applicable straight
+	 * RLS checks/WCOs for the benefit of the ExecUpdate() call that
+	 * follows.  INSERTs and UPDATEs naturally have mutually exclusive WCO
+	 * kinds, so there is no danger of spurious over-enforcement in the
+	 * INSERT or UPDATE path.
+	 */
+	ExecProcessWithCheckOptions(mtstate, resultRelInfo, existing,
+								WCO_RLS_CONFLICT_CHECK);
 
 	/* Project the new tuple version */
 	ExecProject(resultRelInfo->ri_onConflict->oc_ProjInfo);
@@ -1929,11 +2354,12 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
+	ModifyTable *plan = (ModifyTable *) node->ps.plan;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *resultRelInfo;
+	ResultRelInfo *resultRelInfo = NULL;
 	PlanState  *subplanstate;
-	JunkFilter *junkfilter;
+	JunkFilter *junkfilter = NULL;
 	TupleTableSlot *slot;
 	TupleTableSlot *planSlot;
 	ItemPointer tupleid;
@@ -1974,9 +2400,7 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/* Preload local variables */
-	resultRelInfo = node->resultRelInfo + node->mt_whichplan;
 	subplanstate = node->mt_plans[node->mt_whichplan];
-	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
@@ -2000,17 +2424,37 @@ ExecModifyTable(PlanState *pstate)
 		if (pstate->ps_ExprContext)
 			ResetExprContext(pstate->ps_ExprContext);
 
+		/*
+		 * FDWs that can push down a modify operation would need to see the
+		 * ResultRelInfo, so fetch one if not already done before executing
+		 * the subplan, potentially opening it for the first time.
+		 */
+		if (bms_is_member(node->mt_whichplan, plan->fdwDirectModifyPlans) &&
+			resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecGetResultRelation(node, node->mt_whichplan);
+
+			/*
+			 * Must make sure to initialize the RETURNING projection as well,
+			 * because some FDWs rely on accessing ri_projectReturning to
+			 * set its "scan" tuple to use below for computing the actual
+			 * RETURNING targetlist.
+			 */
+			if (plan->returningLists && resultRelInfo->ri_returningList == NIL)
+				InitReturningProjection(node, resultRelInfo);
+		}
+
 		planSlot = ExecProcNode(subplanstate);
 
 		if (TupIsNull(planSlot))
 		{
-			/* advance to next subplan if any */
+			/* Signal to initialize the next plan's relation. */
+			resultRelInfo = NULL;
+
 			node->mt_whichplan++;
 			if (node->mt_whichplan < node->mt_nplans)
 			{
-				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
-				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				continue;
@@ -2020,8 +2464,25 @@ ExecModifyTable(PlanState *pstate)
 		}
 
 		/*
+		 * Fetch the result relation for the current plan if not already done,
+		 * potentially opening it for the first time.
+		 */
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecGetResultRelation(node, node->mt_whichplan);
+			if (!resultRelInfo->ri_junkFilterValid)
+				InitJunkFilter(node, resultRelInfo);
+			junkfilter = resultRelInfo->ri_junkFilter;
+		}
+
+		/*
 		 * Ensure input tuple is the right format for the target relation.
 		 */
+		if (node->mt_scans[node->mt_whichplan] == NULL)
+			node->mt_scans[node->mt_whichplan] =
+				ExecInitExtraTupleSlot(node->ps.state,
+									   ExecGetResultType(subplanstate),
+									   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
 		if (node->mt_scans[node->mt_whichplan]->tts_ops != planSlot->tts_ops)
 		{
 			ExecCopySlot(node->mt_scans[node->mt_whichplan], planSlot);
@@ -2042,7 +2503,8 @@ ExecModifyTable(PlanState *pstate)
 			 * ExecProcessReturning by IterateDirectModify, so no need to
 			 * provide it here.
 			 */
-			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
+			slot = ExecProcessReturning(node, resultRelInfo, NULL, NULL, NULL,
+										planSlot);
 
 			return slot;
 		}
@@ -2175,13 +2637,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	ListCell   *l,
-			   *l1;
+	ListCell   *l;
 	int			i;
 	Relation	rel;
-	bool		update_tuple_routing_needed = node->partColsUpdated;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2198,7 +2657,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->canSetTag = node->canSetTag;
 	mtstate->mt_done = false;
 
+	/*
+	 * call ExecInitNode on each of the plans to be executed and save the
+	 * results into the array "mt_plans".
+	 */
+	mtstate->mt_nplans = nplans;
 	mtstate->mt_plans = (PlanState **) palloc0(sizeof(PlanState *) * nplans);
+	i = 0;
+	foreach(l, node->plans)
+	{
+		subplan = (Plan *) lfirst(l);
+
+		mtstate->mt_plans[i++] = ExecInitNode(subplan, estate, eflags);
+	}
+
 	mtstate->resultRelInfo = (ResultRelInfo *)
 		palloc(nplans * sizeof(ResultRelInfo));
 	mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
@@ -2225,13 +2697,17 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 	else
 	{
-		mtstate->rootResultRelInfo = mtstate->resultRelInfo;
-		ExecInitResultRelation(estate, mtstate->resultRelInfo,
-							   linitial_int(node->resultRelations));
+		/*
+		 * Unlike a partitioned target relation, the target relation in this
+		 * case will be actually used by ExecModifyTable(), so use
+		 * ExecGetResultRelation() to get the ResultRelInfo, because it
+		 * initializes some fields that a bare InitResultRelInfo() doesn't.
+		 */
+		mtstate->rootResultRelInfo = ExecGetResultRelation(mtstate, 0);
+		Assert(mtstate->rootResultRelInfo == mtstate->resultRelInfo);
 	}
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
-	mtstate->mt_nplans = nplans;
 
 	/* set up epqstate with dummy subplan data for the moment */
 	EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
@@ -2244,163 +2720,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
 		ExecSetupTransitionCaptureState(mtstate, estate);
 
-	/*
-	 * call ExecInitNode on each of the plans to be executed and save the
-	 * results into the array "mt_plans".  This is also a convenient place to
-	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.
-	 */
-	resultRelInfo = mtstate->resultRelInfo;
-	i = 0;
-	forboth(l, node->resultRelations, l1, node->plans)
-	{
-		Index		resultRelation = lfirst_int(l);
-
-		subplan = (Plan *) lfirst(l1);
-
-		/*
-		 * This opens result relation and fills ResultRelInfo. (root relation
-		 * was initialized already.)
-		 */
-		if (resultRelInfo != mtstate->rootResultRelInfo)
-			ExecInitResultRelation(estate, resultRelInfo, resultRelation);
-
-		/* Initialize the usesFdwDirectModify flag */
-		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
-															  node->fdwDirectModifyPlans);
-
-		/*
-		 * Verify result relation is a valid target for the current operation
-		 */
-		CheckValidResultRel(resultRelInfo, operation);
-
-		/*
-		 * If there are indices on the result relation, open them and save
-		 * descriptors in the result relation info, so that we can add new
-		 * index entries for the tuples we add/update.  We need not do this
-		 * for a DELETE, however, since deletion doesn't affect indexes. Also,
-		 * inside an EvalPlanQual operation, the indexes might be open
-		 * already, since we share the resultrel state with the original
-		 * query.
-		 */
-		if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
-			operation != CMD_DELETE &&
-			resultRelInfo->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(resultRelInfo,
-							node->onConflictAction != ONCONFLICT_NONE);
-
-		/*
-		 * If this is an UPDATE and a BEFORE UPDATE trigger is present, the
-		 * trigger itself might modify the partition-key values. So arrange
-		 * for tuple routing.
-		 */
-		if (resultRelInfo->ri_TrigDesc &&
-			resultRelInfo->ri_TrigDesc->trig_update_before_row &&
-			operation == CMD_UPDATE)
-			update_tuple_routing_needed = true;
-
-		/* Now init the plan for this result rel */
-		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
-		mtstate->mt_scans[i] =
-			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
-								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
-		/* Also let FDWs init themselves for foreign-table result rels */
-		if (!resultRelInfo->ri_usesFdwDirectModify &&
-			resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
-		{
-			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
-
-			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
-															 resultRelInfo,
-															 fdw_private,
-															 i,
-															 eflags);
-		}
-
-		/*
-		 * If needed, initialize a map to convert tuples in the child format
-		 * to the format of the table mentioned in the query (root relation).
-		 * It's needed for update tuple routing, because the routing starts
-		 * from the root relation.  It's also needed for capturing transition
-		 * tuples, because the transition tuple store can only store tuples in
-		 * the root table format.
-		 *
-		 * For INSERT, the map is only initialized for a given partition when
-		 * the partition itself is first initialized by ExecFindPartition().
-		 */
-		if (update_tuple_routing_needed ||
-			(mtstate->mt_transition_capture &&
-			 mtstate->operation != CMD_INSERT))
-			resultRelInfo->ri_ChildToRootMap =
-				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
-									   RelationGetDescr(mtstate->rootResultRelInfo->ri_RelationDesc));
-		resultRelInfo++;
-		i++;
-	}
-
-	/* Get the target relation */
-	rel = mtstate->rootResultRelInfo->ri_RelationDesc;
-
-	/*
-	 * If it's not a partitioned table after all, UPDATE tuple routing should
-	 * not be attempted.
-	 */
-	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		update_tuple_routing_needed = false;
-
-	/*
-	 * Build state for tuple routing if it's an INSERT or if it's an UPDATE of
-	 * partition key.
-	 */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
-		(operation == CMD_INSERT || update_tuple_routing_needed))
-		mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
-
-	/*
-	 * For update row movement we'll need a dedicated slot to store the tuples
-	 * that have been converted from partition format to the root table
-	 * format.
-	 */
-	if (update_tuple_routing_needed)
-		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-
-	/*
-	 * Initialize any WITH CHECK OPTION constraints if needed.
-	 */
-	resultRelInfo = mtstate->resultRelInfo;
-	i = 0;
-	foreach(l, node->withCheckOptionLists)
-	{
-		List	   *wcoList = (List *) lfirst(l);
-		List	   *wcoExprs = NIL;
-		ListCell   *ll;
-
-		foreach(ll, wcoList)
-		{
-			WithCheckOption *wco = (WithCheckOption *) lfirst(ll);
-			ExprState  *wcoExpr = ExecInitQual((List *) wco->qual,
-											   &mtstate->ps);
-
-			wcoExprs = lappend(wcoExprs, wcoExpr);
-		}
-
-		resultRelInfo->ri_WithCheckOptions = wcoList;
-		resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		resultRelInfo++;
-		i++;
-	}
-
-	/*
-	 * Initialize RETURNING projections if needed.
-	 */
+	/* Initialize some global state for RETURNING projections. */
 	if (node->returningLists)
 	{
-		TupleTableSlot *slot;
-		ExprContext *econtext;
-
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
 		 * RETURNING list.  We assume the rest will look the same.
@@ -2409,27 +2731,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 		/* Set up a slot for the output of the RETURNING projection(s) */
 		ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
-		slot = mtstate->ps.ps_ResultTupleSlot;
 
 		/* Need an econtext too */
 		if (mtstate->ps.ps_ExprContext == NULL)
 			ExecAssignExprContext(estate, &mtstate->ps);
-		econtext = mtstate->ps.ps_ExprContext;
-
-		/*
-		 * Build a projection for each result rel.
-		 */
-		resultRelInfo = mtstate->resultRelInfo;
-		foreach(l, node->returningLists)
-		{
-			List	   *rlist = (List *) lfirst(l);
-
-			resultRelInfo->ri_returningList = rlist;
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-			resultRelInfo++;
-		}
 	}
 	else
 	{
@@ -2443,67 +2748,18 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		mtstate->ps.ps_ExprContext = NULL;
 	}
 
-	/* Set the list of arbiter indexes if needed for ON CONFLICT */
-	resultRelInfo = mtstate->resultRelInfo;
-	if (node->onConflictAction != ONCONFLICT_NONE)
-		resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
+	/* Get the target relation */
+	rel = mtstate->rootResultRelInfo->ri_RelationDesc;
 
 	/*
-	 * If needed, Initialize target list, projection and qual for ON CONFLICT
-	 * DO UPDATE.
+	 * Build state for tuple routing if it's an INSERT.  An UPDATE might need
+	 * it too, but it's initialized only when it actually ends up moving
+	 * tuples between partitions; see ExecCrossPartitionUpdate().
 	 */
-	if (node->onConflictAction == ONCONFLICT_UPDATE)
-	{
-		ExprContext *econtext;
-		TupleDesc	relationDesc;
-		TupleDesc	tupDesc;
-
-		/* insert may only have one plan, inheritance is not expanded */
-		Assert(nplans == 1);
-
-		/* already exists if created by RETURNING processing above */
-		if (mtstate->ps.ps_ExprContext == NULL)
-			ExecAssignExprContext(estate, &mtstate->ps);
-
-		econtext = mtstate->ps.ps_ExprContext;
-		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
-
-		/* create state for DO UPDATE SET operation */
-		resultRelInfo->ri_onConflict = makeNode(OnConflictSetState);
-
-		/* initialize slot for the existing tuple */
-		resultRelInfo->ri_onConflict->oc_Existing =
-			table_slot_create(resultRelInfo->ri_RelationDesc,
-							  &mtstate->ps.state->es_tupleTable);
-
-		/*
-		 * Create the tuple slot for the UPDATE SET projection. We want a slot
-		 * of the table's type here, because the slot will be used to insert
-		 * into the table, and for RETURNING processing - which may access
-		 * system attributes.
-		 */
-		tupDesc = ExecTypeFromTL((List *) node->onConflictSet);
-		resultRelInfo->ri_onConflict->oc_ProjSlot =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
-								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
-		/* build UPDATE SET projection state */
-		resultRelInfo->ri_onConflict->oc_ProjInfo =
-			ExecBuildProjectionInfo(node->onConflictSet, econtext,
-									resultRelInfo->ri_onConflict->oc_ProjSlot,
-									&mtstate->ps,
-									relationDesc);
-
-		/* initialize state to evaluate the WHERE clause, if any */
-		if (node->onConflictWhere)
-		{
-			ExprState  *qualexpr;
-
-			qualexpr = ExecInitQual((List *) node->onConflictWhere,
-									&mtstate->ps);
-			resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
-		}
-	}
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		operation == CMD_INSERT)
+		mtstate->mt_partition_tuple_routing =
+			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
 	 * If we have any secondary relations in an UPDATE or DELETE, they need to
@@ -2541,109 +2797,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 						mtstate->mt_arowmarks[0]);
 
 	/*
-	 * Initialize the junk filter(s) if needed.  INSERT queries need a filter
-	 * if there are any junk attrs in the tlist.  UPDATE and DELETE always
-	 * need a filter, since there's always at least one junk attribute present
-	 * --- no need to look first.  Typically, this will be a 'ctid' or
-	 * 'wholerow' attribute, but in the case of a foreign data wrapper it
-	 * might be a set of junk attributes sufficient to identify the remote
-	 * row.
-	 *
-	 * If there are multiple result relations, each one needs its own junk
-	 * filter.  Note multiple rels are only possible for UPDATE/DELETE, so we
-	 * can't be fooled by some needing a filter and some not.
-	 *
-	 * This section of code is also a convenient place to verify that the
-	 * output of an INSERT or UPDATE matches the target table(s).
-	 */
-	{
-		bool		junk_filter_needed = false;
-
-		switch (operation)
-		{
-			case CMD_INSERT:
-				foreach(l, subplan->targetlist)
-				{
-					TargetEntry *tle = (TargetEntry *) lfirst(l);
-
-					if (tle->resjunk)
-					{
-						junk_filter_needed = true;
-						break;
-					}
-				}
-				break;
-			case CMD_UPDATE:
-			case CMD_DELETE:
-				junk_filter_needed = true;
-				break;
-			default:
-				elog(ERROR, "unknown operation");
-				break;
-		}
-
-		if (junk_filter_needed)
-		{
-			resultRelInfo = mtstate->resultRelInfo;
-			for (i = 0; i < nplans; i++)
-			{
-				JunkFilter *j;
-				TupleTableSlot *junkresslot;
-
-				subplan = mtstate->mt_plans[i]->plan;
-				if (operation == CMD_INSERT || operation == CMD_UPDATE)
-					ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
-										subplan->targetlist);
-
-				junkresslot =
-					ExecInitExtraTupleSlot(estate, NULL,
-										   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-				j = ExecInitJunkFilter(subplan->targetlist,
-									   junkresslot);
-
-				if (operation == CMD_UPDATE || operation == CMD_DELETE)
-				{
-					/* For UPDATE/DELETE, find the appropriate junk attr now */
-					char		relkind;
-
-					relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-					if (relkind == RELKIND_RELATION ||
-						relkind == RELKIND_MATVIEW ||
-						relkind == RELKIND_PARTITIONED_TABLE)
-					{
-						j->jf_junkAttNo = ExecFindJunkAttribute(j, "ctid");
-						if (!AttributeNumberIsValid(j->jf_junkAttNo))
-							elog(ERROR, "could not find junk ctid column");
-					}
-					else if (relkind == RELKIND_FOREIGN_TABLE)
-					{
-						/*
-						 * When there is a row-level trigger, there should be
-						 * a wholerow attribute.
-						 */
-						j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
-					}
-					else
-					{
-						j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
-						if (!AttributeNumberIsValid(j->jf_junkAttNo))
-							elog(ERROR, "could not find junk wholerow column");
-					}
-				}
-
-				resultRelInfo->ri_junkFilter = j;
-				resultRelInfo++;
-			}
-		}
-		else
-		{
-			if (operation == CMD_INSERT)
-				ExecCheckPlanOutput(mtstate->resultRelInfo->ri_RelationDesc,
-									subplan->targetlist);
-		}
-	}
-
-	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
 	 * to estate->es_auxmodifytables so that it will be run to completion by
 	 * ExecPostprocessPlan.  (It'd actually work fine to add the primary
@@ -2673,20 +2826,6 @@ ExecEndModifyTable(ModifyTableState *node)
 	int			i;
 
 	/*
-	 * Allow any FDWs to shut down
-	 */
-	for (i = 0; i < node->mt_nplans; i++)
-	{
-		ResultRelInfo *resultRelInfo = node->resultRelInfo + i;
-
-		if (!resultRelInfo->ri_usesFdwDirectModify &&
-			resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignModify != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignModify(node->ps.state,
-														   resultRelInfo);
-	}
-
-	/*
 	 * Close all the partitioned tables, leaf partitions, and their indices
 	 * and release the slot used for tuple routing, if set.
 	 */
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 46a2dc9..9ae7e40 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -22,5 +22,6 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
 extern void ExecReScanModifyTable(ModifyTableState *node);
+extern ResultRelInfo *ExecGetResultRelation(ModifyTableState *mtstate, int whichrel);
 
 #endif							/* NODEMODIFYTABLE_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d6..f2f4bed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -463,6 +463,7 @@ typedef struct ResultRelInfo
 
 	/* for removing junk attributes from tuples */
 	JunkFilter *ri_junkFilter;
+	bool		ri_junkFilterValid;	/* has the filter been initialized? */
 
 	/* list of RETURNING expressions */
 	List	   *ri_returningList;
@@ -497,6 +498,7 @@ typedef struct ResultRelInfo
 	 * transition tuple capture or update partition row movement is active.
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
+	bool		ri_ChildToRootMapValid;	/* has the map been initialized? */
 
 	/* for use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index ff157ce..74cd7e2 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -52,10 +52,7 @@ explain (costs off) insert into insertconflicttest values(0, 'Crowberry') on con
    Conflict Arbiter Indexes: op_index_key, collation_index_key, both_index_key
    Conflict Filter: (SubPlan 1)
    ->  Result
-   SubPlan 1
-     ->  Index Only Scan using both_index_expr_key on insertconflicttest ii
-           Index Cond: (key = excluded.key)
-(8 rows)
+(5 rows)
 
 -- Neither collation nor operator class specifications are required --
 -- supplying them merely *limits* matches to indexes with matching opclasses
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index caed1c1..d8d2a3d 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1862,28 +1862,22 @@ UPDATE rw_view1 SET a = a + 5; -- should fail
 ERROR:  new row violates check option for view "rw_view1"
 DETAIL:  Failing row contains (15).
 EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5);
-                       QUERY PLAN                        
----------------------------------------------------------
+      QUERY PLAN      
+----------------------
  Insert on base_tbl b
    ->  Result
-   SubPlan 1
-     ->  Index Only Scan using ref_tbl_pkey on ref_tbl r
-           Index Cond: (a = b.a)
-(5 rows)
+(2 rows)
 
 EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
-                        QUERY PLAN                         
------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Update on base_tbl b
    ->  Hash Join
          Hash Cond: (b.a = r.a)
          ->  Seq Scan on base_tbl b
          ->  Hash
                ->  Seq Scan on ref_tbl r
-   SubPlan 1
-     ->  Index Only Scan using ref_tbl_pkey on ref_tbl r_1
-           Index Cond: (a = b.a)
-(9 rows)
+(6 rows)
 
 DROP TABLE base_tbl, ref_tbl CASCADE;
 NOTICE:  drop cascades to view rw_view1
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index bf939d7..0ad0d1a 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -341,8 +341,8 @@ DETAIL:  Failing row contains (105, 85, null, b, 15).
 -- fail, no partition key update, so no attempt to move tuple,
 -- but "a = 'a'" violates partition constraint enforced by root partition)
 UPDATE part_b_10_b_20 set a = 'a';
-ERROR:  new row for relation "part_c_1_100" violates partition constraint
-DETAIL:  Failing row contains (null, 1, 96, 12, a).
+ERROR:  new row for relation "part_b_10_b_20" violates partition constraint
+DETAIL:  Failing row contains (null, 96, a, 12, 1).
 -- ok, partition key update, no constraint violation
 UPDATE range_parted set d = d - 10 WHERE d > 10;
 -- ok, no partition key update, no constraint violation
@@ -372,8 +372,8 @@ UPDATE part_b_10_b_20 set c = c + 20 returning c, b, a;
 
 -- fail, row movement happens only within the partition subtree.
 UPDATE part_b_10_b_20 set b = b - 6 WHERE c > 116 returning *;
-ERROR:  new row for relation "part_d_1_15" violates partition constraint
-DETAIL:  Failing row contains (2, 117, 2, b, 7).
+ERROR:  new row for relation "part_b_10_b_20" violates partition constraint
+DETAIL:  Failing row contains (2, 117, b, 7, 2).
 -- ok, row movement, with subset of rows moved into different partition.
 UPDATE range_parted set b = b - 6 WHERE c > 116 returning a, b + c;
  a | ?column? 
@@ -814,8 +814,8 @@ INSERT into sub_parted VALUES (1,2,10);
 -- Test partition constraint violation when intermediate ancestor is used and
 -- constraint is inherited from upper root.
 UPDATE sub_parted set a = 2 WHERE c = 10;
-ERROR:  new row for relation "sub_part2" violates partition constraint
-DETAIL:  Failing row contains (2, 10, 2).
+ERROR:  new row for relation "sub_parted" violates partition constraint
+DETAIL:  Failing row contains (2, 2, 10).
 -- Test update-partition-key, where the unpruned partitions do not have their
 -- partition keys updated.
 SELECT tableoid::regclass::text, * FROM list_parted WHERE a = 2 ORDER BY 1;
-- 
1.8.3.1

#75Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#74)
Re: partition routing layering in nodeModifyTable.c

On 2020-Oct-22, Amit Langote wrote:

0001 fixes a thinko of the recent commit 1375422c782 that I discovered
when debugging a problem with 0003.

Hmm, how hard is it to produce a test case that fails because of this
problem?

#76Amit Langote
amitlangote09@gmail.com
In reply to: Alvaro Herrera (#75)
Re: partition routing layering in nodeModifyTable.c

On Thu, Oct 22, 2020 at 11:25 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2020-Oct-22, Amit Langote wrote:

0001 fixes a thinko of the recent commit 1375422c782 that I discovered
when debugging a problem with 0003.

Hmm, how hard is it to produce a test case that fails because of this
problem?

I checked and don't think there's any live bug here. You will notice
if you take a look at 1375422c7 that we've made es_result_relations an
array of pointers while the individual ModifyTableState nodes own the
actual ResultRelInfos. So, EvalPlanQualStart() setting the parent
EState's es_result_relations array to NULL implies that those pointers
become inaccessible to the parent query's execution after
EvalPlanQual() returns. However, nothing in the tree today accesses
ResulRelInfos through es_result_relations array, except during
ExecInit* stage (see ExecInitForeignScan()) but it would still be
intact at that stage.

With the lazy-initialization patch though, we do check
es_result_relations when trying to open a result relation to see if it
has already been initialized (a non-NULL pointer in that array means
yes), so resetting it in the middle of the execution can't be safe.
For one example, we will end up initializing the same relation many
times after not finding it in es_result_relations and also add it
*duplicatively* to es_opened_result_relations list, breaking the
invariant that that list contains distinct relations.

--
Amit Langote
EDB: http://www.enterprisedb.com

#77Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#76)
Re: partition routing layering in nodeModifyTable.c

On 23/10/2020 05:56, Amit Langote wrote:

On Thu, Oct 22, 2020 at 11:25 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2020-Oct-22, Amit Langote wrote:

0001 fixes a thinko of the recent commit 1375422c782 that I discovered
when debugging a problem with 0003.

Hmm, how hard is it to produce a test case that fails because of this
problem?

I checked and don't think there's any live bug here. You will notice
if you take a look at 1375422c7 that we've made es_result_relations an
array of pointers while the individual ModifyTableState nodes own the
actual ResultRelInfos. So, EvalPlanQualStart() setting the parent
EState's es_result_relations array to NULL implies that those pointers
become inaccessible to the parent query's execution after
EvalPlanQual() returns. However, nothing in the tree today accesses
ResulRelInfos through es_result_relations array, except during
ExecInit* stage (see ExecInitForeignScan()) but it would still be
intact at that stage.

With the lazy-initialization patch though, we do check
es_result_relations when trying to open a result relation to see if it
has already been initialized (a non-NULL pointer in that array means
yes), so resetting it in the middle of the execution can't be safe.
For one example, we will end up initializing the same relation many
times after not finding it in es_result_relations and also add it
*duplicatively* to es_opened_result_relations list, breaking the
invariant that that list contains distinct relations.

Pushed that thinko-fix, thanks!

- Heikki

#78Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#74)
Re: partition routing layering in nodeModifyTable.c

On 22/10/2020 16:49, Amit Langote wrote:

On Tue, Oct 20, 2020 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It's probably true that there's no performance gain from initializing
them more lazily. But the reasoning and logic around the initialization
is complicated. After tracing through various path through the code, I'm
convinced enough that it's correct, or at least these patches didn't
break it, but I still think some sort of lazy initialization on first
use would make it more readable. Or perhaps there's some other
refactoring we could do.

So the other patch I have mentioned is about lazy initialization of
the ResultRelInfo itself, not the individual fields, but maybe with
enough refactoring we can get the latter too.

So, I tried implementing a lazy-initialization-on-first-access
approach for both the ResultRelInfos themselves and some of the
individual fields of ResultRelInfo that don't need to be set right
away. You can see the end result in the attached 0003 patch. This
slims down ExecInitModifyTable() significantly, both in terms of code
footprint and the amount of work that it does.

Have you done any performance testing? I'd like to know how much of a
difference this makes in practice.

Another alternative is to continue to create the ResultRelInfos in
ExecInitModify(), but initialize the individual fields in them lazily.

Does this patch become moot if we do the "Overhaul UPDATE/DELETE
processing"?
(/messages/by-id/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com)

- Heikki

#79Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#78)
Re: partition routing layering in nodeModifyTable.c

On Fri, Oct 23, 2020 at 4:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 22/10/2020 16:49, Amit Langote wrote:

On Tue, Oct 20, 2020 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

It's probably true that there's no performance gain from initializing
them more lazily. But the reasoning and logic around the initialization
is complicated. After tracing through various path through the code, I'm
convinced enough that it's correct, or at least these patches didn't
break it, but I still think some sort of lazy initialization on first
use would make it more readable. Or perhaps there's some other
refactoring we could do.

So the other patch I have mentioned is about lazy initialization of
the ResultRelInfo itself, not the individual fields, but maybe with
enough refactoring we can get the latter too.

So, I tried implementing a lazy-initialization-on-first-access
approach for both the ResultRelInfos themselves and some of the
individual fields of ResultRelInfo that don't need to be set right
away. You can see the end result in the attached 0003 patch. This
slims down ExecInitModifyTable() significantly, both in terms of code
footprint and the amount of work that it does.

Have you done any performance testing? I'd like to know how much of a
difference this makes in practice.

I have shown some numbers here:

/messages/by-id/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com

To reiterate, if you apply the following patch:

Does this patch become moot if we do the "Overhaul UPDATE/DELETE
processing"?
(/messages/by-id/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com)

...and run this benchmark with plan_cache_mode = force_generic_plan

pgbench -i -s 10 --partitions={0, 10, 100, 1000}
pgbench -T120 -f test.sql -M prepared

test.sql:
\set aid random(1, 1000000)
update pgbench_accounts set abalance = abalance + 1 where aid = :aid;

you may see roughly the following results:

HEAD:

0 tps = 13045.485121 (excluding connections establishing)
10 tps = 9358.157433 (excluding connections establishing)
100 tps = 1878.274500 (excluding connections establishing)
1000 tps = 84.684695 (excluding connections establishing)

Patched (overhaul update/delete processing):

0 tps = 12743.487196 (excluding connections establishing)
10 tps = 12644.240748 (excluding connections establishing)
100 tps = 4158.123345 (excluding connections establishing)
1000 tps = 391.248067 (excluding connections establishing)

And if you apply the patch being discussed here, TPS shoots up a bit,
especially for higher partition counts:

Patched (lazy-ResultRelInfo-initialization)

0 tps = 13419.283168 (excluding connections establishing)
10 tps = 12588.016095 (excluding connections establishing)
100 tps = 8560.824225 (excluding connections establishing)
1000 tps = 1926.553901 (excluding connections establishing)

To explain these numbers a bit, "overheaul update/delete processing"
patch improves the performance of that benchmark by allowing the
updates to use run-time pruning when executing generic plans, which
they can't today.

However without "lazy-ResultRelInfo-initialization" patch,
ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can
be seen to be spending time initializing all of those result
relations, whereas only one of those will actually be used.

As mentioned further in that email, it's really the locking of all
relations by AcquireExecutorLocks() that occurs even before we enter
the executor that's a much thornier bottleneck for this benchmark.
But the ResultRelInfo initialization bottleneck sounded like it could
get alleviated in a relatively straightforward manner. The patches
that were developed for attacking the locking bottleneck would require
further reflection on whether they are correct.

(Note: I've just copy pasted the numbers I reported in that email. To
reproduce, I'll have to rebase the "overhaul update/delete processing"
patch on this one, which I haven't yet done.)

Another alternative is to continue to create the ResultRelInfos in
ExecInitModify(), but initialize the individual fields in them lazily.

If you consider the above, maybe you can see how that will not really
eliminate the bottleneck I'm aiming to fix here.

--
Amit Langote
EDB: http://www.enterprisedb.com

#80Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Amit Langote (#79)
Re: partition routing layering in nodeModifyTable.c

On 23/10/2020 12:37, Amit Langote wrote:

To explain these numbers a bit, "overheaul update/delete processing"
patch improves the performance of that benchmark by allowing the
updates to use run-time pruning when executing generic plans, which
they can't today.

However without "lazy-ResultRelInfo-initialization" patch,
ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can
be seen to be spending time initializing all of those result
relations, whereas only one of those will actually be used.

As mentioned further in that email, it's really the locking of all
relations by AcquireExecutorLocks() that occurs even before we enter
the executor that's a much thornier bottleneck for this benchmark.
But the ResultRelInfo initialization bottleneck sounded like it could
get alleviated in a relatively straightforward manner. The patches
that were developed for attacking the locking bottleneck would require
further reflection on whether they are correct.

(Note: I've just copy pasted the numbers I reported in that email. To
reproduce, I'll have to rebase the "overhaul update/delete processing"
patch on this one, which I haven't yet done.)

Ok, thanks for the explanation, now I understand.

This patch looks reasonable to me at a quick glance. I'm a bit worried
or unhappy about the impact on FDWs, though. It doesn't seem nice that
the ResultRelInfo is not available in the BeginDirectModify call. It's
not too bad, the FDW can call ExecGetResultRelation() if it needs it,
but still. Perhaps it would be better to delay calling
BeginDirectModify() until the first modification is performed, to avoid
any initialization overhead there, like establishing the connection in
postgres_fdw.

But since this applies on top of the "overhaul update/delete processing"
patch, let's tackle that patch set next. Could you rebase that, please?

- Heikki

#81Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#80)
Re: partition routing layering in nodeModifyTable.c

On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 23/10/2020 12:37, Amit Langote wrote:

To explain these numbers a bit, "overheaul update/delete processing"
patch improves the performance of that benchmark by allowing the
updates to use run-time pruning when executing generic plans, which
they can't today.

However without "lazy-ResultRelInfo-initialization" patch,
ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can
be seen to be spending time initializing all of those result
relations, whereas only one of those will actually be used.

As mentioned further in that email, it's really the locking of all
relations by AcquireExecutorLocks() that occurs even before we enter
the executor that's a much thornier bottleneck for this benchmark.
But the ResultRelInfo initialization bottleneck sounded like it could
get alleviated in a relatively straightforward manner. The patches
that were developed for attacking the locking bottleneck would require
further reflection on whether they are correct.

(Note: I've just copy pasted the numbers I reported in that email. To
reproduce, I'll have to rebase the "overhaul update/delete processing"
patch on this one, which I haven't yet done.)

Ok, thanks for the explanation, now I understand.

But since this applies on top of the "overhaul update/delete processing"
patch, let's tackle that patch set next. Could you rebase that, please?

Actually, I made lazy-ResultRelInfo-initialization apply on HEAD
directly at one point because of its separate CF entry, that is, to
appease the CF app's automatic patch tester that wouldn't know to
apply the other patch first. Because both of these patch sets want to
change thow ModifyTable works, there are conflicts.

The "overhaul update/delete processing" patch is somewhat complex and
I expect some amount of back and forth on its design points. OTOH,
the lazy-ResultRelInfo-initialization patch is straightforward enough
that I hoped it would be easier to bring it into a committable state
than the other. But I can see why one may find it hard to justify
committing the latter without the former already in, because the
bottleneck it purports to alleviate (that of eager ResultRelInfo
initialization) is not apparent until update/delete can use run-time
pruning.

Anyway, I will post the rebased patch on the "overhaul update/delete
processing" thread.

--
Amit Langote
EDB: http://www.enterprisedb.com

#82Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#80)
2 attachment(s)
Re: partition routing layering in nodeModifyTable.c

On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

This patch looks reasonable to me at a quick glance. I'm a bit worried
or unhappy about the impact on FDWs, though. It doesn't seem nice that
the ResultRelInfo is not available in the BeginDirectModify call. It's
not too bad, the FDW can call ExecGetResultRelation() if it needs it,
but still. Perhaps it would be better to delay calling
BeginDirectModify() until the first modification is performed, to avoid
any initialization overhead there, like establishing the connection in
postgres_fdw.

Ah, calling BeginDirectModify() itself lazily sounds like a good idea;
see attached updated 0001 to see how that looks. While updating that
patch, I realized that the ForeignScan.resultRelation that we
introduced in 178f2d560d will now be totally useless. :-(

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Call-BeginDirectModify-from-ExecInitModifyTable.patchapplication/octet-stream; name=v2-0001-Call-BeginDirectModify-from-ExecInitModifyTable.patchDownload
From c86c7e7fb5112ae7e95704ae8f687f50f78da29c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 19 Oct 2020 17:17:33 +0900
Subject: [PATCH v2 1/2] Call BeginDirectModify from ExecInitModifyTable

This allows ModifyTable to directly install the target foreign
table's ResultRelInfo into the ForeignScanState node that will be
used to perform the "direct modify" operation rather than have
ExecInitForeignScan() do it by getting it via es_result_relations.

This is in preparation of a later commit to make ModifyTable node
initialize ResultRelInfos lazily, whereby accessing a given target
table's ResultRelInfo directly through es_result_relations while
the ModifyTable is executing will become a deprecated pattern.
---
 src/backend/executor/nodeForeignscan.c | 15 ++++-----------
 src/backend/executor/nodeModifyTable.c | 32 +++++++++++++++++++++++---------
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 0b20f94..7101c68 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -215,24 +215,17 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
-	/*
-	 * For the FDW's convenience, look up the modification target relation's.
-	 * ResultRelInfo.
-	 */
-	if (node->resultRelation > 0)
-		scanstate->resultRelInfo = estate->es_result_relations[node->resultRelation - 1];
-
 	/* Initialize any outer plan. */
 	if (outerPlan(node))
 		outerPlanState(scanstate) =
 			ExecInitNode(outerPlan(node), estate, eflags);
 
 	/*
-	 * Tell the FDW to initialize the scan.
+	 * Tell the FDW to initialize the scan.  For modify operations, it's the
+	 * enclosing ModifyTable node's job to call the FDW after setting up the
+	 * target foreign table's ResultRelInfo.
 	 */
-	if (node->operation != CMD_SELECT)
-		fdwroutine->BeginDirectModify(scanstate, eflags);
-	else
+	if (node->operation == CMD_SELECT)
 		fdwroutine->BeginForeignScan(scanstate, eflags);
 
 	return scanstate;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 29e07b7..05e68ef 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2306,17 +2306,31 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
 
 		/* Also let FDWs init themselves for foreign-table result rels */
-		if (!resultRelInfo->ri_usesFdwDirectModify &&
-			resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
+		if (resultRelInfo->ri_FdwRoutine != NULL)
 		{
-			List	   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+			if (resultRelInfo->ri_usesFdwDirectModify)
+			{
+				ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i];
 
-			resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
-															 resultRelInfo,
-															 fdw_private,
-															 i,
-															 eflags);
+				/*
+				 * For the FDW's convenience, set the ForeignScanState node's
+				 * ResultRelInfo to let the FDW know which result relation it
+				 * is going to work with.
+				 */
+				Assert(IsA(fscan, ForeignScanState));
+				fscan->resultRelInfo = resultRelInfo;
+				resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
+			{
+				List   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
+
+				resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
+																 resultRelInfo,
+																 fdw_private,
+																 i,
+																 eflags);
+			}
 		}
 
 		/*
-- 
1.8.3.1

v2-0002-Initialize-result-relation-information-lazily.patchapplication/octet-stream; name=v2-0002-Initialize-result-relation-information-lazily.patchDownload
From 0a19e71adba1e97f5150229e576e3f93eb2db0de Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 2 Jul 2020 10:51:45 +0900
Subject: [PATCH v2 2/2] Initialize result relation information lazily

Currently, all elements of the ModifyTableState.resultRelInfo array
are initialized in ExecInitModifyTable(), possibly wastefully,
because only one or a handful of potentially many result relations
appearing in that array may actually have any rows to update or
delete.

This commit refactors all places that directly access the individual
elements of the array to instead go through a lazy-initialization-on-
access function, such that only the elements corresponding to result
relations that are actually operated on are initialized.

Also, extend this lazy initialization approach to some of the
individual fields of ResultRelInfo such that even for the result
relations that are initialized, those fields are only initialized on
first access.  While no performance improvement is to be expected
there, it can lead to a simpler initialization logic of the
ResultRelInfo itself, because the conditions for whether a given
field is needed or not tends to look confusing.  One side-effect
of this is that any "SubPlans" referenced in the expressions of
those fields are also lazily initialized and hence changes the
output of EXPLAIN (without ANALYZE) in some regression tests.

Another unrelated regression test output change is in update.out,
which is caused by deferred initialization of PartitionTupleRouting
for update tuple routing.  Whereas previously a partition constraint
violation error would be reported as occurring on a leaf partition,
due to the aforementioned change, it is now shown as occurring on
the query's target relation, which is valid because it is really
that table's (which is a sub-partitioned table) partition constraint
that is actually violated in the affected test cases.
---
 src/backend/commands/explain.c                |    6 +-
 src/backend/executor/execMain.c               |    7 +
 src/backend/executor/execPartition.c          |  116 ++-
 src/backend/executor/nodeModifyTable.c        | 1123 ++++++++++++++-----------
 src/include/executor/nodeModifyTable.h        |    1 +
 src/include/nodes/execnodes.h                 |    2 +
 src/test/regress/expected/insert_conflict.out |    5 +-
 src/test/regress/expected/updatable_views.out |   18 +-
 src/test/regress/expected/update.out          |   12 +-
 9 files changed, 737 insertions(+), 553 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01..edd79d7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,7 +18,9 @@
 #include "commands/createas.h"
 #include "commands/defrem.h"
 #include "commands/prepare.h"
+#include "executor/executor.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
 #include "nodes/extensible.h"
@@ -3678,14 +3680,14 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 	/* Should we explicitly label target relations? */
 	labeltargets = (mtstate->mt_nplans > 1 ||
 					(mtstate->mt_nplans == 1 &&
-					 mtstate->resultRelInfo->ri_RangeTableIndex != node->nominalRelation));
+					 ExecGetResultRelation(mtstate, 0)->ri_RangeTableIndex != node->nominalRelation));
 
 	if (labeltargets)
 		ExplainOpenGroup("Target Tables", "Target Tables", false, es);
 
 	for (j = 0; j < mtstate->mt_nplans; j++)
 	{
-		ResultRelInfo *resultRelInfo = mtstate->resultRelInfo + j;
+		ResultRelInfo *resultRelInfo = ExecGetResultRelation(mtstate, j);
 		FdwRoutine *fdwroutine = resultRelInfo->ri_FdwRoutine;
 
 		if (labeltargets)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7179f58..f484e6a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1236,6 +1236,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 	resultRelInfo->ri_ConstraintExprs = NULL;
 	resultRelInfo->ri_GeneratedExprs = NULL;
 	resultRelInfo->ri_junkFilter = NULL;
+	resultRelInfo->ri_junkFilterValid = false;;
 	resultRelInfo->ri_projectReturning = NULL;
 	resultRelInfo->ri_onConflictArbiterIndexes = NIL;
 	resultRelInfo->ri_onConflict = NULL;
@@ -1247,6 +1248,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
 													 * ExecInitRoutingInfo */
 	resultRelInfo->ri_PartitionTupleSlot = NULL;	/* ditto */
 	resultRelInfo->ri_ChildToRootMap = NULL;
+	resultRelInfo->ri_ChildToRootMapValid = false;
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 }
 
@@ -1440,6 +1442,11 @@ ExecCloseResultRelations(EState *estate)
 		ResultRelInfo *resultRelInfo = lfirst(l);
 
 		ExecCloseIndices(resultRelInfo);
+		if (!resultRelInfo->ri_usesFdwDirectModify &&
+			resultRelInfo->ri_FdwRoutine != NULL &&
+			resultRelInfo->ri_FdwRoutine->EndForeignModify != NULL)
+			resultRelInfo->ri_FdwRoutine->EndForeignModify(estate,
+														   resultRelInfo);
 	}
 
 	/* Close any relations that have been opened by ExecGetTriggerResultRel(). */
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 86594bd..8265db2 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -20,6 +20,7 @@
 #include "catalog/pg_type.h"
 #include "executor/execPartition.h"
 #include "executor/executor.h"
+#include "executor/nodeModifyTable.h"
 #include "foreign/fdwapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -157,10 +158,11 @@ typedef struct PartitionDispatchData
 typedef struct SubplanResultRelHashElem
 {
 	Oid			relid;			/* hash key -- must be first */
-	ResultRelInfo *rri;
+	int			index;
 } SubplanResultRelHashElem;
 
 
+static ResultRelInfo *ExecLookupUpdateResultRelByOid(ModifyTableState *mtstate, Oid reloid);
 static void ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 										   PartitionTupleRouting *proute);
 static ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
@@ -218,7 +220,6 @@ ExecSetupPartitionTupleRouting(EState *estate, ModifyTableState *mtstate,
 							   Relation rel)
 {
 	PartitionTupleRouting *proute;
-	ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL;
 
 	/*
 	 * Here we attempt to expend as little effort as possible in setting up
@@ -240,17 +241,6 @@ ExecSetupPartitionTupleRouting(EState *estate, ModifyTableState *mtstate,
 	ExecInitPartitionDispatchInfo(estate, proute, RelationGetRelid(rel),
 								  NULL, 0);
 
-	/*
-	 * If performing an UPDATE with tuple routing, we can reuse partition
-	 * sub-plan result rels.  We build a hash table to map the OIDs of
-	 * partitions present in mtstate->resultRelInfo to their ResultRelInfos.
-	 * Every time a tuple is routed to a partition that we've yet to set the
-	 * ResultRelInfo for, before we go to the trouble of making one, we check
-	 * for a pre-made one in the hash table.
-	 */
-	if (node && node->operation == CMD_UPDATE)
-		ExecHashSubPlanResultRelsByOid(mtstate, proute);
-
 	return proute;
 }
 
@@ -350,7 +340,6 @@ ExecFindPartition(ModifyTableState *mtstate,
 		is_leaf = partdesc->is_leaf[partidx];
 		if (is_leaf)
 		{
-
 			/*
 			 * We've reached the leaf -- hurray, we're done.  Look to see if
 			 * we've already got a ResultRelInfo for this partition.
@@ -367,20 +356,19 @@ ExecFindPartition(ModifyTableState *mtstate,
 
 				/*
 				 * We have not yet set up a ResultRelInfo for this partition,
-				 * but if we have a subplan hash table, we might have one
-				 * there.  If not, we'll have to create one.
+				 * but if the partition is also an UPDATE result relation, use
+				 * the one in mtstate->resultRelInfo instead of creating a new
+				 * one with ExecInitPartitionInfo().
 				 */
-				if (proute->subplan_resultrel_htab)
+				if (mtstate->operation == CMD_UPDATE && mtstate->ps.plan)
 				{
 					Oid			partoid = partdesc->oids[partidx];
-					SubplanResultRelHashElem *elem;
 
-					elem = hash_search(proute->subplan_resultrel_htab,
-									   &partoid, HASH_FIND, NULL);
-					if (elem)
+					rri = ExecLookupUpdateResultRelByOid(mtstate, partoid);
+
+					if (rri)
 					{
 						found = true;
-						rri = elem->rri;
 
 						/* Verify this ResultRelInfo allows INSERTs */
 						CheckValidResultRel(rri, CMD_INSERT);
@@ -508,6 +496,41 @@ ExecFindPartition(ModifyTableState *mtstate,
 }
 
 /*
+ * ExecLookupUpdateResultRelByOid
+ * 		If the table with given OID appears in the list of result relations
+ * 		to be updated by the given ModifyTable node, return its
+ * 		ResultRelInfo, NULL otherwise.
+ */
+static ResultRelInfo *
+ExecLookupUpdateResultRelByOid(ModifyTableState *mtstate, Oid reloid)
+{
+	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+	SubplanResultRelHashElem *elem;
+	ResultRelInfo *result = NULL;
+
+	Assert(proute != NULL);
+	if (proute->subplan_resultrel_htab == NULL)
+		ExecHashSubPlanResultRelsByOid(mtstate, proute);
+
+	elem = hash_search(proute->subplan_resultrel_htab, &reloid,
+					   HASH_FIND, NULL);
+
+	if (elem)
+	{
+		result = ExecGetResultRelation(mtstate, elem->index);
+
+		/*
+		 * This is required in order to convert the partition's tuple to be
+		 * compatible with the root partitioned table's tuple descriptor. When
+		 * generating the per-subplan result rels, this was not set.
+		 */
+		result->ri_PartitionRoot = proute->partition_root;
+	}
+
+	return result;
+}
+
+/*
  * ExecHashSubPlanResultRelsByOid
  *		Build a hash table to allow fast lookups of subplan ResultRelInfos by
  *		partition Oid.  We also populate the subplan ResultRelInfo with an
@@ -517,9 +540,13 @@ static void
 ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 							   PartitionTupleRouting *proute)
 {
+	EState	   *estate = mtstate->ps.state;
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	ListCell   *l;
 	HASHCTL		ctl;
 	HTAB	   *htab;
 	int			i;
+	MemoryContext oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
 
 	memset(&ctl, 0, sizeof(ctl));
 	ctl.keysize = sizeof(Oid);
@@ -530,26 +557,26 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 					   &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	proute->subplan_resultrel_htab = htab;
 
-	/* Hash all subplans by their Oid */
-	for (i = 0; i < mtstate->mt_nplans; i++)
+	/*
+	 * Map each result relation's OID to its ordinal position in
+	 * plan->resultRelations.
+	 */
+	i = 0;
+	foreach(l, plan->resultRelations)
 	{
-		ResultRelInfo *rri = &mtstate->resultRelInfo[i];
+		Index		rti = lfirst_int(l);
+		RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+		Oid			partoid = rte->relid;
 		bool		found;
-		Oid			partoid = RelationGetRelid(rri->ri_RelationDesc);
 		SubplanResultRelHashElem *elem;
 
 		elem = (SubplanResultRelHashElem *)
 			hash_search(htab, &partoid, HASH_ENTER, &found);
 		Assert(!found);
-		elem->rri = rri;
-
-		/*
-		 * This is required in order to convert the partition's tuple to be
-		 * compatible with the root partitioned table's tuple descriptor. When
-		 * generating the per-subplan result rels, this was not set.
-		 */
-		rri->ri_PartitionRoot = proute->partition_root;
+		elem->index = i++;
 	}
+
+	MemoryContextSwitchTo(oldcxt);
 }
 
 /*
@@ -570,7 +597,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
 	Relation	rootrel = rootResultRelInfo->ri_RelationDesc,
 				partrel;
-	Relation	firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc;
+	Relation	firstResultRel = NULL;
+	Index		firstVarno = 0;
 	ResultRelInfo *leaf_part_rri;
 	MemoryContext oldcxt;
 	AttrMap    *part_attmap = NULL;
@@ -606,19 +634,26 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 						(node != NULL &&
 						 node->onConflictAction != ONCONFLICT_NONE));
 
+	if (node)
+	{
+		ResultRelInfo *firstResultRelInfo = ExecGetResultRelation(mtstate, 0);
+
+		firstResultRel = firstResultRelInfo->ri_RelationDesc;
+		firstVarno = firstResultRelInfo->ri_RangeTableIndex;
+	}
+
 	/*
 	 * Build WITH CHECK OPTION constraints for the partition.  Note that we
 	 * didn't build the withCheckOptionList for partitions within the planner,
 	 * but simple translation of varattnos will suffice.  This only occurs for
 	 * the INSERT case or in the case of UPDATE tuple routing where we didn't
-	 * find a result rel to reuse in ExecSetupPartitionTupleRouting().
+	 * find a result rel to reuse.
 	 */
 	if (node && node->withCheckOptionLists != NIL)
 	{
 		List	   *wcoList;
 		List	   *wcoExprs = NIL;
 		ListCell   *ll;
-		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
 
 		/*
 		 * In the case of INSERT on a partitioned table, there is only one
@@ -682,7 +717,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 		TupleTableSlot *slot;
 		ExprContext *econtext;
 		List	   *returningList;
-		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
 
 		/* See the comment above for WCO lists. */
 		Assert((node->operation == CMD_INSERT &&
@@ -741,7 +775,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	 */
 	if (node && node->onConflictAction != ONCONFLICT_NONE)
 	{
-		int			firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
 		TupleDesc	partrelDesc = RelationGetDescr(partrel);
 		ExprContext *econtext = mtstate->ps.ps_ExprContext;
 		ListCell   *lc;
@@ -916,9 +949,14 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
 	 * from partition's rowtype to the root partition table's.
 	 */
 	if (mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture)
+	{
 		leaf_part_rri->ri_ChildToRootMap =
 			convert_tuples_by_name(RelationGetDescr(leaf_part_rri->ri_RelationDesc),
 								   RelationGetDescr(leaf_part_rri->ri_PartitionRoot));
+		/* First time creating the map for this result relation. */
+		Assert(!leaf_part_rri->ri_ChildToRootMapValid);
+		leaf_part_rri->ri_ChildToRootMapValid = true;
+	}
 
 	/*
 	 * Since we've just initialized this ResultRelInfo, it's not in any list
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 05e68ef..db24dff 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -144,10 +144,41 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
 }
 
 /*
+ * Initialize ri_returningList and ri_projectReturning for RETURNING
+ */
+static void
+InitReturningProjection(ModifyTableState *mtstate,
+						ResultRelInfo *resultRelInfo)
+{
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	int		whichrel = resultRelInfo - mtstate->resultRelInfo;
+	List	*rlist;
+	TupleTableSlot *slot;
+	ExprContext *econtext;
+
+	Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans);
+	rlist = (List *) list_nth(plan->returningLists, whichrel);
+	slot = mtstate->ps.ps_ResultTupleSlot;
+	Assert(slot != NULL);
+	econtext = mtstate->ps.ps_ExprContext;
+	Assert(econtext != NULL);
+
+	/* Must not do this a second time! */
+	Assert(resultRelInfo->ri_returningList == NIL &&
+		   resultRelInfo->ri_projectReturning == NULL);
+	resultRelInfo->ri_returningList = rlist;
+	resultRelInfo->ri_projectReturning =
+		ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
+								resultRelInfo->ri_RelationDesc->rd_att);
+}
+
+/*
  * ExecProcessReturning --- evaluate a RETURNING list
  *
  * resultRelInfo: current result rel
- * tupleSlot: slot holding tuple actually inserted/updated/deleted
+ * tupleSlot: slot holding tuple actually inserted/updated or NULL for delete
+ * tupleid, oldtuple: when called for delete, one of these can be used to
+ * fill the RETURNING slot for the relation
  * planSlot: slot holding tuple returned by top subplan node
  *
  * Note: If tupleSlot is NULL, the FDW should have already provided econtext's
@@ -156,12 +187,50 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
  * Returns a slot holding the result tuple
  */
 static TupleTableSlot *
-ExecProcessReturning(ResultRelInfo *resultRelInfo,
+ExecProcessReturning(ModifyTableState *mtstate,
+					 ResultRelInfo *resultRelInfo,
 					 TupleTableSlot *tupleSlot,
+					 ItemPointer tupleid, HeapTuple oldtuple,
 					 TupleTableSlot *planSlot)
 {
-	ProjectionInfo *projectReturning = resultRelInfo->ri_projectReturning;
-	ExprContext *econtext = projectReturning->pi_exprContext;
+	EState *estate = mtstate->ps.state;
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	ProjectionInfo *projectReturning;
+	ExprContext *econtext;
+	bool		clearTupleSlot = false;
+	TupleTableSlot *result;
+
+	if (plan->returningLists == NIL)
+		return NULL;
+
+	if (resultRelInfo->ri_returningList == NIL)
+		InitReturningProjection(mtstate, resultRelInfo);
+
+	projectReturning = resultRelInfo->ri_projectReturning;
+	econtext = projectReturning->pi_exprContext;
+
+	/*
+	 * Fill tupleSlot with provided tuple or after fetching the tuple with
+	 * provided tupleid.
+	 */
+	if (tupleSlot == NULL && resultRelInfo->ri_FdwRoutine == NULL)
+	{
+		/* FDW must have provided a slot containing the deleted row */
+		Assert(resultRelInfo->ri_FdwRoutine == NULL);
+		tupleSlot = ExecGetReturningSlot(estate, resultRelInfo);
+		if (oldtuple != NULL)
+		{
+			ExecForceStoreHeapTuple(oldtuple, tupleSlot, false);
+		}
+		else
+		{
+			if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+											   tupleid, SnapshotAny,
+											   tupleSlot))
+				elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
+		}
+		clearTupleSlot = true;
+	}
 
 	/* Make tuple and any needed join variables available to ExecProject */
 	if (tupleSlot)
@@ -176,7 +245,392 @@ ExecProcessReturning(ResultRelInfo *resultRelInfo,
 		RelationGetRelid(resultRelInfo->ri_RelationDesc);
 
 	/* Compute the RETURNING expressions */
-	return ExecProject(projectReturning);
+	result = ExecProject(projectReturning);
+
+	if (clearTupleSlot)
+		ExecClearTuple(tupleSlot);
+
+	return result;
+}
+
+/*
+ * Perform WITH CHECK OPTIONS check, if any.
+ */
+static void
+ExecProcessWithCheckOptions(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
+							TupleTableSlot *slot, WCOKind wco_kind)
+{
+	ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
+	EState *estate = mtstate->ps.state;
+
+	if (node->withCheckOptionLists == NIL)
+		return;
+
+	/* Initilize expression state if not already done. */
+	if (resultRelInfo->ri_WithCheckOptions == NIL)
+	{
+		int		whichrel = resultRelInfo - mtstate->resultRelInfo;
+		List   *wcoList;
+		List   *wcoExprs = NIL;
+		ListCell   *ll;
+
+		Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans);
+		wcoList = (List *) list_nth(node->withCheckOptionLists, whichrel);
+		foreach(ll, wcoList)
+		{
+			WithCheckOption *wco = (WithCheckOption *) lfirst(ll);
+			ExprState  *wcoExpr = ExecInitQual((List *) wco->qual,
+											   &mtstate->ps);
+
+			wcoExprs = lappend(wcoExprs, wcoExpr);
+		}
+
+		resultRelInfo->ri_WithCheckOptions = wcoList;
+		resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
+	}
+
+	/*
+	 * ExecWithCheckOptions() will skip any WCOs which are not of the kind
+	 * we are looking for at this point.
+	 */
+	ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);
+}
+
+/*
+ * Return the list of arbiter indexes to be used for ON CONFLICT processing
+ * on given result relation, fetching it from the plan if not already done.
+ */
+static List *
+GetOnConflictArbiterIndexes(ModifyTableState *mtstate,
+							ResultRelInfo *resultRelInfo)
+{
+	if (resultRelInfo->ri_onConflictArbiterIndexes == NIL)
+	{
+		ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+
+		resultRelInfo->ri_onConflictArbiterIndexes = plan->arbiterIndexes;
+	}
+
+	return resultRelInfo->ri_onConflictArbiterIndexes;
+}
+
+/*
+ * Initialize target list, projection and qual for ON CONFLICT DO UPDATE.
+ */
+static void
+InitOnConflictState(ModifyTableState *mtstate,
+					   ResultRelInfo *resultRelInfo)
+{
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	EState	   *estate = mtstate->ps.state;
+	TupleDesc	relationDesc;
+	TupleDesc	tupDesc;
+	ExprContext *econtext;
+
+	/* insert may only have one relation, inheritance is not expanded */
+	Assert(mtstate->mt_nplans == 1);
+
+	/* already exists if created by RETURNING processing above */
+	if (mtstate->ps.ps_ExprContext == NULL)
+		ExecAssignExprContext(estate, &mtstate->ps);
+
+	econtext = mtstate->ps.ps_ExprContext;
+	relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
+
+	/* create state for DO UPDATE SET operation */
+	resultRelInfo->ri_onConflict = makeNode(OnConflictSetState);
+
+	/* initialize slot for the existing tuple */
+	resultRelInfo->ri_onConflict->oc_Existing =
+		table_slot_create(resultRelInfo->ri_RelationDesc,
+						  &mtstate->ps.state->es_tupleTable);
+
+	/*
+	 * Create the tuple slot for the UPDATE SET projection. We want a slot
+	 * of the table's type here, because the slot will be used to insert
+	 * into the table, and for RETURNING processing - which may access
+	 * system attributes.
+	 */
+	tupDesc = ExecTypeFromTL((List *) plan->onConflictSet);
+	resultRelInfo->ri_onConflict->oc_ProjSlot =
+		ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
+							   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
+
+	/* build UPDATE SET projection state */
+	resultRelInfo->ri_onConflict->oc_ProjInfo =
+		ExecBuildProjectionInfo(plan->onConflictSet, econtext,
+								resultRelInfo->ri_onConflict->oc_ProjSlot,
+								&mtstate->ps,
+								relationDesc);
+
+	/* initialize state to evaluate the WHERE clause, if any */
+	if (plan->onConflictWhere)
+	{
+		ExprState  *qualexpr;
+
+		qualexpr = ExecInitQual((List *) plan->onConflictWhere,
+								&mtstate->ps);
+		resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
+	}
+}
+
+/*
+ * Initialize ri_junkFilter if needed.
+ *
+ * INSERT queries need a filter if there are any junk attrs in the tlist.
+ * UPDATE and DELETE always need a filter, since there's always at least one
+ * junk attribute present --- no need to look first.  Typically, this will be
+ * a 'ctid' or 'wholerow' attribute, but in the case of a foreign data wrapper
+ * it might be a set of junk attributes sufficient to identify the remote row.
+ *
+ * If there are multiple result relations, each one needs its own junk filter.
+ * Note multiple rels are only possible for UPDATE/DELETE, so we can't be
+ * fooled by some needing a filter and some not.
+ *
+ * This is also a convenient place to verify that the output of an INSERT or
+ * UPDATE matches the target table(s).
+ */
+static void
+InitJunkFilter(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo)
+{
+	EState	   *estate = mtstate->ps.state;
+	CmdType		operation = mtstate->operation;
+	Plan	   *subplan = mtstate->mt_plans[mtstate->mt_whichplan]->plan;
+	ListCell   *l;
+	bool		junk_filter_needed = false;
+
+	switch (operation)
+	{
+		case CMD_INSERT:
+			foreach(l, subplan->targetlist)
+			{
+				TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+				if (tle->resjunk)
+				{
+					junk_filter_needed = true;
+					break;
+				}
+			}
+			break;
+		case CMD_UPDATE:
+		case CMD_DELETE:
+			junk_filter_needed = true;
+			break;
+		default:
+			elog(ERROR, "unknown operation");
+			break;
+	}
+
+	if (junk_filter_needed)
+	{
+		JunkFilter *j;
+		TupleTableSlot *junkresslot;
+
+		junkresslot =
+			ExecInitExtraTupleSlot(estate, NULL,
+								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
+
+		/*
+		 * For an INSERT or UPDATE, the result tuple must always match
+		 * the target table's descriptor.  For a DELETE, it won't
+		 * (indeed, there's probably no non-junk output columns).
+		 */
+		if (operation == CMD_INSERT || operation == CMD_UPDATE)
+		{
+			ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
+								subplan->targetlist);
+			j = ExecInitJunkFilterInsertion(subplan->targetlist,
+											RelationGetDescr(resultRelInfo->ri_RelationDesc),
+											junkresslot);
+		}
+		else
+			j = ExecInitJunkFilter(subplan->targetlist,
+								   junkresslot);
+
+		if (operation == CMD_UPDATE || operation == CMD_DELETE)
+		{
+			/* For UPDATE/DELETE, find the appropriate junk attr now */
+			char		relkind;
+
+			relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
+			if (relkind == RELKIND_RELATION ||
+				relkind == RELKIND_MATVIEW ||
+				relkind == RELKIND_PARTITIONED_TABLE)
+			{
+				j->jf_junkAttNo = ExecFindJunkAttribute(j, "ctid");
+				if (!AttributeNumberIsValid(j->jf_junkAttNo))
+					elog(ERROR, "could not find junk ctid column");
+			}
+			else if (relkind == RELKIND_FOREIGN_TABLE)
+			{
+				/*
+				 * When there is a row-level trigger, there should be
+				 * a wholerow attribute.
+				 */
+				j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
+			}
+			else
+			{
+				j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
+				if (!AttributeNumberIsValid(j->jf_junkAttNo))
+					elog(ERROR, "could not find junk wholerow column");
+			}
+		}
+
+		/* Must not do this a second time! */
+		Assert(resultRelInfo->ri_junkFilter == NULL);
+		resultRelInfo->ri_junkFilter = j;
+		resultRelInfo->ri_junkFilterValid = true;
+	}
+
+	if (operation == CMD_INSERT || operation == CMD_UPDATE)
+		ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
+							subplan->targetlist);
+}
+
+/*
+ * Returns the map needed to convert given child relation's tuples to the
+ * root relation's format, possibly initializing if not already done.
+ */
+static TupleConversionMap *
+GetChildToRootMap(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo)
+{
+	if (!resultRelInfo->ri_ChildToRootMapValid)
+	{
+		Relation	relation = resultRelInfo->ri_RelationDesc;
+		Relation	targetRel = mtstate->rootResultRelInfo->ri_RelationDesc;
+
+		resultRelInfo->ri_ChildToRootMap =
+			convert_tuples_by_name(RelationGetDescr(relation),
+								   RelationGetDescr(targetRel));
+		resultRelInfo->ri_ChildToRootMapValid = true;
+	}
+
+	return resultRelInfo->ri_ChildToRootMap;
+}
+
+/*
+ * ExecGetResultRelation
+ *		Returns mtstate->resultRelInfo[whichrel], possibly initializing it
+ *		if being requested for the first time
+ */
+ResultRelInfo *
+ExecGetResultRelation(ModifyTableState *mtstate, int whichrel)
+{
+	EState	   *estate = mtstate->ps.state;
+	ModifyTable *plan = (ModifyTable *) mtstate->ps.plan;
+	Index		rti;
+	ResultRelInfo *resultRelInfo = NULL;
+
+	/*
+	 * Initialized result relations are added to es_result_relations, so check
+	 * there first.  Remember that es_result_relations is indexed by RT index,
+	 * so fetch the relation's RT index from the plan.
+	 */
+	Assert(plan != NULL);
+	Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans);
+	rti = list_nth_int(plan->resultRelations, whichrel);
+	if (estate->es_result_relations)
+		resultRelInfo = estate->es_result_relations[rti - 1];
+
+	/* Nope, so initialize. */
+	if (resultRelInfo == NULL)
+	{
+		int		eflags = estate->es_top_eflags;
+		CmdType	operation = mtstate->operation;
+		MemoryContext oldcxt;
+
+		Assert(whichrel >= 0);
+		resultRelInfo = &mtstate->resultRelInfo[whichrel];
+
+		/* Things built here have to last for the query duration. */
+		oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+
+		/*
+		 * Perform InitResultRelInfo() and save the pointer in
+		 * es_result_relations.
+		 */
+		ExecInitResultRelation(estate, resultRelInfo, rti);
+
+		/*
+		 * A few more initializations that are not handled by
+		 * InitResultRelInfo() follow.
+		 */
+
+		/*
+		 * Verify result relation is a valid target for the current operation.
+		 */
+		CheckValidResultRel(resultRelInfo, operation);
+
+		/* Initialize the usesFdwDirectModify flag */
+		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(whichrel,
+															  plan->fdwDirectModifyPlans);
+
+		/* Also let FDWs init themselves for foreign-table result rels */
+		if (resultRelInfo->ri_FdwRoutine != NULL)
+		{
+			if (resultRelInfo->ri_usesFdwDirectModify)
+			{
+				ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[whichrel];
+
+				/*
+				 * For the FDW's convenience, set the ForeignScanState node's
+				 * ResultRelInfo to let the FDW know which result relation it
+				 * is going to work with.
+				 */
+				Assert(IsA(fscan, ForeignScanState));
+				fscan->resultRelInfo = resultRelInfo;
+				resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags);
+			}
+			else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
+			{
+				List   *fdw_private = (List *) list_nth(plan->fdwPrivLists,
+														whichrel);
+
+				resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
+																 resultRelInfo,
+																 fdw_private,
+																 whichrel,
+																 eflags);
+			}
+		}
+
+		/*
+		 * If transition tuples will be captured, initialize a map to convert
+		 * child tuples into the format of the table mentioned in the query
+		 * (root relation), because the transition tuple store can only store
+		 * tuples in the root table format.  However for INSERT, the map is
+		 * only initialized for a given partition when the partition itself is
+		 * first initialized by ExecFindPartition.  Also, this map is also
+		 * needed if an UPDATE ends up having to move tuples across
+		 * partitions, because in that case the child tuple to be moved first
+		 * needs to be converted into the root table's format.  In that case,
+		 * we use GetChildToRootMap() to either create one from scratch if
+		 * we didn't already create it here.
+		 *
+		 * Note: We cannot always initialize this map lazily, that is, use
+		 * GetChildToRootMap(), because AfterTriggerSaveEvent(), which needs
+		 * the map, doesn't have access to the "target" relation that is
+		 * needed to create the map.
+		 */
+		if (mtstate->mt_transition_capture && operation != CMD_INSERT)
+		{
+			Relation	relation = resultRelInfo->ri_RelationDesc;
+			Relation	targetRel = mtstate->rootResultRelInfo->ri_RelationDesc;
+
+			resultRelInfo->ri_ChildToRootMap =
+				convert_tuples_by_name(RelationGetDescr(relation),
+									   RelationGetDescr(targetRel));
+			/* First time creating the map for this result relation. */
+			Assert(!resultRelInfo->ri_ChildToRootMapValid);
+			resultRelInfo->ri_ChildToRootMapValid = true;
+		}
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	return resultRelInfo;
 }
 
 /*
@@ -398,12 +852,27 @@ ExecInsert(ModifyTableState *mtstate,
 	{
 		ResultRelInfo *partRelInfo;
 
+		/*
+		 * ExecInitPartitionInfo() expects that the root parent's ri_onConflict
+		 * is initialized. XXX maybe it shouldn't?
+		 */
+		if (onconflict != ONCONFLICT_NONE &&
+			resultRelInfo->ri_onConflict == NULL)
+		{
+			(void) GetOnConflictArbiterIndexes(mtstate, resultRelInfo);
+			if (onconflict == ONCONFLICT_UPDATE)
+				InitOnConflictState(mtstate, resultRelInfo);
+		}
+
 		slot = ExecPrepareTupleRouting(mtstate, estate, proute,
 									   resultRelInfo, slot,
 									   &partRelInfo);
 		resultRelInfo = partRelInfo;
 	}
 
+	if (resultRelInfo->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(resultRelInfo, onconflict != ONCONFLICT_NONE);
+
 	ExecMaterializeSlot(slot);
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
@@ -489,12 +958,7 @@ ExecInsert(ModifyTableState *mtstate,
 		wco_kind = (mtstate->operation == CMD_UPDATE) ?
 			WCO_RLS_UPDATE_CHECK : WCO_RLS_INSERT_CHECK;
 
-		/*
-		 * ExecWithCheckOptions() will skip any WCOs which are not of the kind
-		 * we are looking for at this point.
-		 */
-		if (resultRelInfo->ri_WithCheckOptions != NIL)
-			ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);
+		ExecProcessWithCheckOptions(mtstate, resultRelInfo, slot, wco_kind);
 
 		/*
 		 * Check the constraints of the tuple.
@@ -521,7 +985,8 @@ ExecInsert(ModifyTableState *mtstate,
 			bool		specConflict;
 			List	   *arbiterIndexes;
 
-			arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes;
+			arbiterIndexes = GetOnConflictArbiterIndexes(mtstate,
+														 resultRelInfo);
 
 			/*
 			 * Do a non-conclusive check for conflicts first.
@@ -691,12 +1156,11 @@ ExecInsert(ModifyTableState *mtstate,
 	 * ExecWithCheckOptions() will skip any WCOs which are not of the kind we
 	 * are looking for at this point.
 	 */
-	if (resultRelInfo->ri_WithCheckOptions != NIL)
-		ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	ExecProcessWithCheckOptions(mtstate, resultRelInfo, slot, WCO_VIEW_CHECK);
 
 	/* Process RETURNING if present */
-	if (resultRelInfo->ri_projectReturning)
-		result = ExecProcessReturning(resultRelInfo, slot, planSlot);
+	result = ExecProcessReturning(mtstate, resultRelInfo, slot, NULL, NULL,
+								  planSlot);
 
 	return result;
 }
@@ -1011,45 +1475,23 @@ ldelete:;
 						 ar_delete_trig_tcs);
 
 	/* Process RETURNING if present and if requested */
-	if (processReturning && resultRelInfo->ri_projectReturning)
+	if (processReturning)
 	{
-		/*
-		 * We have to put the target tuple into a slot, which means first we
-		 * gotta fetch it.  We can use the trigger tuple slot.
-		 */
-		TupleTableSlot *rslot;
-
-		if (resultRelInfo->ri_FdwRoutine)
-		{
-			/* FDW must have provided a slot containing the deleted row */
-			Assert(!TupIsNull(slot));
-		}
-		else
-		{
-			slot = ExecGetReturningSlot(estate, resultRelInfo);
-			if (oldtuple != NULL)
-			{
-				ExecForceStoreHeapTuple(oldtuple, slot, false);
-			}
-			else
-			{
-				if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
-												   SnapshotAny, slot))
-					elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
-			}
-		}
-
-		rslot = ExecProcessReturning(resultRelInfo, slot, planSlot);
+		TupleTableSlot *rslot = ExecProcessReturning(mtstate, resultRelInfo,
+													 slot, tupleid, oldtuple,
+													 planSlot);
 
 		/*
 		 * Before releasing the target tuple again, make sure rslot has a
 		 * local copy of any pass-by-reference values.
 		 */
-		ExecMaterializeSlot(rslot);
-
-		ExecClearTuple(slot);
-
-		return rslot;
+		if (rslot)
+		{
+			ExecMaterializeSlot(rslot);
+			if (slot)
+				ExecClearTuple(slot);
+			return rslot;
+		}
 	}
 
 	return NULL;
@@ -1082,7 +1524,6 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 						 TupleTableSlot **inserted_tuple)
 {
 	EState	   *estate = mtstate->ps.state;
-	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 	TupleConversionMap *tupconv_map;
 	bool		tuple_deleted;
 	TupleTableSlot *epqslot = NULL;
@@ -1101,13 +1542,27 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 				 errmsg("invalid ON UPDATE specification"),
 				 errdetail("The result tuple would appear in a different partition than the original tuple.")));
 
-	/*
-	 * When an UPDATE is run on a leaf partition, we will not have partition
-	 * tuple routing set up.  In that case, fail with partition constraint
-	 * violation error.
-	 */
-	if (proute == NULL)
-		ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+	/* Initialize tuple routing info if not already done. */
+	if (mtstate->mt_partition_tuple_routing == NULL)
+	{
+		Relation	targetRel = mtstate->rootResultRelInfo->ri_RelationDesc;
+		MemoryContext	oldcxt;
+
+		/* Things built here have to last for the query duration. */
+		oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+
+		mtstate->mt_partition_tuple_routing =
+			ExecSetupPartitionTupleRouting(estate, mtstate, targetRel);
+
+		/*
+		 * Before a partition's tuple can be re-routed, it must first
+		 * be converted to the root's format and we need a slot for
+		 * storing such tuple.
+		 */
+		Assert(mtstate->mt_root_tuple_slot == NULL);
+		mtstate->mt_root_tuple_slot = table_slot_create(targetRel, NULL);
+		MemoryContextSwitchTo(oldcxt);
+	}
 
 	/*
 	 * Row movement, part 1.  Delete the tuple, but skip RETURNING processing.
@@ -1161,7 +1616,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
 	 * convert the tuple into root's tuple descriptor if needed, since
 	 * ExecInsert() starts the search from root.
 	 */
-	tupconv_map = resultRelInfo->ri_ChildToRootMap;
+	tupconv_map = GetChildToRootMap(mtstate, resultRelInfo);
 	if (tupconv_map != NULL)
 		slot = execute_attr_map_slot(tupconv_map->attrMap,
 									 slot,
@@ -1226,6 +1681,9 @@ ExecUpdate(ModifyTableState *mtstate,
 	if (IsBootstrapProcessingMode())
 		elog(ERROR, "cannot UPDATE during bootstrap");
 
+	if (resultRelInfo->ri_IndexRelationDescs == NULL)
+		ExecOpenIndices(resultRelInfo, false);
+
 	ExecMaterializeSlot(slot);
 
 	/* BEFORE ROW UPDATE Triggers */
@@ -1318,16 +1776,9 @@ lreplace:;
 			resultRelationDesc->rd_rel->relispartition &&
 			!ExecPartitionCheck(resultRelInfo, slot, estate, false);
 
-		if (!partition_constraint_failed &&
-			resultRelInfo->ri_WithCheckOptions != NIL)
-		{
-			/*
-			 * ExecWithCheckOptions() will skip any WCOs which are not of the
-			 * kind we are looking for at this point.
-			 */
-			ExecWithCheckOptions(WCO_RLS_UPDATE_CHECK,
-								 resultRelInfo, slot, estate);
-		}
+		if (!partition_constraint_failed)
+			ExecProcessWithCheckOptions(mtstate, resultRelInfo, slot,
+										WCO_RLS_UPDATE_CHECK);
 
 		/*
 		 * If a partition check failed, try to move the row into the right
@@ -1340,6 +1791,13 @@ lreplace:;
 			bool		retry;
 
 			/*
+			 * When an UPDATE is run directly on a leaf partition, simply fail
+			 * with partition constraint violation error.
+			 */
+			if (resultRelInfo == mtstate->rootResultRelInfo)
+				ExecPartitionCheckEmitError(resultRelInfo, slot, estate);
+
+			/*
 			 * ExecCrossPartitionUpdate will first DELETE the row from the
 			 * partition it's currently in and then insert it back into the
 			 * root table, which will re-route it to the correct partition.
@@ -1535,18 +1993,12 @@ lreplace:;
 	 * required to do this after testing all constraints and uniqueness
 	 * violations per the SQL spec, so we do it after actually updating the
 	 * record in the heap and all indexes.
-	 *
-	 * ExecWithCheckOptions() will skip any WCOs which are not of the kind we
-	 * are looking for at this point.
 	 */
-	if (resultRelInfo->ri_WithCheckOptions != NIL)
-		ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
+	ExecProcessWithCheckOptions(mtstate, resultRelInfo, slot, WCO_VIEW_CHECK);
 
 	/* Process RETURNING if present */
-	if (resultRelInfo->ri_projectReturning)
-		return ExecProcessReturning(resultRelInfo, slot, planSlot);
-
-	return NULL;
+	return ExecProcessReturning(mtstate, resultRelInfo, slot, NULL, NULL,
+								planSlot);
 }
 
 /*
@@ -1570,10 +2022,10 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 					 bool canSetTag,
 					 TupleTableSlot **returning)
 {
-	ExprContext *econtext = mtstate->ps.ps_ExprContext;
+	ExprContext *econtext;
 	Relation	relation = resultRelInfo->ri_RelationDesc;
-	ExprState  *onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
-	TupleTableSlot *existing = resultRelInfo->ri_onConflict->oc_Existing;
+	ExprState  *onConflictSetWhere;
+	TupleTableSlot *existing;
 	TM_FailureData tmfd;
 	LockTupleMode lockmode;
 	TM_Result	test;
@@ -1581,6 +2033,13 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	TransactionId xmin;
 	bool		isnull;
 
+	if (resultRelInfo->ri_onConflict == NULL)
+		InitOnConflictState(mtstate, resultRelInfo);
+
+	econtext = mtstate->ps.ps_ExprContext;
+	onConflictSetWhere = resultRelInfo->ri_onConflict->oc_WhereClause;
+	existing = resultRelInfo->ri_onConflict->oc_Existing;
+
 	/* Determine lock mode to use */
 	lockmode = ExecUpdateLockMode(estate, resultRelInfo);
 
@@ -1719,27 +2178,23 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 		return true;			/* done with the tuple */
 	}
 
-	if (resultRelInfo->ri_WithCheckOptions != NIL)
-	{
-		/*
-		 * Check target's existing tuple against UPDATE-applicable USING
-		 * security barrier quals (if any), enforced here as RLS checks/WCOs.
-		 *
-		 * The rewriter creates UPDATE RLS checks/WCOs for UPDATE security
-		 * quals, and stores them as WCOs of "kind" WCO_RLS_CONFLICT_CHECK,
-		 * but that's almost the extent of its special handling for ON
-		 * CONFLICT DO UPDATE.
-		 *
-		 * The rewriter will also have associated UPDATE applicable straight
-		 * RLS checks/WCOs for the benefit of the ExecUpdate() call that
-		 * follows.  INSERTs and UPDATEs naturally have mutually exclusive WCO
-		 * kinds, so there is no danger of spurious over-enforcement in the
-		 * INSERT or UPDATE path.
-		 */
-		ExecWithCheckOptions(WCO_RLS_CONFLICT_CHECK, resultRelInfo,
-							 existing,
-							 mtstate->ps.state);
-	}
+	/*
+	 * Check target's existing tuple against UPDATE-applicable USING
+	 * security barrier quals (if any), enforced here as RLS checks/WCOs.
+	 *
+	 * The rewriter creates UPDATE RLS checks/WCOs for UPDATE security
+	 * quals, and stores them as WCOs of "kind" WCO_RLS_CONFLICT_CHECK,
+	 * but that's almost the extent of its special handling for ON
+	 * CONFLICT DO UPDATE.
+	 *
+	 * The rewriter will also have associated UPDATE applicable straight
+	 * RLS checks/WCOs for the benefit of the ExecUpdate() call that
+	 * follows.  INSERTs and UPDATEs naturally have mutually exclusive WCO
+	 * kinds, so there is no danger of spurious over-enforcement in the
+	 * INSERT or UPDATE path.
+	 */
+	ExecProcessWithCheckOptions(mtstate, resultRelInfo, existing,
+								WCO_RLS_CONFLICT_CHECK);
 
 	/* Project the new tuple version */
 	ExecProject(resultRelInfo->ri_onConflict->oc_ProjInfo);
@@ -1929,11 +2384,12 @@ static TupleTableSlot *
 ExecModifyTable(PlanState *pstate)
 {
 	ModifyTableState *node = castNode(ModifyTableState, pstate);
+	ModifyTable *plan = (ModifyTable *) node->ps.plan;
 	EState	   *estate = node->ps.state;
 	CmdType		operation = node->operation;
-	ResultRelInfo *resultRelInfo;
+	ResultRelInfo *resultRelInfo = NULL;
 	PlanState  *subplanstate;
-	JunkFilter *junkfilter;
+	JunkFilter *junkfilter = NULL;
 	TupleTableSlot *slot;
 	TupleTableSlot *planSlot;
 	ItemPointer tupleid;
@@ -1974,9 +2430,7 @@ ExecModifyTable(PlanState *pstate)
 	}
 
 	/* Preload local variables */
-	resultRelInfo = node->resultRelInfo + node->mt_whichplan;
 	subplanstate = node->mt_plans[node->mt_whichplan];
-	junkfilter = resultRelInfo->ri_junkFilter;
 
 	/*
 	 * Fetch rows from subplan(s), and execute the required table modification
@@ -2000,17 +2454,37 @@ ExecModifyTable(PlanState *pstate)
 		if (pstate->ps_ExprContext)
 			ResetExprContext(pstate->ps_ExprContext);
 
+		/*
+		 * FDWs that can push down a modify operation would need to see the
+		 * ResultRelInfo, so fetch one if not already done before executing
+		 * the subplan, potentially opening it for the first time.
+		 */
+		if (bms_is_member(node->mt_whichplan, plan->fdwDirectModifyPlans) &&
+			resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecGetResultRelation(node, node->mt_whichplan);
+
+			/*
+			 * Must make sure to initialize the RETURNING projection as well,
+			 * because some FDWs rely on accessing ri_projectReturning to
+			 * set its "scan" tuple to use below for computing the actual
+			 * RETURNING targetlist.
+			 */
+			if (plan->returningLists && resultRelInfo->ri_returningList == NIL)
+				InitReturningProjection(node, resultRelInfo);
+		}
+
 		planSlot = ExecProcNode(subplanstate);
 
 		if (TupIsNull(planSlot))
 		{
-			/* advance to next subplan if any */
+			/* Signal to initialize the next plan's relation. */
+			resultRelInfo = NULL;
+
 			node->mt_whichplan++;
 			if (node->mt_whichplan < node->mt_nplans)
 			{
-				resultRelInfo++;
 				subplanstate = node->mt_plans[node->mt_whichplan];
-				junkfilter = resultRelInfo->ri_junkFilter;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
 				continue;
@@ -2020,8 +2494,25 @@ ExecModifyTable(PlanState *pstate)
 		}
 
 		/*
+		 * Fetch the result relation for the current plan if not already done,
+		 * potentially opening it for the first time.
+		 */
+		if (resultRelInfo == NULL)
+		{
+			resultRelInfo = ExecGetResultRelation(node, node->mt_whichplan);
+			if (!resultRelInfo->ri_junkFilterValid)
+				InitJunkFilter(node, resultRelInfo);
+			junkfilter = resultRelInfo->ri_junkFilter;
+		}
+
+		/*
 		 * Ensure input tuple is the right format for the target relation.
 		 */
+		if (node->mt_scans[node->mt_whichplan] == NULL)
+			node->mt_scans[node->mt_whichplan] =
+				ExecInitExtraTupleSlot(node->ps.state,
+									   ExecGetResultType(subplanstate),
+									   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
 		if (node->mt_scans[node->mt_whichplan]->tts_ops != planSlot->tts_ops)
 		{
 			ExecCopySlot(node->mt_scans[node->mt_whichplan], planSlot);
@@ -2042,7 +2533,8 @@ ExecModifyTable(PlanState *pstate)
 			 * ExecProcessReturning by IterateDirectModify, so no need to
 			 * provide it here.
 			 */
-			slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
+			slot = ExecProcessReturning(node, resultRelInfo, NULL, NULL, NULL,
+										planSlot);
 
 			return slot;
 		}
@@ -2175,13 +2667,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	ModifyTableState *mtstate;
 	CmdType		operation = node->operation;
 	int			nplans = list_length(node->plans);
-	ResultRelInfo *resultRelInfo;
 	Plan	   *subplan;
-	ListCell   *l,
-			   *l1;
+	ListCell   *l;
 	int			i;
 	Relation	rel;
-	bool		update_tuple_routing_needed = node->partColsUpdated;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -2198,7 +2687,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->canSetTag = node->canSetTag;
 	mtstate->mt_done = false;
 
+	/*
+	 * call ExecInitNode on each of the plans to be executed and save the
+	 * results into the array "mt_plans".
+	 */
+	mtstate->mt_nplans = nplans;
 	mtstate->mt_plans = (PlanState **) palloc0(sizeof(PlanState *) * nplans);
+	i = 0;
+	foreach(l, node->plans)
+	{
+		subplan = (Plan *) lfirst(l);
+
+		mtstate->mt_plans[i++] = ExecInitNode(subplan, estate, eflags);
+	}
+
 	mtstate->resultRelInfo = (ResultRelInfo *)
 		palloc(nplans * sizeof(ResultRelInfo));
 	mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
@@ -2225,13 +2727,17 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	}
 	else
 	{
-		mtstate->rootResultRelInfo = mtstate->resultRelInfo;
-		ExecInitResultRelation(estate, mtstate->resultRelInfo,
-							   linitial_int(node->resultRelations));
+		/*
+		 * Unlike a partitioned target relation, the target relation in this
+		 * case will be actually used by ExecModifyTable(), so use
+		 * ExecGetResultRelation() to get the ResultRelInfo, because it
+		 * initializes some fields that a bare InitResultRelInfo() doesn't.
+		 */
+		mtstate->rootResultRelInfo = ExecGetResultRelation(mtstate, 0);
+		Assert(mtstate->rootResultRelInfo == mtstate->resultRelInfo);
 	}
 
 	mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
-	mtstate->mt_nplans = nplans;
 
 	/* set up epqstate with dummy subplan data for the moment */
 	EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
@@ -2244,177 +2750,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
 		ExecSetupTransitionCaptureState(mtstate, estate);
 
-	/*
-	 * call ExecInitNode on each of the plans to be executed and save the
-	 * results into the array "mt_plans".  This is also a convenient place to
-	 * verify that the proposed target relations are valid and open their
-	 * indexes for insertion of new index entries.
-	 */
-	resultRelInfo = mtstate->resultRelInfo;
-	i = 0;
-	forboth(l, node->resultRelations, l1, node->plans)
-	{
-		Index		resultRelation = lfirst_int(l);
-
-		subplan = (Plan *) lfirst(l1);
-
-		/*
-		 * This opens result relation and fills ResultRelInfo. (root relation
-		 * was initialized already.)
-		 */
-		if (resultRelInfo != mtstate->rootResultRelInfo)
-			ExecInitResultRelation(estate, resultRelInfo, resultRelation);
-
-		/* Initialize the usesFdwDirectModify flag */
-		resultRelInfo->ri_usesFdwDirectModify = bms_is_member(i,
-															  node->fdwDirectModifyPlans);
-
-		/*
-		 * Verify result relation is a valid target for the current operation
-		 */
-		CheckValidResultRel(resultRelInfo, operation);
-
-		/*
-		 * If there are indices on the result relation, open them and save
-		 * descriptors in the result relation info, so that we can add new
-		 * index entries for the tuples we add/update.  We need not do this
-		 * for a DELETE, however, since deletion doesn't affect indexes. Also,
-		 * inside an EvalPlanQual operation, the indexes might be open
-		 * already, since we share the resultrel state with the original
-		 * query.
-		 */
-		if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
-			operation != CMD_DELETE &&
-			resultRelInfo->ri_IndexRelationDescs == NULL)
-			ExecOpenIndices(resultRelInfo,
-							node->onConflictAction != ONCONFLICT_NONE);
-
-		/*
-		 * If this is an UPDATE and a BEFORE UPDATE trigger is present, the
-		 * trigger itself might modify the partition-key values. So arrange
-		 * for tuple routing.
-		 */
-		if (resultRelInfo->ri_TrigDesc &&
-			resultRelInfo->ri_TrigDesc->trig_update_before_row &&
-			operation == CMD_UPDATE)
-			update_tuple_routing_needed = true;
-
-		/* Now init the plan for this result rel */
-		mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
-		mtstate->mt_scans[i] =
-			ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
-								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
-		/* Also let FDWs init themselves for foreign-table result rels */
-		if (resultRelInfo->ri_FdwRoutine != NULL)
-		{
-			if (resultRelInfo->ri_usesFdwDirectModify)
-			{
-				ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i];
-
-				/*
-				 * For the FDW's convenience, set the ForeignScanState node's
-				 * ResultRelInfo to let the FDW know which result relation it
-				 * is going to work with.
-				 */
-				Assert(IsA(fscan, ForeignScanState));
-				fscan->resultRelInfo = resultRelInfo;
-				resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags);
-			}
-			else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
-			{
-				List   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
-
-				resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
-																 resultRelInfo,
-																 fdw_private,
-																 i,
-																 eflags);
-			}
-		}
-
-		/*
-		 * If needed, initialize a map to convert tuples in the child format
-		 * to the format of the table mentioned in the query (root relation).
-		 * It's needed for update tuple routing, because the routing starts
-		 * from the root relation.  It's also needed for capturing transition
-		 * tuples, because the transition tuple store can only store tuples in
-		 * the root table format.
-		 *
-		 * For INSERT, the map is only initialized for a given partition when
-		 * the partition itself is first initialized by ExecFindPartition().
-		 */
-		if (update_tuple_routing_needed ||
-			(mtstate->mt_transition_capture &&
-			 mtstate->operation != CMD_INSERT))
-			resultRelInfo->ri_ChildToRootMap =
-				convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
-									   RelationGetDescr(mtstate->rootResultRelInfo->ri_RelationDesc));
-		resultRelInfo++;
-		i++;
-	}
-
-	/* Get the target relation */
-	rel = mtstate->rootResultRelInfo->ri_RelationDesc;
-
-	/*
-	 * If it's not a partitioned table after all, UPDATE tuple routing should
-	 * not be attempted.
-	 */
-	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		update_tuple_routing_needed = false;
-
-	/*
-	 * Build state for tuple routing if it's an INSERT or if it's an UPDATE of
-	 * partition key.
-	 */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
-		(operation == CMD_INSERT || update_tuple_routing_needed))
-		mtstate->mt_partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
-
-	/*
-	 * For update row movement we'll need a dedicated slot to store the tuples
-	 * that have been converted from partition format to the root table
-	 * format.
-	 */
-	if (update_tuple_routing_needed)
-		mtstate->mt_root_tuple_slot = table_slot_create(rel, NULL);
-
-	/*
-	 * Initialize any WITH CHECK OPTION constraints if needed.
-	 */
-	resultRelInfo = mtstate->resultRelInfo;
-	i = 0;
-	foreach(l, node->withCheckOptionLists)
-	{
-		List	   *wcoList = (List *) lfirst(l);
-		List	   *wcoExprs = NIL;
-		ListCell   *ll;
-
-		foreach(ll, wcoList)
-		{
-			WithCheckOption *wco = (WithCheckOption *) lfirst(ll);
-			ExprState  *wcoExpr = ExecInitQual((List *) wco->qual,
-											   &mtstate->ps);
-
-			wcoExprs = lappend(wcoExprs, wcoExpr);
-		}
-
-		resultRelInfo->ri_WithCheckOptions = wcoList;
-		resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
-		resultRelInfo++;
-		i++;
-	}
-
-	/*
-	 * Initialize RETURNING projections if needed.
-	 */
+	/* Initialize some global state for RETURNING projections. */
 	if (node->returningLists)
 	{
-		TupleTableSlot *slot;
-		ExprContext *econtext;
-
 		/*
 		 * Initialize result tuple slot and assign its rowtype using the first
 		 * RETURNING list.  We assume the rest will look the same.
@@ -2423,27 +2761,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 
 		/* Set up a slot for the output of the RETURNING projection(s) */
 		ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
-		slot = mtstate->ps.ps_ResultTupleSlot;
 
 		/* Need an econtext too */
 		if (mtstate->ps.ps_ExprContext == NULL)
 			ExecAssignExprContext(estate, &mtstate->ps);
-		econtext = mtstate->ps.ps_ExprContext;
-
-		/*
-		 * Build a projection for each result rel.
-		 */
-		resultRelInfo = mtstate->resultRelInfo;
-		foreach(l, node->returningLists)
-		{
-			List	   *rlist = (List *) lfirst(l);
-
-			resultRelInfo->ri_returningList = rlist;
-			resultRelInfo->ri_projectReturning =
-				ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
-										resultRelInfo->ri_RelationDesc->rd_att);
-			resultRelInfo++;
-		}
 	}
 	else
 	{
@@ -2457,67 +2778,18 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		mtstate->ps.ps_ExprContext = NULL;
 	}
 
-	/* Set the list of arbiter indexes if needed for ON CONFLICT */
-	resultRelInfo = mtstate->resultRelInfo;
-	if (node->onConflictAction != ONCONFLICT_NONE)
-		resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes;
+	/* Get the target relation */
+	rel = mtstate->rootResultRelInfo->ri_RelationDesc;
 
 	/*
-	 * If needed, Initialize target list, projection and qual for ON CONFLICT
-	 * DO UPDATE.
+	 * Build state for tuple routing if it's an INSERT.  An UPDATE might need
+	 * it too, but it's initialized only when it actually ends up moving
+	 * tuples between partitions; see ExecCrossPartitionUpdate().
 	 */
-	if (node->onConflictAction == ONCONFLICT_UPDATE)
-	{
-		ExprContext *econtext;
-		TupleDesc	relationDesc;
-		TupleDesc	tupDesc;
-
-		/* insert may only have one plan, inheritance is not expanded */
-		Assert(nplans == 1);
-
-		/* already exists if created by RETURNING processing above */
-		if (mtstate->ps.ps_ExprContext == NULL)
-			ExecAssignExprContext(estate, &mtstate->ps);
-
-		econtext = mtstate->ps.ps_ExprContext;
-		relationDesc = resultRelInfo->ri_RelationDesc->rd_att;
-
-		/* create state for DO UPDATE SET operation */
-		resultRelInfo->ri_onConflict = makeNode(OnConflictSetState);
-
-		/* initialize slot for the existing tuple */
-		resultRelInfo->ri_onConflict->oc_Existing =
-			table_slot_create(resultRelInfo->ri_RelationDesc,
-							  &mtstate->ps.state->es_tupleTable);
-
-		/*
-		 * Create the tuple slot for the UPDATE SET projection. We want a slot
-		 * of the table's type here, because the slot will be used to insert
-		 * into the table, and for RETURNING processing - which may access
-		 * system attributes.
-		 */
-		tupDesc = ExecTypeFromTL((List *) node->onConflictSet);
-		resultRelInfo->ri_onConflict->oc_ProjSlot =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
-								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
-		/* build UPDATE SET projection state */
-		resultRelInfo->ri_onConflict->oc_ProjInfo =
-			ExecBuildProjectionInfo(node->onConflictSet, econtext,
-									resultRelInfo->ri_onConflict->oc_ProjSlot,
-									&mtstate->ps,
-									relationDesc);
-
-		/* initialize state to evaluate the WHERE clause, if any */
-		if (node->onConflictWhere)
-		{
-			ExprState  *qualexpr;
-
-			qualexpr = ExecInitQual((List *) node->onConflictWhere,
-									&mtstate->ps);
-			resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
-		}
-	}
+	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+		operation == CMD_INSERT)
+		mtstate->mt_partition_tuple_routing =
+			ExecSetupPartitionTupleRouting(estate, mtstate, rel);
 
 	/*
 	 * If we have any secondary relations in an UPDATE or DELETE, they need to
@@ -2555,121 +2827,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 						mtstate->mt_arowmarks[0]);
 
 	/*
-	 * Initialize the junk filter(s) if needed.  INSERT queries need a filter
-	 * if there are any junk attrs in the tlist.  UPDATE and DELETE always
-	 * need a filter, since there's always at least one junk attribute present
-	 * --- no need to look first.  Typically, this will be a 'ctid' or
-	 * 'wholerow' attribute, but in the case of a foreign data wrapper it
-	 * might be a set of junk attributes sufficient to identify the remote
-	 * row.
-	 *
-	 * If there are multiple result relations, each one needs its own junk
-	 * filter.  Note multiple rels are only possible for UPDATE/DELETE, so we
-	 * can't be fooled by some needing a filter and some not.
-	 *
-	 * This section of code is also a convenient place to verify that the
-	 * output of an INSERT or UPDATE matches the target table(s).
-	 */
-	{
-		bool		junk_filter_needed = false;
-
-		switch (operation)
-		{
-			case CMD_INSERT:
-				foreach(l, subplan->targetlist)
-				{
-					TargetEntry *tle = (TargetEntry *) lfirst(l);
-
-					if (tle->resjunk)
-					{
-						junk_filter_needed = true;
-						break;
-					}
-				}
-				break;
-			case CMD_UPDATE:
-			case CMD_DELETE:
-				junk_filter_needed = true;
-				break;
-			default:
-				elog(ERROR, "unknown operation");
-				break;
-		}
-
-		if (junk_filter_needed)
-		{
-			resultRelInfo = mtstate->resultRelInfo;
-			for (i = 0; i < nplans; i++)
-			{
-				JunkFilter *j;
-				TupleTableSlot *junkresslot;
-
-				subplan = mtstate->mt_plans[i]->plan;
-
-				junkresslot =
-					ExecInitExtraTupleSlot(estate, NULL,
-										   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
-				/*
-				 * For an INSERT or UPDATE, the result tuple must always match
-				 * the target table's descriptor.  For a DELETE, it won't
-				 * (indeed, there's probably no non-junk output columns).
-				 */
-				if (operation == CMD_INSERT || operation == CMD_UPDATE)
-				{
-					ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
-										subplan->targetlist);
-					j = ExecInitJunkFilterInsertion(subplan->targetlist,
-													RelationGetDescr(resultRelInfo->ri_RelationDesc),
-													junkresslot);
-				}
-				else
-					j = ExecInitJunkFilter(subplan->targetlist,
-										   junkresslot);
-
-				if (operation == CMD_UPDATE || operation == CMD_DELETE)
-				{
-					/* For UPDATE/DELETE, find the appropriate junk attr now */
-					char		relkind;
-
-					relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-					if (relkind == RELKIND_RELATION ||
-						relkind == RELKIND_MATVIEW ||
-						relkind == RELKIND_PARTITIONED_TABLE)
-					{
-						j->jf_junkAttNo = ExecFindJunkAttribute(j, "ctid");
-						if (!AttributeNumberIsValid(j->jf_junkAttNo))
-							elog(ERROR, "could not find junk ctid column");
-					}
-					else if (relkind == RELKIND_FOREIGN_TABLE)
-					{
-						/*
-						 * When there is a row-level trigger, there should be
-						 * a wholerow attribute.
-						 */
-						j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
-					}
-					else
-					{
-						j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
-						if (!AttributeNumberIsValid(j->jf_junkAttNo))
-							elog(ERROR, "could not find junk wholerow column");
-					}
-				}
-
-				resultRelInfo->ri_junkFilter = j;
-				resultRelInfo++;
-			}
-		}
-		else
-		{
-			if (operation == CMD_INSERT)
-				ExecCheckPlanOutput(mtstate->resultRelInfo->ri_RelationDesc,
-									subplan->targetlist);
-		}
-	}
-
-	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
 	 * to estate->es_auxmodifytables so that it will be run to completion by
 	 * ExecPostprocessPlan.  (It'd actually work fine to add the primary
@@ -2699,20 +2856,6 @@ ExecEndModifyTable(ModifyTableState *node)
 	int			i;
 
 	/*
-	 * Allow any FDWs to shut down
-	 */
-	for (i = 0; i < node->mt_nplans; i++)
-	{
-		ResultRelInfo *resultRelInfo = node->resultRelInfo + i;
-
-		if (!resultRelInfo->ri_usesFdwDirectModify &&
-			resultRelInfo->ri_FdwRoutine != NULL &&
-			resultRelInfo->ri_FdwRoutine->EndForeignModify != NULL)
-			resultRelInfo->ri_FdwRoutine->EndForeignModify(node->ps.state,
-														   resultRelInfo);
-	}
-
-	/*
 	 * Close all the partitioned tables, leaf partitions, and their indices
 	 * and release the slot used for tuple routing, if set.
 	 */
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 46a2dc9..9ae7e40 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -22,5 +22,6 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
 extern void ExecEndModifyTable(ModifyTableState *node);
 extern void ExecReScanModifyTable(ModifyTableState *node);
+extern ResultRelInfo *ExecGetResultRelation(ModifyTableState *mtstate, int whichrel);
 
 #endif							/* NODEMODIFYTABLE_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c0a7d6..f2f4bed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -463,6 +463,7 @@ typedef struct ResultRelInfo
 
 	/* for removing junk attributes from tuples */
 	JunkFilter *ri_junkFilter;
+	bool		ri_junkFilterValid;	/* has the filter been initialized? */
 
 	/* list of RETURNING expressions */
 	List	   *ri_returningList;
@@ -497,6 +498,7 @@ typedef struct ResultRelInfo
 	 * transition tuple capture or update partition row movement is active.
 	 */
 	TupleConversionMap *ri_ChildToRootMap;
+	bool		ri_ChildToRootMapValid;	/* has the map been initialized? */
 
 	/* for use by copy.c when performing multi-inserts */
 	struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index ff157ce..74cd7e2 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -52,10 +52,7 @@ explain (costs off) insert into insertconflicttest values(0, 'Crowberry') on con
    Conflict Arbiter Indexes: op_index_key, collation_index_key, both_index_key
    Conflict Filter: (SubPlan 1)
    ->  Result
-   SubPlan 1
-     ->  Index Only Scan using both_index_expr_key on insertconflicttest ii
-           Index Cond: (key = excluded.key)
-(8 rows)
+(5 rows)
 
 -- Neither collation nor operator class specifications are required --
 -- supplying them merely *limits* matches to indexes with matching opclasses
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index caed1c1..d8d2a3d 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1862,28 +1862,22 @@ UPDATE rw_view1 SET a = a + 5; -- should fail
 ERROR:  new row violates check option for view "rw_view1"
 DETAIL:  Failing row contains (15).
 EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5);
-                       QUERY PLAN                        
----------------------------------------------------------
+      QUERY PLAN      
+----------------------
  Insert on base_tbl b
    ->  Result
-   SubPlan 1
-     ->  Index Only Scan using ref_tbl_pkey on ref_tbl r
-           Index Cond: (a = b.a)
-(5 rows)
+(2 rows)
 
 EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
-                        QUERY PLAN                         
------------------------------------------------------------
+               QUERY PLAN                
+-----------------------------------------
  Update on base_tbl b
    ->  Hash Join
          Hash Cond: (b.a = r.a)
          ->  Seq Scan on base_tbl b
          ->  Hash
                ->  Seq Scan on ref_tbl r
-   SubPlan 1
-     ->  Index Only Scan using ref_tbl_pkey on ref_tbl r_1
-           Index Cond: (a = b.a)
-(9 rows)
+(6 rows)
 
 DROP TABLE base_tbl, ref_tbl CASCADE;
 NOTICE:  drop cascades to view rw_view1
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index bf939d7..0ad0d1a 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -341,8 +341,8 @@ DETAIL:  Failing row contains (105, 85, null, b, 15).
 -- fail, no partition key update, so no attempt to move tuple,
 -- but "a = 'a'" violates partition constraint enforced by root partition)
 UPDATE part_b_10_b_20 set a = 'a';
-ERROR:  new row for relation "part_c_1_100" violates partition constraint
-DETAIL:  Failing row contains (null, 1, 96, 12, a).
+ERROR:  new row for relation "part_b_10_b_20" violates partition constraint
+DETAIL:  Failing row contains (null, 96, a, 12, 1).
 -- ok, partition key update, no constraint violation
 UPDATE range_parted set d = d - 10 WHERE d > 10;
 -- ok, no partition key update, no constraint violation
@@ -372,8 +372,8 @@ UPDATE part_b_10_b_20 set c = c + 20 returning c, b, a;
 
 -- fail, row movement happens only within the partition subtree.
 UPDATE part_b_10_b_20 set b = b - 6 WHERE c > 116 returning *;
-ERROR:  new row for relation "part_d_1_15" violates partition constraint
-DETAIL:  Failing row contains (2, 117, 2, b, 7).
+ERROR:  new row for relation "part_b_10_b_20" violates partition constraint
+DETAIL:  Failing row contains (2, 117, b, 7, 2).
 -- ok, row movement, with subset of rows moved into different partition.
 UPDATE range_parted set b = b - 6 WHERE c > 116 returning a, b + c;
  a | ?column? 
@@ -814,8 +814,8 @@ INSERT into sub_parted VALUES (1,2,10);
 -- Test partition constraint violation when intermediate ancestor is used and
 -- constraint is inherited from upper root.
 UPDATE sub_parted set a = 2 WHERE c = 10;
-ERROR:  new row for relation "sub_part2" violates partition constraint
-DETAIL:  Failing row contains (2, 10, 2).
+ERROR:  new row for relation "sub_parted" violates partition constraint
+DETAIL:  Failing row contains (2, 2, 10).
 -- Test update-partition-key, where the unpruned partitions do not have their
 -- partition keys updated.
 SELECT tableoid::regclass::text, * FROM list_parted WHERE a = 2 ORDER BY 1;
-- 
1.8.3.1

#83Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#81)
Re: partition routing layering in nodeModifyTable.c

On Wed, Oct 28, 2020 at 12:02 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

But since this applies on top of the "overhaul update/delete processing"
patch, let's tackle that patch set next. Could you rebase that, please?

Anyway, I will post the rebased patch on the "overhaul update/delete
processing" thread.

Done.

--
Amit Langote
EDB: http://www.enterprisedb.com

#84Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#82)
Re: partition routing layering in nodeModifyTable.c

On Wed, Oct 28, 2020 at 4:46 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

This patch looks reasonable to me at a quick glance. I'm a bit worried
or unhappy about the impact on FDWs, though. It doesn't seem nice that
the ResultRelInfo is not available in the BeginDirectModify call. It's
not too bad, the FDW can call ExecGetResultRelation() if it needs it,
but still. Perhaps it would be better to delay calling
BeginDirectModify() until the first modification is performed, to avoid
any initialization overhead there, like establishing the connection in
postgres_fdw.

Ah, calling BeginDirectModify() itself lazily sounds like a good idea;
see attached updated 0001 to see how that looks. While updating that
patch, I realized that the ForeignScan.resultRelation that we
introduced in 178f2d560d will now be totally useless. :-(

Given that we've closed the CF entry for this thread and given that
there seems to be enough context to these patches, I will move these
patches back to their original thread, that is:

* ModifyTable overheads in generic plans *
/messages/by-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com

That will also make the CF-bot happy.

--
Amit Langote
EDB: http://www.enterprisedb.com