postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Started by Etsuro Fujitaabout 8 years ago33 messages
#1Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
1 attachment(s)

(2017/04/08 10:28), Robert Haas wrote:

On Wed, Mar 22, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/02/22 19:57, Rushabh Lathia wrote:

Marked this as Ready for Committer.

I noticed that this item in the CF app was incorrectly marked as

Committed.

This patch isn't committed, so I returned it to the previous status.

I also

rebased the patch. Attached is a new version of the patch.

Sorry, I marked the wrong patch as committed. Apologies for that.

No problem. My apologies for the long long delay.

This doesn't apply any more because of recent changes.

git diff --check complains:
contrib/postgres_fdw/postgres_fdw.c:3653: space before tab in indent.

I rebased the patch.

+        /* Shouldn't contain the target relation. */
+        Assert(target_rel == 0);

This comment should give a reason.

Done.

void
deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
+ RelOptInfo *foreignrel,
List *targetlist,
List *targetAttrs,
List *remote_conds,

Could you add a comment explaining the meaning of these various
arguments? It takes rtindex, rel, and foreignrel, which apparently
are all different things, but the meaning is not explained.

Done.

/*
+ * Add a RETURNING clause, if needed, to an UPDATE/DELETE on a join.
+ */
+static void
+deparseExplicitReturningList(List *rlist,
+                             List **retrieved_attrs,
+                             deparse_expr_cxt *context)
+{
+    deparseExplicitTargetList(rlist, true, retrieved_attrs, context);
+}

Do we really want to add a function for one line of code?

I don't have any strong opinion about that, so I removed that function
and modified deparseDirectUpdateSql/deparseDirectDeleteSql so it calls
deparseExplicitTargetList directly.

+/*
+ * Look for conditions mentioning the target relation in the given

join tree,

+ * which will be pulled up into the WHERE clause. Note that this is

safe due

+ * to the same reason stated in comments in deparseFromExprForRel.
+ */

The comments for deparseFromExprForRel do not seem to address the
topic of why this is safe. Also, the answer to the question "safe
from what?" is not clear.

Maybe my explanation would not be enough, but I think the reason why
this is safe would be derived from the comments for
deparseFromExprForRel. BUT: on reflection, I think I made the deparsing
logic complex beyond necessity. So I simplified the logic, which
doesn't pull_up_target_conditions any more, and added comments about that.

-    deparseReturningList(buf, root, rtindex, rel, false,
-                         returningList, retrieved_attrs);
+    if (foreignrel->reloptkind == RELOPT_JOINREL)
+        deparseExplicitReturningList(returningList,

retrieved_attrs,&context);

+    else
+        deparseReturningList(buf, root, rtindex, rel, false,
+                             returningList, retrieved_attrs);

Why do these cases need to be handled differently? Maybe add a brief

comment?

The reason for that is 1)

+       /*
+        * When performing an UPDATE/DELETE .. RETURNING on a join directly,
+        * we fetch from the foreign server any Vars specified in RETURNING
+        * that refer not only to the target relation but to non-target
+        * relations.  So we'll deparse them into the RETURNING clause
of the
+        * remote query;

and 2) deparseReturningList can retrieve Vars of the target relation,
but can't retrieve Vars of non-target relations.

+        if ((outerrel->reloptkind == RELOPT_BASEREL&&
+             outerrel->relid == target_rel) ||
+            (innerrel->reloptkind == RELOPT_BASEREL&&
+             innerrel->relid == target_rel))

1. Surely it's redundant to check the RelOptKind if the RTI matches?

Good catch! Revised.

2. Generally, the tests in this patch against various RelOptKind
values should be adapted to use the new macros introduced in
7a39b5e4d11229ece930a51fd7cb29e535db4494.

I left some of the tests alone because I think that's more strict.

The regression tests remove every remaining case where an update or
delete gets fails to get pushed to the remote side. I think we should
still test that path, because we've still got that code. Maybe use a
non-pushable function in the join clause, or something.

Done.

The new test cases could use some brief comments explaining their purpose.

Done. Also, I think some of the tests in the previous version are
redundant, so I simplified the tests a bit.

if (plan->returningLists)
+ {
returningList = (List *) list_nth(plan->returningLists,

subplan_index);

+        /*
+         * If UPDATE/DELETE on a join, create a RETURNING list used

in the

+         * remote query.
+         */
+        if (fscan->scan.scanrelid == 0)
+            returningList = make_explicit_returning_list(resultRelation,
+                                                         rel,
+                                                         returningList);
+    }

Again, the comment doesn't really explain why we're doing this. And
initializing returningList twice in a row seems strange, too.

I don't think that's strange because the second one is actually
re-computation of that list. Note that make_explicit_returning_list
takes that list as the 3rd argument. I added more comments explaining
why. (I also changed the name of make_explicit_returning_list.)

I am unfortunately too tired to finish properly reviewing this

tonight. :-(

Thanks for the review!

Attached is an updated version of the patch. I'll add this to the next CF.

Sorry for creating a new thread. I changed my mail client.

Best regards,
Etsuro Fujita

Attachments:

postgres-fdw-more-update-pushdown-v5.patchtext/x-diff; name=postgres-fdw-more-update-pushdown-v5.patchDownload
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***************
*** 132,138 **** static void deparseTargetList(StringInfo buf,
  				  Bitmapset *attrs_used,
  				  bool qualify_col,
  				  List **retrieved_attrs);
! static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  						  deparse_expr_cxt *context);
  static void deparseSubqueryTargetList(deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
--- 132,140 ----
  				  Bitmapset *attrs_used,
  				  bool qualify_col,
  				  List **retrieved_attrs);
! static void deparseExplicitTargetList(List *tlist,
! 						  bool is_returning,
! 						  List **retrieved_attrs,
  						  deparse_expr_cxt *context);
  static void deparseSubqueryTargetList(deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
***************
*** 168,178 **** static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
  static void appendConditions(List *exprs, deparse_expr_cxt *context);
  static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
! 					  RelOptInfo *joinrel, bool use_alias, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
  static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
  				   RelOptInfo *foreignrel, bool make_subquery,
! 				   List **params_list);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
  static void appendAggOrderBy(List *orderList, List *targetList,
--- 170,182 ----
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
  static void appendConditions(List *exprs, deparse_expr_cxt *context);
  static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
! 					  RelOptInfo *foreignrel, bool use_alias,
! 					  Index ignore_rel, List **ignore_conds,
! 					  List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
  static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
  				   RelOptInfo *foreignrel, bool make_subquery,
! 				   Index ignore_rel, List **ignore_conds, List **params_list);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
  static void appendAggOrderBy(List *orderList, List *targetList,
***************
*** 1028,1034 **** deparseSelectSql(List *tlist, bool is_subquery, List **retrieved_attrs,
  		 * For a join or upper relation the input tlist gives the list of
  		 * columns required to be fetched from the foreign server.
  		 */
! 		deparseExplicitTargetList(tlist, retrieved_attrs, context);
  	}
  	else
  	{
--- 1032,1038 ----
  		 * For a join or upper relation the input tlist gives the list of
  		 * columns required to be fetched from the foreign server.
  		 */
! 		deparseExplicitTargetList(tlist, false, retrieved_attrs, context);
  	}
  	else
  	{
***************
*** 1071,1077 **** deparseFromExpr(List *quals, deparse_expr_cxt *context)
  	appendStringInfoString(buf, " FROM ");
  	deparseFromExprForRel(buf, context->root, scanrel,
  						  (bms_num_members(scanrel->relids) > 1),
! 						  context->params_list);
  
  	/* Construct WHERE clause */
  	if (quals != NIL)
--- 1075,1081 ----
  	appendStringInfoString(buf, " FROM ");
  	deparseFromExprForRel(buf, context->root, scanrel,
  						  (bms_num_members(scanrel->relids) > 1),
! 						  (Index) 0, NULL, context->params_list);
  
  	/* Construct WHERE clause */
  	if (quals != NIL)
***************
*** 1340,1348 **** get_jointype_name(JoinType jointype)
   *
   * retrieved_attrs is the list of continuously increasing integers starting
   * from 1. It has same number of entries as tlist.
   */
  static void
! deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  						  deparse_expr_cxt *context)
  {
  	ListCell   *lc;
--- 1344,1357 ----
   *
   * retrieved_attrs is the list of continuously increasing integers starting
   * from 1. It has same number of entries as tlist.
+  *
+  * This is used for both SELECT and RETURNING targetlists; the is_returning
+  * parameter is true only for a RETURNING targetlist.
   */
  static void
! deparseExplicitTargetList(List *tlist,
! 						  bool is_returning,
! 						  List **retrieved_attrs,
  						  deparse_expr_cxt *context)
  {
  	ListCell   *lc;
***************
*** 1357,1369 **** deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  
  		if (i > 0)
  			appendStringInfoString(buf, ", ");
  		deparseExpr((Expr *) tle->expr, context);
  
  		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
  		i++;
  	}
  
! 	if (i == 0)
  		appendStringInfoString(buf, "NULL");
  }
  
--- 1366,1381 ----
  
  		if (i > 0)
  			appendStringInfoString(buf, ", ");
+ 		else if (is_returning)
+ 			appendStringInfoString(buf, " RETURNING ");
+ 
  		deparseExpr((Expr *) tle->expr, context);
  
  		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
  		i++;
  	}
  
! 	if (i == 0 && !is_returning)
  		appendStringInfoString(buf, "NULL");
  }
  
***************
*** 1406,1415 **** deparseSubqueryTargetList(deparse_expr_cxt *context)
   * The function constructs ... JOIN ... ON ... for join relation. For a base
   * relation it just returns schema-qualified tablename, with the appropriate
   * alias if so requested.
   */
  static void
  deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 					  bool use_alias, List **params_list)
  {
  	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
--- 1418,1434 ----
   * The function constructs ... JOIN ... ON ... for join relation. For a base
   * relation it just returns schema-qualified tablename, with the appropriate
   * alias if so requested.
+  *
+  * 'ignore_rel' is either zero or the RT index of a target relation.  In the
+  * latter case the function constructs FROM clause of UPDATE or USING clause
+  * of DELETE; it deparses the join relation as if it never contained the
+  * target relation, and creates a List of conditions to be deparsed into the
+  * top-level WHERE clause, which is returned to *ignore_conds.
   */
  static void
  deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 					  bool use_alias, Index ignore_rel, List **ignore_conds,
! 					  List **params_list)
  {
  	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
***************
*** 1417,1432 **** deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
  	{
  		StringInfoData join_sql_o;
  		StringInfoData join_sql_i;
  
! 		/* Deparse outer relation */
! 		initStringInfo(&join_sql_o);
! 		deparseRangeTblRef(&join_sql_o, root, fpinfo->outerrel,
! 						   fpinfo->make_outerrel_subquery, params_list);
  
! 		/* Deparse inner relation */
! 		initStringInfo(&join_sql_i);
! 		deparseRangeTblRef(&join_sql_i, root, fpinfo->innerrel,
! 						   fpinfo->make_innerrel_subquery, params_list);
  
  		/*
  		 * For a join relation FROM clause entry is deparsed as
--- 1436,1518 ----
  	{
  		StringInfoData join_sql_o;
  		StringInfoData join_sql_i;
+ 		RelOptInfo *outerrel = fpinfo->outerrel;
+ 		RelOptInfo *innerrel = fpinfo->innerrel;
+ 		bool		outerrel_is_target = false;
+ 		bool		innerrel_is_target = false;
  
! 		if (ignore_rel > 0 && bms_is_member(ignore_rel, foreignrel->relids))
! 		{
! 			/*
! 			 * Note that the target relation can never be within the nullable
! 			 * side of an outer join.  So, if this is an inner join, the join
! 			 * conditions can be pulled up into the top-level WHERE clause.
! 			 */
! 			if (fpinfo->jointype == JOIN_INNER)
! 			{
! 				*ignore_conds = list_concat(*ignore_conds,
! 											list_copy(fpinfo->joinclauses));
! 				fpinfo->joinclauses = NIL;
! 			}
  
! 			/*
! 			 * Check if either of the input relations is the target relation.
! 			 */
! 			if (outerrel->relid == ignore_rel)
! 				outerrel_is_target = true;
! 			else if (innerrel->relid == ignore_rel)
! 				innerrel_is_target = true;
! 		}
! 
! 		/* Deparse outer relation if not the target relation. */
! 		if (!outerrel_is_target)
! 		{
! 			initStringInfo(&join_sql_o);
! 			deparseRangeTblRef(&join_sql_o, root, outerrel,
! 							   fpinfo->make_outerrel_subquery,
! 							   ignore_rel, ignore_conds, params_list);
! 
! 			/*
! 			 * If inner relation is the target relation, skip deparsing it.
! 			 * Note that since the join of the target relation with any other
! 			 * relation in the query is an inner join and can never be within
! 			 * the nullable side of an outer join, the join can be
! 			 * interchanged with higher-level joins (cf. identity 1 on outer
! 			 * join reordering shown in src/backend/optimizer/README), which
! 			 * means it's safe to skip the target-relation deparsing here.
! 			 */
! 			if (innerrel_is_target)
! 			{
! 				Assert(fpinfo->jointype == JOIN_INNER);
! 				Assert(fpinfo->joinclauses == NIL);
! 				appendStringInfo(buf, "%s", join_sql_o.data);
! 				return;
! 			}
! 		}
! 
! 		/* Deparse inner relation if not the target relation. */
! 		if (!innerrel_is_target)
! 		{
! 			initStringInfo(&join_sql_i);
! 			deparseRangeTblRef(&join_sql_i, root, innerrel,
! 							   fpinfo->make_innerrel_subquery,
! 							   ignore_rel, ignore_conds, params_list);
! 
! 			/*
! 			 * If outer relation is the target relation, skip deparsing it.
! 			 * See the above note about safety.
! 			 */
! 			if (outerrel_is_target)
! 			{
! 				Assert(fpinfo->jointype == JOIN_INNER);
! 				Assert(fpinfo->joinclauses == NIL);
! 				appendStringInfo(buf, "%s", join_sql_i.data);
! 				return;
! 			}
! 		}
! 
! 		/* Neither of the relations is the target relation. */
! 		Assert(!outerrel_is_target && !innerrel_is_target);
  
  		/*
  		 * For a join relation FROM clause entry is deparsed as
***************
*** 1486,1492 **** deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
   */
  static void
  deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 				   bool make_subquery, List **params_list)
  {
  	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
--- 1572,1579 ----
   */
  static void
  deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 				   bool make_subquery, Index ignore_rel, List **ignore_conds,
! 				   List **params_list)
  {
  	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
***************
*** 1501,1506 **** deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
--- 1588,1601 ----
  		List	   *retrieved_attrs;
  		int			ncols;
  
+ 		/*
+ 		 * The given relation shouldn't contain the target relation, because
+ 		 * this should only happen for input relations for a full join, and
+ 		 * such relations can never contain an UPDATE/DELETE target.
+ 		 */
+ 		Assert(ignore_rel == 0 ||
+ 			   !bms_is_member(ignore_rel, foreignrel->relids));
+ 
  		/* Deparse the subquery representing the relation. */
  		appendStringInfoChar(buf, '(');
  		deparseSelectStmtForRel(buf, root, foreignrel, NIL,
***************
*** 1534,1540 **** deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
  		}
  	}
  	else
! 		deparseFromExprForRel(buf, root, foreignrel, true, params_list);
  }
  
  /*
--- 1629,1636 ----
  		}
  	}
  	else
! 		deparseFromExprForRel(buf, root, foreignrel, true, ignore_rel,
! 							  ignore_conds, params_list);
  }
  
  /*
***************
*** 1645,1657 **** deparseUpdateSql(StringInfo buf, PlannerInfo *root,
  /*
   * deparse remote UPDATE statement
   *
!  * The statement text is appended to buf, and we also create an integer List
!  * of the columns being retrieved by RETURNING (if any), which is returned
!  * to *retrieved_attrs.
   */
  void
  deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
  					   List *targetlist,
  					   List *targetAttrs,
  					   List *remote_conds,
--- 1741,1763 ----
  /*
   * deparse remote UPDATE statement
   *
!  * 'buf' is the output buffer to append the statement to
!  * 'rtindex' is the RT index of the associated target relation
!  * 'rel' is the relation descriptor for the target relation
!  * 'foreignrel' is the RelOptInfo for the target relation or the join relation
!  *		containing all base relations in the query
!  * 'targetlist' is the tlist of the underlying foreign-scan plan node
!  * 'targetAttrs' is the target columns of the UPDATE
!  * 'remote_conds' is the qual clauses that must be evaluated remotely
!  * '*params_list' is an output list of exprs that will become remote Params
!  * 'returningList' is the RETURNING targetlist
!  * '*retrieved_attrs' is an output list of integers of columns being retrieved
!  *		by RETURNING (if any)
   */
  void
  deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
+ 					   RelOptInfo *foreignrel,
  					   List *targetlist,
  					   List *targetAttrs,
  					   List *remote_conds,
***************
*** 1659,1665 **** deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  					   List *returningList,
  					   List **retrieved_attrs)
  {
- 	RelOptInfo *baserel = root->simple_rel_array[rtindex];
  	deparse_expr_cxt context;
  	int			nestlevel;
  	bool		first;
--- 1765,1770 ----
***************
*** 1667,1679 **** deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  
  	/* Set up context struct for recursion */
  	context.root = root;
! 	context.foreignrel = baserel;
! 	context.scanrel = baserel;
  	context.buf = buf;
  	context.params_list = params_list;
  
  	appendStringInfoString(buf, "UPDATE ");
  	deparseRelation(buf, rel);
  	appendStringInfoString(buf, " SET ");
  
  	/* Make sure any constants in the exprs are printed portably */
--- 1772,1786 ----
  
  	/* Set up context struct for recursion */
  	context.root = root;
! 	context.foreignrel = foreignrel;
! 	context.scanrel = foreignrel;
  	context.buf = buf;
  	context.params_list = params_list;
  
  	appendStringInfoString(buf, "UPDATE ");
  	deparseRelation(buf, rel);
+ 	if (foreignrel->reloptkind == RELOPT_JOINREL)
+ 		appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, rtindex);
  	appendStringInfoString(buf, " SET ");
  
  	/* Make sure any constants in the exprs are printed portably */
***************
*** 1700,1713 **** deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  
  	reset_transmission_modes(nestlevel);
  
  	if (remote_conds)
  	{
  		appendStringInfoString(buf, " WHERE ");
  		appendConditions(remote_conds, &context);
  	}
  
! 	deparseReturningList(buf, root, rtindex, rel, false,
! 						 returningList, retrieved_attrs);
  }
  
  /*
--- 1807,1834 ----
  
  	reset_transmission_modes(nestlevel);
  
+ 	if (foreignrel->reloptkind == RELOPT_JOINREL)
+ 	{
+ 		List	   *ignore_conds = NIL;
+ 
+ 		appendStringInfo(buf, " FROM ");
+ 		deparseFromExprForRel(buf, root, foreignrel, true, rtindex,
+ 							  &ignore_conds, params_list);
+ 		remote_conds = list_concat(ignore_conds, remote_conds);
+ 	}
+ 
  	if (remote_conds)
  	{
  		appendStringInfoString(buf, " WHERE ");
  		appendConditions(remote_conds, &context);
  	}
  
! 	if (foreignrel->reloptkind == RELOPT_JOINREL)
! 		deparseExplicitTargetList(returningList, true, retrieved_attrs,
! 								  &context);
! 	else
! 		deparseReturningList(buf, root, rtindex, rel, false,
! 							 returningList, retrieved_attrs);
  }
  
  /*
***************
*** 1735,1764 **** deparseDeleteSql(StringInfo buf, PlannerInfo *root,
  /*
   * deparse remote DELETE statement
   *
!  * The statement text is appended to buf, and we also create an integer List
!  * of the columns being retrieved by RETURNING (if any), which is returned
!  * to *retrieved_attrs.
   */
  void
  deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
  					   List *remote_conds,
  					   List **params_list,
  					   List *returningList,
  					   List **retrieved_attrs)
  {
- 	RelOptInfo *baserel = root->simple_rel_array[rtindex];
  	deparse_expr_cxt context;
  
  	/* Set up context struct for recursion */
  	context.root = root;
! 	context.foreignrel = baserel;
! 	context.scanrel = baserel;
  	context.buf = buf;
  	context.params_list = params_list;
  
  	appendStringInfoString(buf, "DELETE FROM ");
  	deparseRelation(buf, rel);
  
  	if (remote_conds)
  	{
--- 1856,1904 ----
  /*
   * deparse remote DELETE statement
   *
!  * 'buf' is the output buffer to append the statement to
!  * 'rtindex' is the RT index of the associated target relation
!  * 'rel' is the relation descriptor for the target relation
!  * 'foreignrel' is the RelOptInfo for the target relation or the join relation
!  *		containing all base relations in the query
!  * 'remote_conds' is the qual clauses that must be evaluated remotely
!  * '*params_list' is an output list of exprs that will become remote Params
!  * 'returningList' is the RETURNING targetlist
!  * '*retrieved_attrs' is an output list of integers of columns being retrieved
!  *		by RETURNING (if any)
   */
  void
  deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
+ 					   RelOptInfo *foreignrel,
  					   List *remote_conds,
  					   List **params_list,
  					   List *returningList,
  					   List **retrieved_attrs)
  {
  	deparse_expr_cxt context;
  
  	/* Set up context struct for recursion */
  	context.root = root;
! 	context.foreignrel = foreignrel;
! 	context.scanrel = foreignrel;
  	context.buf = buf;
  	context.params_list = params_list;
  
  	appendStringInfoString(buf, "DELETE FROM ");
  	deparseRelation(buf, rel);
+ 	if (foreignrel->reloptkind == RELOPT_JOINREL)
+ 		appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, rtindex);
+ 
+ 	if (foreignrel->reloptkind == RELOPT_JOINREL)
+ 	{
+ 		List	   *ignore_conds = NIL;
+ 
+ 		appendStringInfo(buf, " USING ");
+ 		deparseFromExprForRel(buf, root, foreignrel, true, rtindex,
+ 							  &ignore_conds, params_list);
+ 		remote_conds = list_concat(ignore_conds, remote_conds);
+ 	}
  
  	if (remote_conds)
  	{
***************
*** 1766,1773 **** deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
  		appendConditions(remote_conds, &context);
  	}
  
! 	deparseReturningList(buf, root, rtindex, rel, false,
! 						 returningList, retrieved_attrs);
  }
  
  /*
--- 1906,1917 ----
  		appendConditions(remote_conds, &context);
  	}
  
! 	if (foreignrel->reloptkind == RELOPT_JOINREL)
! 		deparseExplicitTargetList(returningList, true, retrieved_attrs,
! 								  &context);
! 	else
! 		deparseReturningList(buf, root, rtindex, rel, false,
! 							 returningList, retrieved_attrs);
  }
  
  /*
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 4291,4317 **** UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING
  
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
!   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;                               -- can't be pushed down
!                                                                                                                                                         QUERY PLAN                                                                                                                                                         
! ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
!    Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c3 = $3, c7 = $4 WHERE ctid = $1
!    ->  Foreign Scan
!          Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
!          Relations: (public.ft2) INNER JOIN (public.ft1)
!          Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
!                Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
!                ->  Foreign Scan on public.ft2
!                      Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
!                      Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4291,4303 ----
  
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
!   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;                               -- can be pushed down
!                                                                                                    QUERY PLAN                                                                                                    
! -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
!    ->  Foreign Update
!          Remote SQL: UPDATE "S 1"."T 1" r1 SET c2 = (r1.c2 + 500), c3 = (r1.c3 || '_update9'::text), c7 = 'ft2       '::character(10) FROM "S 1"."T 1" r2 WHERE ((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9))
! (3 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4434,4460 **** DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
  (103 rows)
  
  EXPLAIN (verbose, costs off)
! DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;                -- can't be pushed down
!                                                                                                                               QUERY PLAN                                                                                                                               
! -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Delete on public.ft2
!    Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1
!    ->  Foreign Scan
!          Output: ft2.ctid, ft1.*
!          Relations: (public.ft2) INNER JOIN (public.ft1)
!          Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
!                Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
!                ->  Foreign Scan on public.ft2
!                      Output: ft2.ctid, ft2.c2
!                      Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
!                      Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4420,4432 ----
  (103 rows)
  
  EXPLAIN (verbose, costs off)
! DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;                -- can be pushed down
!                                                          QUERY PLAN                                                         
! ----------------------------------------------------------------------------------------------------------------------------
   Delete on public.ft2
!    ->  Foreign Delete
!          Remote SQL: DELETE FROM "S 1"."T 1" r1 USING "S 1"."T 1" r2 WHERE ((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2))
! (3 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
***************
*** 5330,5335 **** DELETE FROM ft2 WHERE c1 = 9999 RETURNING tableoid::regclass;
--- 5302,5496 ----
   ft2
  (1 row)
  
+ -- Test UPDATE/DELETE with RETURNING on a three-table join
+ INSERT INTO ft2 (c1,c2,c3)
+   SELECT id, id - 1200, to_char(id, 'FM00000') FROM generate_series(1201, 1300) id;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'foo'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2, ft2.*, ft4.ctid, ft4, ft4.*;                             -- can be pushed down
+                                                                                                                                                                                    QUERY PLAN                                                                                                                                                                                    
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Update on public.ft2
+    Output: ft2.ctid, ft2.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.ctid, ft4.*, ft4.c1, ft4.c2, ft4.c3
+    ->  Foreign Update
+          Remote SQL: UPDATE "S 1"."T 1" r1 SET c3 = 'foo'::text FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (TRUE)) WHERE ((r2.c1 = r3.c1)) AND ((r1.c2 = r2.c1)) AND ((r1."C 1" > 1200)) RETURNING r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r1.ctid, r2.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3
+ (4 rows)
+ 
+ UPDATE ft2 SET c3 = 'foo'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2, ft2.*, ft4.ctid, ft4, ft4.*;
+    ctid   |              ft2               |  c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8 |  ctid  |      ft4       | c1 | c2 |   c3   
+ ----------+--------------------------------+------+----+-----+----+----+----+------------+----+--------+----------------+----+----+--------
+  (12,102) | (1206,6,foo,,,,"ft2       ",)  | 1206 |  6 | foo |    |    |    | ft2        |    | (0,6)  | (6,7,AAA006)   |  6 |  7 | AAA006
+  (12,103) | (1212,12,foo,,,,"ft2       ",) | 1212 | 12 | foo |    |    |    | ft2        |    | (0,12) | (12,13,AAA012) | 12 | 13 | AAA012
+  (12,104) | (1218,18,foo,,,,"ft2       ",) | 1218 | 18 | foo |    |    |    | ft2        |    | (0,18) | (18,19,AAA018) | 18 | 19 | AAA018
+  (12,105) | (1224,24,foo,,,,"ft2       ",) | 1224 | 24 | foo |    |    |    | ft2        |    | (0,24) | (24,25,AAA024) | 24 | 25 | AAA024
+  (12,106) | (1230,30,foo,,,,"ft2       ",) | 1230 | 30 | foo |    |    |    | ft2        |    | (0,30) | (30,31,AAA030) | 30 | 31 | AAA030
+  (12,107) | (1236,36,foo,,,,"ft2       ",) | 1236 | 36 | foo |    |    |    | ft2        |    | (0,36) | (36,37,AAA036) | 36 | 37 | AAA036
+  (12,108) | (1242,42,foo,,,,"ft2       ",) | 1242 | 42 | foo |    |    |    | ft2        |    | (0,42) | (42,43,AAA042) | 42 | 43 | AAA042
+  (12,109) | (1248,48,foo,,,,"ft2       ",) | 1248 | 48 | foo |    |    |    | ft2        |    | (0,48) | (48,49,AAA048) | 48 | 49 | AAA048
+  (12,110) | (1254,54,foo,,,,"ft2       ",) | 1254 | 54 | foo |    |    |    | ft2        |    | (0,54) | (54,55,AAA054) | 54 | 55 | AAA054
+  (12,111) | (1260,60,foo,,,,"ft2       ",) | 1260 | 60 | foo |    |    |    | ft2        |    | (0,60) | (60,61,AAA060) | 60 | 61 | AAA060
+  (12,112) | (1266,66,foo,,,,"ft2       ",) | 1266 | 66 | foo |    |    |    | ft2        |    | (0,66) | (66,67,AAA066) | 66 | 67 | AAA066
+  (12,113) | (1272,72,foo,,,,"ft2       ",) | 1272 | 72 | foo |    |    |    | ft2        |    | (0,72) | (72,73,AAA072) | 72 | 73 | AAA072
+  (12,114) | (1278,78,foo,,,,"ft2       ",) | 1278 | 78 | foo |    |    |    | ft2        |    | (0,78) | (78,79,AAA078) | 78 | 79 | AAA078
+  (12,115) | (1284,84,foo,,,,"ft2       ",) | 1284 | 84 | foo |    |    |    | ft2        |    | (0,84) | (84,85,AAA084) | 84 | 85 | AAA084
+  (12,116) | (1290,90,foo,,,,"ft2       ",) | 1290 | 90 | foo |    |    |    | ft2        |    | (0,90) | (90,91,AAA090) | 90 | 91 | AAA090
+  (12,117) | (1296,96,foo,,,,"ft2       ",) | 1296 | 96 | foo |    |    |    | ft2        |    | (0,96) | (96,97,AAA096) | 96 | 97 | AAA096
+ (16 rows)
+ 
+ EXPLAIN (verbose, costs off)
+ DELETE FROM ft2
+   USING ft4 LEFT JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c1 % 10 = 0 AND ft2.c2 = ft4.c1
+   RETURNING 100;                                                                    -- can be pushed down
+                                                                                             QUERY PLAN                                                                                             
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Delete on public.ft2
+    Output: 100
+    ->  Foreign Delete
+          Remote SQL: DELETE FROM "S 1"."T 1" r1 USING ("S 1"."T 3" r2 LEFT JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) WHERE ((r1.c2 = r2.c1)) AND ((r1."C 1" > 1200)) AND (((r1."C 1" % 10) = 0))
+ (4 rows)
+ 
+ DELETE FROM ft2
+   USING ft4 LEFT JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c1 % 10 = 0 AND ft2.c2 = ft4.c1
+   RETURNING 100;
+  ?column? 
+ ----------
+       100
+       100
+       100
+       100
+       100
+       100
+       100
+       100
+       100
+       100
+ (10 rows)
+ 
+ DELETE FROM ft2 WHERE ft2.c1 > 1200;
+ -- Test UPDATE/DELETE with WHERE or JOIN/ON conditions containing
+ -- user-defined operators/functions
+ ALTER SERVER loopback OPTIONS (DROP extensions);
+ INSERT INTO ft2 (c1,c2,c3)
+   SELECT id, id % 10, to_char(id, 'FM00000') FROM generate_series(2001, 2010) id;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;            -- can't be pushed down
+                                                 QUERY PLAN                                                
+ ----------------------------------------------------------------------------------------------------------
+  Update on public.ft2
+    Output: c1, c2, c3, c4, c5, c6, c7, c8
+    Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
+    ->  Foreign Scan on public.ft2
+          Output: c1, c2, NULL::integer, 'bar'::text, c4, c5, c6, c7, c8, ctid
+          Filter: (postgres_fdw_abs(ft2.c1) > 2000)
+          Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+ (7 rows)
+ 
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
+   c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8 
+ ------+----+-----+----+----+----+------------+----
+  2001 |  1 | bar |    |    |    | ft2        | 
+  2002 |  2 | bar |    |    |    | ft2        | 
+  2003 |  3 | bar |    |    |    | ft2        | 
+  2004 |  4 | bar |    |    |    | ft2        | 
+  2005 |  5 | bar |    |    |    | ft2        | 
+  2006 |  6 | bar |    |    |    | ft2        | 
+  2007 |  7 | bar |    |    |    | ft2        | 
+  2008 |  8 | bar |    |    |    | ft2        | 
+  2009 |  9 | bar |    |    |    | ft2        | 
+  2010 |  0 | bar |    |    |    | ft2        | 
+ (10 rows)
+ 
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'baz'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
+   RETURNING ft2.*, ft4.*, ft5.*;                                                    -- can't be pushed down
+                                                                                                                                           QUERY PLAN                                                                                                                                          
+ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Update on public.ft2
+    Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
+    Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
+    ->  Nested Loop
+          Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
+          Join Filter: (ft2.c2 === ft4.c1)
+          ->  Foreign Scan on public.ft2
+                Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
+                Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
+          ->  Foreign Scan
+                Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
+                Relations: (public.ft4) INNER JOIN (public.ft5)
+                Remote SQL: SELECT CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r3.c1, r3.c2, r3.c3 FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1))))
+                ->  Hash Join
+                      Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
+                      Hash Cond: (ft4.c1 = ft5.c1)
+                      ->  Foreign Scan on public.ft4
+                            Output: ft4.*, ft4.c1, ft4.c2, ft4.c3
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
+                            ->  Foreign Scan on public.ft5
+                                  Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ (24 rows)
+ 
+ UPDATE ft2 SET c3 = 'baz'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
+   RETURNING ft2.*, ft4.*, ft5.*;
+   c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8 | c1 | c2 |   c3   | c1 | c2 |   c3   
+ ------+----+-----+----+----+----+------------+----+----+----+--------+----+----+--------
+  2006 |  6 | baz |    |    |    | ft2        |    |  6 |  7 | AAA006 |  6 |  7 | AAA006
+ (1 row)
+ 
+ EXPLAIN (verbose, costs off)
+ DELETE FROM ft2
+   USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2.c1, ft2.c2, ft2.c3;                                       -- can't be pushed down
+                                                                                                                                                                      QUERY PLAN                                                                                                                                                                     
+ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Delete on public.ft2
+    Output: ft2.ctid, ft2.c1, ft2.c2, ft2.c3
+    Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c2, c3, ctid
+    ->  Foreign Scan
+          Output: ft2.ctid, ft4.*, ft5.*
+          Filter: (ft4.c1 === ft5.c1)
+          Relations: ((public.ft2) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
+          Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r2.c1, r3.c1 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c2 = r2.c1)) AND ((r1."C 1" > 2000)))) INNER JOIN "S 1"."T 4" r3 ON (TRUE)) FOR UPDATE OF r1
+          ->  Nested Loop
+                Output: ft2.ctid, ft4.*, ft5.*, ft4.c1, ft5.c1
+                ->  Nested Loop
+                      Output: ft2.ctid, ft4.*, ft4.c1
+                      Join Filter: (ft2.c2 = ft4.c1)
+                      ->  Foreign Scan on public.ft2
+                            Output: ft2.ctid, ft2.c2
+                            Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
+                      ->  Foreign Scan on public.ft4
+                            Output: ft4.*, ft4.c1
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                ->  Foreign Scan on public.ft5
+                      Output: ft5.*, ft5.c1
+                      Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ (22 rows)
+ 
+ DELETE FROM ft2
+   USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2.c1, ft2.c2, ft2.c3;
+    ctid   |  c1  | c2 | c3  
+ ----------+------+----+-----
+  (12,112) | 2006 |  6 | baz
+ (1 row)
+ 
+ DELETE FROM ft2 WHERE ft2.c1 > 2000;
+ ALTER SERVER loopback OPTIONS (ADD extensions 'postgres_fdw');
  -- Test that trigger on remote table works as expected
  CREATE OR REPLACE FUNCTION "S 1".F_BRTRIG() RETURNS trigger AS $$
  BEGIN
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 210,215 **** typedef struct PgFdwDirectModifyState
--- 210,220 ----
  	PGresult   *result;			/* result for query */
  	int			num_tuples;		/* # of result tuples */
  	int			next_tuple;		/* index of next one to return */
+ 	Relation	resultRel;		/* relcache entry for the target relation */
+ 	AttrNumber *attnoMap;		/* array of attnums of input user columns */
+ 	AttrNumber	ctidAttno;		/* attnum of input ctid column */
+ 	AttrNumber	oidAttno;		/* attnum of input oid column */
+ 	bool		hasSystemCols;	/* are there system columns of resultRel? */
  
  	/* working memory context */
  	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
***************
*** 376,383 **** static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
--- 381,397 ----
  						 TupleTableSlot *slot);
  static void store_returning_result(PgFdwModifyState *fmstate,
  					   TupleTableSlot *slot, PGresult *res);
+ static List *build_remote_returning(Index rtindex, Relation rel,
+ 									List *returningList);
+ static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
  static void execute_dml_stmt(ForeignScanState *node);
  static TupleTableSlot *get_returning_data(ForeignScanState *node);
+ static void init_returning_filter(PgFdwDirectModifyState *dmstate,
+ 					  List *fdw_scan_tlist,
+ 					  Index rtindex);
+ static TupleTableSlot *apply_returning_filter(PgFdwDirectModifyState *dmstate,
+ 					   TupleTableSlot *slot,
+ 					   EState *estate);
  static void prepare_query_params(PlanState *node,
  					 List *fdw_exprs,
  					 int numParams,
***************
*** 2144,2157 **** postgresPlanDirectModify(PlannerInfo *root,
  	if (subplan->qual != NIL)
  		return false;
  
- 	/*
- 	 * We can't handle an UPDATE or DELETE on a foreign join for now.
- 	 */
- 	if (fscan->scan.scanrelid == 0)
- 		return false;
- 
  	/* Safe to fetch data about the target foreign rel */
! 	foreignrel = root->simple_rel_array[resultRelation];
  	rte = root->simple_rte_array[resultRelation];
  	fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
--- 2158,2172 ----
  	if (subplan->qual != NIL)
  		return false;
  
  	/* Safe to fetch data about the target foreign rel */
! 	if (fscan->scan.scanrelid == 0)
! 	{
! 		foreignrel = find_join_rel(root, fscan->fs_relids);
! 		/* We should have a rel for this foreign join. */
! 		Assert(foreignrel);
! 	}
! 	else
! 		foreignrel = root->simple_rel_array[resultRelation];
  	rte = root->simple_rte_array[resultRelation];
  	fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
***************
*** 2212,2219 **** postgresPlanDirectModify(PlannerInfo *root,
--- 2227,2249 ----
  	 * Extract the relevant RETURNING list if any.
  	 */
  	if (plan->returningLists)
+ 	{
  		returningList = (List *) list_nth(plan->returningLists, subplan_index);
  
+ 		/*
+ 		 * When performing an UPDATE/DELETE .. RETURNING on a join directly,
+ 		 * we fetch from the foreign server any Vars specified in RETURNING
+ 		 * that refer not only to the target relation but to non-target
+ 		 * relations.  So we'll deparse them into the RETURNING clause of the
+ 		 * remote query; use a targetlist consisting of them instead, which
+ 		 * will be adjusted to be new fdw_scan_tlist of the foreign-scan plan
+ 		 * node below.
+ 		 */
+ 		if (fscan->scan.scanrelid == 0)
+ 			returningList = build_remote_returning(resultRelation, rel,
+ 												   returningList);
+ 	}
+ 
  	/*
  	 * Construct the SQL command string.
  	 */
***************
*** 2221,2226 **** postgresPlanDirectModify(PlannerInfo *root,
--- 2251,2257 ----
  	{
  		case CMD_UPDATE:
  			deparseDirectUpdateSql(&sql, root, resultRelation, rel,
+ 								   foreignrel,
  								   ((Plan *) fscan)->targetlist,
  								   targetAttrs,
  								   remote_exprs, &params_list,
***************
*** 2228,2233 **** postgresPlanDirectModify(PlannerInfo *root,
--- 2259,2265 ----
  			break;
  		case CMD_DELETE:
  			deparseDirectDeleteSql(&sql, root, resultRelation, rel,
+ 								   foreignrel,
  								   remote_exprs, &params_list,
  								   returningList, &retrieved_attrs);
  			break;
***************
*** 2255,2260 **** postgresPlanDirectModify(PlannerInfo *root,
--- 2287,2305 ----
  									retrieved_attrs,
  									makeInteger(plan->canSetTag));
  
+ 	/*
+ 	 * Update the foreign-join-related fields.
+ 	 */
+ 	if (fscan->scan.scanrelid == 0)
+ 	{
+ 		/* No need for the outer subplan. */
+ 		fscan->scan.plan.lefttree = NULL;
+ 
+ 		/* Build new fdw_scan_tlist if UPDATE/DELETE .. RETURNING. */
+ 	  	if (returningList)
+ 			rebuild_fdw_scan_tlist(fscan, returningList);
+ 	}
+ 
  	heap_close(rel, NoLock);
  	return true;
  }
***************
*** 2269,2274 **** postgresBeginDirectModify(ForeignScanState *node, int eflags)
--- 2314,2320 ----
  	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
  	EState	   *estate = node->ss.ps.state;
  	PgFdwDirectModifyState *dmstate;
+ 	Index		rtindex;
  	RangeTblEntry *rte;
  	Oid			userid;
  	ForeignTable *table;
***************
*** 2291,2301 **** postgresBeginDirectModify(ForeignScanState *node, int eflags)
  	 * Identify which user to do the remote access as.  This should match what
  	 * ExecCheckRTEPerms() does.
  	 */
! 	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
  	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
  
  	/* Get info about foreign table. */
! 	dmstate->rel = node->ss.ss_currentRelation;
  	table = GetForeignTable(RelationGetRelid(dmstate->rel));
  	user = GetUserMapping(userid, table->serverid);
  
--- 2337,2351 ----
  	 * Identify which user to do the remote access as.  This should match what
  	 * ExecCheckRTEPerms() does.
  	 */
! 	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
! 	rte = rt_fetch(rtindex, estate->es_range_table);
  	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
  
  	/* Get info about foreign table. */
! 	if (fsplan->scan.scanrelid == 0)
! 		dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
! 	else
! 		dmstate->rel = node->ss.ss_currentRelation;
  	table = GetForeignTable(RelationGetRelid(dmstate->rel));
  	user = GetUserMapping(userid, table->serverid);
  
***************
*** 2305,2310 **** postgresBeginDirectModify(ForeignScanState *node, int eflags)
--- 2355,2375 ----
  	 */
  	dmstate->conn = GetConnection(user, false);
  
+ 	/* Update the foreign-join-related fields. */
+ 	if (fsplan->scan.scanrelid == 0)
+ 	{
+ 		/* Save info about foreign table. */
+ 		dmstate->resultRel = dmstate->rel;
+ 
+ 		/*
+ 		 * Set dmstate->rel to NULL to teach get_returning_data() and
+ 		 * make_tuple_from_result_row() that columns fetched from the remote
+ 		 * server are described by fdw_scan_tlist of the foreign-scan plan
+ 		 * node, not the tuple descriptor for the target relation.
+ 		 */
+ 		dmstate->rel = NULL;
+ 	}
+ 
  	/* Initialize state variable */
  	dmstate->num_tuples = -1;	/* -1 means not set yet */
  
***************
*** 2325,2331 **** postgresBeginDirectModify(ForeignScanState *node, int eflags)
  
  	/* Prepare for input conversion of RETURNING results. */
  	if (dmstate->has_returning)
! 		dmstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(dmstate->rel));
  
  	/*
  	 * Prepare for processing of parameters used in remote query, if any.
--- 2390,2413 ----
  
  	/* Prepare for input conversion of RETURNING results. */
  	if (dmstate->has_returning)
! 	{
! 		TupleDesc	tupdesc;
! 
! 		if (fsplan->scan.scanrelid == 0)
! 			tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
! 		else
! 			tupdesc = RelationGetDescr(dmstate->rel);
! 
! 		dmstate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 
! 		/*
! 		 * When performing an UPDATE/DELETE .. RETURNING on a join directly,
! 		 * initialize a filter to extract an updated/deleted tuple from a scan
! 		 * tuple.
! 		 */
! 		if (fsplan->scan.scanrelid == 0)
! 			init_returning_filter(dmstate, fsplan->fdw_scan_tlist, rtindex);
! 	}
  
  	/*
  	 * Prepare for processing of parameters used in remote query, if any.
***************
*** 2406,2411 **** postgresEndDirectModify(ForeignScanState *node)
--- 2488,2497 ----
  	ReleaseConnection(dmstate->conn);
  	dmstate->conn = NULL;
  
+ 	/* close the target relation. */
+ 	if (dmstate->resultRel)
+ 		ExecCloseScanRelation(dmstate->resultRel);
+ 
  	/* MemoryContext will be deleted automatically. */
  }
  
***************
*** 3273,3278 **** store_returning_result(PgFdwModifyState *fmstate,
--- 3359,3494 ----
  }
  
  /*
+  * build_remote_returning
+  *		Build a RETURNING targetlist of a remote query for performing an
+  *		UPDATE/DELETE .. RETURNING on a join directly
+  */
+ static List *
+ build_remote_returning(Index rtindex, Relation rel, List *returningList)
+ {
+ 	bool		have_wholerow = false;
+ 	List	   *tlist = NIL;
+ 	List	   *vars;
+ 	ListCell   *lc;
+ 
+ 	Assert(returningList);
+ 
+ 	vars = pull_var_clause((Node *) returningList, PVC_INCLUDE_PLACEHOLDERS);
+ 
+ 	/*
+ 	 * If there's a whole-row reference to the target relation, then we'll
+ 	 * need all the columns of the relation.
+ 	 */
+ 	foreach(lc, vars)
+ 	{
+ 		Var		   *var = (Var *) lfirst(lc);
+ 
+ 		if (IsA(var, Var) &&
+ 			var->varno == rtindex &&
+ 			var->varattno == InvalidAttrNumber)
+ 		{
+ 			have_wholerow = true;
+ 			break;
+ 		}
+ 	}
+ 
+ 	if (have_wholerow)
+ 	{
+ 		TupleDesc	tupdesc = RelationGetDescr(rel);
+ 		int			i;
+ 
+ 		for (i = 1; i <= tupdesc->natts; i++)
+ 		{
+ 			Form_pg_attribute attr = TupleDescAttr(tupdesc, i - 1);
+ 			Var		   *var;
+ 
+ 			/* Ignore dropped attributes. */
+ 			if (attr->attisdropped)
+ 				continue;
+ 
+ 			var = makeVar(rtindex,
+ 						  i,
+ 						  attr->atttypid,
+ 						  attr->atttypmod,
+ 						  attr->attcollation,
+ 						  0);
+ 
+ 			tlist = lappend(tlist,
+ 							makeTargetEntry((Expr *) var,
+ 											list_length(tlist) + 1,
+ 											NULL,
+ 											false));
+ 		}
+ 	}
+ 
+ 	/* Now add any remaining columns to tlist. */
+ 	foreach(lc, vars)
+ 	{
+ 		Var		   *var = (Var *) lfirst(lc);
+ 
+ 		/*
+ 		 * No need for whole-row references to the target relation.  We don't
+ 		 * need system columns other than ctid and oid either, since those are
+ 		 * set locally.
+ 		 */
+ 		if (IsA(var, Var) &&
+ 			var->varno == rtindex &&
+ 			var->varattno <= InvalidAttrNumber &&
+ 			var->varattno != SelfItemPointerAttributeNumber &&
+ 			var->varattno != ObjectIdAttributeNumber)
+ 			continue;		/* don't need it */
+ 
+ 		if (tlist_member((Expr *) var, tlist))
+ 			continue;		/* already got it */
+ 
+ 		tlist = lappend(tlist,
+ 						makeTargetEntry((Expr *) var,
+ 										list_length(tlist) + 1,
+ 										NULL,
+ 										false));
+ 	}
+ 
+ 	list_free(vars);
+ 
+ 	return tlist;
+ }
+ 
+ /*
+  * rebuild_fdw_scan_tlist
+  *		Build new fdw_scan_tlist of given foreign-scan plan node from given
+  *		tlist
+  *
+  * There might be columns that the fdw_scan_tlist of the given foreign-scan
+  * plan node contains that the given tlist doesn't.  The fdw_scan_tlist would
+  * have contained resjunk columns such as 'ctid' of the target relation and
+  * 'wholerow' of non-target relations, but the tlist might not contain them,
+  * for example.  So, adjust the tlist so it contains all the columns specified
+  * in the fdw_scan_tlist; else setrefs.c will get confused.
+  */
+ static void
+ rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist)
+ {
+ 	List	   *new_tlist = tlist;
+ 	List	   *old_tlist = fscan->fdw_scan_tlist;
+ 	ListCell   *lc;
+ 
+ 	foreach(lc, old_tlist)
+ 	{
+ 		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ 
+ 		if (tlist_member(tle->expr, new_tlist))
+ 			continue;		/* already got it */
+ 
+ 		new_tlist = lappend(new_tlist,
+ 							makeTargetEntry(tle->expr,
+ 											list_length(new_tlist) + 1,
+ 											NULL,
+ 											false));
+ 	}
+ 	fscan->fdw_scan_tlist = new_tlist;
+ }
+ 
+ /*
   * Execute a direct UPDATE/DELETE statement.
   */
  static void
***************
*** 3332,3337 **** get_returning_data(ForeignScanState *node)
--- 3548,3554 ----
  	EState	   *estate = node->ss.ps.state;
  	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
  	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ 	TupleTableSlot *resultSlot;
  
  	Assert(resultRelInfo->ri_projectReturning);
  
***************
*** 3349,3355 **** get_returning_data(ForeignScanState *node)
--- 3566,3575 ----
  	 * "UPDATE/DELETE .. RETURNING 1" for example.)
  	 */
  	if (!dmstate->has_returning)
+ 	{
  		ExecStoreAllNullTuple(slot);
+ 		resultSlot = slot;
+ 	}
  	else
  	{
  		/*
***************
*** 3365,3371 **** get_returning_data(ForeignScanState *node)
  												dmstate->rel,
  												dmstate->attinmeta,
  												dmstate->retrieved_attrs,
! 												NULL,
  												dmstate->temp_cxt);
  			ExecStoreTuple(newtup, slot, InvalidBuffer, false);
  		}
--- 3585,3591 ----
  												dmstate->rel,
  												dmstate->attinmeta,
  												dmstate->retrieved_attrs,
! 												node,
  												dmstate->temp_cxt);
  			ExecStoreTuple(newtup, slot, InvalidBuffer, false);
  		}
***************
*** 3376,3391 **** get_returning_data(ForeignScanState *node)
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
  	}
  	dmstate->next_tuple++;
  
  	/* Make slot available for evaluation of the local query RETURNING list. */
! 	resultRelInfo->ri_projectReturning->pi_exprContext->ecxt_scantuple = slot;
  
  	return slot;
  }
  
  /*
   * Prepare for processing of parameters used in remote query.
   */
  static void
--- 3596,3798 ----
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
+ 
+ 		/* Get the updated/deleted tuple. */
+ 		if (dmstate->rel)
+ 			resultSlot = slot;
+ 		else
+ 			resultSlot = apply_returning_filter(dmstate, slot, estate);
  	}
  	dmstate->next_tuple++;
  
  	/* Make slot available for evaluation of the local query RETURNING list. */
! 	resultRelInfo->ri_projectReturning->pi_exprContext->ecxt_scantuple = resultSlot;
  
  	return slot;
  }
  
  /*
+  * Initialize a filter to extract an updated/deleted tuple from a scan tuple.
+  */
+ static void
+ init_returning_filter(PgFdwDirectModifyState *dmstate,
+ 					  List *fdw_scan_tlist,
+ 					  Index rtindex)
+ {
+ 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
+ 	ListCell   *lc;
+ 	int			i;
+ 
+ 	/*
+ 	 * Calculate the mapping between the fdw_scan_tlist's entries and the
+ 	 * result tuple's attributes.
+ 	 *
+ 	 * The "map" is an array of indexes of the result tuple's attributes in
+ 	 * fdw_scan_tlist, i.e., one entry for every attribute of the result
+ 	 * tuple.  We store zero for any attributes that don't have the
+ 	 * corresponding entries in that list, marking that a NULL is needed in
+ 	 * the result tuple.
+ 	 *
+ 	 * Also get the indexes of the entries for ctid and oid if any.
+ 	 */
+ 	dmstate->attnoMap = (AttrNumber *) palloc0(resultTupType->natts * sizeof(AttrNumber));
+ 
+ 	dmstate->ctidAttno = dmstate->oidAttno = 0;
+ 
+ 	i = 1;
+ 	dmstate->hasSystemCols = false;
+ 	foreach(lc, fdw_scan_tlist)
+ 	{
+ 		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ 		Var		   *var = (Var *) tle->expr;
+ 
+ 		Assert(IsA(var, Var));
+ 
+ 		/*
+ 		 * If the Var is a column of the target relation to be retrieved from
+ 		 * the foreign server, get the index of the entry.
+ 		 */
+ 		if (var->varno == rtindex &&
+ 			list_member_int(dmstate->retrieved_attrs, i))
+ 		{
+ 			int			attrno = var->varattno;
+ 
+ 			if (attrno < 0)
+ 			{
+ 				/*
+ 				 * We don't retrieve system columns other than ctid and oid.
+ 				 */
+ 				if (attrno == SelfItemPointerAttributeNumber)
+ 					dmstate->ctidAttno = i;
+ 				else if (attrno == ObjectIdAttributeNumber)
+ 					dmstate->oidAttno = i;
+ 				else
+ 					Assert(false);
+ 				dmstate->hasSystemCols = true;
+ 			}
+ 			else
+ 			{
+ 				/*
+ 				 * We don't retrieve whole-row references to the target
+ 				 * relation either.
+ 				 */
+ 				Assert(attrno > 0);
+ 
+ 				dmstate->attnoMap[attrno - 1] = i;
+ 			}
+ 		}
+ 		i++;
+ 	}
+ }
+ 
+ /*
+  * Extract and return an updated/deleted tuple from a scan tuple.
+  */
+ static TupleTableSlot *
+ apply_returning_filter(PgFdwDirectModifyState *dmstate,
+ 					   TupleTableSlot *slot,
+ 					   EState *estate)
+ {
+ 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
+ 	TupleTableSlot *resultSlot;
+ 	Datum	   *values;
+ 	bool	   *isnull;
+ 	Datum	   *old_values;
+ 	bool	   *old_isnull;
+ 	int			i;
+ 
+ 	/*
+ 	 * Use the trigger tuple slot as a place to store the result tuple.
+ 	 */
+ 	resultSlot = estate->es_trig_tuple_slot;
+ 	if (resultSlot->tts_tupleDescriptor != resultTupType)
+ 		ExecSetSlotDescriptor(resultSlot, resultTupType);
+ 
+ 	/*
+ 	 * Extract all the values of the scan tuple.
+ 	 */
+ 	slot_getallattrs(slot);
+ 	old_values = slot->tts_values;
+ 	old_isnull = slot->tts_isnull;
+ 
+ 	/*
+ 	 * Prepare to build the result tuple.
+ 	 */
+ 	ExecClearTuple(resultSlot);
+ 	values = resultSlot->tts_values;
+ 	isnull = resultSlot->tts_isnull;
+ 
+ 	/*
+ 	 * Transpose data into proper fields of the result tuple.
+ 	 */
+ 	for (i = 0; i < resultTupType->natts; i++)
+ 	{
+ 		int			j = dmstate->attnoMap[i];
+ 
+ 		if (j == 0)
+ 		{
+ 			values[i] = (Datum) 0;
+ 			isnull[i] = true;
+ 		}
+ 		else
+ 		{
+ 			values[i] = old_values[j - 1];
+ 			isnull[i] = old_isnull[j - 1];
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Build the virtual tuple.
+ 	 */
+ 	ExecStoreVirtualTuple(resultSlot);
+ 
+ 	/*
+ 	 * If we have any system columns to return, install them.
+ 	 */
+ 	if (dmstate->hasSystemCols)
+ 	{
+ 		HeapTuple	resultTup = ExecMaterializeSlot(resultSlot);
+ 
+ 		/* ctid */
+ 		if (dmstate->ctidAttno)
+ 		{
+ 			ItemPointer ctid = NULL;
+ 
+ 			ctid = (ItemPointer) DatumGetPointer(old_values[dmstate->ctidAttno - 1]);
+ 			resultTup->t_self = *ctid;
+ 		}
+ 
+ 		/* oid */
+ 		if (dmstate->oidAttno)
+ 		{
+ 			Oid			oid = InvalidOid;
+ 
+ 			oid = DatumGetObjectId(old_values[dmstate->oidAttno - 1]);
+ 			HeapTupleSetOid(resultTup, oid);
+ 		}
+ 
+ 		/*
+ 		 * And remaining columns
+ 		 *
+ 		 * Note: since we currently don't allow the target relation to appear
+ 		 * on the nullable side of an outer join, any system columns wouldn't
+ 		 * go to NULL.
+ 		 *
+ 		 * Note: no need to care about tableoid here because it will be
+ 		 * initialized in ExecProcessReturning().
+ 		 */
+ 		HeapTupleHeaderSetXmin(resultTup->t_data, InvalidTransactionId);
+ 		HeapTupleHeaderSetXmax(resultTup->t_data, InvalidTransactionId);
+ 		HeapTupleHeaderSetCmin(resultTup->t_data, InvalidTransactionId);
+ 	}
+ 
+ 	/*
+ 	 * And return the result tuple.
+ 	 */
+ 	return resultSlot;
+ }
+ 
+ /*
   * Prepare for processing of parameters used in remote query.
   */
  static void
***************
*** 4943,4953 **** make_tuple_from_result_row(PGresult *res,
  		tupdesc = RelationGetDescr(rel);
  	else
  	{
- 		PgFdwScanState *fdw_sstate;
- 
  		Assert(fsstate);
! 		fdw_sstate = (PgFdwScanState *) fsstate->fdw_state;
! 		tupdesc = fdw_sstate->tupdesc;
  	}
  
  	values = (Datum *) palloc0(tupdesc->natts * sizeof(Datum));
--- 5350,5357 ----
  		tupdesc = RelationGetDescr(rel);
  	else
  	{
  		Assert(fsstate);
! 		tupdesc = fsstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
  	}
  
  	values = (Datum *) palloc0(tupdesc->natts * sizeof(Datum));
*** a/contrib/postgres_fdw/postgres_fdw.h
--- b/contrib/postgres_fdw/postgres_fdw.h
***************
*** 150,155 **** extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
--- 150,156 ----
  				 List **retrieved_attrs);
  extern void deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
+ 					   RelOptInfo *foreignrel,
  					   List *targetlist,
  					   List *targetAttrs,
  					   List *remote_conds,
***************
*** 162,167 **** extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
--- 163,169 ----
  				 List **retrieved_attrs);
  extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
+ 					   RelOptInfo *foreignrel,
  					   List *remote_conds,
  					   List **params_list,
  					   List *returningList,
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 1064,1077 **** UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING
  UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING *;
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
!   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;                               -- can't be pushed down
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
  EXPLAIN (verbose, costs off)
    DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;                               -- can be pushed down
  DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
  EXPLAIN (verbose, costs off)
! DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;                -- can't be pushed down
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
  EXPLAIN (verbose, costs off)
--- 1064,1077 ----
  UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING *;
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
!   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;                               -- can be pushed down
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
  EXPLAIN (verbose, costs off)
    DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;                               -- can be pushed down
  DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
  EXPLAIN (verbose, costs off)
! DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;                -- can be pushed down
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
  EXPLAIN (verbose, costs off)
***************
*** 1084,1089 **** EXPLAIN (verbose, costs off)
--- 1084,1141 ----
  DELETE FROM ft2 WHERE c1 = 9999 RETURNING tableoid::regclass;                       -- can be pushed down
  DELETE FROM ft2 WHERE c1 = 9999 RETURNING tableoid::regclass;
  
+ -- Test UPDATE/DELETE with RETURNING on a three-table join
+ INSERT INTO ft2 (c1,c2,c3)
+   SELECT id, id - 1200, to_char(id, 'FM00000') FROM generate_series(1201, 1300) id;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'foo'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2, ft2.*, ft4.ctid, ft4, ft4.*;                             -- can be pushed down
+ UPDATE ft2 SET c3 = 'foo'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2, ft2.*, ft4.ctid, ft4, ft4.*;
+ EXPLAIN (verbose, costs off)
+ DELETE FROM ft2
+   USING ft4 LEFT JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c1 % 10 = 0 AND ft2.c2 = ft4.c1
+   RETURNING 100;                                                                    -- can be pushed down
+ DELETE FROM ft2
+   USING ft4 LEFT JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c1 % 10 = 0 AND ft2.c2 = ft4.c1
+   RETURNING 100;
+ DELETE FROM ft2 WHERE ft2.c1 > 1200;
+ 
+ -- Test UPDATE/DELETE with WHERE or JOIN/ON conditions containing
+ -- user-defined operators/functions
+ ALTER SERVER loopback OPTIONS (DROP extensions);
+ INSERT INTO ft2 (c1,c2,c3)
+   SELECT id, id % 10, to_char(id, 'FM00000') FROM generate_series(2001, 2010) id;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;            -- can't be pushed down
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'baz'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
+   RETURNING ft2.*, ft4.*, ft5.*;                                                    -- can't be pushed down
+ UPDATE ft2 SET c3 = 'baz'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
+   RETURNING ft2.*, ft4.*, ft5.*;
+ EXPLAIN (verbose, costs off)
+ DELETE FROM ft2
+   USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2.c1, ft2.c2, ft2.c3;                                       -- can't be pushed down
+ DELETE FROM ft2
+   USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2.c1, ft2.c2, ft2.c3;
+ DELETE FROM ft2 WHERE ft2.c1 > 2000;
+ ALTER SERVER loopback OPTIONS (ADD extensions 'postgres_fdw');
+ 
  -- Test that trigger on remote table works as expected
  CREATE OR REPLACE FUNCTION "S 1".F_BRTRIG() RETURNS trigger AS $$
  BEGIN
#2Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#1)
1 attachment(s)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2017/12/27 20:55), Etsuro Fujita wrote:

Attached is an updated version of the patch.

I revised code/comments a little bit. PFA new version.

Best regards,
Etsuro Fujita

Attachments:

postgres-fdw-more-update-pushdown-v6.patchtext/x-diff; name=postgres-fdw-more-update-pushdown-v6.patchDownload
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***************
*** 132,138 **** static void deparseTargetList(StringInfo buf,
  				  Bitmapset *attrs_used,
  				  bool qualify_col,
  				  List **retrieved_attrs);
! static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  						  deparse_expr_cxt *context);
  static void deparseSubqueryTargetList(deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
--- 132,140 ----
  				  Bitmapset *attrs_used,
  				  bool qualify_col,
  				  List **retrieved_attrs);
! static void deparseExplicitTargetList(List *tlist,
! 						  bool is_returning,
! 						  List **retrieved_attrs,
  						  deparse_expr_cxt *context);
  static void deparseSubqueryTargetList(deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
***************
*** 168,178 **** static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
  static void appendConditions(List *exprs, deparse_expr_cxt *context);
  static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
! 					  RelOptInfo *joinrel, bool use_alias, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
  static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
  				   RelOptInfo *foreignrel, bool make_subquery,
! 				   List **params_list);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
  static void appendAggOrderBy(List *orderList, List *targetList,
--- 170,182 ----
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
  static void appendConditions(List *exprs, deparse_expr_cxt *context);
  static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
! 					  RelOptInfo *foreignrel, bool use_alias,
! 					  Index ignore_rel, List **ignore_conds,
! 					  List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
  static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
  				   RelOptInfo *foreignrel, bool make_subquery,
! 				   Index ignore_rel, List **ignore_conds, List **params_list);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
  static void appendAggOrderBy(List *orderList, List *targetList,
***************
*** 1028,1034 **** deparseSelectSql(List *tlist, bool is_subquery, List **retrieved_attrs,
  		 * For a join or upper relation the input tlist gives the list of
  		 * columns required to be fetched from the foreign server.
  		 */
! 		deparseExplicitTargetList(tlist, retrieved_attrs, context);
  	}
  	else
  	{
--- 1032,1038 ----
  		 * For a join or upper relation the input tlist gives the list of
  		 * columns required to be fetched from the foreign server.
  		 */
! 		deparseExplicitTargetList(tlist, false, retrieved_attrs, context);
  	}
  	else
  	{
***************
*** 1071,1077 **** deparseFromExpr(List *quals, deparse_expr_cxt *context)
  	appendStringInfoString(buf, " FROM ");
  	deparseFromExprForRel(buf, context->root, scanrel,
  						  (bms_num_members(scanrel->relids) > 1),
! 						  context->params_list);
  
  	/* Construct WHERE clause */
  	if (quals != NIL)
--- 1075,1081 ----
  	appendStringInfoString(buf, " FROM ");
  	deparseFromExprForRel(buf, context->root, scanrel,
  						  (bms_num_members(scanrel->relids) > 1),
! 						  (Index) 0, NULL, context->params_list);
  
  	/* Construct WHERE clause */
  	if (quals != NIL)
***************
*** 1340,1348 **** get_jointype_name(JoinType jointype)
   *
   * retrieved_attrs is the list of continuously increasing integers starting
   * from 1. It has same number of entries as tlist.
   */
  static void
! deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  						  deparse_expr_cxt *context)
  {
  	ListCell   *lc;
--- 1344,1357 ----
   *
   * retrieved_attrs is the list of continuously increasing integers starting
   * from 1. It has same number of entries as tlist.
+  *
+  * This is used for both SELECT and RETURNING targetlists; the is_returning
+  * parameter is true only for a RETURNING targetlist.
   */
  static void
! deparseExplicitTargetList(List *tlist,
! 						  bool is_returning,
! 						  List **retrieved_attrs,
  						  deparse_expr_cxt *context)
  {
  	ListCell   *lc;
***************
*** 1357,1369 **** deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  
  		if (i > 0)
  			appendStringInfoString(buf, ", ");
  		deparseExpr((Expr *) tle->expr, context);
  
  		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
  		i++;
  	}
  
! 	if (i == 0)
  		appendStringInfoString(buf, "NULL");
  }
  
--- 1366,1381 ----
  
  		if (i > 0)
  			appendStringInfoString(buf, ", ");
+ 		else if (is_returning)
+ 			appendStringInfoString(buf, " RETURNING ");
+ 
  		deparseExpr((Expr *) tle->expr, context);
  
  		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
  		i++;
  	}
  
! 	if (i == 0 && !is_returning)
  		appendStringInfoString(buf, "NULL");
  }
  
***************
*** 1406,1415 **** deparseSubqueryTargetList(deparse_expr_cxt *context)
   * The function constructs ... JOIN ... ON ... for join relation. For a base
   * relation it just returns schema-qualified tablename, with the appropriate
   * alias if so requested.
   */
  static void
  deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 					  bool use_alias, List **params_list)
  {
  	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
--- 1418,1434 ----
   * The function constructs ... JOIN ... ON ... for join relation. For a base
   * relation it just returns schema-qualified tablename, with the appropriate
   * alias if so requested.
+  *
+  * 'ignore_rel' is either zero or the RT index of a target relation.  In the
+  * latter case the function constructs FROM clause of UPDATE or USING clause
+  * of DELETE; it deparses the join relation as if the relation never contained
+  * the target relation, and creates a List of conditions to be deparsed into
+  * the top-level WHERE clause, which is returned to *ignore_conds.
   */
  static void
  deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 					  bool use_alias, Index ignore_rel, List **ignore_conds,
! 					  List **params_list)
  {
  	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
***************
*** 1417,1432 **** deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
  	{
  		StringInfoData join_sql_o;
  		StringInfoData join_sql_i;
  
! 		/* Deparse outer relation */
! 		initStringInfo(&join_sql_o);
! 		deparseRangeTblRef(&join_sql_o, root, fpinfo->outerrel,
! 						   fpinfo->make_outerrel_subquery, params_list);
  
! 		/* Deparse inner relation */
! 		initStringInfo(&join_sql_i);
! 		deparseRangeTblRef(&join_sql_i, root, fpinfo->innerrel,
! 						   fpinfo->make_innerrel_subquery, params_list);
  
  		/*
  		 * For a join relation FROM clause entry is deparsed as
--- 1436,1524 ----
  	{
  		StringInfoData join_sql_o;
  		StringInfoData join_sql_i;
+ 		RelOptInfo *outerrel = fpinfo->outerrel;
+ 		RelOptInfo *innerrel = fpinfo->innerrel;
+ 		bool		outerrel_is_target = false;
+ 		bool		innerrel_is_target = false;
  
! 		if (ignore_rel > 0 && bms_is_member(ignore_rel, foreignrel->relids))
! 		{
! 			/*
! 			 * If this is an inner join, add joinclauses to *ignore_conds and
! 			 * set it to empty so that those can be deparsed into the WHERE
! 			 * clause.  Note that since the target relation can never be
! 			 * within the nullable side of an outer join, those could safely
! 			 * be pulled up into the WHERE clause (see foreign_join_ok()).
! 			 * Note also that since the target relation is only inner-joined
! 			 * to any other relation in the query, all conditions in the join
! 			 * tree mentioning the target relation could be deparsed into the
! 			 * WHERE clause by doing this recursively.
! 			 */
! 			if (fpinfo->jointype == JOIN_INNER)
! 			{
! 				*ignore_conds = list_concat(*ignore_conds,
! 											list_copy(fpinfo->joinclauses));
! 				fpinfo->joinclauses = NIL;
! 			}
  
! 			/*
! 			 * Check if either of the input relations is the target relation.
! 			 */
! 			if (outerrel->relid == ignore_rel)
! 				outerrel_is_target = true;
! 			else if (innerrel->relid == ignore_rel)
! 				innerrel_is_target = true;
! 		}
! 
! 		/* Deparse outer relation if not the target relation. */
! 		if (!outerrel_is_target)
! 		{
! 			initStringInfo(&join_sql_o);
! 			deparseRangeTblRef(&join_sql_o, root, outerrel,
! 							   fpinfo->make_outerrel_subquery,
! 							   ignore_rel, ignore_conds, params_list);
! 
! 			/*
! 			 * If inner relation is the target relation, skip deparsing it.
! 			 * Note that since the join of the target relation with any other
! 			 * relation in the query is an inner join and can never be within
! 			 * the nullable side of an outer join, the join could be
! 			 * interchanged with higher-level joins (cf. identity 1 on outer
! 			 * join reordering shown in src/backend/optimizer/README), which
! 			 * means it's safe to skip the target-relation deparsing here.
! 			 */
! 			if (innerrel_is_target)
! 			{
! 				Assert(fpinfo->jointype == JOIN_INNER);
! 				Assert(fpinfo->joinclauses == NIL);
! 				appendStringInfo(buf, "%s", join_sql_o.data);
! 				return;
! 			}
! 		}
! 
! 		/* Deparse inner relation if not the target relation. */
! 		if (!innerrel_is_target)
! 		{
! 			initStringInfo(&join_sql_i);
! 			deparseRangeTblRef(&join_sql_i, root, innerrel,
! 							   fpinfo->make_innerrel_subquery,
! 							   ignore_rel, ignore_conds, params_list);
! 
! 			/*
! 			 * If outer relation is the target relation, skip deparsing it.
! 			 * See the above note about safety.
! 			 */
! 			if (outerrel_is_target)
! 			{
! 				Assert(fpinfo->jointype == JOIN_INNER);
! 				Assert(fpinfo->joinclauses == NIL);
! 				appendStringInfo(buf, "%s", join_sql_i.data);
! 				return;
! 			}
! 		}
! 
! 		/* Neither of the relations is the target relation. */
! 		Assert(!outerrel_is_target && !innerrel_is_target);
  
  		/*
  		 * For a join relation FROM clause entry is deparsed as
***************
*** 1486,1492 **** deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
   */
  static void
  deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 				   bool make_subquery, List **params_list)
  {
  	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
--- 1578,1585 ----
   */
  static void
  deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
! 				   bool make_subquery, Index ignore_rel, List **ignore_conds,
! 				   List **params_list)
  {
  	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
***************
*** 1501,1506 **** deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
--- 1594,1607 ----
  		List	   *retrieved_attrs;
  		int			ncols;
  
+ 		/*
+ 		 * The given relation shouldn't contain the target relation, because
+ 		 * this should only happen for input relations for a full join, and
+ 		 * such relations can never contain an UPDATE/DELETE target.
+ 		 */
+ 		Assert(ignore_rel == 0 ||
+ 			   !bms_is_member(ignore_rel, foreignrel->relids));
+ 
  		/* Deparse the subquery representing the relation. */
  		appendStringInfoChar(buf, '(');
  		deparseSelectStmtForRel(buf, root, foreignrel, NIL,
***************
*** 1534,1540 **** deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
  		}
  	}
  	else
! 		deparseFromExprForRel(buf, root, foreignrel, true, params_list);
  }
  
  /*
--- 1635,1642 ----
  		}
  	}
  	else
! 		deparseFromExprForRel(buf, root, foreignrel, true, ignore_rel,
! 							  ignore_conds, params_list);
  }
  
  /*
***************
*** 1645,1657 **** deparseUpdateSql(StringInfo buf, PlannerInfo *root,
  /*
   * deparse remote UPDATE statement
   *
!  * The statement text is appended to buf, and we also create an integer List
!  * of the columns being retrieved by RETURNING (if any), which is returned
!  * to *retrieved_attrs.
   */
  void
  deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
  					   List *targetlist,
  					   List *targetAttrs,
  					   List *remote_conds,
--- 1747,1769 ----
  /*
   * deparse remote UPDATE statement
   *
!  * 'buf' is the output buffer to append the statement to
!  * 'rtindex' is the RT index of the associated target relation
!  * 'rel' is the relation descriptor for the target relation
!  * 'foreignrel' is the RelOptInfo for the target relation or the join relation
!  *		containing all base relations in the query
!  * 'targetlist' is the tlist of the underlying foreign-scan plan node
!  * 'targetAttrs' is the target columns of the UPDATE
!  * 'remote_conds' is the qual clauses that must be evaluated remotely
!  * '*params_list' is an output list of exprs that will become remote Params
!  * 'returningList' is the RETURNING targetlist
!  * '*retrieved_attrs' is an output list of integers of columns being retrieved
!  *		by RETURNING (if any)
   */
  void
  deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
+ 					   RelOptInfo *foreignrel,
  					   List *targetlist,
  					   List *targetAttrs,
  					   List *remote_conds,
***************
*** 1659,1665 **** deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  					   List *returningList,
  					   List **retrieved_attrs)
  {
- 	RelOptInfo *baserel = root->simple_rel_array[rtindex];
  	deparse_expr_cxt context;
  	int			nestlevel;
  	bool		first;
--- 1771,1776 ----
***************
*** 1667,1679 **** deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  
  	/* Set up context struct for recursion */
  	context.root = root;
! 	context.foreignrel = baserel;
! 	context.scanrel = baserel;
  	context.buf = buf;
  	context.params_list = params_list;
  
  	appendStringInfoString(buf, "UPDATE ");
  	deparseRelation(buf, rel);
  	appendStringInfoString(buf, " SET ");
  
  	/* Make sure any constants in the exprs are printed portably */
--- 1778,1792 ----
  
  	/* Set up context struct for recursion */
  	context.root = root;
! 	context.foreignrel = foreignrel;
! 	context.scanrel = foreignrel;
  	context.buf = buf;
  	context.params_list = params_list;
  
  	appendStringInfoString(buf, "UPDATE ");
  	deparseRelation(buf, rel);
+ 	if (foreignrel->reloptkind == RELOPT_JOINREL)
+ 		appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, rtindex);
  	appendStringInfoString(buf, " SET ");
  
  	/* Make sure any constants in the exprs are printed portably */
***************
*** 1700,1713 **** deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  
  	reset_transmission_modes(nestlevel);
  
  	if (remote_conds)
  	{
  		appendStringInfoString(buf, " WHERE ");
  		appendConditions(remote_conds, &context);
  	}
  
! 	deparseReturningList(buf, root, rtindex, rel, false,
! 						 returningList, retrieved_attrs);
  }
  
  /*
--- 1813,1840 ----
  
  	reset_transmission_modes(nestlevel);
  
+ 	if (foreignrel->reloptkind == RELOPT_JOINREL)
+ 	{
+ 		List	   *ignore_conds = NIL;
+ 
+ 		appendStringInfo(buf, " FROM ");
+ 		deparseFromExprForRel(buf, root, foreignrel, true, rtindex,
+ 							  &ignore_conds, params_list);
+ 		remote_conds = list_concat(remote_conds, ignore_conds);
+ 	}
+ 
  	if (remote_conds)
  	{
  		appendStringInfoString(buf, " WHERE ");
  		appendConditions(remote_conds, &context);
  	}
  
! 	if (foreignrel->reloptkind == RELOPT_JOINREL)
! 		deparseExplicitTargetList(returningList, true, retrieved_attrs,
! 								  &context);
! 	else
! 		deparseReturningList(buf, root, rtindex, rel, false,
! 							 returningList, retrieved_attrs);
  }
  
  /*
***************
*** 1735,1764 **** deparseDeleteSql(StringInfo buf, PlannerInfo *root,
  /*
   * deparse remote DELETE statement
   *
!  * The statement text is appended to buf, and we also create an integer List
!  * of the columns being retrieved by RETURNING (if any), which is returned
!  * to *retrieved_attrs.
   */
  void
  deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
  					   List *remote_conds,
  					   List **params_list,
  					   List *returningList,
  					   List **retrieved_attrs)
  {
- 	RelOptInfo *baserel = root->simple_rel_array[rtindex];
  	deparse_expr_cxt context;
  
  	/* Set up context struct for recursion */
  	context.root = root;
! 	context.foreignrel = baserel;
! 	context.scanrel = baserel;
  	context.buf = buf;
  	context.params_list = params_list;
  
  	appendStringInfoString(buf, "DELETE FROM ");
  	deparseRelation(buf, rel);
  
  	if (remote_conds)
  	{
--- 1862,1910 ----
  /*
   * deparse remote DELETE statement
   *
!  * 'buf' is the output buffer to append the statement to
!  * 'rtindex' is the RT index of the associated target relation
!  * 'rel' is the relation descriptor for the target relation
!  * 'foreignrel' is the RelOptInfo for the target relation or the join relation
!  *		containing all base relations in the query
!  * 'remote_conds' is the qual clauses that must be evaluated remotely
!  * '*params_list' is an output list of exprs that will become remote Params
!  * 'returningList' is the RETURNING targetlist
!  * '*retrieved_attrs' is an output list of integers of columns being retrieved
!  *		by RETURNING (if any)
   */
  void
  deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
+ 					   RelOptInfo *foreignrel,
  					   List *remote_conds,
  					   List **params_list,
  					   List *returningList,
  					   List **retrieved_attrs)
  {
  	deparse_expr_cxt context;
  
  	/* Set up context struct for recursion */
  	context.root = root;
! 	context.foreignrel = foreignrel;
! 	context.scanrel = foreignrel;
  	context.buf = buf;
  	context.params_list = params_list;
  
  	appendStringInfoString(buf, "DELETE FROM ");
  	deparseRelation(buf, rel);
+ 	if (foreignrel->reloptkind == RELOPT_JOINREL)
+ 		appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, rtindex);
+ 
+ 	if (foreignrel->reloptkind == RELOPT_JOINREL)
+ 	{
+ 		List	   *ignore_conds = NIL;
+ 
+ 		appendStringInfo(buf, " USING ");
+ 		deparseFromExprForRel(buf, root, foreignrel, true, rtindex,
+ 							  &ignore_conds, params_list);
+ 		remote_conds = list_concat(remote_conds, ignore_conds);
+ 	}
  
  	if (remote_conds)
  	{
***************
*** 1766,1773 **** deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
  		appendConditions(remote_conds, &context);
  	}
  
! 	deparseReturningList(buf, root, rtindex, rel, false,
! 						 returningList, retrieved_attrs);
  }
  
  /*
--- 1912,1923 ----
  		appendConditions(remote_conds, &context);
  	}
  
! 	if (foreignrel->reloptkind == RELOPT_JOINREL)
! 		deparseExplicitTargetList(returningList, true, retrieved_attrs,
! 								  &context);
! 	else
! 		deparseReturningList(buf, root, rtindex, rel, false,
! 							 returningList, retrieved_attrs);
  }
  
  /*
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 4291,4317 **** UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING
  
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
!   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;                               -- can't be pushed down
!                                                                                                                                                         QUERY PLAN                                                                                                                                                         
! ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
!    Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c3 = $3, c7 = $4 WHERE ctid = $1
!    ->  Foreign Scan
!          Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
!          Relations: (public.ft2) INNER JOIN (public.ft1)
!          Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
!                Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
!                ->  Foreign Scan on public.ft2
!                      Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
!                      Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4291,4303 ----
  
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
!   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;                               -- can be pushed down
!                                                                                                    QUERY PLAN                                                                                                    
! -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
!    ->  Foreign Update
!          Remote SQL: UPDATE "S 1"."T 1" r1 SET c2 = (r1.c2 + 500), c3 = (r1.c3 || '_update9'::text), c7 = 'ft2       '::character(10) FROM "S 1"."T 1" r2 WHERE ((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9))
! (3 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4434,4460 **** DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
  (103 rows)
  
  EXPLAIN (verbose, costs off)
! DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;                -- can't be pushed down
!                                                                                                                               QUERY PLAN                                                                                                                               
! -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Delete on public.ft2
!    Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1
!    ->  Foreign Scan
!          Output: ft2.ctid, ft1.*
!          Relations: (public.ft2) INNER JOIN (public.ft1)
!          Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
!                Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
!                ->  Foreign Scan on public.ft2
!                      Output: ft2.ctid, ft2.c2
!                      Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
!                      Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4420,4432 ----
  (103 rows)
  
  EXPLAIN (verbose, costs off)
! DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;                -- can be pushed down
!                                                          QUERY PLAN                                                         
! ----------------------------------------------------------------------------------------------------------------------------
   Delete on public.ft2
!    ->  Foreign Delete
!          Remote SQL: DELETE FROM "S 1"."T 1" r1 USING "S 1"."T 1" r2 WHERE ((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2))
! (3 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
***************
*** 5330,5335 **** DELETE FROM ft2 WHERE c1 = 9999 RETURNING tableoid::regclass;
--- 5302,5496 ----
   ft2
  (1 row)
  
+ -- Test UPDATE/DELETE with RETURNING on a three-table join
+ INSERT INTO ft2 (c1,c2,c3)
+   SELECT id, id - 1200, to_char(id, 'FM00000') FROM generate_series(1201, 1300) id;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'foo'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2, ft2.*, ft4.ctid, ft4, ft4.*;                             -- can be pushed down
+                                                                                                                                                                                    QUERY PLAN                                                                                                                                                                                    
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Update on public.ft2
+    Output: ft2.ctid, ft2.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.ctid, ft4.*, ft4.c1, ft4.c2, ft4.c3
+    ->  Foreign Update
+          Remote SQL: UPDATE "S 1"."T 1" r1 SET c3 = 'foo'::text FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (TRUE)) WHERE ((r2.c1 = r3.c1)) AND ((r1.c2 = r2.c1)) AND ((r1."C 1" > 1200)) RETURNING r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r1.ctid, r2.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3
+ (4 rows)
+ 
+ UPDATE ft2 SET c3 = 'foo'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2, ft2.*, ft4.ctid, ft4, ft4.*;
+    ctid   |              ft2               |  c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8 |  ctid  |      ft4       | c1 | c2 |   c3   
+ ----------+--------------------------------+------+----+-----+----+----+----+------------+----+--------+----------------+----+----+--------
+  (12,102) | (1206,6,foo,,,,"ft2       ",)  | 1206 |  6 | foo |    |    |    | ft2        |    | (0,6)  | (6,7,AAA006)   |  6 |  7 | AAA006
+  (12,103) | (1212,12,foo,,,,"ft2       ",) | 1212 | 12 | foo |    |    |    | ft2        |    | (0,12) | (12,13,AAA012) | 12 | 13 | AAA012
+  (12,104) | (1218,18,foo,,,,"ft2       ",) | 1218 | 18 | foo |    |    |    | ft2        |    | (0,18) | (18,19,AAA018) | 18 | 19 | AAA018
+  (12,105) | (1224,24,foo,,,,"ft2       ",) | 1224 | 24 | foo |    |    |    | ft2        |    | (0,24) | (24,25,AAA024) | 24 | 25 | AAA024
+  (12,106) | (1230,30,foo,,,,"ft2       ",) | 1230 | 30 | foo |    |    |    | ft2        |    | (0,30) | (30,31,AAA030) | 30 | 31 | AAA030
+  (12,107) | (1236,36,foo,,,,"ft2       ",) | 1236 | 36 | foo |    |    |    | ft2        |    | (0,36) | (36,37,AAA036) | 36 | 37 | AAA036
+  (12,108) | (1242,42,foo,,,,"ft2       ",) | 1242 | 42 | foo |    |    |    | ft2        |    | (0,42) | (42,43,AAA042) | 42 | 43 | AAA042
+  (12,109) | (1248,48,foo,,,,"ft2       ",) | 1248 | 48 | foo |    |    |    | ft2        |    | (0,48) | (48,49,AAA048) | 48 | 49 | AAA048
+  (12,110) | (1254,54,foo,,,,"ft2       ",) | 1254 | 54 | foo |    |    |    | ft2        |    | (0,54) | (54,55,AAA054) | 54 | 55 | AAA054
+  (12,111) | (1260,60,foo,,,,"ft2       ",) | 1260 | 60 | foo |    |    |    | ft2        |    | (0,60) | (60,61,AAA060) | 60 | 61 | AAA060
+  (12,112) | (1266,66,foo,,,,"ft2       ",) | 1266 | 66 | foo |    |    |    | ft2        |    | (0,66) | (66,67,AAA066) | 66 | 67 | AAA066
+  (12,113) | (1272,72,foo,,,,"ft2       ",) | 1272 | 72 | foo |    |    |    | ft2        |    | (0,72) | (72,73,AAA072) | 72 | 73 | AAA072
+  (12,114) | (1278,78,foo,,,,"ft2       ",) | 1278 | 78 | foo |    |    |    | ft2        |    | (0,78) | (78,79,AAA078) | 78 | 79 | AAA078
+  (12,115) | (1284,84,foo,,,,"ft2       ",) | 1284 | 84 | foo |    |    |    | ft2        |    | (0,84) | (84,85,AAA084) | 84 | 85 | AAA084
+  (12,116) | (1290,90,foo,,,,"ft2       ",) | 1290 | 90 | foo |    |    |    | ft2        |    | (0,90) | (90,91,AAA090) | 90 | 91 | AAA090
+  (12,117) | (1296,96,foo,,,,"ft2       ",) | 1296 | 96 | foo |    |    |    | ft2        |    | (0,96) | (96,97,AAA096) | 96 | 97 | AAA096
+ (16 rows)
+ 
+ EXPLAIN (verbose, costs off)
+ DELETE FROM ft2
+   USING ft4 LEFT JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c1 % 10 = 0 AND ft2.c2 = ft4.c1
+   RETURNING 100;                                                                    -- can be pushed down
+                                                                                             QUERY PLAN                                                                                             
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Delete on public.ft2
+    Output: 100
+    ->  Foreign Delete
+          Remote SQL: DELETE FROM "S 1"."T 1" r1 USING ("S 1"."T 3" r2 LEFT JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) WHERE ((r1.c2 = r2.c1)) AND ((r1."C 1" > 1200)) AND (((r1."C 1" % 10) = 0))
+ (4 rows)
+ 
+ DELETE FROM ft2
+   USING ft4 LEFT JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c1 % 10 = 0 AND ft2.c2 = ft4.c1
+   RETURNING 100;
+  ?column? 
+ ----------
+       100
+       100
+       100
+       100
+       100
+       100
+       100
+       100
+       100
+       100
+ (10 rows)
+ 
+ DELETE FROM ft2 WHERE ft2.c1 > 1200;
+ -- Test UPDATE/DELETE with WHERE or JOIN/ON conditions containing
+ -- user-defined operators/functions
+ ALTER SERVER loopback OPTIONS (DROP extensions);
+ INSERT INTO ft2 (c1,c2,c3)
+   SELECT id, id % 10, to_char(id, 'FM00000') FROM generate_series(2001, 2010) id;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;            -- can't be pushed down
+                                                 QUERY PLAN                                                
+ ----------------------------------------------------------------------------------------------------------
+  Update on public.ft2
+    Output: c1, c2, c3, c4, c5, c6, c7, c8
+    Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
+    ->  Foreign Scan on public.ft2
+          Output: c1, c2, NULL::integer, 'bar'::text, c4, c5, c6, c7, c8, ctid
+          Filter: (postgres_fdw_abs(ft2.c1) > 2000)
+          Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+ (7 rows)
+ 
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
+   c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8 
+ ------+----+-----+----+----+----+------------+----
+  2001 |  1 | bar |    |    |    | ft2        | 
+  2002 |  2 | bar |    |    |    | ft2        | 
+  2003 |  3 | bar |    |    |    | ft2        | 
+  2004 |  4 | bar |    |    |    | ft2        | 
+  2005 |  5 | bar |    |    |    | ft2        | 
+  2006 |  6 | bar |    |    |    | ft2        | 
+  2007 |  7 | bar |    |    |    | ft2        | 
+  2008 |  8 | bar |    |    |    | ft2        | 
+  2009 |  9 | bar |    |    |    | ft2        | 
+  2010 |  0 | bar |    |    |    | ft2        | 
+ (10 rows)
+ 
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'baz'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
+   RETURNING ft2.*, ft4.*, ft5.*;                                                    -- can't be pushed down
+                                                                                                                                           QUERY PLAN                                                                                                                                          
+ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Update on public.ft2
+    Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
+    Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
+    ->  Nested Loop
+          Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
+          Join Filter: (ft2.c2 === ft4.c1)
+          ->  Foreign Scan on public.ft2
+                Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
+                Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
+          ->  Foreign Scan
+                Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
+                Relations: (public.ft4) INNER JOIN (public.ft5)
+                Remote SQL: SELECT CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r3.c1, r3.c2, r3.c3 FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1))))
+                ->  Hash Join
+                      Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
+                      Hash Cond: (ft4.c1 = ft5.c1)
+                      ->  Foreign Scan on public.ft4
+                            Output: ft4.*, ft4.c1, ft4.c2, ft4.c3
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
+                            ->  Foreign Scan on public.ft5
+                                  Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ (24 rows)
+ 
+ UPDATE ft2 SET c3 = 'baz'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
+   RETURNING ft2.*, ft4.*, ft5.*;
+   c1  | c2 | c3  | c4 | c5 | c6 |     c7     | c8 | c1 | c2 |   c3   | c1 | c2 |   c3   
+ ------+----+-----+----+----+----+------------+----+----+----+--------+----+----+--------
+  2006 |  6 | baz |    |    |    | ft2        |    |  6 |  7 | AAA006 |  6 |  7 | AAA006
+ (1 row)
+ 
+ EXPLAIN (verbose, costs off)
+ DELETE FROM ft2
+   USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2.c1, ft2.c2, ft2.c3;                                       -- can't be pushed down
+                                                                                                                                                                      QUERY PLAN                                                                                                                                                                     
+ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Delete on public.ft2
+    Output: ft2.ctid, ft2.c1, ft2.c2, ft2.c3
+    Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c2, c3, ctid
+    ->  Foreign Scan
+          Output: ft2.ctid, ft4.*, ft5.*
+          Filter: (ft4.c1 === ft5.c1)
+          Relations: ((public.ft2) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
+          Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r2.c1, r3.c1 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c2 = r2.c1)) AND ((r1."C 1" > 2000)))) INNER JOIN "S 1"."T 4" r3 ON (TRUE)) FOR UPDATE OF r1
+          ->  Nested Loop
+                Output: ft2.ctid, ft4.*, ft5.*, ft4.c1, ft5.c1
+                ->  Nested Loop
+                      Output: ft2.ctid, ft4.*, ft4.c1
+                      Join Filter: (ft2.c2 = ft4.c1)
+                      ->  Foreign Scan on public.ft2
+                            Output: ft2.ctid, ft2.c2
+                            Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
+                      ->  Foreign Scan on public.ft4
+                            Output: ft4.*, ft4.c1
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                ->  Foreign Scan on public.ft5
+                      Output: ft5.*, ft5.c1
+                      Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ (22 rows)
+ 
+ DELETE FROM ft2
+   USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2.c1, ft2.c2, ft2.c3;
+    ctid   |  c1  | c2 | c3  
+ ----------+------+----+-----
+  (12,112) | 2006 |  6 | baz
+ (1 row)
+ 
+ DELETE FROM ft2 WHERE ft2.c1 > 2000;
+ ALTER SERVER loopback OPTIONS (ADD extensions 'postgres_fdw');
  -- Test that trigger on remote table works as expected
  CREATE OR REPLACE FUNCTION "S 1".F_BRTRIG() RETURNS trigger AS $$
  BEGIN
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 210,215 **** typedef struct PgFdwDirectModifyState
--- 210,220 ----
  	PGresult   *result;			/* result for query */
  	int			num_tuples;		/* # of result tuples */
  	int			next_tuple;		/* index of next one to return */
+ 	Relation	resultRel;		/* relcache entry for the target relation */
+ 	AttrNumber *attnoMap;		/* array of attnums of input user columns */
+ 	AttrNumber	ctidAttno;		/* attnum of input ctid column */
+ 	AttrNumber	oidAttno;		/* attnum of input oid column */
+ 	bool		hasSystemCols;	/* are there system columns of resultRel? */
  
  	/* working memory context */
  	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
***************
*** 376,383 **** static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
--- 381,397 ----
  						 TupleTableSlot *slot);
  static void store_returning_result(PgFdwModifyState *fmstate,
  					   TupleTableSlot *slot, PGresult *res);
+ static List *build_remote_returning(Index rtindex, Relation rel,
+ 									List *returningList);
+ static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
  static void execute_dml_stmt(ForeignScanState *node);
  static TupleTableSlot *get_returning_data(ForeignScanState *node);
+ static void init_returning_filter(PgFdwDirectModifyState *dmstate,
+ 					  List *fdw_scan_tlist,
+ 					  Index rtindex);
+ static TupleTableSlot *apply_returning_filter(PgFdwDirectModifyState *dmstate,
+ 					   TupleTableSlot *slot,
+ 					   EState *estate);
  static void prepare_query_params(PlanState *node,
  					 List *fdw_exprs,
  					 int numParams,
***************
*** 2144,2157 **** postgresPlanDirectModify(PlannerInfo *root,
  	if (subplan->qual != NIL)
  		return false;
  
- 	/*
- 	 * We can't handle an UPDATE or DELETE on a foreign join for now.
- 	 */
- 	if (fscan->scan.scanrelid == 0)
- 		return false;
- 
  	/* Safe to fetch data about the target foreign rel */
! 	foreignrel = root->simple_rel_array[resultRelation];
  	rte = root->simple_rte_array[resultRelation];
  	fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
--- 2158,2172 ----
  	if (subplan->qual != NIL)
  		return false;
  
  	/* Safe to fetch data about the target foreign rel */
! 	if (fscan->scan.scanrelid == 0)
! 	{
! 		foreignrel = find_join_rel(root, fscan->fs_relids);
! 		/* We should have a rel for this foreign join. */
! 		Assert(foreignrel);
! 	}
! 	else
! 		foreignrel = root->simple_rel_array[resultRelation];
  	rte = root->simple_rte_array[resultRelation];
  	fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
  
***************
*** 2212,2219 **** postgresPlanDirectModify(PlannerInfo *root,
--- 2227,2249 ----
  	 * Extract the relevant RETURNING list if any.
  	 */
  	if (plan->returningLists)
+ 	{
  		returningList = (List *) list_nth(plan->returningLists, subplan_index);
  
+ 		/*
+ 		 * When performing an UPDATE/DELETE .. RETURNING on a join directly,
+ 		 * we fetch from the foreign server any Vars specified in RETURNING
+ 		 * that refer not only to the target relation but to non-target
+ 		 * relations.  So we'll deparse them into the RETURNING clause of the
+ 		 * remote query; use a targetlist consisting of them instead, which
+ 		 * will be adjusted to be new fdw_scan_tlist of the foreign-scan plan
+ 		 * node below.
+ 		 */
+ 		if (fscan->scan.scanrelid == 0)
+ 			returningList = build_remote_returning(resultRelation, rel,
+ 												   returningList);
+ 	}
+ 
  	/*
  	 * Construct the SQL command string.
  	 */
***************
*** 2221,2226 **** postgresPlanDirectModify(PlannerInfo *root,
--- 2251,2257 ----
  	{
  		case CMD_UPDATE:
  			deparseDirectUpdateSql(&sql, root, resultRelation, rel,
+ 								   foreignrel,
  								   ((Plan *) fscan)->targetlist,
  								   targetAttrs,
  								   remote_exprs, &params_list,
***************
*** 2228,2233 **** postgresPlanDirectModify(PlannerInfo *root,
--- 2259,2265 ----
  			break;
  		case CMD_DELETE:
  			deparseDirectDeleteSql(&sql, root, resultRelation, rel,
+ 								   foreignrel,
  								   remote_exprs, &params_list,
  								   returningList, &retrieved_attrs);
  			break;
***************
*** 2255,2260 **** postgresPlanDirectModify(PlannerInfo *root,
--- 2287,2305 ----
  									retrieved_attrs,
  									makeInteger(plan->canSetTag));
  
+ 	/*
+ 	 * Update the foreign-join-related fields.
+ 	 */
+ 	if (fscan->scan.scanrelid == 0)
+ 	{
+ 		/* No need for the outer subplan. */
+ 		fscan->scan.plan.lefttree = NULL;
+ 
+ 		/* Build new fdw_scan_tlist if UPDATE/DELETE .. RETURNING. */
+ 	  	if (returningList)
+ 			rebuild_fdw_scan_tlist(fscan, returningList);
+ 	}
+ 
  	heap_close(rel, NoLock);
  	return true;
  }
***************
*** 2269,2274 **** postgresBeginDirectModify(ForeignScanState *node, int eflags)
--- 2314,2320 ----
  	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
  	EState	   *estate = node->ss.ps.state;
  	PgFdwDirectModifyState *dmstate;
+ 	Index		rtindex;
  	RangeTblEntry *rte;
  	Oid			userid;
  	ForeignTable *table;
***************
*** 2291,2301 **** postgresBeginDirectModify(ForeignScanState *node, int eflags)
  	 * Identify which user to do the remote access as.  This should match what
  	 * ExecCheckRTEPerms() does.
  	 */
! 	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
  	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
  
  	/* Get info about foreign table. */
! 	dmstate->rel = node->ss.ss_currentRelation;
  	table = GetForeignTable(RelationGetRelid(dmstate->rel));
  	user = GetUserMapping(userid, table->serverid);
  
--- 2337,2351 ----
  	 * Identify which user to do the remote access as.  This should match what
  	 * ExecCheckRTEPerms() does.
  	 */
! 	rtindex = estate->es_result_relation_info->ri_RangeTableIndex;
! 	rte = rt_fetch(rtindex, estate->es_range_table);
  	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
  
  	/* Get info about foreign table. */
! 	if (fsplan->scan.scanrelid == 0)
! 		dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
! 	else
! 		dmstate->rel = node->ss.ss_currentRelation;
  	table = GetForeignTable(RelationGetRelid(dmstate->rel));
  	user = GetUserMapping(userid, table->serverid);
  
***************
*** 2305,2310 **** postgresBeginDirectModify(ForeignScanState *node, int eflags)
--- 2355,2375 ----
  	 */
  	dmstate->conn = GetConnection(user, false);
  
+ 	/* Update the foreign-join-related fields. */
+ 	if (fsplan->scan.scanrelid == 0)
+ 	{
+ 		/* Save info about foreign table. */
+ 		dmstate->resultRel = dmstate->rel;
+ 
+ 		/*
+ 		 * Set dmstate->rel to NULL to teach get_returning_data() and
+ 		 * make_tuple_from_result_row() that columns fetched from the remote
+ 		 * server are described by fdw_scan_tlist of the foreign-scan plan
+ 		 * node, not the tuple descriptor for the target relation.
+ 		 */
+ 		dmstate->rel = NULL;
+ 	}
+ 
  	/* Initialize state variable */
  	dmstate->num_tuples = -1;	/* -1 means not set yet */
  
***************
*** 2325,2331 **** postgresBeginDirectModify(ForeignScanState *node, int eflags)
  
  	/* Prepare for input conversion of RETURNING results. */
  	if (dmstate->has_returning)
! 		dmstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(dmstate->rel));
  
  	/*
  	 * Prepare for processing of parameters used in remote query, if any.
--- 2390,2413 ----
  
  	/* Prepare for input conversion of RETURNING results. */
  	if (dmstate->has_returning)
! 	{
! 		TupleDesc	tupdesc;
! 
! 		if (fsplan->scan.scanrelid == 0)
! 			tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
! 		else
! 			tupdesc = RelationGetDescr(dmstate->rel);
! 
! 		dmstate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 
! 		/*
! 		 * When performing an UPDATE/DELETE .. RETURNING on a join directly,
! 		 * initialize a filter to extract an updated/deleted tuple from a scan
! 		 * tuple.
! 		 */
! 		if (fsplan->scan.scanrelid == 0)
! 			init_returning_filter(dmstate, fsplan->fdw_scan_tlist, rtindex);
! 	}
  
  	/*
  	 * Prepare for processing of parameters used in remote query, if any.
***************
*** 2406,2411 **** postgresEndDirectModify(ForeignScanState *node)
--- 2488,2497 ----
  	ReleaseConnection(dmstate->conn);
  	dmstate->conn = NULL;
  
+ 	/* close the target relation. */
+ 	if (dmstate->resultRel)
+ 		ExecCloseScanRelation(dmstate->resultRel);
+ 
  	/* MemoryContext will be deleted automatically. */
  }
  
***************
*** 3273,3278 **** store_returning_result(PgFdwModifyState *fmstate,
--- 3359,3494 ----
  }
  
  /*
+  * build_remote_returning
+  *		Build a RETURNING targetlist of a remote query for performing an
+  *		UPDATE/DELETE .. RETURNING on a join directly
+  */
+ static List *
+ build_remote_returning(Index rtindex, Relation rel, List *returningList)
+ {
+ 	bool		have_wholerow = false;
+ 	List	   *tlist = NIL;
+ 	List	   *vars;
+ 	ListCell   *lc;
+ 
+ 	Assert(returningList);
+ 
+ 	vars = pull_var_clause((Node *) returningList, PVC_INCLUDE_PLACEHOLDERS);
+ 
+ 	/*
+ 	 * If there's a whole-row reference to the target relation, then we'll
+ 	 * need all the columns of the relation.
+ 	 */
+ 	foreach(lc, vars)
+ 	{
+ 		Var		   *var = (Var *) lfirst(lc);
+ 
+ 		if (IsA(var, Var) &&
+ 			var->varno == rtindex &&
+ 			var->varattno == InvalidAttrNumber)
+ 		{
+ 			have_wholerow = true;
+ 			break;
+ 		}
+ 	}
+ 
+ 	if (have_wholerow)
+ 	{
+ 		TupleDesc	tupdesc = RelationGetDescr(rel);
+ 		int			i;
+ 
+ 		for (i = 1; i <= tupdesc->natts; i++)
+ 		{
+ 			Form_pg_attribute attr = TupleDescAttr(tupdesc, i - 1);
+ 			Var		   *var;
+ 
+ 			/* Ignore dropped attributes. */
+ 			if (attr->attisdropped)
+ 				continue;
+ 
+ 			var = makeVar(rtindex,
+ 						  i,
+ 						  attr->atttypid,
+ 						  attr->atttypmod,
+ 						  attr->attcollation,
+ 						  0);
+ 
+ 			tlist = lappend(tlist,
+ 							makeTargetEntry((Expr *) var,
+ 											list_length(tlist) + 1,
+ 											NULL,
+ 											false));
+ 		}
+ 	}
+ 
+ 	/* Now add any remaining columns to tlist. */
+ 	foreach(lc, vars)
+ 	{
+ 		Var		   *var = (Var *) lfirst(lc);
+ 
+ 		/*
+ 		 * No need for whole-row references to the target relation.  We don't
+ 		 * need system columns other than ctid and oid either, since those are
+ 		 * set locally.
+ 		 */
+ 		if (IsA(var, Var) &&
+ 			var->varno == rtindex &&
+ 			var->varattno <= InvalidAttrNumber &&
+ 			var->varattno != SelfItemPointerAttributeNumber &&
+ 			var->varattno != ObjectIdAttributeNumber)
+ 			continue;		/* don't need it */
+ 
+ 		if (tlist_member((Expr *) var, tlist))
+ 			continue;		/* already got it */
+ 
+ 		tlist = lappend(tlist,
+ 						makeTargetEntry((Expr *) var,
+ 										list_length(tlist) + 1,
+ 										NULL,
+ 										false));
+ 	}
+ 
+ 	list_free(vars);
+ 
+ 	return tlist;
+ }
+ 
+ /*
+  * rebuild_fdw_scan_tlist
+  *		Build new fdw_scan_tlist of given foreign-scan plan node from given
+  *		tlist
+  *
+  * There might be columns that the fdw_scan_tlist of the given foreign-scan
+  * plan node contains that the given tlist doesn't.  The fdw_scan_tlist would
+  * have contained resjunk columns such as 'ctid' of the target relation and
+  * 'wholerow' of non-target relations, but the tlist might not contain them,
+  * for example.  So, adjust the tlist so it contains all the columns specified
+  * in the fdw_scan_tlist; else setrefs.c will get confused.
+  */
+ static void
+ rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist)
+ {
+ 	List	   *new_tlist = tlist;
+ 	List	   *old_tlist = fscan->fdw_scan_tlist;
+ 	ListCell   *lc;
+ 
+ 	foreach(lc, old_tlist)
+ 	{
+ 		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ 
+ 		if (tlist_member(tle->expr, new_tlist))
+ 			continue;		/* already got it */
+ 
+ 		new_tlist = lappend(new_tlist,
+ 							makeTargetEntry(tle->expr,
+ 											list_length(new_tlist) + 1,
+ 											NULL,
+ 											false));
+ 	}
+ 	fscan->fdw_scan_tlist = new_tlist;
+ }
+ 
+ /*
   * Execute a direct UPDATE/DELETE statement.
   */
  static void
***************
*** 3332,3337 **** get_returning_data(ForeignScanState *node)
--- 3548,3554 ----
  	EState	   *estate = node->ss.ps.state;
  	ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
  	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ 	TupleTableSlot *resultSlot;
  
  	Assert(resultRelInfo->ri_projectReturning);
  
***************
*** 3349,3355 **** get_returning_data(ForeignScanState *node)
--- 3566,3575 ----
  	 * "UPDATE/DELETE .. RETURNING 1" for example.)
  	 */
  	if (!dmstate->has_returning)
+ 	{
  		ExecStoreAllNullTuple(slot);
+ 		resultSlot = slot;
+ 	}
  	else
  	{
  		/*
***************
*** 3365,3371 **** get_returning_data(ForeignScanState *node)
  												dmstate->rel,
  												dmstate->attinmeta,
  												dmstate->retrieved_attrs,
! 												NULL,
  												dmstate->temp_cxt);
  			ExecStoreTuple(newtup, slot, InvalidBuffer, false);
  		}
--- 3585,3591 ----
  												dmstate->rel,
  												dmstate->attinmeta,
  												dmstate->retrieved_attrs,
! 												node,
  												dmstate->temp_cxt);
  			ExecStoreTuple(newtup, slot, InvalidBuffer, false);
  		}
***************
*** 3376,3391 **** get_returning_data(ForeignScanState *node)
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
  	}
  	dmstate->next_tuple++;
  
  	/* Make slot available for evaluation of the local query RETURNING list. */
! 	resultRelInfo->ri_projectReturning->pi_exprContext->ecxt_scantuple = slot;
  
  	return slot;
  }
  
  /*
   * Prepare for processing of parameters used in remote query.
   */
  static void
--- 3596,3798 ----
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
+ 
+ 		/* Get the updated/deleted tuple. */
+ 		if (dmstate->rel)
+ 			resultSlot = slot;
+ 		else
+ 			resultSlot = apply_returning_filter(dmstate, slot, estate);
  	}
  	dmstate->next_tuple++;
  
  	/* Make slot available for evaluation of the local query RETURNING list. */
! 	resultRelInfo->ri_projectReturning->pi_exprContext->ecxt_scantuple = resultSlot;
  
  	return slot;
  }
  
  /*
+  * Initialize a filter to extract an updated/deleted tuple from a scan tuple.
+  */
+ static void
+ init_returning_filter(PgFdwDirectModifyState *dmstate,
+ 					  List *fdw_scan_tlist,
+ 					  Index rtindex)
+ {
+ 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
+ 	ListCell   *lc;
+ 	int			i;
+ 
+ 	/*
+ 	 * Calculate the mapping between the fdw_scan_tlist's entries and the
+ 	 * result tuple's attributes.
+ 	 *
+ 	 * The "map" is an array of indexes of the result tuple's attributes in
+ 	 * fdw_scan_tlist, i.e., one entry for every attribute of the result
+ 	 * tuple.  We store zero for any attributes that don't have the
+ 	 * corresponding entries in that list, marking that a NULL is needed in
+ 	 * the result tuple.
+ 	 *
+ 	 * Also get the indexes of the entries for ctid and oid if any.
+ 	 */
+ 	dmstate->attnoMap = (AttrNumber *) palloc0(resultTupType->natts * sizeof(AttrNumber));
+ 
+ 	dmstate->ctidAttno = dmstate->oidAttno = 0;
+ 
+ 	i = 1;
+ 	dmstate->hasSystemCols = false;
+ 	foreach(lc, fdw_scan_tlist)
+ 	{
+ 		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ 		Var		   *var = (Var *) tle->expr;
+ 
+ 		Assert(IsA(var, Var));
+ 
+ 		/*
+ 		 * If the Var is a column of the target relation to be retrieved from
+ 		 * the foreign server, get the index of the entry.
+ 		 */
+ 		if (var->varno == rtindex &&
+ 			list_member_int(dmstate->retrieved_attrs, i))
+ 		{
+ 			int			attrno = var->varattno;
+ 
+ 			if (attrno < 0)
+ 			{
+ 				/*
+ 				 * We don't retrieve system columns other than ctid and oid.
+ 				 */
+ 				if (attrno == SelfItemPointerAttributeNumber)
+ 					dmstate->ctidAttno = i;
+ 				else if (attrno == ObjectIdAttributeNumber)
+ 					dmstate->oidAttno = i;
+ 				else
+ 					Assert(false);
+ 				dmstate->hasSystemCols = true;
+ 			}
+ 			else
+ 			{
+ 				/*
+ 				 * We don't retrieve whole-row references to the target
+ 				 * relation either.
+ 				 */
+ 				Assert(attrno > 0);
+ 
+ 				dmstate->attnoMap[attrno - 1] = i;
+ 			}
+ 		}
+ 		i++;
+ 	}
+ }
+ 
+ /*
+  * Extract and return an updated/deleted tuple from a scan tuple.
+  */
+ static TupleTableSlot *
+ apply_returning_filter(PgFdwDirectModifyState *dmstate,
+ 					   TupleTableSlot *slot,
+ 					   EState *estate)
+ {
+ 	TupleDesc	resultTupType = RelationGetDescr(dmstate->resultRel);
+ 	TupleTableSlot *resultSlot;
+ 	Datum	   *values;
+ 	bool	   *isnull;
+ 	Datum	   *old_values;
+ 	bool	   *old_isnull;
+ 	int			i;
+ 
+ 	/*
+ 	 * Use the trigger tuple slot as a place to store the result tuple.
+ 	 */
+ 	resultSlot = estate->es_trig_tuple_slot;
+ 	if (resultSlot->tts_tupleDescriptor != resultTupType)
+ 		ExecSetSlotDescriptor(resultSlot, resultTupType);
+ 
+ 	/*
+ 	 * Extract all the values of the scan tuple.
+ 	 */
+ 	slot_getallattrs(slot);
+ 	old_values = slot->tts_values;
+ 	old_isnull = slot->tts_isnull;
+ 
+ 	/*
+ 	 * Prepare to build the result tuple.
+ 	 */
+ 	ExecClearTuple(resultSlot);
+ 	values = resultSlot->tts_values;
+ 	isnull = resultSlot->tts_isnull;
+ 
+ 	/*
+ 	 * Transpose data into proper fields of the result tuple.
+ 	 */
+ 	for (i = 0; i < resultTupType->natts; i++)
+ 	{
+ 		int			j = dmstate->attnoMap[i];
+ 
+ 		if (j == 0)
+ 		{
+ 			values[i] = (Datum) 0;
+ 			isnull[i] = true;
+ 		}
+ 		else
+ 		{
+ 			values[i] = old_values[j - 1];
+ 			isnull[i] = old_isnull[j - 1];
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Build the virtual tuple.
+ 	 */
+ 	ExecStoreVirtualTuple(resultSlot);
+ 
+ 	/*
+ 	 * If we have any system columns to return, install them.
+ 	 */
+ 	if (dmstate->hasSystemCols)
+ 	{
+ 		HeapTuple	resultTup = ExecMaterializeSlot(resultSlot);
+ 
+ 		/* ctid */
+ 		if (dmstate->ctidAttno)
+ 		{
+ 			ItemPointer ctid = NULL;
+ 
+ 			ctid = (ItemPointer) DatumGetPointer(old_values[dmstate->ctidAttno - 1]);
+ 			resultTup->t_self = *ctid;
+ 		}
+ 
+ 		/* oid */
+ 		if (dmstate->oidAttno)
+ 		{
+ 			Oid			oid = InvalidOid;
+ 
+ 			oid = DatumGetObjectId(old_values[dmstate->oidAttno - 1]);
+ 			HeapTupleSetOid(resultTup, oid);
+ 		}
+ 
+ 		/*
+ 		 * And remaining columns
+ 		 *
+ 		 * Note: since we currently don't allow the target relation to appear
+ 		 * on the nullable side of an outer join, any system columns wouldn't
+ 		 * go to NULL.
+ 		 *
+ 		 * Note: no need to care about tableoid here because it will be
+ 		 * initialized in ExecProcessReturning().
+ 		 */
+ 		HeapTupleHeaderSetXmin(resultTup->t_data, InvalidTransactionId);
+ 		HeapTupleHeaderSetXmax(resultTup->t_data, InvalidTransactionId);
+ 		HeapTupleHeaderSetCmin(resultTup->t_data, InvalidTransactionId);
+ 	}
+ 
+ 	/*
+ 	 * And return the result tuple.
+ 	 */
+ 	return resultSlot;
+ }
+ 
+ /*
   * Prepare for processing of parameters used in remote query.
   */
  static void
***************
*** 4943,4953 **** make_tuple_from_result_row(PGresult *res,
  		tupdesc = RelationGetDescr(rel);
  	else
  	{
- 		PgFdwScanState *fdw_sstate;
- 
  		Assert(fsstate);
! 		fdw_sstate = (PgFdwScanState *) fsstate->fdw_state;
! 		tupdesc = fdw_sstate->tupdesc;
  	}
  
  	values = (Datum *) palloc0(tupdesc->natts * sizeof(Datum));
--- 5350,5357 ----
  		tupdesc = RelationGetDescr(rel);
  	else
  	{
  		Assert(fsstate);
! 		tupdesc = fsstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
  	}
  
  	values = (Datum *) palloc0(tupdesc->natts * sizeof(Datum));
*** a/contrib/postgres_fdw/postgres_fdw.h
--- b/contrib/postgres_fdw/postgres_fdw.h
***************
*** 150,155 **** extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
--- 150,156 ----
  				 List **retrieved_attrs);
  extern void deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
+ 					   RelOptInfo *foreignrel,
  					   List *targetlist,
  					   List *targetAttrs,
  					   List *remote_conds,
***************
*** 162,167 **** extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
--- 163,169 ----
  				 List **retrieved_attrs);
  extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
  					   Index rtindex, Relation rel,
+ 					   RelOptInfo *foreignrel,
  					   List *remote_conds,
  					   List **params_list,
  					   List *returningList,
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 1064,1077 **** UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING
  UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING *;
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
!   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;                               -- can't be pushed down
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
  EXPLAIN (verbose, costs off)
    DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;                               -- can be pushed down
  DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
  EXPLAIN (verbose, costs off)
! DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;                -- can't be pushed down
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
  EXPLAIN (verbose, costs off)
--- 1064,1077 ----
  UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING *;
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
!   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;                               -- can be pushed down
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
  EXPLAIN (verbose, costs off)
    DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;                               -- can be pushed down
  DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
  EXPLAIN (verbose, costs off)
! DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;                -- can be pushed down
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
  EXPLAIN (verbose, costs off)
***************
*** 1084,1089 **** EXPLAIN (verbose, costs off)
--- 1084,1141 ----
  DELETE FROM ft2 WHERE c1 = 9999 RETURNING tableoid::regclass;                       -- can be pushed down
  DELETE FROM ft2 WHERE c1 = 9999 RETURNING tableoid::regclass;
  
+ -- Test UPDATE/DELETE with RETURNING on a three-table join
+ INSERT INTO ft2 (c1,c2,c3)
+   SELECT id, id - 1200, to_char(id, 'FM00000') FROM generate_series(1201, 1300) id;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'foo'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2, ft2.*, ft4.ctid, ft4, ft4.*;                             -- can be pushed down
+ UPDATE ft2 SET c3 = 'foo'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2, ft2.*, ft4.ctid, ft4, ft4.*;
+ EXPLAIN (verbose, costs off)
+ DELETE FROM ft2
+   USING ft4 LEFT JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c1 % 10 = 0 AND ft2.c2 = ft4.c1
+   RETURNING 100;                                                                    -- can be pushed down
+ DELETE FROM ft2
+   USING ft4 LEFT JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 1200 AND ft2.c1 % 10 = 0 AND ft2.c2 = ft4.c1
+   RETURNING 100;
+ DELETE FROM ft2 WHERE ft2.c1 > 1200;
+ 
+ -- Test UPDATE/DELETE with WHERE or JOIN/ON conditions containing
+ -- user-defined operators/functions
+ ALTER SERVER loopback OPTIONS (DROP extensions);
+ INSERT INTO ft2 (c1,c2,c3)
+   SELECT id, id % 10, to_char(id, 'FM00000') FROM generate_series(2001, 2010) id;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;            -- can't be pushed down
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'baz'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
+   RETURNING ft2.*, ft4.*, ft5.*;                                                    -- can't be pushed down
+ UPDATE ft2 SET c3 = 'baz'
+   FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
+   RETURNING ft2.*, ft4.*, ft5.*;
+ EXPLAIN (verbose, costs off)
+ DELETE FROM ft2
+   USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2.c1, ft2.c2, ft2.c3;                                       -- can't be pushed down
+ DELETE FROM ft2
+   USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
+   WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
+   RETURNING ft2.ctid, ft2.c1, ft2.c2, ft2.c3;
+ DELETE FROM ft2 WHERE ft2.c1 > 2000;
+ ALTER SERVER loopback OPTIONS (ADD extensions 'postgres_fdw');
+ 
  -- Test that trigger on remote table works as expected
  CREATE OR REPLACE FUNCTION "S 1".F_BRTRIG() RETURNS trigger AS $$
  BEGIN
#3Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#2)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Thu, Jan 11, 2018 at 2:58 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2017/12/27 20:55), Etsuro Fujita wrote:

Attached is an updated version of the patch.

I revised code/comments a little bit. PFA new version.

I spent a while reading through this today. I see a few decisions
here or there that are debatable, in the sense that somebody else
might have chosen to do it differently, but I don't see anything that
actually looks wrong. So, committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Robert Haas <robertmhaas@gmail.com> writes:

I spent a while reading through this today. I see a few decisions
here or there that are debatable, in the sense that somebody else
might have chosen to do it differently, but I don't see anything that
actually looks wrong. So, committed.

The buildfarm's opinion of it is lower than yours. Just eyeballing
the failures, I'd say there was some naivete about the reproducibility
of tuple CTIDs across different platforms. Is there a good reason
these test cases need to print CTID?

regards, tom lane

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Wed, Feb 7, 2018 at 6:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I spent a while reading through this today. I see a few decisions
here or there that are debatable, in the sense that somebody else
might have chosen to do it differently, but I don't see anything that
actually looks wrong. So, committed.

The buildfarm's opinion of it is lower than yours. Just eyeballing
the failures, I'd say there was some naivete about the reproducibility
of tuple CTIDs across different platforms. Is there a good reason
these test cases need to print CTID?

Uggh, I missed the fact that they were doing that. It's probably
actually useful test coverage, but it's not surprising that it isn't
stable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#3)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2018/02/08 5:39), Robert Haas wrote:

On Thu, Jan 11, 2018 at 2:58 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2017/12/27 20:55), Etsuro Fujita wrote:

Attached is an updated version of the patch.

I revised code/comments a little bit. PFA new version.

I spent a while reading through this today. I see a few decisions
here or there that are debatable, in the sense that somebody else
might have chosen to do it differently, but I don't see anything that
actually looks wrong. So, committed.

Thanks for committing, Robert! Thanks for reviewing, Rushabh and Ashutosh!

Best regards,
Etsuro Fujita

#7Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#5)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2018/02/08 10:40), Robert Haas wrote:

On Wed, Feb 7, 2018 at 6:01 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

The buildfarm's opinion of it is lower than yours. Just eyeballing
the failures, I'd say there was some naivete about the reproducibility
of tuple CTIDs across different platforms. Is there a good reason
these test cases need to print CTID?

Uggh, I missed the fact that they were doing that. It's probably
actually useful test coverage, but it's not surprising that it isn't
stable.

That was my purpose, but I agree with the instability. Thanks again,
Robert!

Best regards,
Etsuro Fujita

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Etsuro Fujita (#7)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

(2018/02/08 10:40), Robert Haas wrote:

Uggh, I missed the fact that they were doing that. It's probably
actually useful test coverage, but it's not surprising that it isn't
stable.

That was my purpose, but I agree with the instability. Thanks again,
Robert!

According to

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&amp;dt=2018-02-08%2001%3A45%3A01

there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.

regards, tom lane

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

(2018/02/08 10:40), Robert Haas wrote:

Uggh, I missed the fact that they were doing that. It's probably
actually useful test coverage, but it's not surprising that it isn't
stable.

That was my purpose, but I agree with the instability. Thanks again,
Robert!

According to

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&amp;dt=2018-02-08%2001%3A45%3A01

there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.

Hmm. Maybe inserting an ANALYZE command in the right place would fix it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#9)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2018/02/09 4:32), Robert Haas wrote:

On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

According to

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&amp;dt=2018-02-08%2001%3A45%3A01

there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.

Will look into this.

Hmm. Maybe inserting an ANALYZE command in the right place would fix it?

VACUUM to the right tables in the right place might better fix it?
Another idea would be to modify test cases added by that commit so that
they don't modify the existing tables to not break the existing test cases?

Best regards,
Etsuro Fujita

#11Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#10)
1 attachment(s)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2018/02/09 10:48), Etsuro Fujita wrote:

(2018/02/09 4:32), Robert Haas wrote:

On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.

Will look into this.

I tried to reproduce that in my environment, but I couldn't. On
reflection I think an easy and reliable way to address that concern is
to use local stats on foreign tables. Attached is a patch for that.

Best regards,
Etsuro Fujita

Attachments:

postgres-fdw-regress.patchtext/x-diff; name=postgres-fdw-regress.patchDownload
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 4244,4249 **** explain (verbose, costs off) select * from ft3 f, loct3 l
--- 4244,4253 ----
  -- ===================================================================
  -- test writable foreign table stuff
  -- ===================================================================
+ -- Autovacuum on the remote side might affect remote estimates,
+ -- so use local stats on ft2 as well
+ ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'false');
+ ANALYZE ft2;
  EXPLAIN (verbose, costs off)
  INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
                                                                                                                      QUERY PLAN                                                                                                                    
***************
*** 5520,5551 **** UPDATE ft2 SET c3 = 'baz'
    FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
    WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
    RETURNING ft2.*, ft4.*, ft5.*;                                                    -- can't be pushed down
!                                                                                                                                           QUERY PLAN                                                                                                                                          
! ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
     Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
     Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
!    ->  Nested Loop
           Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
!          Join Filter: (ft2.c2 === ft4.c1)
!          ->  Foreign Scan on public.ft2
!                Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
!                Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
           ->  Foreign Scan
!                Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
!                Relations: (public.ft4) INNER JOIN (public.ft5)
!                Remote SQL: SELECT CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r3.c1, r3.c2, r3.c3 FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1))))
!                ->  Hash Join
!                      Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
!                      Hash Cond: (ft4.c1 = ft5.c1)
                       ->  Foreign Scan on public.ft4
                             Output: ft4.*, ft4.c1, ft4.c2, ft4.c3
                             Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
!                      ->  Hash
!                            Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!                            ->  Foreign Scan on public.ft5
!                                  Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
  (24 rows)
  
  UPDATE ft2 SET c3 = 'baz'
--- 5524,5555 ----
    FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
    WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
    RETURNING ft2.*, ft4.*, ft5.*;                                                    -- can't be pushed down
!                                                                                                                                     QUERY PLAN                                                                                                                                     
! -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
     Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
     Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
!    ->  Hash Join
           Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
!          Hash Cond: (ft4.c1 = ft5.c1)
           ->  Foreign Scan
!                Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft4.c1, ft4.c2, ft4.c3
!                Filter: (ft2.c2 === ft4.c1)
!                Relations: (public.ft2) INNER JOIN (public.ft4)
!                Remote SQL: SELECT r1."C 1", r1.c2, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1."C 1" > 2000)))) FOR UPDATE OF r1
!                ->  Nested Loop
!                      Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft4.c1, ft4.c2, ft4.c3
!                      ->  Foreign Scan on public.ft2
!                            Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
!                            Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
                       ->  Foreign Scan on public.ft4
                             Output: ft4.*, ft4.c1, ft4.c2, ft4.c3
                             Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
!          ->  Hash
!                Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!                ->  Foreign Scan on public.ft5
!                      Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!                      Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
  (24 rows)
  
  UPDATE ft2 SET c3 = 'baz'
***************
*** 5999,6004 **** select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
--- 6003,6011 ----
   407 |   100
  (13 rows)
  
+ -- Go back to use remote-estimate mode on ft2
+ VACUUM ANALYZE "S 1"."T 1";
+ ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'true');
  -- Above DMLs add data with c6 as NULL in ft1, so test ORDER BY NULLS LAST and NULLs
  -- FIRST behavior here.
  -- ORDER BY DESC NULLS LAST options
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 1068,1073 **** explain (verbose, costs off) select * from ft3 f, loct3 l
--- 1068,1078 ----
  -- ===================================================================
  -- test writable foreign table stuff
  -- ===================================================================
+ -- Autovacuum on the remote side might affect remote estimates,
+ -- so use local stats on ft2 as well
+ ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'false');
+ ANALYZE ft2;
+ 
  EXPLAIN (verbose, costs off)
  INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
  INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
***************
*** 1208,1213 **** commit;
--- 1213,1222 ----
  select c2, count(*) from ft2 where c2 < 500 group by 1 order by 1;
  select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
  
+ -- Go back to use remote-estimate mode on ft2
+ VACUUM ANALYZE "S 1"."T 1";
+ ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'true');
+ 
  -- Above DMLs add data with c6 as NULL in ft1, so test ORDER BY NULLS LAST and NULLs
  -- FIRST behavior here.
  -- ORDER BY DESC NULLS LAST options
#12Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#11)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Fri, Feb 9, 2018 at 5:24 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I tried to reproduce that in my environment, but I couldn't. On reflection
I think an easy and reliable way to address that concern is to use local
stats on foreign tables. Attached is a patch for that.

Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
1 attachment(s)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Robert Haas <robertmhaas@gmail.com> writes:

Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?

I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep. The attached is enough to reproduce rhinoceros' results
for me.

regards, tom lane

Attachments:

add-sleep.patchtext/x-diff; charset=us-ascii; name=add-sleep.patchDownload
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 0b2c528..3f8bd6d 100644
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
*************** INSERT INTO ft2 (c1,c2,c3)
*** 1133,1138 ****
--- 1133,1139 ----
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;            -- can't be pushed down
  UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
+ select pg_sleep(60);
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c3 = 'baz'
    FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Fri, Feb 9, 2018 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?

I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep. The attached is enough to reproduce rhinoceros' results
for me.

Not for me, but when I pushed the pg_sleep up to 180 seconds, then it failed.

With the proposed patch, it passes repeatedly for me with no sleep,
and also passes for me with the sleep. So I guess I'll commit this
and see what the buildfarm thinks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 9, 2018 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?

I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep. The attached is enough to reproduce rhinoceros' results
for me.

Not for me, but when I pushed the pg_sleep up to 180 seconds, then it failed.

With the proposed patch, it passes repeatedly for me with no sleep,
and also passes for me with the sleep. So I guess I'll commit this
and see what the buildfarm thinks.

FWIW, I ran a thousand cycles of postgres_fdw installcheck without seeing
further problems. So this fixes it at least for my configuration.
However, jaguarundi still shows a problem:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundi&amp;dt=2018-02-10%2008%3A41%3A32

(previous run similar, so it's semi-reproducible even after this patch).
jaguarundi uses -DCLOBBER_CACHE_ALWAYS, so you might try a few repetitions
with that.

regards, tom lane

#16Etsuro Fujita
etsuro.fujita@gmail.com
In reply to: Tom Lane (#15)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-11 6:24 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 9, 2018 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?

I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep. The attached is enough to reproduce rhinoceros' results
for me.

Not for me, but when I pushed the pg_sleep up to 180 seconds, then it

failed.

With the proposed patch, it passes repeatedly for me with no sleep,
and also passes for me with the sleep. So I guess I'll commit this
and see what the buildfarm thinks.

FWIW, I ran a thousand cycles of postgres_fdw installcheck without seeing
further problems. So this fixes it at least for my configuration.

Thank you both for working on this issue!

However, jaguarundi still shows a problem:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
jaguarundi&dt=2018-02-10%2008%3A41%3A32

(previous run similar, so it's semi-reproducible even after this patch).
jaguarundi uses -DCLOBBER_CACHE_ALWAYS, so you might try a few repetitions
with that.

I'll look into this and send a patch by Tuesday.

Best regards,
Etsuro Fujita

#17Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Tom Lane (#15)
1 attachment(s)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2018/02/11 6:24), Tom Lane wrote:

However, jaguarundi still shows a problem:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundi&amp;dt=2018-02-10%2008%3A41%3A32

(previous run similar, so it's semi-reproducible even after this patch).
jaguarundi uses -DCLOBBER_CACHE_ALWAYS, so you might try a few repetitions
with that.

I ran the postgres_fdw regression test with no sleep two times in a
CLOBBER_CACHE_ALWAYS-enabled build, and then the regression test with
the sleep (60 seconds) two times, but I couldn't reproduce that in both
cases. I suspect the changes in the order of the RETURNING output there
was still caused by autovacuum kicking in partway through the run. So
to make the regression test more stable against autovacuum, I'd propose
to modify the regression test to disable autovacuum for the remote table
(ie "S 1"."T 1") (and perform VACUUM ANALYZE to that table manually
instead) in hopes of getting that fixed. Attached is a patch for that.
I think changes added by the previous patch wouldn't make sense
anymore, so I removed those changes.

Best regards,
Etsuro Fujita

Attachments:

postgres-fdw-regress-2.patchtext/x-diff; name=postgres-fdw-regress-2.patchDownload
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 35,40 **** CREATE TABLE "S 1"."T 1" (
--- 35,43 ----
  	c8 user_enum,
  	CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
  );
+ -- "S 1"."T 1" will be heavily updated below, so disable autovacuum for
+ -- the table to avoid unexpected effects of that
+ ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
  CREATE TABLE "S 1"."T 2" (
  	c1 int NOT NULL,
  	c2 text,
***************
*** 4244,4253 **** explain (verbose, costs off) select * from ft3 f, loct3 l
  -- ===================================================================
  -- test writable foreign table stuff
  -- ===================================================================
- -- Autovacuum on the remote side might affect remote estimates,
- -- so use local stats on ft2 as well
- ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'false');
- ANALYZE ft2;
  EXPLAIN (verbose, costs off)
  INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
                                                                                                                      QUERY PLAN                                                                                                                    
--- 4247,4252 ----
***************
*** 5524,5555 **** UPDATE ft2 SET c3 = 'baz'
    FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
    WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
    RETURNING ft2.*, ft4.*, ft5.*;                                                    -- can't be pushed down
!                                                                                                                                     QUERY PLAN                                                                                                                                     
! -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
     Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
     Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
!    ->  Hash Join
           Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
!          Hash Cond: (ft4.c1 = ft5.c1)
           ->  Foreign Scan
!                Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft4.c1, ft4.c2, ft4.c3
!                Filter: (ft2.c2 === ft4.c1)
!                Relations: (public.ft2) INNER JOIN (public.ft4)
!                Remote SQL: SELECT r1."C 1", r1.c2, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1."C 1" > 2000)))) FOR UPDATE OF r1
!                ->  Nested Loop
!                      Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft4.c1, ft4.c2, ft4.c3
!                      ->  Foreign Scan on public.ft2
!                            Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
!                            Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
                       ->  Foreign Scan on public.ft4
                             Output: ft4.*, ft4.c1, ft4.c2, ft4.c3
                             Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
!          ->  Hash
!                Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!                ->  Foreign Scan on public.ft5
!                      Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!                      Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
  (24 rows)
  
  UPDATE ft2 SET c3 = 'baz'
--- 5523,5554 ----
    FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
    WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
    RETURNING ft2.*, ft4.*, ft5.*;                                                    -- can't be pushed down
!                                                                                                                                           QUERY PLAN                                                                                                                                          
! ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
     Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
     Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
!    ->  Nested Loop
           Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
!          Join Filter: (ft2.c2 === ft4.c1)
!          ->  Foreign Scan on public.ft2
!                Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
!                Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
           ->  Foreign Scan
!                Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
!                Relations: (public.ft4) INNER JOIN (public.ft5)
!                Remote SQL: SELECT CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r3.c1, r3.c2, r3.c3 FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1))))
!                ->  Hash Join
!                      Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
!                      Hash Cond: (ft4.c1 = ft5.c1)
                       ->  Foreign Scan on public.ft4
                             Output: ft4.*, ft4.c1, ft4.c2, ft4.c3
                             Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
!                      ->  Hash
!                            Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!                            ->  Foreign Scan on public.ft5
!                                  Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
!                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
  (24 rows)
  
  UPDATE ft2 SET c3 = 'baz'
***************
*** 6003,6011 **** select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
   407 |   100
  (13 rows)
  
- -- Go back to use remote-estimate mode on ft2
  VACUUM ANALYZE "S 1"."T 1";
- ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'true');
  -- Above DMLs add data with c6 as NULL in ft1, so test ORDER BY NULLS LAST and NULLs
  -- FIRST behavior here.
  -- ORDER BY DESC NULLS LAST options
--- 6002,6008 ----
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 39,44 **** CREATE TABLE "S 1"."T 1" (
--- 39,47 ----
  	c8 user_enum,
  	CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
  );
+ -- "S 1"."T 1" will be heavily updated below, so disable autovacuum for
+ -- the table to avoid unexpected effects of that
+ ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
  CREATE TABLE "S 1"."T 2" (
  	c1 int NOT NULL,
  	c2 text,
***************
*** 1068,1078 **** explain (verbose, costs off) select * from ft3 f, loct3 l
  -- ===================================================================
  -- test writable foreign table stuff
  -- ===================================================================
- -- Autovacuum on the remote side might affect remote estimates,
- -- so use local stats on ft2 as well
- ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'false');
- ANALYZE ft2;
- 
  EXPLAIN (verbose, costs off)
  INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
  INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
--- 1071,1076 ----
***************
*** 1213,1221 **** commit;
  select c2, count(*) from ft2 where c2 < 500 group by 1 order by 1;
  select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
  
- -- Go back to use remote-estimate mode on ft2
  VACUUM ANALYZE "S 1"."T 1";
- ALTER FOREIGN TABLE ft2 OPTIONS (SET use_remote_estimate 'true');
  
  -- Above DMLs add data with c6 as NULL in ft1, so test ORDER BY NULLS LAST and NULLs
  -- FIRST behavior here.
--- 1211,1217 ----
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Etsuro Fujita (#17)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

(2018/02/11 6:24), Tom Lane wrote:

However, jaguarundi still shows a problem:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundi&amp;dt=2018-02-10%2008%3A41%3A32

I ran the postgres_fdw regression test with no sleep two times in a
CLOBBER_CACHE_ALWAYS-enabled build, and then the regression test with
the sleep (60 seconds) two times, but I couldn't reproduce that in both
cases. I suspect the changes in the order of the RETURNING output there
was still caused by autovacuum kicking in partway through the run. So
to make the regression test more stable against autovacuum, I'd propose
to modify the regression test to disable autovacuum for the remote table
(ie "S 1"."T 1") (and perform VACUUM ANALYZE to that table manually
instead) in hopes of getting that fixed. Attached is a patch for that.
I think changes added by the previous patch wouldn't make sense
anymore, so I removed those changes.

Ping? We're still seeing those failures on jaguarundi.

regards, tom lane

#19Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#18)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Tue, Feb 27, 2018 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I ran the postgres_fdw regression test with no sleep two times in a
CLOBBER_CACHE_ALWAYS-enabled build, and then the regression test with
the sleep (60 seconds) two times, but I couldn't reproduce that in both
cases. I suspect the changes in the order of the RETURNING output there
was still caused by autovacuum kicking in partway through the run. So
to make the regression test more stable against autovacuum, I'd propose
to modify the regression test to disable autovacuum for the remote table
(ie "S 1"."T 1") (and perform VACUUM ANALYZE to that table manually
instead) in hopes of getting that fixed. Attached is a patch for that.
I think changes added by the previous patch wouldn't make sense
anymore, so I removed those changes.

Ping? We're still seeing those failures on jaguarundi.

This is another patch that got past me. Committed now.

I'd be lying if I said I was filled with confidence that this was
going to fix it, but I don't know what to do other than keep tinkering
with it until we find something that works.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#19)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Feb 27, 2018 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ping? We're still seeing those failures on jaguarundi.

This is another patch that got past me. Committed now.

I'd be lying if I said I was filled with confidence that this was
going to fix it, but I don't know what to do other than keep tinkering
with it until we find something that works.

The buildfarm news on this isn't very good; jaguarundi seems happier,
but we've seen intermittent failures on four other critters since
this went in.

The idea of disabling autovacuum seems reasonable to me, but perhaps
it needs to be done to more of the tables.

regards, tom lane

#21Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Tom Lane (#20)
1 attachment(s)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2018/03/02 3:34), Tom Lane wrote:

Robert Haas<robertmhaas@gmail.com> writes:

This is another patch that got past me. Committed now.

I'd be lying if I said I was filled with confidence that this was
going to fix it, but I don't know what to do other than keep tinkering
with it until we find something that works.

The buildfarm news on this isn't very good; jaguarundi seems happier,
but we've seen intermittent failures on four other critters since
this went in.

The idea of disabling autovacuum seems reasonable to me, but perhaps
it needs to be done to more of the tables.

Agreed. Better safe than sorry, so I disabled autovacuum for all the
tables created in the postgres_fdw regression test, except the ones with
no data modification created in the sections "test handling of
collations" and "test IMPORT FOREIGN SCHEMA". I'm attaching a patch for
that.

Best regards,
Etsuro Fujita

Attachments:

postgres-fdw-regress-3.patchtext/x-diff; name=postgres-fdw-regress-3.patchDownload
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 35,43 **** CREATE TABLE "S 1"."T 1" (
  	c8 user_enum,
  	CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
  );
- -- "S 1"."T 1" will be heavily updated below, so disable autovacuum for
- -- the table to avoid unexpected effects of that
- ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
  CREATE TABLE "S 1"."T 2" (
  	c1 int NOT NULL,
  	c2 text,
--- 35,40 ----
***************
*** 55,60 **** CREATE TABLE "S 1"."T 4" (
--- 52,62 ----
  	c3 text,
  	CONSTRAINT t4_pkey PRIMARY KEY (c1)
  );
+ -- Disable autovacuum for these tables to avoid unexpected effects of that
+ ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
+ ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
+ ALTER TABLE "S 1"."T 3" SET (autovacuum_enabled = 'false');
+ ALTER TABLE "S 1"."T 4" SET (autovacuum_enabled = 'false');
  INSERT INTO "S 1"."T 1"
  	SELECT id,
  	       id % 10,
***************
*** 6172,6177 **** ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c2negative;
--- 6174,6180 ----
  -- test WITH CHECK OPTION constraints
  -- ===================================================================
  CREATE TABLE base_tbl (a int, b int);
+ ALTER TABLE base_tbl SET (autovacuum_enabled = 'false');
  CREATE FOREIGN TABLE foreign_tbl (a int, b int)
    SERVER loopback OPTIONS(table_name 'base_tbl');
  CREATE VIEW rw_view AS SELECT * FROM foreign_tbl
***************
*** 6232,6237 **** DROP TABLE base_tbl;
--- 6235,6241 ----
  -- test serial columns (ie, sequence-based defaults)
  -- ===================================================================
  create table loc1 (f1 serial, f2 text);
+ alter table loc1 set (autovacuum_enabled = 'false');
  create foreign table rem1 (f1 serial, f2 text)
    server loopback options(table_name 'loc1');
  select pg_catalog.setval('rem1_f1_seq', 10, false);
***************
*** 6779,6784 **** DROP TRIGGER trig_row_after_delete ON rem1;
--- 6783,6790 ----
  -- ===================================================================
  CREATE TABLE a (aa TEXT);
  CREATE TABLE loct (aa TEXT, bb TEXT);
+ ALTER TABLE a SET (autovacuum_enabled = 'false');
+ ALTER TABLE loct SET (autovacuum_enabled = 'false');
  CREATE FOREIGN TABLE b (bb TEXT) INHERITS (a)
    SERVER loopback OPTIONS (table_name 'loct');
  INSERT INTO a(aa) VALUES('aaa');
***************
*** 6920,6931 **** DROP TABLE loct;
--- 6926,6941 ----
  -- Check SELECT FOR UPDATE/SHARE with an inherited source table
  create table loct1 (f1 int, f2 int, f3 int);
  create table loct2 (f1 int, f2 int, f3 int);
+ alter table loct1 set (autovacuum_enabled = 'false');
+ alter table loct2 set (autovacuum_enabled = 'false');
  create table foo (f1 int, f2 int);
  create foreign table foo2 (f3 int) inherits (foo)
    server loopback options (table_name 'loct1');
  create table bar (f1 int, f2 int);
  create foreign table bar2 (f3 int) inherits (bar)
    server loopback options (table_name 'loct2');
+ alter table foo set (autovacuum_enabled = 'false');
+ alter table bar set (autovacuum_enabled = 'false');
  insert into foo values(1,1);
  insert into foo values(3,3);
  insert into foo2 values(2,2,2);
***************
*** 7685,7690 **** SET enable_partitionwise_join=on;
--- 7695,7702 ----
  CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
  CREATE TABLE fprt1_p1 (LIKE fprt1);
  CREATE TABLE fprt1_p2 (LIKE fprt1);
+ ALTER TABLE fprt1_p1 SET (autovacuum_enabled = 'false');
+ ALTER TABLE fprt1_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 2) i;
  INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 2) i;
  CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
***************
*** 7697,7702 **** ANALYZE fprt1_p2;
--- 7709,7716 ----
  CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
  CREATE TABLE fprt2_p1 (LIKE fprt2);
  CREATE TABLE fprt2_p2 (LIKE fprt2);
+ ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
+ ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
  CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 39,47 **** CREATE TABLE "S 1"."T 1" (
  	c8 user_enum,
  	CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
  );
- -- "S 1"."T 1" will be heavily updated below, so disable autovacuum for
- -- the table to avoid unexpected effects of that
- ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
  CREATE TABLE "S 1"."T 2" (
  	c1 int NOT NULL,
  	c2 text,
--- 39,44 ----
***************
*** 60,65 **** CREATE TABLE "S 1"."T 4" (
--- 57,68 ----
  	CONSTRAINT t4_pkey PRIMARY KEY (c1)
  );
  
+ -- Disable autovacuum for these tables to avoid unexpected effects of that
+ ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
+ ALTER TABLE "S 1"."T 2" SET (autovacuum_enabled = 'false');
+ ALTER TABLE "S 1"."T 3" SET (autovacuum_enabled = 'false');
+ ALTER TABLE "S 1"."T 4" SET (autovacuum_enabled = 'false');
+ 
  INSERT INTO "S 1"."T 1"
  	SELECT id,
  	       id % 10,
***************
*** 1260,1265 **** ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c2negative;
--- 1263,1269 ----
  -- ===================================================================
  
  CREATE TABLE base_tbl (a int, b int);
+ ALTER TABLE base_tbl SET (autovacuum_enabled = 'false');
  CREATE FOREIGN TABLE foreign_tbl (a int, b int)
    SERVER loopback OPTIONS(table_name 'base_tbl');
  CREATE VIEW rw_view AS SELECT * FROM foreign_tbl
***************
*** 1283,1288 **** DROP TABLE base_tbl;
--- 1287,1293 ----
  -- test serial columns (ie, sequence-based defaults)
  -- ===================================================================
  create table loc1 (f1 serial, f2 text);
+ alter table loc1 set (autovacuum_enabled = 'false');
  create foreign table rem1 (f1 serial, f2 text)
    server loopback options(table_name 'loc1');
  select pg_catalog.setval('rem1_f1_seq', 10, false);
***************
*** 1599,1604 **** DROP TRIGGER trig_row_after_delete ON rem1;
--- 1604,1611 ----
  
  CREATE TABLE a (aa TEXT);
  CREATE TABLE loct (aa TEXT, bb TEXT);
+ ALTER TABLE a SET (autovacuum_enabled = 'false');
+ ALTER TABLE loct SET (autovacuum_enabled = 'false');
  CREATE FOREIGN TABLE b (bb TEXT) INHERITS (a)
    SERVER loopback OPTIONS (table_name 'loct');
  
***************
*** 1645,1650 **** DROP TABLE loct;
--- 1652,1660 ----
  create table loct1 (f1 int, f2 int, f3 int);
  create table loct2 (f1 int, f2 int, f3 int);
  
+ alter table loct1 set (autovacuum_enabled = 'false');
+ alter table loct2 set (autovacuum_enabled = 'false');
+ 
  create table foo (f1 int, f2 int);
  create foreign table foo2 (f3 int) inherits (foo)
    server loopback options (table_name 'loct1');
***************
*** 1652,1657 **** create table bar (f1 int, f2 int);
--- 1662,1670 ----
  create foreign table bar2 (f3 int) inherits (bar)
    server loopback options (table_name 'loct2');
  
+ alter table foo set (autovacuum_enabled = 'false');
+ alter table bar set (autovacuum_enabled = 'false');
+ 
  insert into foo values(1,1);
  insert into foo values(3,3);
  insert into foo2 values(2,2,2);
***************
*** 1866,1871 **** SET enable_partitionwise_join=on;
--- 1879,1886 ----
  CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
  CREATE TABLE fprt1_p1 (LIKE fprt1);
  CREATE TABLE fprt1_p2 (LIKE fprt1);
+ ALTER TABLE fprt1_p1 SET (autovacuum_enabled = 'false');
+ ALTER TABLE fprt1_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 2) i;
  INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 2) i;
  CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
***************
*** 1879,1884 **** ANALYZE fprt1_p2;
--- 1894,1901 ----
  CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
  CREATE TABLE fprt2_p1 (LIKE fprt2);
  CREATE TABLE fprt2_p2 (LIKE fprt2);
+ ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
+ ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
  CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#21)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Fri, Mar 2, 2018 at 5:35 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Agreed. Better safe than sorry, so I disabled autovacuum for all the tables
created in the postgres_fdw regression test, except the ones with no data
modification created in the sections "test handling of collations" and "test
IMPORT FOREIGN SCHEMA". I'm attaching a patch for that.

I have committed this patch. Whee!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#22)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Fri, Mar 2, 2018 at 1:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 2, 2018 at 5:35 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Agreed. Better safe than sorry, so I disabled autovacuum for all the tables
created in the postgres_fdw regression test, except the ones with no data
modification created in the sections "test handling of collations" and "test
IMPORT FOREIGN SCHEMA". I'm attaching a patch for that.

I have committed this patch. Whee!

And that seems to have made things worse. The failures on rhinocerous
look related:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=rhinoceros&amp;br=HEAD

And so does this failure on chub:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chub&amp;dt=2018-03-02%2016%3A10%3A02

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#24Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#23)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2018/03/03 5:01), Robert Haas wrote:

On Fri, Mar 2, 2018 at 1:20 PM, Robert Haas<robertmhaas@gmail.com> wrote:

On Fri, Mar 2, 2018 at 5:35 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Agreed. Better safe than sorry, so I disabled autovacuum for all the tables
created in the postgres_fdw regression test, except the ones with no data
modification created in the sections "test handling of collations" and "test
IMPORT FOREIGN SCHEMA". I'm attaching a patch for that.

I have committed this patch. Whee!

Thank you for taking the time on this!

And that seems to have made things worse. The failures on rhinocerous
look related:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=rhinoceros&amp;br=HEAD

Totally outdated stats used in query planning causes the failures?
ANALYZE right before the plan-changing queries would fix the failures?

And so does this failure on chub:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chub&amp;dt=2018-03-02%2016%3A10%3A02

The Git log shows that this was done before the commit.

Best regards,
Etsuro Fujita

#25Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Etsuro Fujita (#24)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On Mon, Mar 5, 2018 at 8:51 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

(2018/03/03 5:01), Robert Haas wrote:

On Fri, Mar 2, 2018 at 1:20 PM, Robert Haas<robertmhaas@gmail.com> wrote:

On Fri, Mar 2, 2018 at 5:35 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Agreed. Better safe than sorry, so I disabled autovacuum for all the
tables
created in the postgres_fdw regression test, except the ones with no
data
modification created in the sections "test handling of collations" and
"test
IMPORT FOREIGN SCHEMA". I'm attaching a patch for that.

I have committed this patch. Whee!

Thank you for taking the time on this!

And that seems to have made things worse. The failures on rhinocerous
look related:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=rhinoceros&amp;br=HEAD

Totally outdated stats used in query planning causes the failures? ANALYZE
right before the plan-changing queries would fix the failures?

And so does this failure on chub:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chub&amp;dt=2018-03-02%2016%3A10%3A02

The Git log shows that this was done before the commit.

I think the testcase file for postgres_fdw has grown really large over
the time, as we have added more pushdown. Since most of the queries in
that file use the same tables created at the beginning of the file,
changes somewhere in-between (esp. DMLs) affect the following queries.
It's getting hard to add a test query in that file and expect stable
results. I think, it's time to split the file into at least two one
for queries and one for DMLs. In long term, we may have, multiple
files each handling one functionality like regress/sql/. If we do so,
it will be easier to fix such problems.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Bapat (#25)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Mon, Mar 5, 2018 at 8:51 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Totally outdated stats used in query planning causes the failures? ANALYZE
right before the plan-changing queries would fix the failures?

I think the testcase file for postgres_fdw has grown really large over
the time, as we have added more pushdown. Since most of the queries in
that file use the same tables created at the beginning of the file,
changes somewhere in-between (esp. DMLs) affect the following queries.
It's getting hard to add a test query in that file and expect stable
results.

The thing that I find curious, now that we've shut off autovacuum
altogether on those tables, is that we *still* aren't getting stable
results. How can that be? Yeah, if you add another query in the middle
you might find that changing later results, but for any fixed contents of
the test script, why isn't the buildfarm getting the same answers that the
patch author and committer got? I could believe different results on
32-bit and 64-bit hardware, but that does not seem to be quite the pattern
we're seeing.

I think, it's time to split the file into at least two one
for queries and one for DMLs.

That seems largely irrelevant right at this point. First we have to
explain the instability.

regards, tom lane

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#26)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

I wrote:

The thing that I find curious, now that we've shut off autovacuum
altogether on those tables, is that we *still* aren't getting stable
results. How can that be?

I spent some time trying to figure out what's going on here. I've still
not been able to replicate the failure on any of my machines, but I have
learned a thing or two.

On the first query that's still failing on rhinoceros (and has also been
seen to fail on other machines, before the last round of changes), which
is at line 1142 in postgres_fdw.sql in current HEAD:

EXPLAIN (verbose, costs off)
UPDATE ft2 SET c3 = 'baz'
FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
RETURNING ft2.*, ft4.*, ft5.*; -- can't be pushed down

(call this Q1), we are expecting to get a plan of the shape of

-> Nested Loop
-> Foreign Scan on public.ft2
Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
-> Foreign Scan
Relations: (public.ft4) INNER JOIN (public.ft5)

Now, there is no possible way that the planner would pick that plan if its
estimate of the number of rows out of the ft2 scan was more than 1.
Re-executing the foreign join would have high enough overhead to push the
plan to some other shape. In fact, probably the shape we see in the
actual plan choice, on the failing machines:

-> Nested Loop
-> Foreign Scan
Relations: (public.ft4) INNER JOIN (public.ft5)
-> Materialize
-> Foreign Scan on public.ft2
Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE

I instrumented costsize.c, and determined that the estimated size of
"S 1"."T 1" WHERE (("C 1" > 2000)), before the clamp_row_est() rounding,
is only 0.1159 rows on my machine. So there's no way to explain the plan
change as the result of small platform-specific roundoff differences ---
we need something more than a 12X change in the selectivity estimate
before the plan shape would change like this. There are a bunch of things
going on under the hood, such as that the table's raw rowcount estimate
gets scaled up from the original 1000 rows because it's now got more pages
than before, but none come close to explaining 12X.

However, the planner is working with quite old statistics, dating back to
the manual ANALYZE at postgres_fdw.sql line 93. If I stick in another
manual ANALYZE just before Q1, I get exactly the same plan change reported
by rhinoceros. (And underneath, the rowcount estimate for ft2 has gone
from 1 row to 8 rows, which is much closer to the true value of 10 rows,
so the plan change is not surprising.) What's more, doing this also
reproduces the one other plan change seen later in rhinoceros' output.

It is, therefore, very hard to avoid the conclusion that something is
causing an ANALYZE to happen while the script runs, despite our fooling
about with the table's reloptions. I'm not sure that that something is
autovacuum. A platform-specific bug in reloptions handling doesn't seem
out of the question, but poking around in the code didn't spot anything
obvious.

Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
rhinoceros' extra_config options, temporarily? Correlating that log
output with the log_statement output from the test proper would let
us confirm or deny whether it's autovacuum.

Another thing I'd like to do is temporarily add

select relpages, reltuples from pg_class where relname = 'T 1';

to the test script, both just after the manual ANALYZE and just before Q1.
If we see a change between those reports on any of the affected machines,
we'll know that *something* is changing the stats. Now the problem with
doing that is that the expected value of relpages is platform-dependent
(I see 11 on 64-bit and 10 on 32-bit on my machines). We can work around
that, perhaps by capturing the initial value in a temp table and printing
only the delta, but I'm not sure if it's worth the effort as opposed to
just letting it fail on 32-bit critters for a day or two.

regards, tom lane

#28Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#27)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On 03/05/2018 11:19 AM, Tom Lane wrote:

Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
rhinoceros' extra_config options, temporarily? Correlating that log
output with the log_statement output from the test proper would let
us confirm or deny whether it's autovacuum.

Done just now. Do you want me to force a run?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#28)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Joe Conway <mail@joeconway.com> writes:

On 03/05/2018 11:19 AM, Tom Lane wrote:

Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
rhinoceros' extra_config options, temporarily? Correlating that log
output with the log_statement output from the test proper would let
us confirm or deny whether it's autovacuum.

Done just now. Do you want me to force a run?

Thanks. I'm about to push something, so no need for a force.

regards, tom lane

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#28)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

Joe Conway <mail@joeconway.com> writes:

On 03/05/2018 11:19 AM, Tom Lane wrote:

Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
rhinoceros' extra_config options, temporarily? Correlating that log
output with the log_statement output from the test proper would let
us confirm or deny whether it's autovacuum.

Done just now. Do you want me to force a run?

Both of these runs

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&amp;dt=2018-03-05%2020%3A35%3A00
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&amp;dt=2018-03-05%2021%3A45%3A02

appear to exonerate autovacuum, and the second seems to destroy the
behind-the-scenes-ANALYZE theory entirely, since there was no change
in the outputs of the extra instrumentation queries. (In theory
there's a window for ANALYZE to have occurred in between, but that's
not very credible.)

So you can revert the rhinoceros config change if you like --- thanks
for making it so quickly!

Meanwhile, I'm back to wondering what could possibly have affected
the planner's estimates, if pg_proc and pg_statistic didn't change.
I confess bafflement ... but we've now eliminated the autovacuum-
did-it theory entirely, so it's time to start looking someplace else.
I wonder if something in the postgres_fdw remote join machinery
is not as deterministic as it should be.

regards, tom lane

#31Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#30)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On 03/05/2018 02:07 PM, Tom Lane wrote:

So you can revert the rhinoceros config change if you like --- thanks
for making it so quickly!

Ok, reverted.

Meanwhile, I'm back to wondering what could possibly have affected
the planner's estimates, if pg_proc and pg_statistic didn't change.
I confess bafflement ... but we've now eliminated the autovacuum-
did-it theory entirely, so it's time to start looking someplace else.
I wonder if something in the postgres_fdw remote join machinery
is not as deterministic as it should be.

Do you want me to do anything manual locally on this VM?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

#32Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#30)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

On 2018-03-05 17:07:10 -0500, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

On 03/05/2018 11:19 AM, Tom Lane wrote:

Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
rhinoceros' extra_config options, temporarily? Correlating that log
output with the log_statement output from the test proper would let
us confirm or deny whether it's autovacuum.

Done just now. Do you want me to force a run?

Both of these runs

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&amp;dt=2018-03-05%2020%3A35%3A00
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&amp;dt=2018-03-05%2021%3A45%3A02

appear to exonerate autovacuum, and the second seems to destroy the
behind-the-scenes-ANALYZE theory entirely, since there was no change
in the outputs of the extra instrumentation queries. (In theory
there's a window for ANALYZE to have occurred in between, but that's
not very credible.)

So you can revert the rhinoceros config change if you like --- thanks
for making it so quickly!

Meanwhile, I'm back to wondering what could possibly have affected
the planner's estimates, if pg_proc and pg_statistic didn't change.
I confess bafflement ... but we've now eliminated the autovacuum-
did-it theory entirely, so it's time to start looking someplace else.
I wonder if something in the postgres_fdw remote join machinery
is not as deterministic as it should be.

I wonder if temporarily changing postgres_fdw's test to specify an extra
config that installs auto_explain in full aggressiveness (i.e. including
costs etc) and enables debug3 logging could help narrow this down?

- Andres

#33Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Andres Freund (#32)
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

(2018/04/07 4:17), Andres Freund wrote:

On 2018-03-05 17:07:10 -0500, Tom Lane wrote:

Meanwhile, I'm back to wondering what could possibly have affected
the planner's estimates, if pg_proc and pg_statistic didn't change.
I confess bafflement ... but we've now eliminated the autovacuum-
did-it theory entirely, so it's time to start looking someplace else.
I wonder if something in the postgres_fdw remote join machinery
is not as deterministic as it should be.

I wonder if temporarily changing postgres_fdw's test to specify an extra
config that installs auto_explain in full aggressiveness (i.e. including
costs etc) and enables debug3 logging could help narrow this down?

+1 because we cannot deny the possibility that the plan instability is
caused by such an unexpected behavior of postgres_fdw.

Best regards,
Etsuro Fujita