Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Started by Kouhei Kaigaiabout 11 years ago101 messages
#1Kouhei Kaigai
kaigai@ak.jp.nec.com
1 attachment(s)

On Tue, Nov 25, 2014 at 3:44 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Today, I had a talk with Hanada-san to clarify which can be a common
portion of them and how to implement it. Then, we concluded both of
features can be shared most of the infrastructure.
Let me put an introduction of join replacement by foreign-/custom-scan

below.

Its overall design intends to inject foreign-/custom-scan node instead
of the built-in join logic (based on the estimated cost). From the
viewpoint of core backend, it looks like a sub-query scan that
contains relations join internally.

What we need to do is below:

(1) Add a hook add_paths_to_joinrel()
It gives extensions (including FDW drivers and custom-scan providers)
chance to add alternative paths towards a particular join of
relations, using ForeignScanPath or CustomScanPath, if it can run instead

of the built-in ones.

(2) Informs the core backend varno/varattno mapping One thing we need
to pay attention is, foreign-/custom-scan node that performs instead
of the built-in join node must return mixture of values come from both
relations. In case when FDW driver fetch a remote record (also, fetch
a record computed by external computing resource), the most reasonable
way is to store it on ecxt_scantuple of ExprContext, then kicks
projection with varnode that references this slot.
It needs an infrastructure that tracks relationship between original
varnode and the alternative varno/varattno. We thought, it shall be
mapped to INDEX_VAR and a virtual attribute number to reference
ecxt_scantuple naturally, and this infrastructure is quite helpful for

both of ForegnScan/CustomScan.

We'd like to add List *fdw_varmap/*custom_varmap variable to both of plan

nodes.

It contains list of the original Var node that shall be mapped on the
position according to the list index. (e.g, the first varnode is
varno=INDEX_VAR and
varattno=1)

(3) Reverse mapping on EXPLAIN
For EXPLAIN support, above varnode on the pseudo relation scan needed
to be solved. All we need to do is initialization of dpns->inner_tlist
on
set_deparse_planstate() according to the above mapping.

(4) case of scanrelid == 0
To skip open/close (foreign) tables, we need to have a mark to
introduce the backend not to initialize the scan node according to
table definition, but according to the pseudo varnodes list.
As earlier custom-scan patch doing, scanrelid == 0 is a
straightforward mark to show the scan node is not combined with a

particular real relation.

So, it also need to add special case handling around foreign-/custom-scan

code.

We expect above changes are enough small to implement basic join
push-down functionality (that does not involves external computing of
complicated expression node), but valuable to support in v9.5.

Please comment on the proposition above.

I don't really have any technical comments on this design right at the moment,
but I think it's an important area where PostgreSQL needs to make some
progress sooner rather than later, so I hope that we can get something
committed in time for 9.5.

I tried to implement the interface portion, as attached.
Hanada-san may be under development of postgres_fdw based on this interface
definition towards the next commit fest.

Overall design of this patch is identical with what I described above.
It intends to allow extensions (FDW driver or custom-scan provider) to
replace a join by a foreign/custom-scan which internally contains a result
set of relations join externally computed. It looks like a relation scan
on the pseudo relation.

One we need to pay attention is, how setrefs.c fixes up varno/varattno
unlike regular join structure. I could find IndexOnlyScan already has
similar infrastructure that redirect references of varnode to a certain
column on ecxt_scantuple of ExprContext using a pair of INDEX_VAR and
alternative varattno.

This patch put a new field: fdw_ps_tlist of ForeignScan, and
custom_ps_tlist of CustomScan. It is extension's role to set a pseudo-
scan target-list (so, ps_tlist) of the foreign/custom-scan that replaced
a join.
If it is not NIL, set_plan_refs() takes another strategy to fix up them.
It calls fix_upper_expr() to map varnodes of expression-list on INDEX_VAR
according to the ps_tlist, then extension is expected to put values/isnull
pair on ss_ScanTupleSlot of scan-state according to the ps_tlist preliminary
constructed.

Regarding to the primary hook to add alternative foreign/custom-scan
path instead of built-in join paths, I added the following hook on
add_paths_to_joinrel().

/* Hook for plugins to get control in add_paths_to_joinrel() */
typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
List *restrictlist,
JoinType jointype,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;

It shall give enough information for extensions to determine whether
it can offer alternative paths, or not.

One thing I concerned about is, fdw_handler to be called on joinrel is
not obvious, unlike custom-scan that hold reference to CustomScanMethods,
because joinrel is not managed by any FDW drivers.
So, I had to add "Oid fdw_handler" field onto RelOptInfo to track which
foreign-tables are involved in this relation join. This field shall have
oid of valid FDW handler if both inner/outer relation is managed by
same FDW handler. Elsewhere, InvalidOid. Even if either/both of them are
relations-join, fdw_handler shall be set as long as it is managed by
same FDW handler. It allows to replace join by foreign-scan that involves
more than two tables.

One new interface contract is case of scanrelid == 0. If foreign-/custom-
scan is not associated with a particular relation, ExecInitXXX() tries to
initialize ss_ScanTupleSlot according to the ps_tlist, and relations is
not opened.

Because the working example is still under development, this patch is
not tested/validated yet. However, it briefly implements the concept of
what we'd like to enhance foreign-/custom-scan functionality.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.5-custom-join.v1.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v1.patchDownload
 src/backend/executor/execScan.c         |  4 +++
 src/backend/executor/nodeCustom.c       | 39 ++++++++++++++++-----
 src/backend/executor/nodeForeignscan.c  | 35 +++++++++++++------
 src/backend/foreign/foreign.c           | 32 ++++++++++++++----
 src/backend/nodes/copyfuncs.c           |  3 ++
 src/backend/nodes/outfuncs.c            |  3 ++
 src/backend/optimizer/path/joinpath.c   | 15 +++++++++
 src/backend/optimizer/plan/createplan.c | 33 +++++++++++-------
 src/backend/optimizer/plan/setrefs.c    | 60 +++++++++++++++++++++++++++++++++
 src/backend/optimizer/util/plancat.c    |  7 +++-
 src/backend/optimizer/util/relnode.c    | 13 +++++++
 src/backend/utils/adt/ruleutils.c       |  4 +++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           | 20 ++++++++---
 src/include/nodes/relation.h            |  2 ++
 src/include/optimizer/paths.h           | 13 +++++++
 16 files changed, 240 insertions(+), 44 deletions(-)

diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 1319519..e8784d9 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0 &&
+			 (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+		varno = INDEX_VAR;
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index 576b295..27c5790 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
+	Index				scan_relid = cscan->scan.scanrelid;
 	Relation			scan_rel;
 
 	/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,32 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		Assert(cscan->custom_ps_tlist != NULL);
+		ps_tupdesc = ExecTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +110,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 9cc5345..07cd883 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
+	Index		scanrelid = node->scan.scanrelid;
 	Relation	currentRelation;
 	FdwRoutine *fdwroutine;
 
@@ -141,16 +142,29 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		Assert(node->fdw_ps_tlist != NULL);
+		ps_tupdesc = ExecTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +175,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +207,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 4f5f6ae..860b6ca 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -302,13 +302,12 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +349,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,6 +408,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+	return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
 
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6b1bf7b..b88339c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -590,7 +590,9 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
@@ -615,6 +617,7 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
 
 	/*
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index edbd09f..fa7dd37 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -556,7 +556,9 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
@@ -570,6 +572,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index be54f3d..030158d 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,19 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDW drivers or custom-scan providers, in
+	 * addition to built-in paths.
+	 *
+	 * XXX - In case of FDW, we may be able to omit invocation if joinrel's
+	 * fdwhandler (set only if both relations are managed by same FDW server).
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bf8dbe0..a35809d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1957,16 +1957,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1981,13 +1991,16 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2051,12 +2064,6 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	RelOptInfo *rel = best_path->path.parent;
 
 	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
-
-	/*
 	 * Sort clauses into the best execution order, although custom-scan
 	 * provider can reorder them again.
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 4d3fbca..cf7e8e9a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -569,6 +569,36 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (splan->fdw_ps_tlist != NIL)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					Assert(splan->scan.scanrelid == 0);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -583,6 +613,36 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (splan->custom_ps_tlist != NIL)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					Assert(splan->scan.scanrelid == 0);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b2becfa..c269ac0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -378,10 +378,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerForRelation(relation);
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4c76f54..26589e3 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -427,6 +428,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 24ade6c..0cf2768 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3815,6 +3815,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index dc0a7fc7..09b0823 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -157,6 +157,7 @@ typedef struct FdwRoutine
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid	GetFdwHandlerForRelation(Relation relation);
 extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 48203a0..26c992e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -470,7 +470,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -479,7 +485,9 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
@@ -487,10 +495,11 @@ typedef struct ForeignScan
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -511,6 +520,7 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
 	const CustomScanMethods *methods;
 } CustomScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7116496..5fa4e39 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index afa5f9b..093f9d1 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
#2Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#1)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Hello,

The attached patch is newer revision of custom-/foreign-join
interface.

I've ported my PG-Strom extension to fit the latest custom-scan
(+this patch) interface in this winter vacation.

The concept of "join replaced by foreign-/custom-scan" almost
works well, however, here are two small oversight on the v1
interface.

1. EXPLAIN didn't work when scanrelid==0.
ExplainNode() always called ExplainScanTarget() to T_ForeignScan
or T_CustomScan, however, foreign-/custom-scan node that replaced
join relation does not have a particular base relation.
So, I put a check to skip this call when scanrelid==0.

2. create_plan_recurse() needs to be available from extension.
In case when CustomScan node takes underlying plan nodes, its
PlanCustomPath() method is also responsible to invoke the plan
creation routine of the underlying path-node. However, existing
code declared create_plan_recurse() as static function.
So, this patch re-declared it as external function.

Also, one other point I'd like to have in this interface.
In case when foreign-/custom-scan node has pseudo-scan
targetlist, it may contain the target-entries which are not
actually in use, but need to be here to lookup column name on
EXPLAIN command.
I'd like to add a flag to indicate the core backend to ignore
target-entries in the pseudo-scan tlist if resjunk=true, when
it initialized the foreign-/custom-scan-state node, and setting
up scan type descriptor.
It will reduce unnecessary projection, if foreign-/custom-scan
node can produce a tuple based on the expectation of tlist.

I'd like to see the comment around this point.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Wednesday, December 03, 2014 3:11 PM
To: Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

On Tue, Nov 25, 2014 at 3:44 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

Today, I had a talk with Hanada-san to clarify which can be a common
portion of them and how to implement it. Then, we concluded both of
features can be shared most of the infrastructure.
Let me put an introduction of join replacement by
foreign-/custom-scan

below.

Its overall design intends to inject foreign-/custom-scan node
instead of the built-in join logic (based on the estimated cost).
From the viewpoint of core backend, it looks like a sub-query scan
that contains relations join internally.

What we need to do is below:

(1) Add a hook add_paths_to_joinrel() It gives extensions (including
FDW drivers and custom-scan providers) chance to add alternative
paths towards a particular join of relations, using ForeignScanPath
or CustomScanPath, if it can run instead

of the built-in ones.

(2) Informs the core backend varno/varattno mapping One thing we
need to pay attention is, foreign-/custom-scan node that performs
instead of the built-in join node must return mixture of values come
from both relations. In case when FDW driver fetch a remote record
(also, fetch a record computed by external computing resource), the
most reasonable way is to store it on ecxt_scantuple of ExprContext,
then kicks projection with varnode that references this slot.
It needs an infrastructure that tracks relationship between original
varnode and the alternative varno/varattno. We thought, it shall be
mapped to INDEX_VAR and a virtual attribute number to reference
ecxt_scantuple naturally, and this infrastructure is quite helpful
for

both of ForegnScan/CustomScan.

We'd like to add List *fdw_varmap/*custom_varmap variable to both of
plan

nodes.

It contains list of the original Var node that shall be mapped on
the position according to the list index. (e.g, the first varnode is
varno=INDEX_VAR and
varattno=1)

(3) Reverse mapping on EXPLAIN
For EXPLAIN support, above varnode on the pseudo relation scan
needed to be solved. All we need to do is initialization of
dpns->inner_tlist on
set_deparse_planstate() according to the above mapping.

(4) case of scanrelid == 0
To skip open/close (foreign) tables, we need to have a mark to
introduce the backend not to initialize the scan node according to
table definition, but according to the pseudo varnodes list.
As earlier custom-scan patch doing, scanrelid == 0 is a
straightforward mark to show the scan node is not combined with a

particular real relation.

So, it also need to add special case handling around
foreign-/custom-scan

code.

We expect above changes are enough small to implement basic join
push-down functionality (that does not involves external computing
of complicated expression node), but valuable to support in v9.5.

Please comment on the proposition above.

I don't really have any technical comments on this design right at the
moment, but I think it's an important area where PostgreSQL needs to
make some progress sooner rather than later, so I hope that we can get
something committed in time for 9.5.

I tried to implement the interface portion, as attached.
Hanada-san may be under development of postgres_fdw based on this interface
definition towards the next commit fest.

Overall design of this patch is identical with what I described above.
It intends to allow extensions (FDW driver or custom-scan provider) to
replace a join by a foreign/custom-scan which internally contains a result
set of relations join externally computed. It looks like a relation scan
on the pseudo relation.

One we need to pay attention is, how setrefs.c fixes up varno/varattno unlike
regular join structure. I could find IndexOnlyScan already has similar
infrastructure that redirect references of varnode to a certain column on
ecxt_scantuple of ExprContext using a pair of INDEX_VAR and alternative
varattno.

This patch put a new field: fdw_ps_tlist of ForeignScan, and custom_ps_tlist
of CustomScan. It is extension's role to set a pseudo- scan target-list
(so, ps_tlist) of the foreign/custom-scan that replaced a join.
If it is not NIL, set_plan_refs() takes another strategy to fix up them.
It calls fix_upper_expr() to map varnodes of expression-list on INDEX_VAR
according to the ps_tlist, then extension is expected to put values/isnull
pair on ss_ScanTupleSlot of scan-state according to the ps_tlist
preliminary constructed.

Regarding to the primary hook to add alternative foreign/custom-scan path
instead of built-in join paths, I added the following hook on
add_paths_to_joinrel().

/* Hook for plugins to get control in add_paths_to_joinrel() */
typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
List *restrictlist,
JoinType jointype,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors
*semifactors,
Relids
param_source_rels,
Relids
extra_lateral_rels);
extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;

It shall give enough information for extensions to determine whether it
can offer alternative paths, or not.

One thing I concerned about is, fdw_handler to be called on joinrel is not
obvious, unlike custom-scan that hold reference to CustomScanMethods,
because joinrel is not managed by any FDW drivers.
So, I had to add "Oid fdw_handler" field onto RelOptInfo to track which
foreign-tables are involved in this relation join. This field shall have
oid of valid FDW handler if both inner/outer relation is managed by same
FDW handler. Elsewhere, InvalidOid. Even if either/both of them are
relations-join, fdw_handler shall be set as long as it is managed by same
FDW handler. It allows to replace join by foreign-scan that involves more
than two tables.

One new interface contract is case of scanrelid == 0. If foreign-/custom-
scan is not associated with a particular relation, ExecInitXXX() tries to
initialize ss_ScanTupleSlot according to the ps_tlist, and relations is
not opened.

Because the working example is still under development, this patch is not
tested/validated yet. However, it briefly implements the concept of what
we'd like to enhance foreign-/custom-scan functionality.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.5-custom-join.v2.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v2.patchDownload
 src/backend/commands/explain.c          |  5 ++-
 src/backend/executor/execScan.c         |  4 +++
 src/backend/executor/nodeCustom.c       | 38 ++++++++++++++++-----
 src/backend/executor/nodeForeignscan.c  | 34 +++++++++++++------
 src/backend/foreign/foreign.c           | 32 ++++++++++++++----
 src/backend/nodes/copyfuncs.c           |  3 ++
 src/backend/nodes/outfuncs.c            |  3 ++
 src/backend/optimizer/path/joinpath.c   | 15 +++++++++
 src/backend/optimizer/plan/createplan.c | 36 +++++++++++---------
 src/backend/optimizer/plan/setrefs.c    | 60 +++++++++++++++++++++++++++++++++
 src/backend/optimizer/util/plancat.c    |  7 +++-
 src/backend/optimizer/util/relnode.c    | 13 +++++++
 src/backend/utils/adt/ruleutils.c       |  4 +++
 src/include/foreign/fdwapi.h            |  1 +
 src/include/nodes/plannodes.h           | 20 ++++++++---
 src/include/nodes/relation.h            |  2 ++
 src/include/optimizer/paths.h           | 13 +++++++
 src/include/optimizer/planmain.h        |  1 +
 18 files changed, 244 insertions(+), 47 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 064f880..cf6f885 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1065,9 +1065,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 1319519..e8784d9 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0 &&
+			 (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+		varno = INDEX_VAR;
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index 576b295..24d0e5c 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
+	Index				scan_relid = cscan->scan.scanrelid;
 	Relation			scan_rel;
 
 	/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,31 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +109,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 9cc5345..fe3bbba 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
+	Index		scanrelid = node->scan.scanrelid;
 	Relation	currentRelation;
 	FdwRoutine *fdwroutine;
 
@@ -141,16 +142,28 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		ps_tupdesc = ExecTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +174,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +206,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 4f5f6ae..860b6ca 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -302,13 +302,12 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +349,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,6 +408,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+	return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
 
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a737d7d..a964f5a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -590,7 +590,9 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
@@ -615,6 +617,7 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
 
 	/*
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e3e29f5..69af422 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -556,7 +556,9 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
@@ -570,6 +572,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index be54f3d..030158d 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,19 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDW drivers or custom-scan providers, in
+	 * addition to built-in paths.
+	 *
+	 * XXX - In case of FDW, we may be able to omit invocation if joinrel's
+	 * fdwhandler (set only if both relations are managed by same FDW server).
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8f9ae4f..58c2fce 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1958,16 +1957,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1982,13 +1991,16 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2052,12 +2064,6 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	RelOptInfo *rel = best_path->path.parent;
 
 	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
-
-	/*
 	 * Sort clauses into the best execution order, although custom-scan
 	 * provider can reorder them again.
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 4d3fbca..cf7e8e9a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -569,6 +569,36 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (splan->fdw_ps_tlist != NIL)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					Assert(splan->scan.scanrelid == 0);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -583,6 +613,36 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (splan->custom_ps_tlist != NIL)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					Assert(splan->scan.scanrelid == 0);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b2becfa..c269ac0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -378,10 +378,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerForRelation(relation);
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4c76f54..26589e3 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -427,6 +428,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 24ade6c..0cf2768 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3815,6 +3815,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index dc0a7fc7..09b0823 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -157,6 +157,7 @@ typedef struct FdwRoutine
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid	GetFdwHandlerForRelation(Relation relation);
 extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 48203a0..26c992e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -470,7 +470,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -479,7 +485,9 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
@@ -487,10 +495,11 @@ typedef struct ForeignScan
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -511,6 +520,7 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
 	const CustomScanMethods *methods;
 } CustomScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7116496..5fa4e39 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index afa5f9b..093f9d1 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 3fdc2cb..ad969df 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Kouhei Kaigai (#2)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 1/6/15, 8:17 AM, Kouhei Kaigai wrote:

The attached patch is newer revision of custom-/foreign-join
interface.

Shouldn't instances of

scan_relid > 0

be

scan_relid != InvalidOid

?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Petr Jelinek
petr@2ndquadrant.com
In reply to: Jim Nasby (#3)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 07/01/15 00:05, Jim Nasby wrote:

On 1/6/15, 8:17 AM, Kouhei Kaigai wrote:

The attached patch is newer revision of custom-/foreign-join
interface.

Shouldn't instances of

scan_relid > 0

be

scan_relid != InvalidOid

Ideally, they should be OidIsValid(scan_relid)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Petr Jelinek (#4)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

scan_relid != InvalidOid

Ideally, they should be OidIsValid(scan_relid)

Scan.scanrelid is an index of range-tables list, not an object-id.
So, InvalidOid or OidIsValid() are not a good choice.

The bare relation oid has to be saved on relid of RangeTblEntry
which can be pulled using rt_fetch(scanrelid, range_tables).

I could found an assertion below at ExecScanFetch().
Assert(scanrelid > 0);
Probably, it is a usual manner for this.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Petr Jelinek [mailto:petr@2ndquadrant.com]
Sent: Wednesday, January 07, 2015 8:24 AM
To: Jim Nasby; Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
API)

On 07/01/15 00:05, Jim Nasby wrote:

On 1/6/15, 8:17 AM, Kouhei Kaigai wrote:

The attached patch is newer revision of custom-/foreign-join
interface.

Shouldn't instances of

scan_relid > 0

be

scan_relid != InvalidOid

Ideally, they should be OidIsValid(scan_relid)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#2)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Tue, Jan 6, 2015 at 9:17 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch is newer revision of custom-/foreign-join
interface.

It seems that the basic purpose of this patch is to allow a foreign
scan or custom scan to have scanrelid == 0, reflecting the case where
we are scanning a joinrel rather than a baserel. The major problem
that seems to create is that we can't set the target list from the
relation descriptor, because there isn't one. To work around that,
you've added fdw_ps_list and custom_ps_tlist, which the FDW or
custom-plan provider must set. I don't know off-hand whether that's a
good interface or not. How does the FDW know what to stick in there?
There's a comment that seems to be trying to explain this:

+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.

...but I can't understand what that's telling me.

You've added an "Oid fdw_handler" field to the ForeignScan and
RelOptInfo structures. I think this is the OID of the pg_proc entry
for the handler function; and I think we need it because, if scanrelid
== 0 then we don't have a relation that we can trace to a foreign
table, to a server, to an FDW, and then to a handler. So we need to
get that information some other way. When building joinrels, the
fdw_handler OID, and the associated routine, are propagated from any
two relations that share the same fdw_handler OID to the resulting
joinrel. I guess that's reasonable, although it feels a little weird
that we're copying around both the OID and the structure-pointer.

For non-obvious reasons, you've made create_plan_recurse() non-static.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#6)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Tue, Jan 6, 2015 at 9:17 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch is newer revision of custom-/foreign-join
interface.

It seems that the basic purpose of this patch is to allow a foreign scan
or custom scan to have scanrelid == 0, reflecting the case where we are
scanning a joinrel rather than a baserel. The major problem that seems
to create is that we can't set the target list from the relation descriptor,
because there isn't one. To work around that, you've added fdw_ps_list
and custom_ps_tlist, which the FDW or custom-plan provider must set. I
don't know off-hand whether that's a good interface or not. How does the
FDW know what to stick in there?

In the most usual scenario, FDP/CSP will make a ps_tlist according to the
target-list of the joinrel (that contains mixture of var-nodes to left-side
and right-side), and qualifier's expression tree if any.
As long as FDW can construct a remote query, it knows which attributes shall
be returned and which relation does it come from. It is equivalent to what
ps_tlist tries to inform the core optimizer.

There's a comment that seems to be trying to explain this:

+ * An optional fdw_ps_tlist is used to map a reference to an attribute
+ of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible
+ to
+ * set expected target list for this. If FDW returns records as
+ foreign-
+ * table definition, just put NIL here.

...but I can't understand what that's telling me.

Sorry, let me explain in another expression.

A joinrel has a target-list that can/may contain references to both of
the left and right relations. These are eventually mapped to either
INNER_VAR or OUTER_VAR, then executor switch the TupleTableSlot
(whether ecxt_innertuple or ecxt_outertuple) according to the special
varno.
On the other hands, because ForeignScan/CustomScan is a scan plan, it
shall have just one TupleTableSlot on execution time. Thus, we need a
mechanism that maps attributes from both of the relations on a certain
location of the slot; that shall be eventually translated to var-node
with INDEX_VAR to reference ecxt_scantuple.
Of course, ps_tlist is not necessary if ForeignScan/CustomScan scans
on a base relation as literal. In this case, the interface contract
expects NIL is set on the ps_tlist field.

You've added an "Oid fdw_handler" field to the ForeignScan and RelOptInfo
structures. I think this is the OID of the pg_proc entry for the handler
function; and I think we need it because, if scanrelid == 0 then we don't
have a relation that we can trace to a foreign table, to a server, to an
FDW, and then to a handler. So we need to get that information some other
way. When building joinrels, the fdw_handler OID, and the associated
routine, are propagated from any two relations that share the same
fdw_handler OID to the resulting joinrel. I guess that's reasonable,
although it feels a little weird that we're copying around both the OID
and the structure-pointer.

Unlike CustomScan node, ForeignScan node does not have function pointers.
In addition, it is dynamically allocated by palloc(), so we have no
guarantee the pointer constructed on plan-stage is valid on beginning
of the executor.
It is the reason why I put OID of the FDW handler routine.
Any other better idea?

For non-obvious reasons, you've made create_plan_recurse() non-static.

When custom-scan node replaced a join-plan, it shall have at least two
child plan-nodes. The callback handler of PlanCustomPath needs to be
able to call create_plan_recurse() to transform the underlying path-nodes
to plan-nodes, because this custom-scan node may take other built-in
scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to remote
side, thus we may not expect ForeignScan has underlying plans...

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Kouhei Kaigai (#5)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 1/6/15, 5:43 PM, Kouhei Kaigai wrote:

scan_relid != InvalidOid

Ideally, they should be OidIsValid(scan_relid)

Scan.scanrelid is an index of range-tables list, not an object-id.
So, InvalidOid or OidIsValid() are not a good choice.

I think the name needs to change then; scan_relid certainly looks like the OID of a relation.

scan_index?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Jim Nasby (#8)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-01-10 8:18 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/6/15, 5:43 PM, Kouhei Kaigai wrote:

scan_relid != InvalidOid

Ideally, they should be OidIsValid(scan_relid)

Scan.scanrelid is an index of range-tables list, not an object-id.
So, InvalidOid or OidIsValid() are not a good choice.

I think the name needs to change then; scan_relid certainly looks like the
OID of a relation.

scan_index?

Yep, I had a same impression when I looked at the code first time,
however, it is defined as below. Not a manner of custom-scan itself.

/*
* ==========
* Scan nodes
* ==========
*/
typedef struct Scan
{
Plan plan;
Index scanrelid; /* relid is index into the range table */
} Scan;

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Petr Jelinek
petr@2ndquadrant.com
In reply to: Kohei KaiGai (#9)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 10/01/15 01:19, Kohei KaiGai wrote:

2015-01-10 8:18 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/6/15, 5:43 PM, Kouhei Kaigai wrote:

scan_relid != InvalidOid

Ideally, they should be OidIsValid(scan_relid)

Scan.scanrelid is an index of range-tables list, not an object-id.
So, InvalidOid or OidIsValid() are not a good choice.

I think the name needs to change then; scan_relid certainly looks like the
OID of a relation.

scan_index?

Yep, I had a same impression when I looked at the code first time,
however, it is defined as below. Not a manner of custom-scan itself.

/*
* ==========
* Scan nodes
* ==========
*/
typedef struct Scan
{
Plan plan;
Index scanrelid; /* relid is index into the range table */
} Scan;

Yeah there are actually several places in the code where "relid" means
index in range table and not oid of relation, it still manages to
confuse me. Nothing this patch can do about that.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Petr Jelinek (#10)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 1/9/15, 6:44 PM, Petr Jelinek wrote:

Yep, I had a same impression when I looked at the code first time,
however, it is defined as below. Not a manner of custom-scan itself.

/*
* ==========
* Scan nodes
* ==========
*/
typedef struct Scan
{
Plan plan;
Index scanrelid; /* relid is index into the range table */
} Scan;

Yeah there are actually several places in the code where "relid" means index in range table and not oid of relation, it still manages to confuse me. Nothing this patch can do about that.

Well, since it's confused 3 of us now... should we change it (as a separate patch)? I'm willing to do that work but don't want to waste time if it'll just be rejected.

Any other examples of this I should fix too?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#11)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 1/9/15, 6:54 PM, Jim Nasby wrote:

On 1/9/15, 6:44 PM, Petr Jelinek wrote:

Yep, I had a same impression when I looked at the code first time,
however, it is defined as below. Not a manner of custom-scan itself.

/*
* ==========
* Scan nodes
* ==========
*/
typedef struct Scan
{
Plan plan;
Index scanrelid; /* relid is index into the range table */
} Scan;

Yeah there are actually several places in the code where "relid" means index in range table and not oid of relation, it still manages to confuse me. Nothing this patch can do about that.

Well, since it's confused 3 of us now... should we change it (as a separate patch)? I'm willing to do that work but don't want to waste time if it'll just be rejected.

Any other examples of this I should fix too?

Sorry, to clarify... any other items besides Scan.scanrelid that I should fix?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Jim Nasby (#12)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-01-10 9:56 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/9/15, 6:54 PM, Jim Nasby wrote:

On 1/9/15, 6:44 PM, Petr Jelinek wrote:

Yep, I had a same impression when I looked at the code first time,
however, it is defined as below. Not a manner of custom-scan itself.

/*
* ==========
* Scan nodes
* ==========
*/
typedef struct Scan
{
Plan plan;
Index scanrelid; /* relid is index into the range table
*/
} Scan;

Yeah there are actually several places in the code where "relid" means
index in range table and not oid of relation, it still manages to confuse
me. Nothing this patch can do about that.

Well, since it's confused 3 of us now... should we change it (as a
separate patch)? I'm willing to do that work but don't want to waste time if
it'll just be rejected.

Any other examples of this I should fix too?

Sorry, to clarify... any other items besides Scan.scanrelid that I should
fix?

This naming is a little bit confusing, however, I don't think it "should" be
changed because this structure has been used for a long time, so reworking
will prevent back-patching when we find bugs around "scanrelid".

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Kohei KaiGai (#13)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 1/9/15, 8:51 PM, Kohei KaiGai wrote:

2015-01-10 9:56 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/9/15, 6:54 PM, Jim Nasby wrote:

On 1/9/15, 6:44 PM, Petr Jelinek wrote:

Yep, I had a same impression when I looked at the code first time,
however, it is defined as below. Not a manner of custom-scan itself.

/*
* ==========
* Scan nodes
* ==========
*/
typedef struct Scan
{
Plan plan;
Index scanrelid; /* relid is index into the range table
*/
} Scan;

Yeah there are actually several places in the code where "relid" means
index in range table and not oid of relation, it still manages to confuse
me. Nothing this patch can do about that.

Well, since it's confused 3 of us now... should we change it (as a
separate patch)? I'm willing to do that work but don't want to waste time if
it'll just be rejected.

Any other examples of this I should fix too?

Sorry, to clarify... any other items besides Scan.scanrelid that I should
fix?

This naming is a little bit confusing, however, I don't think it "should" be
changed because this structure has been used for a long time, so reworking
will prevent back-patching when we find bugs around "scanrelid".

We can still backpatch; it just requires more work. And how many bugs do we actually expect to find around this anyway?

If folks think this just isn't worth fixing fine, but I find the backpatching argument rather dubious.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Jim Nasby (#14)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-01-11 10:40 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/9/15, 8:51 PM, Kohei KaiGai wrote:

2015-01-10 9:56 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/9/15, 6:54 PM, Jim Nasby wrote:

On 1/9/15, 6:44 PM, Petr Jelinek wrote:

Yep, I had a same impression when I looked at the code first time,
however, it is defined as below. Not a manner of custom-scan itself.

/*
* ==========
* Scan nodes
* ==========
*/
typedef struct Scan
{
Plan plan;
Index scanrelid; /* relid is index into the range
table
*/
} Scan;

Yeah there are actually several places in the code where "relid" means
index in range table and not oid of relation, it still manages to
confuse
me. Nothing this patch can do about that.

Well, since it's confused 3 of us now... should we change it (as a
separate patch)? I'm willing to do that work but don't want to waste
time if
it'll just be rejected.

Any other examples of this I should fix too?

Sorry, to clarify... any other items besides Scan.scanrelid that I should
fix?

This naming is a little bit confusing, however, I don't think it "should"
be
changed because this structure has been used for a long time, so reworking
will prevent back-patching when we find bugs around "scanrelid".

We can still backpatch; it just requires more work. And how many bugs do we
actually expect to find around this anyway?

If folks think this just isn't worth fixing fine, but I find the
backpatching argument rather dubious.

Even though here is no problem around Scan structure itself, a bugfix may
use the variable name of "scanrelid" to fix it. If we renamed it on v9.5, we
also need a little adjustment to apply this bugfix on prior versions.
It seems to me a waste of time for committers.
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Petr Jelinek
petr@2ndquadrant.com
In reply to: Kohei KaiGai (#15)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 11/01/15 08:56, Kohei KaiGai wrote:

2015-01-11 10:40 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

Yeah there are actually several places in the code where "relid" means
index in range table and not oid of relation, it still manages to
confuse
me. Nothing this patch can do about that.

Well, since it's confused 3 of us now... should we change it (as a
separate patch)? I'm willing to do that work but don't want to waste
time if
it'll just be rejected.

Any other examples of this I should fix too?

Sorry, to clarify... any other items besides Scan.scanrelid that I should
fix?

This naming is a little bit confusing, however, I don't think it "should"
be
changed because this structure has been used for a long time, so reworking
will prevent back-patching when we find bugs around "scanrelid".

We can still backpatch; it just requires more work. And how many bugs do we
actually expect to find around this anyway?

If folks think this just isn't worth fixing fine, but I find the
backpatching argument rather dubious.

Even though here is no problem around Scan structure itself, a bugfix may
use the variable name of "scanrelid" to fix it. If we renamed it on v9.5, we
also need a little adjustment to apply this bugfix on prior versions.
It seems to me a waste of time for committers.

I tend to agree, especially as there is multiple places in code this
would affect - RelOptInfo and RestrictInfo have same issue, etc.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Petr Jelinek (#16)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Petr Jelinek <petr@2ndquadrant.com> writes:

On 11/01/15 08:56, Kohei KaiGai wrote:

2015-01-11 10:40 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

Yeah there are actually several places in the code where "relid" means
index in range table and not oid of relation, it still manages to
confuse me. Nothing this patch can do about that.

Well, since it's confused 3 of us now... should we change it (as a
separate patch)? I'm willing to do that work but don't want to waste
time if it'll just be rejected.

It seems to me a waste of time for committers.

I tend to agree, especially as there is multiple places in code this
would affect - RelOptInfo and RestrictInfo have same issue, etc.

Generally speaking, if you're not sure whether a "relid" variable in the
planner is meant to be a table OID or a rangetable index, you can tell by
noting whether it's declared as type Oid or type int. So I'm also -1 on
any wholesale renaming, especially given the complete lack of an obviously
superior naming convention to change to.

If there are any places where such variables are improperly declared, then
of course we ought to fix that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#7)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

When custom-scan node replaced a join-plan, it shall have at least two
child plan-nodes. The callback handler of PlanCustomPath needs to be
able to call create_plan_recurse() to transform the underlying path-nodes
to plan-nodes, because this custom-scan node may take other built-in
scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to remote
side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#18)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

When custom-scan node replaced a join-plan, it shall have at least two
child plan-nodes. The callback handler of PlanCustomPath needs to be
able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take other
built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to remote
side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN
----------------------------------------------------------------------------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000 width=37)
(13 rows)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, January 15, 2015 2:07 AM
To: Kaigai Kouhei(海外 浩平)
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5]
Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

When custom-scan node replaced a join-plan, it shall have at least two
child plan-nodes. The callback handler of PlanCustomPath needs to be
able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take other
built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to remote
side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Michael Paquier
michael.paquier@gmail.com
In reply to: Kouhei Kaigai (#19)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Thu, Jan 15, 2015 at 8:02 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least two
child plan-nodes. The callback handler of PlanCustomPath needs to be
able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take other
built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to remote
side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls
create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that
CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN

----------------------------------------------------------------------------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922
width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009
width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000 width=37)
(13 rows)

Where are we on this? AFAIK, we have now a feature with no documentation
and no example in-core to test those custom routine APIs, hence moved to
next CF.
--
Michael

#21Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Michael Paquier (#20)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Where are we on this? AFAIK, we have now a feature with no documentation
and no example in-core to test those custom routine APIs, hence moved to
next CF.

Now Hanada-san is working on the example module that use this new
infrastructure on top of postgres_fdw. Probably, he will submit the
patch within a couple of days, for the upcoming commit fest.

Regarding to the documentation, a consensus was to make up a wikipage
to edit the description by everyone, then it shall become source of
SGML file.
The latest one is here:
https://wiki.postgresql.org/wiki/CustomScanInterface

Anyway, the next commit-fest shall within a couple of days. I'd like to
have discussion for the feature.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Sent: Friday, February 13, 2015 4:38 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5]
Custom Plan API)

On Thu, Jan 15, 2015 at 8:02 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai

<kaigai@ak.jp.nec.com> wrote:

When custom-scan node replaced a join-plan, it shall have at

least two

child plan-nodes. The callback handler of PlanCustomPath needs

to be

able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may

take other

built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations

to remote

side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.
c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of
GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has
multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls
create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that
CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN
t2;
QUERY PLAN
--------------------------------------------------------------
--------------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31
rows=3970922 width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00
rows=4000009 width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000
width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00
rows=40000 width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000
width=37)
(13 rows)

Where are we on this? AFAIK, we have now a feature with no documentation
and no example in-core to test those custom routine APIs, hence moved to
next CF.
--

Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Michael Paquier
michael.paquier@gmail.com
In reply to: Kouhei Kaigai (#21)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, Feb 13, 2015 at 4:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Where are we on this? AFAIK, we have now a feature with no documentation
and no example in-core to test those custom routine APIs, hence moved to
next CF.

Now Hanada-san is working on the example module that use this new
infrastructure on top of postgres_fdw. Probably, he will submit the
patch within a couple of days, for the upcoming commit fest.

I am a bit surprised by that. Are you planning to give up on the ctidscan
module module and

Regarding to the documentation, a consensus was to make up a wikipage
to edit the description by everyone, then it shall become source of
SGML file.
The latest one is here:
https://wiki.postgresql.org/wiki/CustomScanInterface

OK. This looks like a good base. It would be good to have an actual patch
for review as well at this stage.
--
Michael

#23Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#22)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, Feb 13, 2015 at 6:12 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Fri, Feb 13, 2015 at 4:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
wrote:

Where are we on this? AFAIK, we have now a feature with no documentation
and no example in-core to test those custom routine APIs, hence moved to
next CF.

Now Hanada-san is working on the example module that use this new
infrastructure on top of postgres_fdw. Probably, he will submit the
patch within a couple of days, for the upcoming commit fest.

I am a bit surprised by that. Are you planning to give up on the ctidscan
module module and

Sorry I typed the wrong key.
So... Are you planning to give up on the ctidscan module and submit only
the module written by Hanada-san on top of postgres_fdw? As I imagine that
the goal is just to have a test module to run the APIs why would the module
submitted by Hanada-san be that necessary?
--
Michael

#24Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Michael Paquier (#23)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Sorry I typed the wrong key.

So... Are you planning to give up on the ctidscan module and submit only
the module written by Hanada-san on top of postgres_fdw? As I imagine that
the goal is just to have a test module to run the APIs why would the module
submitted by Hanada-san be that necessary?

No. The ctidscan module is a reference implementation towards the existing
custom-scan interface that just supports relation scan with own way, but no
support for relations join at this moment.

The upcoming enhancement to postgres_fdw will support remote join, that looks
like a scan on pseudo materialized relation on local side. It is the proof of
the concept to the new interface I like to discuss in this thread.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Sent: Friday, February 13, 2015 6:17 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
API)

On Fri, Feb 13, 2015 at 6:12 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Fri, Feb 13, 2015 at 4:59 PM, Kouhei Kaigai
<kaigai@ak.jp.nec.com> wrote:

Where are we on this? AFAIK, we have now a feature with

no documentation

and no example in-core to test those custom routine APIs,

hence moved to

next CF.

Now Hanada-san is working on the example module that use
this new
infrastructure on top of postgres_fdw. Probably, he will
submit the
patch within a couple of days, for the upcoming commit fest.

I am a bit surprised by that. Are you planning to give up on the
ctidscan module module and

Sorry I typed the wrong key.

So... Are you planning to give up on the ctidscan module and submit only
the module written by Hanada-san on top of postgres_fdw? As I imagine that
the goal is just to have a test module to run the APIs why would the module
submitted by Hanada-san be that necessary?

--

Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#19)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

The attached patch is a rebased version of join replacement with
foreign-/custom-scan. Here is no feature updates at this moment
but SGML documentation is added (according to Michael's comment).

This infrastructure allows foreign-data-wrapper and custom-scan-
provider to add alternative scan paths towards relations join.
From viewpoint of the executor, it looks like a scan on a pseudo-
relation that is materialized from multiple relations, even though
FDW/CSP internally processes relations join with their own logic.

Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
scan node runs on a pseudo relation and (2) fdw_ps_tlist and
custom_ps_tlist introduce the definition of the pseudo relation,
because it is not associated with a tangible relation unlike
simple scan case, thus planner cannot know the expected record
type to be returned without these additional information.
These two enhancement enables extensions to process relations
join internally, and to perform as like existing scan node from
viewpoint of the core backend.

Also, as an aside. I had a discussion with Hanada-san about this
interface off-list. He had an idea to keep create_plan_recurse()
static, using a special list field in CustomPath structure to
chain underlying Path node. If core backend translate the Path
node to Plan node if valid list given, extension does not need to
call create_plan_recurse() by itself.
I have no preference about this. Does anybody have opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Thursday, January 15, 2015 8:03 AM
To: Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN
----------------------------------------------------------------------
------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922
width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009
width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000
width=37)
(13 rows)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, January 15, 2015 2:07 AM
To: Kaigai Kouhei(海外 浩平)
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS]
[v9.5] Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

pgsql-v9.5-custom-join.v4.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v4.patchDownload
 doc/src/sgml/custom-scan.sgml           | 278 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |   5 +-
 src/backend/executor/execScan.c         |   4 +
 src/backend/executor/nodeCustom.c       |  38 +++--
 src/backend/executor/nodeForeignscan.c  |  34 ++--
 src/backend/foreign/foreign.c           |  32 +++-
 src/backend/nodes/copyfuncs.c           |   3 +
 src/backend/nodes/outfuncs.c            |   3 +
 src/backend/optimizer/path/joinpath.c   |  15 ++
 src/backend/optimizer/plan/createplan.c |  36 +++--
 src/backend/optimizer/plan/setrefs.c    |  60 +++++++
 src/backend/optimizer/util/plancat.c    |   7 +-
 src/backend/optimizer/util/relnode.c    |  13 ++
 src/backend/utils/adt/ruleutils.c       |   4 +
 src/include/foreign/fdwapi.h            |   1 +
 src/include/nodes/plannodes.h           |  20 ++-
 src/include/nodes/relation.h            |   2 +
 src/include/optimizer/paths.h           |  13 ++
 src/include/optimizer/planmain.h        |   1 +
 21 files changed, 524 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..1d103f5
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,278 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan provider</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+
+ <para>
+  Prior to query execution, the PostgreSQL planner constructs a plan tree
+  that usually consists of built-in plan nodes (eg: SeqScan, HashJoin, etc).
+  The custom-scan interface allows extensions to provide a custom-scan plan
+  that implements its own logic, in addition to the built-in nodes, to scan
+  a relation or join relations. Once a custom-scan node is chosen by planner,
+  callback functions associated with this custom-scan node shall be invoked
+  during query execution. Custom-scan provider is responsible for returning
+  equivalent result set as built-in logic would, but it is free to scan or
+  join the target relations according to its own logic.
+  This chapter explains how to write a custom-scan provider.
+ </para>
+
+ <para>
+  The first thing custom-scan provider to do is adding alternative paths
+  to scan a relation (on the <literal>set_rel_pathlist_hook</>) or
+  to join relations (on the <literal>set_join_pathlist_hook</>).
+  It expects <literal>CustomPath</> node is added with estimated execution
+  cost and a set of callbacks defined at <literal>CustomPathMethods</>.
+  Both of hooks also give extensions enough information to construct
+  <literal>CustomPath</> node, like <literal>RelOptInfo</> of relations
+  to be scanned, joined or read as source of join. Custom-scan provider
+  is responsible to compute a reasonable cost estimation which is
+  comparable to built-in logics.
+ </para>
+
+ <para>
+  Once a custom-path got chosen by planner, custom-scan provider has to
+  populate a plan node according to the <literal>CustomPath</> node.
+  At this moment, <literal>CustomScan</> is the only node type that allows
+  to implement custom-logic towards any <literal>CustomPath</> node.
+  The <literal>CustomScan</> structure has two special fields to keep
+  private information; <literal>custom_exprs</> and <literal>custom_private</>.
+  The <literal>custom_exprs</> intends to save a couple of expression trees
+  that shall be updated on <filename>setrefs.c</> and <filename>subselect.c</>.
+  On the other hands, <literal>custom_private</> is expected to save really
+  private information nobody will touch except for the custom-scan provider
+  itself. A plan-tree, which contains custom-scan node, can be duplicated
+  using <literal>copyObject()</>, so all the data structure stored within
+  these two fields must be safe to <literal>copyObject()</>.
+ </para>
+
+ <para>
+  In case when extension implements its own logic to join relations, it looks
+  like a simple relation scan but on a pseudo materialized relation from
+  multiple source relations, from the standpoint of the core executor.
+  Custom-scan provider is expected to process relation join with its own
+  logic internally, then return a set of records according to the tuple
+  descriptor of the scan node.
+  <literal>CustomScan</> node that replaced a relations join is not
+  associated with a particular tangible relation, unlike simple scan case,
+  so extension needs to inform the core planner expected records type to be
+  fetched from this node.
+  What we should do here is, setting zero on the <literal>scanrelid</> and
+  a valid list of <literal>TargetEntry</> on the <literal>custom_ps_tlist</>
+  instead. These configuration informs the core planner this custom-scan
+  node is not associated with a particular physical table and expected
+  record type to be returned.
+ </para>
+
+ <para>
+  Once a plan-tree is moved to the executor, it has to construct plan-state
+  objects according to the supplied plan-node.
+  Custom-scan is not an exception. Executor invokes a callback to populate
+  <literal>CustomScanState</> node, if <literal>CustomScan</> node gets
+  found in the supplied plan-tree.
+  It does not have fields to save private information unlike
+  <literal>CustomScan</> node, because custom-scan provider can allocate
+  larger object than the bare <literal>CustomScanState</> to store various
+  private execution state.
+  It looks like a relationship of <literal>ScanState</> structure towards
+  <literal>PlanState</>; that expands scan specific fields towards generic
+  plan-state. In addition, custom-scan provider can expand fields on demand.
+  Once a CustomScanState gets constructed, BeginCustomScan is invoked during
+  executor initialization; ExecCustomScan is repeatedly called during
+  execution (returning a TupleTableSlot with each fetched record), then
+  EndCustomScan is invoked on cleanup of the executor.
+ </para>
+
+ <sect1 id="custom-scan-reference">
+  <title>Custom Scan Hooks and Callbacks</title>
+
+  <sect2 id="custom-scan-hooks">
+   <title>Custom Scan Hooks</title>
+   <para>
+    This hooks is invoked when the planner investigates the optimal way to
+    scan a particular relation. Extension can add alternative paths if it
+    can provide its own logic to scan towards the given scan and qualifiers.
+<programlisting>
+typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
+                                            RelOptInfo *rel,
+                                            Index rti,
+                                            RangeTblEntry *rte);
+extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+</programlisting>
+   </para>
+
+   <para>
+    This hook is invoked when the planner investigates the optimal combination
+    of relations join. Extension can add alternative paths that replaces the
+    relation join with its own logic. 
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+                                             RelOptInfo *joinrel,
+                                             RelOptInfo *outerrel,
+                                             RelOptInfo *innerrel,
+                                             List *restrictlist,
+                                             JoinType jointype,
+                                             SpecialJoinInfo *sjinfo,
+                                             SemiAntiJoinFactors *semifactors,
+                                             Relids param_source_rels,
+                                             Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-path-callbacks">
+   <title>Custom Path Callbacks</title>
+   <para>
+    A <literal>CustomPathMethods</> table contains a set of callbacks related
+    to <literal>CustomPath</> node. The core backend invokes these callbacks
+    during query planning.
+   </para>
+   <para>
+    This callback is invoked when the core backend tries to populate
+    <literal>CustomScan</> node according to the supplied
+    <literal>CustomPath</> node.
+    Custom-scan provider is responsible to allocate a <literal>CustomScan</>
+    node and initialize each fields of them.
+<programlisting>
+Plan *(*PlanCustomPath) (PlannerInfo *root,
+                         RelOptInfo *rel,
+                         CustomPath *best_path,
+                         List *tlist,
+                         List *clauses);
+</programlisting>
+   </para>
+   <para>
+    This optional callback will be invoked when <literal>nodeToString()</>
+    tries to create a text representation of <literal>CustomPath</> node.
+    A custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that expression nodes linked to
+    <literal>custom_private</> shall be transformed to text representation
+    by the core, so nothing to do by extension.
+<programlisting>
+void (*TextOutCustomPath) (StringInfo str,
+                           const CustomPath *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-callbacks">
+   <title>Custom Scan Callbacks</title>
+   <para>
+    A <literal>CustomScanMethods</> contains a set of callbacks related to
+    <literal>CustomScan</> node, then the core backend invokes these callbacks
+    during query planning and initialization of executor.
+   </para>
+   <para>
+    This callback shall be invoked when the core backend tries to populate
+    <literal>CustomScanState</> node according to the supplied
+    <literal>CustomScan</> node. The custom-scan provider is responsible to
+    allocate a <literal>CustomScanState</> (or its own data-type enhanced
+    from it), but no need to initialize the fields here, because
+    <literal>ExecInitCustomScan</> initializes the fields in
+    <literal>CustomScanState</>, then <literal>BeginCustomScan</> shall be
+    kicked on the end of executor initialization.
+<programlisting>
+Node *(*CreateCustomScanState) (CustomScan *cscan);
+</programlisting>
+   </para>
+   <para>
+    This optional callback shall be invoked when <literal>nodeToString()</>
+    tries to make text representation of <literal>CustomScan</> node.
+    Custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that it is not allowed to expand the data
+    structure of <literal>CustomScan</> node, so we usually don't need to
+    implement this callback.
+<programlisting>
+void (*TextOutCustomScan) (StringInfo str,
+                           const CustomScan *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-exec-callbacks">
+   <title>Custom Exec Callbacks</title>
+   <para>
+    A <literal>CustomExecMethods</> contains a set of callbacks related to
+    <literal>CustomScanState</> node, then the core backend invokes these
+    callbacks during query execution.
+   </para>
+   <para>
+    This callback allows a custom-scan provider to have final initialization
+    of the <literal>CustomScanState</> node.
+    The supplied <literal>CustomScanState</> node is partially initialized
+    according to either <literal>scanrelid</> or <literal>custom_ps_tlist</>
+    of <literal>CustomScan</> node. If the custom-scan provider wants to
+    apply additional initialization to the private fields, it can be done
+    by this callback.
+<programlisting>
+void (*BeginCustomScan) (CustomScanState *node,
+                         EState *estate,
+                         int eflags);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to produce the next tuple
+    of the relation scan. If any tuples, it should set it on the
+    <literal>ps_ResultTupleSlot</> then returns the tuple slot. Elsewhere,
+    <literal>NULL</> or empty slot shall be returned to inform end of the
+    relation scan.
+<programlisting>
+TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback allows a custom-scan provider to cleanup the
+    <literal>CustomScanState</> node. If it holds any private (and not
+    released automatically) resources on the supplied node, it can release
+    these resources prior to the cleanup of the common portion.
+<programlisting>
+void (*EndCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to rewind the current scan
+    position to the head of relation. Custom-scan provider is expected to
+    reset its internal state to restart the relation scan again.
+<programlisting>
+void (*ReScanCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to save the current
+    scan position on its internal state. It shall be able to restore the
+    position using <literal>RestrPosCustomScan</> callback. It shall be never
+    called unless <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*MarkPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to restore the
+    previous scan position that was saved by <literal>MarkPosCustomScan</>
+   callback. It shall be never called unless
+   <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*RestrPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback allows custom-scan provider to output additional
+    information on <command>EXPLAIN</> that involves custom-scan node.
+    Note that it can output common items; target-list, qualifiers, relation
+    to be scanned. So, it can be used when custom-scan provider wants to show
+    something others in addition to the items.
+<programlisting>
+void (*ExplainCustomScan) (CustomScanState *node,
+                           List *ancestors,
+                           ExplainState *es);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
+
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index f03b72a..89fff77 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -93,6 +93,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index a648a4c..e378d69 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -242,6 +242,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7cfc9bb..0b8de3f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1073,9 +1073,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..2f18a8a 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0 &&
+			 (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+		varno = INDEX_VAR;
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..ca51333 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
+	Index				scan_relid = cscan->scan.scanrelid;
 	Relation			scan_rel;
 
 	/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,31 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +109,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..f25eb6f 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
+	Index		scanrelid = node->scan.scanrelid;
 	Relation	currentRelation;
 	FdwRoutine *fdwroutine;
 
@@ -141,16 +142,28 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		ps_tupdesc = ExecTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +174,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +206,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..df69a95 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -302,13 +302,12 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +349,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,6 +408,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+	return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
 
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1a24f5..cb85468 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -590,7 +590,9 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
@@ -615,6 +617,7 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
 
 	/*
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index dd1278b..c4a06fc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -556,7 +556,9 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
@@ -570,6 +572,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index e6aa21c..5a24efa 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,19 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDW drivers or custom-scan providers, in
+	 * addition to built-in paths.
+	 *
+	 * XXX - In case of FDW, we may be able to omit invocation if joinrel's
+	 * fdwhandler (set only if both relations are managed by same FDW server).
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 655be81..06bea4d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1958,16 +1957,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1982,13 +1991,16 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2052,12 +2064,6 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	RelOptInfo *rel = best_path->path.parent;
 
 	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
-
-	/*
 	 * Sort clauses into the best execution order, although custom-scan
 	 * provider can reorder them again.
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7703946..d567c49 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -569,6 +569,36 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (splan->fdw_ps_tlist != NIL)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					Assert(splan->scan.scanrelid == 0);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -583,6 +613,36 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (splan->custom_ps_tlist != NIL)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					Assert(splan->scan.scanrelid == 0);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index fb7db6d..a4a35c3 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -378,10 +378,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerForRelation(relation);
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..ca71093 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -427,6 +428,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c1d860c..eb9eaf0 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3842,6 +3842,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..b494ff2 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -157,6 +157,7 @@ typedef struct FdwRoutine
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid	GetFdwHandlerForRelation(Relation relation);
 extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 316c9ce..6717c6d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -470,7 +470,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -479,7 +485,9 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
@@ -487,10 +495,11 @@ typedef struct ForeignScan
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -511,6 +520,7 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
 	const CustomScanMethods *methods;
 } CustomScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..9ef0b56 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 082f7d7..e66eaa5 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
#26Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#25)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

The attached version of custom/foreign-join interface patch
fixes up the problem reported on the join-pushdown support
thread.

The previous version referenced *_ps_tlist on setrefs.c, to
check whether the Custom/ForeignScan node is associated with
a particular base relation, or not.
This logic considered above nodes performs base relation scan,
if *_ps_tlist is valid. However, it was incorrect in case when
underlying pseudo-scan relation has empty targetlist.
Instead of the previous logic, it shall be revised to check
scanrelid itself. If zero, it means Custom/ForeignScan node is
not associated with a particular base relation, thus, its slot
descriptor for scan shall be constructed based on *_ps_tlist.

Also, I noticed a potential problem if CSP/FDW driver want to
displays expression nodes using deparse_expression() but
varnode within this expression does not appear in the *_ps_tlist.
For example, a remote query below shall return rows with two
columns.

SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid;

Thus, ForeignScan will perform like as a scan on relation with
two columns, and FDW driver will set two TargetEntry on the
fdw_ps_tlist. If FDW is designed to keep the join condition
(aid = bid) using expression node form, it is expected to be
saved on custom/fdw_expr variable, then setrefs.c rewrites the
varnode according to *_ps_tlist.
It means, we also have to add *_ps_tlist both of "aid" and "bid"
to avoid failure on variable lookup. However, these additional
entries changes the definition of the slot descriptor.
So, I adjusted ExecInitForeignScan and ExecInitCustomScan to
use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct
the slot descriptor based on the *_ps_tlist.
It expects CSP/FDW drivers to add target-entries with resjunk=true,
if it wants to have additional entries for variable lookups on
EXPLAIN command.

Fortunately or unfortunately, postgres_fdw keeps its remote query
in cstring form, so it does not need to add junk entries on the
fdw_ps_tlist.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Sunday, February 15, 2015 11:01 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

The attached patch is a rebased version of join replacement with
foreign-/custom-scan. Here is no feature updates at this moment
but SGML documentation is added (according to Michael's comment).

This infrastructure allows foreign-data-wrapper and custom-scan-
provider to add alternative scan paths towards relations join.
From viewpoint of the executor, it looks like a scan on a pseudo-
relation that is materialized from multiple relations, even though
FDW/CSP internally processes relations join with their own logic.

Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
scan node runs on a pseudo relation and (2) fdw_ps_tlist and
custom_ps_tlist introduce the definition of the pseudo relation,
because it is not associated with a tangible relation unlike
simple scan case, thus planner cannot know the expected record
type to be returned without these additional information.
These two enhancement enables extensions to process relations
join internally, and to perform as like existing scan node from
viewpoint of the core backend.

Also, as an aside. I had a discussion with Hanada-san about this
interface off-list. He had an idea to keep create_plan_recurse()
static, using a special list field in CustomPath structure to
chain underlying Path node. If core backend translate the Path
node to Plan node if valid list given, extension does not need to
call create_plan_recurse() by itself.
I have no preference about this. Does anybody have opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Thursday, January 15, 2015 8:03 AM
To: Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN
----------------------------------------------------------------------
------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922
width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009
width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000
width=37)
(13 rows)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, January 15, 2015 2:07 AM
To: Kaigai Kouhei(海外 浩平)
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS]
[v9.5] Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

pgsql-v9.5-custom-join.v5.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v5.patchDownload
 contrib/chkpass/chkpass.c                       |   6 +-
 contrib/cube/cubedata.h                         |  15 +-
 contrib/file_fdw/file_fdw.c                     |   2 +-
 contrib/hstore/hstore_gist.c                    |   2 +-
 contrib/hstore/hstore_io.c                      |  18 +-
 contrib/intarray/_int.h                         |   4 +-
 contrib/intarray/_int_gist.c                    |  16 +-
 contrib/ltree/ltree.h                           |  14 +-
 contrib/pageinspect/heapfuncs.c                 |  17 +-
 contrib/pageinspect/rawpage.c                   |   2 +-
 contrib/pg_trgm/trgm.h                          |   2 +-
 contrib/postgres_fdw/postgres_fdw.c             |   2 +-
 contrib/spi/timetravel.c                        |  25 +-
 doc/src/sgml/bki.sgml                           |   9 +-
 doc/src/sgml/client-auth.sgml                   |  36 +-
 doc/src/sgml/config.sgml                        |  58 +--
 doc/src/sgml/custom-scan.sgml                   | 278 ++++++++++++++
 doc/src/sgml/event-trigger.sgml                 | 134 +------
 doc/src/sgml/fdwhandler.sgml                    |  54 +++
 doc/src/sgml/filelist.sgml                      |   1 +
 doc/src/sgml/func.sgml                          |   8 +-
 doc/src/sgml/libpq.sgml                         |   9 +-
 doc/src/sgml/monitoring.sgml                    |   8 -
 doc/src/sgml/perform.sgml                       |  16 +-
 doc/src/sgml/postgres.sgml                      |   1 +
 doc/src/sgml/ref/create_type.sgml               |  25 +-
 doc/src/sgml/release-8.3.sgml                   |   2 +-
 doc/src/sgml/storage.sgml                       | 145 ++------
 doc/src/sgml/wal.sgml                           |  69 ++--
 doc/src/sgml/xfunc.sgml                         |  25 +-
 doc/src/sgml/xtypes.sgml                        |  52 +--
 src/Makefile.shlib                              |   3 -
 src/backend/access/common/heaptuple.c           |   2 +-
 src/backend/access/gist/gistscan.c              |   4 +-
 src/backend/access/heap/heapam.c                |  56 +--
 src/backend/access/heap/syncscan.c              |   5 +-
 src/backend/access/heap/tuptoaster.c            |  18 +-
 src/backend/access/nbtree/nbtutils.c            |   4 +-
 src/backend/access/transam/multixact.c          |   5 +-
 src/backend/access/transam/twophase.c           |   9 +-
 src/backend/access/transam/xact.c               |  29 +-
 src/backend/access/transam/xlog.c               | 299 ++++-----------
 src/backend/bootstrap/bootparse.y               |  17 +-
 src/backend/bootstrap/bootscanner.l             |   3 -
 src/backend/bootstrap/bootstrap.c               |  56 ++-
 src/backend/catalog/Catalog.pm                  |  24 +-
 src/backend/catalog/genbki.pl                   |  65 +---
 src/backend/catalog/namespace.c                 |  13 +-
 src/backend/catalog/objectaddress.c             |  10 +-
 src/backend/catalog/toasting.c                  |   2 +-
 src/backend/commands/async.c                    |  22 +-
 src/backend/commands/event_trigger.c            |  37 +-
 src/backend/commands/explain.c                  |  19 +-
 src/backend/commands/prepare.c                  |   5 +-
 src/backend/commands/sequence.c                 |  23 --
 src/backend/commands/tablecmds.c                |  26 +-
 src/backend/commands/tablespace.c               |   2 +-
 src/backend/commands/trigger.c                  |  10 +-
 src/backend/commands/typecmds.c                 | 127 +++++++
 src/backend/executor/execQual.c                 | 132 +++----
 src/backend/executor/execScan.c                 |   4 +
 src/backend/executor/functions.c                |   6 +-
 src/backend/executor/nodeAgg.c                  |  14 +-
 src/backend/executor/nodeCustom.c               |  38 +-
 src/backend/executor/nodeForeignscan.c          |  34 +-
 src/backend/executor/nodeHash.c                 |   2 +-
 src/backend/executor/nodeSubplan.c              |   4 +-
 src/backend/executor/spi.c                      |   5 +-
 src/backend/foreign/foreign.c                   |  32 +-
 src/backend/lib/pairingheap.c                   |  59 ---
 src/backend/libpq/auth.c                        |   3 +-
 src/backend/libpq/be-secure-openssl.c           | 104 +++++-
 src/backend/libpq/be-secure.c                   | 143 ++++---
 src/backend/libpq/hba.c                         |  42 ++-
 src/backend/libpq/ip.c                          |  73 ++++
 src/backend/libpq/pqcomm.c                      |   3 +-
 src/backend/main/main.c                         |   6 +
 src/backend/nodes/copyfuncs.c                   |   8 +-
 src/backend/nodes/equalfuncs.c                  |   3 -
 src/backend/nodes/nodeFuncs.c                   |  18 +-
 src/backend/nodes/outfuncs.c                    |  38 +-
 src/backend/nodes/params.c                      |   5 +-
 src/backend/nodes/readfuncs.c                   |   2 -
 src/backend/optimizer/README                    |  34 +-
 src/backend/optimizer/path/costsize.c           |   6 +-
 src/backend/optimizer/path/joinpath.c           |  75 ++--
 src/backend/optimizer/plan/createplan.c         |  78 +++-
 src/backend/optimizer/plan/planagg.c            |   1 -
 src/backend/optimizer/plan/planner.c            |  23 +-
 src/backend/optimizer/plan/setrefs.c            |  63 +++-
 src/backend/optimizer/plan/subselect.c          |   8 +-
 src/backend/optimizer/prep/prepqual.c           |   2 -
 src/backend/optimizer/prep/prepsecurity.c       |  37 +-
 src/backend/optimizer/prep/prepunion.c          |   2 +-
 src/backend/optimizer/util/clauses.c            |   3 -
 src/backend/optimizer/util/plancat.c            |  10 +-
 src/backend/optimizer/util/relnode.c            |  13 +
 src/backend/parser/gram.y                       | 155 +++-----
 src/backend/parser/parse_clause.c               |   9 +-
 src/backend/parser/parse_expr.c                 | 278 ++++++--------
 src/backend/parser/parse_utilcmd.c              |  11 -
 src/backend/parser/parser.c                     | 105 +++---
 src/backend/postmaster/checkpointer.c           |   8 +-
 src/backend/postmaster/pgstat.c                 | 110 +-----
 src/backend/postmaster/syslogger.c              |   4 +-
 src/backend/replication/README                  |   2 +-
 src/backend/replication/basebackup.c            |  21 +-
 src/backend/replication/logical/decode.c        |  30 +-
 src/backend/replication/logical/reorderbuffer.c |  26 +-
 src/backend/rewrite/rewriteHandler.c            |   2 +-
 src/backend/rewrite/rewriteManip.c              |   1 -
 src/backend/storage/buffer/bufmgr.c             |  14 -
 src/backend/storage/buffer/freelist.c           |   2 +-
 src/backend/storage/ipc/pmsignal.c              |   2 +-
 src/backend/storage/ipc/procarray.c             |   7 +-
 src/backend/storage/ipc/sinvaladt.c             |  13 +-
 src/backend/storage/large_object/inv_api.c      |  16 +-
 src/backend/tcop/postgres.c                     |   5 +-
 src/backend/tcop/utility.c                      |  64 +---
 src/backend/utils/Gen_fmgrtab.pl                |   2 +-
 src/backend/utils/adt/array_userfuncs.c         | 244 +++++-------
 src/backend/utils/adt/arrayfuncs.c              | 264 +++++--------
 src/backend/utils/adt/domains.c                 |  78 ++--
 src/backend/utils/adt/geo_ops.c                 |  22 +-
 src/backend/utils/adt/json.c                    |  43 ++-
 src/backend/utils/adt/jsonb.c                   |  81 ++--
 src/backend/utils/adt/jsonfuncs.c               |  18 +-
 src/backend/utils/adt/numeric.c                 |  10 +-
 src/backend/utils/adt/pgstatfuncs.c             |   8 -
 src/backend/utils/adt/rowtypes.c                |  52 +--
 src/backend/utils/adt/ruleutils.c               |  43 +--
 src/backend/utils/adt/timestamp.c               | 128 -------
 src/backend/utils/adt/trigfuncs.c               |   6 +-
 src/backend/utils/adt/tsgistidx.c               |   2 +-
 src/backend/utils/adt/tsrank.c                  |  28 +-
 src/backend/utils/adt/tsvector_op.c             |   2 +-
 src/backend/utils/adt/txid.c                    |   3 +-
 src/backend/utils/adt/xml.c                     |   2 +-
 src/backend/utils/cache/catcache.c              |   2 +-
 src/backend/utils/cache/inval.c                 |  12 +-
 src/backend/utils/cache/typcache.c              | 407 +-------------------
 src/backend/utils/fmgr/dfmgr.c                  |  11 +-
 src/backend/utils/misc/guc.c                    | 473 ++++++++++++------------
 src/backend/utils/misc/postgresql.conf.sample   |   5 +-
 src/backend/utils/mmgr/README                   | 243 ++++++------
 src/backend/utils/mmgr/aset.c                   |  34 +-
 src/backend/utils/mmgr/mcxt.c                   |  99 ++---
 src/backend/utils/sort/logtape.c                |  14 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl    |  15 +-
 src/bin/pg_dump/compress_io.c                   |  38 +-
 src/bin/pg_dump/dumputils.c                     |   3 +-
 src/bin/pg_dump/dumputils.h                     |   2 +-
 src/bin/pg_dump/pg_backup_archiver.c            |  17 +-
 src/bin/pg_dump/pg_dump.c                       |  19 +-
 src/common/Makefile                             |   2 -
 src/include/access/brin_page.h                  |   7 +-
 src/include/access/gin_private.h                |  32 +-
 src/include/access/gist_private.h               |   7 +-
 src/include/access/heapam_xlog.h                |   2 +-
 src/include/access/htup_details.h               |  12 +-
 src/include/access/spgist_private.h             |  10 +-
 src/include/access/tuptoaster.h                 |   2 +-
 src/include/access/xact.h                       |   8 +-
 src/include/access/xlog.h                       |   9 +-
 src/include/bootstrap/bootstrap.h               |   6 +-
 src/include/c.h                                 |  10 +-
 src/include/catalog/catversion.h                |   2 +-
 src/include/catalog/genbki.h                    |   2 -
 src/include/catalog/namespace.h                 |   4 +-
 src/include/catalog/pg_authid.h                 |   2 -
 src/include/catalog/pg_description.h            |   2 +-
 src/include/catalog/pg_extension.h              |   4 +-
 src/include/catalog/pg_largeobject.h            |   2 +-
 src/include/catalog/pg_pltemplate.h             |   4 +-
 src/include/catalog/pg_proc.h                   |  20 +-
 src/include/catalog/pg_seclabel.h               |   4 +-
 src/include/catalog/pg_shdescription.h          |   2 +-
 src/include/catalog/pg_shseclabel.h             |   4 +-
 src/include/catalog/pg_trigger.h                |   2 +-
 src/include/commands/dbcommands.h               |  15 +
 src/include/commands/event_trigger.h            |   1 -
 src/include/commands/tablespace.h               |   2 +-
 src/include/commands/typecmds.h                 |   2 +
 src/include/executor/hashjoin.h                 |   2 +-
 src/include/foreign/fdwapi.h                    |  15 +
 src/include/lib/pairingheap.h                   |  11 -
 src/include/libpq/ip.h                          |   5 +
 src/include/libpq/libpq-be.h                    |   4 +-
 src/include/nodes/bitmapset.h                   |   4 +-
 src/include/nodes/execnodes.h                   |   5 +-
 src/include/nodes/memnodes.h                    |   8 +-
 src/include/nodes/params.h                      |   2 +-
 src/include/nodes/parsenodes.h                  |  10 +-
 src/include/nodes/plannodes.h                   |  23 +-
 src/include/nodes/primnodes.h                   |   3 +-
 src/include/nodes/relation.h                    |   2 +
 src/include/nodes/tidbitmap.h                   |   4 +-
 src/include/optimizer/paths.h                   |  13 +
 src/include/optimizer/planmain.h                |   2 +-
 src/include/parser/gramparse.h                  |   2 -
 src/include/pgstat.h                            |   8 +-
 src/include/pgtar.h                             |  10 +-
 src/include/port.h                              |   2 +
 src/include/postgres.h                          |   8 +-
 src/include/postmaster/syslogger.h              |   2 +-
 src/include/replication/reorderbuffer.h         |  10 +-
 src/include/replication/slot.h                  |   4 -
 src/include/replication/walsender_private.h     |   4 +-
 src/include/storage/bufpage.h                   |   2 +-
 src/include/storage/fsm_internals.h             |   2 +-
 src/include/storage/s_lock.h                    |   2 +-
 src/include/storage/standby.h                   |   4 +-
 src/include/tsearch/dicts/regis.h               |   2 +-
 src/include/tsearch/dicts/spell.h               |   6 +-
 src/include/tsearch/ts_type.h                   |  13 +-
 src/include/utils/array.h                       |  32 +-
 src/include/utils/catcache.h                    |   4 +-
 src/include/utils/datetime.h                    |   4 +-
 src/include/utils/geo_decls.h                   |   4 +-
 src/include/utils/guc.h                         |  24 +-
 src/include/utils/jsonb.h                       |   2 +-
 src/include/utils/memutils.h                    |   5 +-
 src/include/utils/palloc.h                      |  20 -
 src/include/utils/relmapper.h                   |   2 +-
 src/include/utils/timestamp.h                   |   2 -
 src/include/utils/typcache.h                    |  37 --
 src/include/utils/varbit.h                      |   3 +-
 src/interfaces/ecpg/ecpglib/data.c              |   8 +-
 src/interfaces/ecpg/ecpglib/extern.h            |   2 +-
 src/interfaces/ecpg/preproc/parse.pl            |   6 +-
 src/interfaces/ecpg/preproc/parser.c            | 116 +++---
 src/interfaces/libpq/fe-connect.c               |  60 +--
 src/interfaces/libpq/fe-exec.c                  |   3 +-
 src/interfaces/libpq/fe-misc.c                  |  27 +-
 src/interfaces/libpq/fe-secure-openssl.c        |   9 +-
 src/interfaces/libpq/libpq-int.h                |   4 +-
 src/pl/plperl/plperl.c                          |   2 +-
 src/pl/plpgsql/src/pl_exec.c                    | 333 ++++++++---------
 src/pl/plpgsql/src/pl_funcs.c                   |   2 +-
 src/pl/plpgsql/src/pl_gram.y                    |  42 +--
 src/pl/plpgsql/src/plpgsql.h                    |   3 +-
 src/port/Makefile                               |   2 -
 src/port/dirmod.c                               |  12 +-
 src/port/gettimeofday.c                         |  23 +-
 src/port/pgcheckdir.c                           |  23 +-
 src/port/tar.c                                  |  10 +-
 src/test/regress/expected/copy2.out             |  34 --
 src/test/regress/expected/domain.out            |  30 --
 src/test/regress/expected/event_trigger.out     |  19 -
 src/test/regress/expected/foreign_data.out      |   2 +-
 src/test/regress/expected/join.out              |  19 -
 src/test/regress/expected/json.out              |  24 --
 src/test/regress/expected/json_1.out            |  24 --
 src/test/regress/expected/jsonb.out             |  24 --
 src/test/regress/expected/jsonb_1.out           |  24 --
 src/test/regress/expected/object_address.out    |   6 +-
 src/test/regress/expected/opr_sanity.out        |   6 +-
 src/test/regress/expected/plpgsql.out           |  12 -
 src/test/regress/expected/rowsecurity.out       |  64 ++--
 src/test/regress/expected/rules.out             |  54 ---
 src/test/regress/expected/stats.out             |  74 +---
 src/test/regress/expected/updatable_views.out   | 264 ++++++-------
 src/test/regress/expected/with.out              |  15 -
 src/test/regress/pg_regress.c                   |   2 +-
 src/test/regress/sql/copy2.sql                  |  19 -
 src/test/regress/sql/domain.sql                 |  27 --
 src/test/regress/sql/event_trigger.sql          |  19 -
 src/test/regress/sql/join.sql                   |   9 -
 src/test/regress/sql/json.sql                   |   6 -
 src/test/regress/sql/jsonb.sql                  |   6 -
 src/test/regress/sql/plpgsql.sql                |  10 +-
 src/test/regress/sql/rules.sql                  |  18 -
 src/test/regress/sql/stats.sql                  |  70 +---
 src/test/regress/sql/with.sql                   |   5 -
 src/test/ssl/Makefile                           |   2 +-
 src/test/ssl/ssl/both-cas-1.crt                 |  50 +--
 src/test/ssl/ssl/both-cas-2.crt                 |  50 +--
 src/test/ssl/ssl/client-revoked.crt             |  16 +-
 src/test/ssl/ssl/client-revoked.key             |  26 +-
 src/test/ssl/ssl/client.crl                     |  12 +-
 src/test/ssl/ssl/client.crt                     |  16 +-
 src/test/ssl/ssl/client.key                     |  26 +-
 src/test/ssl/ssl/client_ca.crt                  |  16 +-
 src/test/ssl/ssl/client_ca.key                  |  26 +-
 src/test/ssl/ssl/root+client.crl                |  22 +-
 src/test/ssl/ssl/root+client_ca.crt             |  34 +-
 src/test/ssl/ssl/root+server.crl                |  22 +-
 src/test/ssl/ssl/root+server_ca.crt             |  34 +-
 src/test/ssl/ssl/root.crl                       |  10 +-
 src/test/ssl/ssl/root_ca.crt                    |  18 +-
 src/test/ssl/ssl/root_ca.key                    |  26 +-
 src/test/ssl/ssl/server-cn-and-alt-names.crt    |  18 +-
 src/test/ssl/ssl/server-cn-and-alt-names.key    |  26 +-
 src/test/ssl/ssl/server-cn-only.crt             |  16 +-
 src/test/ssl/ssl/server-cn-only.key             |  26 +-
 src/test/ssl/ssl/server-multiple-alt-names.crt  |  16 +-
 src/test/ssl/ssl/server-multiple-alt-names.key  |  26 +-
 src/test/ssl/ssl/server-no-names.crt            |  14 +-
 src/test/ssl/ssl/server-no-names.key            |  26 +-
 src/test/ssl/ssl/server-revoked.crt             |  16 +-
 src/test/ssl/ssl/server-revoked.key             |  26 +-
 src/test/ssl/ssl/server-single-alt-name.crt     |  14 +-
 src/test/ssl/ssl/server-single-alt-name.key     |  26 +-
 src/test/ssl/ssl/server-ss.crt                  |  16 +-
 src/test/ssl/ssl/server-ss.key                  |  26 +-
 src/test/ssl/ssl/server.crl                     |  12 +-
 src/test/ssl/ssl/server_ca.crt                  |  16 +-
 src/test/ssl/ssl/server_ca.key                  |  26 +-
 src/tools/msvc/Solution.pm                      |  14 +-
 src/tools/msvc/VSObjectFactory.pm               |  28 +-
 310 files changed, 3810 insertions(+), 5469 deletions(-)

diff --git a/contrib/chkpass/chkpass.c b/contrib/chkpass/chkpass.c
index 9425c08..283ad9a 100644
--- a/contrib/chkpass/chkpass.c
+++ b/contrib/chkpass/chkpass.c
@@ -65,7 +65,7 @@ chkpass_in(PG_FUNCTION_ARGS)
 	/* special case to let us enter encrypted passwords */
 	if (*str == ':')
 	{
-		result = (chkpass *) palloc0(sizeof(chkpass));
+		result = (chkpass *) palloc(sizeof(chkpass));
 		strlcpy(result->password, str + 1, 13 + 1);
 		PG_RETURN_POINTER(result);
 	}
@@ -75,7 +75,7 @@ chkpass_in(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_DATA_EXCEPTION),
 				 errmsg("password \"%s\" is weak", str)));
 
-	result = (chkpass *) palloc0(sizeof(chkpass));
+	result = (chkpass *) palloc(sizeof(chkpass));
 
 	mysalt[0] = salt_chars[random() & 0x3f];
 	mysalt[1] = salt_chars[random() & 0x3f];
@@ -107,7 +107,7 @@ chkpass_out(PG_FUNCTION_ARGS)
 
 	result = (char *) palloc(16);
 	result[0] = ':';
-	strlcpy(result + 1, password->password, 15);
+	strcpy(result + 1, password->password);
 
 	PG_RETURN_CSTRING(result);
 }
diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h
index 719e43d..5d44e11 100644
--- a/contrib/cube/cubedata.h
+++ b/contrib/cube/cubedata.h
@@ -23,10 +23,11 @@ typedef struct NDBOX
 	unsigned int header;
 
 	/*
-	 * The lower left coordinates for each dimension come first, followed by
-	 * upper right coordinates unless the point flag is set.
+	 * Variable length array. The lower left coordinates for each dimension
+	 * come first, followed by upper right coordinates unless the point flag
+	 * is set.
 	 */
-	double		x[FLEXIBLE_ARRAY_MEMBER];
+	double		x[1];
 } NDBOX;
 
 #define POINT_BIT			0x80000000
@@ -40,9 +41,9 @@ typedef struct NDBOX
 #define LL_COORD(cube, i) ( (cube)->x[i] )
 #define UR_COORD(cube, i) ( IS_POINT(cube) ? (cube)->x[i] : (cube)->x[(i) + DIM(cube)] )
 
-#define POINT_SIZE(_dim)	(offsetof(NDBOX, x) + sizeof(double)*(_dim))
-#define CUBE_SIZE(_dim)		(offsetof(NDBOX, x) + sizeof(double)*(_dim)*2)
+#define POINT_SIZE(_dim) (offsetof(NDBOX, x[0]) + sizeof(double)*(_dim))
+#define CUBE_SIZE(_dim) (offsetof(NDBOX, x[0]) + sizeof(double)*(_dim)*2)
 
-#define DatumGetNDBOX(x)	((NDBOX *) PG_DETOAST_DATUM(x))
-#define PG_GETARG_NDBOX(x)	DatumGetNDBOX(PG_GETARG_DATUM(x))
+#define DatumGetNDBOX(x)	((NDBOX*)DatumGetPointer(x))
+#define PG_GETARG_NDBOX(x)	DatumGetNDBOX( PG_DETOAST_DATUM(PG_GETARG_DATUM(x)) )
 #define PG_RETURN_NDBOX(x)	PG_RETURN_POINTER(x)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 4368897..d569760 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -932,7 +932,7 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 		int			tuple_width;
 
 		tuple_width = MAXALIGN(baserel->width) +
-			MAXALIGN(SizeofHeapTupleHeader);
+			MAXALIGN(sizeof(HeapTupleHeaderData));
 		ntuples = clamp_row_est((double) stat_buf.st_size /
 								(double) tuple_width);
 	}
diff --git a/contrib/hstore/hstore_gist.c b/contrib/hstore/hstore_gist.c
index f375f5d..54e96fc 100644
--- a/contrib/hstore/hstore_gist.c
+++ b/contrib/hstore/hstore_gist.c
@@ -41,7 +41,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		flag;
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];
 } GISTTYPE;
 
 #define ALLISTRUE		0x04
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 7d89867..079f662 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -747,7 +747,7 @@ typedef struct RecordIOData
 	Oid			record_type;
 	int32		record_typmod;
 	int			ncolumns;
-	ColumnIOData columns[FLEXIBLE_ARRAY_MEMBER];
+	ColumnIOData columns[1];	/* VARIABLE LENGTH ARRAY */
 } RecordIOData;
 
 PG_FUNCTION_INFO_V1(hstore_from_record);
@@ -805,8 +805,8 @@ hstore_from_record(PG_FUNCTION_ARGS)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordIOData, columns) +
-							   ncolumns * sizeof(ColumnIOData));
+							   sizeof(RecordIOData) - sizeof(ColumnIOData)
+							   + ncolumns * sizeof(ColumnIOData));
 		my_extra = (RecordIOData *) fcinfo->flinfo->fn_extra;
 		my_extra->record_type = InvalidOid;
 		my_extra->record_typmod = 0;
@@ -816,8 +816,8 @@ hstore_from_record(PG_FUNCTION_ARGS)
 		my_extra->record_typmod != tupTypmod)
 	{
 		MemSet(my_extra, 0,
-			   offsetof(RecordIOData, columns) +
-			   ncolumns * sizeof(ColumnIOData));
+			   sizeof(RecordIOData) - sizeof(ColumnIOData)
+			   + ncolumns * sizeof(ColumnIOData));
 		my_extra->record_type = tupType;
 		my_extra->record_typmod = tupTypmod;
 		my_extra->ncolumns = ncolumns;
@@ -990,8 +990,8 @@ hstore_populate_record(PG_FUNCTION_ARGS)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordIOData, columns) +
-							   ncolumns * sizeof(ColumnIOData));
+							   sizeof(RecordIOData) - sizeof(ColumnIOData)
+							   + ncolumns * sizeof(ColumnIOData));
 		my_extra = (RecordIOData *) fcinfo->flinfo->fn_extra;
 		my_extra->record_type = InvalidOid;
 		my_extra->record_typmod = 0;
@@ -1001,8 +1001,8 @@ hstore_populate_record(PG_FUNCTION_ARGS)
 		my_extra->record_typmod != tupTypmod)
 	{
 		MemSet(my_extra, 0,
-			   offsetof(RecordIOData, columns) +
-			   ncolumns * sizeof(ColumnIOData));
+			   sizeof(RecordIOData) - sizeof(ColumnIOData)
+			   + ncolumns * sizeof(ColumnIOData));
 		my_extra->record_type = tupType;
 		my_extra->record_typmod = tupTypmod;
 		my_extra->ncolumns = ncolumns;
diff --git a/contrib/intarray/_int.h b/contrib/intarray/_int.h
index d524f0f..7f93206 100644
--- a/contrib/intarray/_int.h
+++ b/contrib/intarray/_int.h
@@ -73,7 +73,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		flag;
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];
 } GISTTYPE;
 
 #define ALLISTRUE		0x04
@@ -133,7 +133,7 @@ typedef struct QUERYTYPE
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		size;			/* number of ITEMs */
-	ITEM		items[FLEXIBLE_ARRAY_MEMBER];
+	ITEM		items[1];		/* variable length array */
 } QUERYTYPE;
 
 #define HDRSIZEQT	offsetof(QUERYTYPE, items)
diff --git a/contrib/intarray/_int_gist.c b/contrib/intarray/_int_gist.c
index 876a7b9..53abcc4 100644
--- a/contrib/intarray/_int_gist.c
+++ b/contrib/intarray/_int_gist.c
@@ -416,7 +416,9 @@ g_int_picksplit(PG_FUNCTION_ARGS)
 			size_waste = size_union - size_inter;
 
 			pfree(union_d);
-			pfree(inter_d);
+
+			if (inter_d != (ArrayType *) NULL)
+				pfree(inter_d);
 
 			/*
 			 * are these a more promising split that what we've already seen?
@@ -515,8 +517,10 @@ g_int_picksplit(PG_FUNCTION_ARGS)
 		/* pick which page to add it to */
 		if (size_alpha - size_l < size_beta - size_r + WISH_F(v->spl_nleft, v->spl_nright, 0.01))
 		{
-			pfree(datum_l);
-			pfree(union_dr);
+			if (datum_l)
+				pfree(datum_l);
+			if (union_dr)
+				pfree(union_dr);
 			datum_l = union_dl;
 			size_l = size_alpha;
 			*left++ = i;
@@ -524,8 +528,10 @@ g_int_picksplit(PG_FUNCTION_ARGS)
 		}
 		else
 		{
-			pfree(datum_r);
-			pfree(union_dl);
+			if (datum_r)
+				pfree(datum_r);
+			if (union_dl)
+				pfree(union_dl);
 			datum_r = union_dr;
 			size_r = size_beta;
 			*right++ = i;
diff --git a/contrib/ltree/ltree.h b/contrib/ltree/ltree.h
index c604357..1b1305b 100644
--- a/contrib/ltree/ltree.h
+++ b/contrib/ltree/ltree.h
@@ -10,7 +10,7 @@
 typedef struct
 {
 	uint16		len;
-	char		name[FLEXIBLE_ARRAY_MEMBER];
+	char		name[1];
 } ltree_level;
 
 #define LEVEL_HDRSIZE	(offsetof(ltree_level,name))
@@ -20,7 +20,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	uint16		numlevel;
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];
 } ltree;
 
 #define LTREE_HDRSIZE	MAXALIGN( offsetof(ltree, data) )
@@ -34,7 +34,7 @@ typedef struct
 	int32		val;
 	uint16		len;
 	uint8		flag;
-	char		name[FLEXIBLE_ARRAY_MEMBER];
+	char		name[1];
 } lquery_variant;
 
 #define LVAR_HDRSIZE   MAXALIGN(offsetof(lquery_variant, name))
@@ -51,7 +51,7 @@ typedef struct
 	uint16		numvar;
 	uint16		low;
 	uint16		high;
-	char		variants[FLEXIBLE_ARRAY_MEMBER];
+	char		variants[1];
 } lquery_level;
 
 #define LQL_HDRSIZE MAXALIGN( offsetof(lquery_level,variants) )
@@ -72,7 +72,7 @@ typedef struct
 	uint16		numlevel;
 	uint16		firstgood;
 	uint16		flag;
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];
 } lquery;
 
 #define LQUERY_HDRSIZE	 MAXALIGN( offsetof(lquery, data) )
@@ -107,7 +107,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		size;
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];
 } ltxtquery;
 
 #define HDRSIZEQT		MAXALIGN(VARHDRSZ + sizeof(int32))
@@ -208,7 +208,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	uint32		flag;
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];
 } ltree_gist;
 
 #define LTG_ONENODE 0x01
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 8d1666c..c8876f3 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -149,7 +149,7 @@ heap_page_items(PG_FUNCTION_ARGS)
 		 * many other ways, but at least we won't crash.
 		 */
 		if (ItemIdHasStorage(id) &&
-			lp_len >= MinHeapTupleSize &&
+			lp_len >= sizeof(HeapTupleHeader) &&
 			lp_offset == MAXALIGN(lp_offset) &&
 			lp_offset + lp_len <= raw_page_size)
 		{
@@ -169,19 +169,18 @@ heap_page_items(PG_FUNCTION_ARGS)
 			values[10] = UInt8GetDatum(tuphdr->t_hoff);
 
 			/*
-			 * We already checked that the item is completely within the raw
-			 * page passed to us, with the length given in the line pointer.
-			 * Let's check that t_hoff doesn't point over lp_len, before using
-			 * it to access t_bits and oid.
+			 * We already checked that the item as is completely within the
+			 * raw page passed to us, with the length given in the line
+			 * pointer.. Let's check that t_hoff doesn't point over lp_len,
+			 * before using it to access t_bits and oid.
 			 */
-			if (tuphdr->t_hoff >= SizeofHeapTupleHeader &&
-				tuphdr->t_hoff <= lp_len &&
-				tuphdr->t_hoff == MAXALIGN(tuphdr->t_hoff))
+			if (tuphdr->t_hoff >= sizeof(HeapTupleHeader) &&
+				tuphdr->t_hoff <= lp_len)
 			{
 				if (tuphdr->t_infomask & HEAP_HASNULL)
 				{
 					bits_len = tuphdr->t_hoff -
-						offsetof(HeapTupleHeaderData, t_bits);
+						(((char *) tuphdr->t_bits) -((char *) tuphdr));
 
 					values[11] = CStringGetTextDatum(
 								 bits_to_text(tuphdr->t_bits, bits_len * 8));
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 38c136f..1dada09 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -192,7 +192,7 @@ page_header(PG_FUNCTION_ARGS)
 	 * Check that enough data was supplied, so that we don't try to access
 	 * fields outside the supplied buffer.
 	 */
-	if (raw_page_size < SizeOfPageHeaderData)
+	if (raw_page_size < sizeof(PageHeaderData))
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("input page too small (%d bytes)", raw_page_size)));
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index f030558..ed649b8 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -63,7 +63,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	uint8		flag;
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];
 } TRGM;
 
 #define TRGMHDRSIZE		  (VARHDRSZ + sizeof(uint8))
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..d76e739 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -519,7 +519,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		{
 			baserel->pages = 10;
 			baserel->tuples =
-				(10 * BLCKSZ) / (baserel->width + MAXALIGN(SizeofHeapTupleHeader));
+				(10 * BLCKSZ) / (baserel->width + sizeof(HeapTupleHeaderData));
 		}
 
 		/* Estimate baserel size as best we can with local statistics. */
diff --git a/contrib/spi/timetravel.c b/contrib/spi/timetravel.c
index 0699438..a37cbee 100644
--- a/contrib/spi/timetravel.c
+++ b/contrib/spi/timetravel.c
@@ -35,10 +35,10 @@ static int	nPlans = 0;
 typedef struct _TTOffList
 {
 	struct _TTOffList *next;
-	char		name[FLEXIBLE_ARRAY_MEMBER];
+	char		name[1];
 } TTOffList;
 
-static TTOffList *TTOff = NULL;
+static TTOffList TTOff = {NULL, {0}};
 
 static int	findTTStatus(char *name);
 static EPlan *find_plan(char *ident, EPlan **eplan, int *nplans);
@@ -428,11 +428,10 @@ set_timetravel(PG_FUNCTION_ARGS)
 	char	   *d;
 	char	   *s;
 	int32		ret;
-	TTOffList  *prev,
+	TTOffList  *p,
 			   *pp;
 
-	prev = NULL;
-	for (pp = TTOff; pp; prev = pp, pp = pp->next)
+	for (pp = (p = &TTOff)->next; pp; pp = (p = pp)->next)
 	{
 		if (namestrcmp(relname, pp->name) == 0)
 			break;
@@ -443,10 +442,7 @@ set_timetravel(PG_FUNCTION_ARGS)
 		if (on != 0)
 		{
 			/* turn ON */
-			if (prev)
-				prev->next = pp->next;
-			else
-				TTOff = pp->next;
+			p->next = pp->next;
 			free(pp);
 		}
 		ret = 0;
@@ -460,18 +456,15 @@ set_timetravel(PG_FUNCTION_ARGS)
 			s = rname = DatumGetCString(DirectFunctionCall1(nameout, NameGetDatum(relname)));
 			if (s)
 			{
-				pp = malloc(offsetof(TTOffList, name) +strlen(rname) + 1);
+				pp = malloc(sizeof(TTOffList) + strlen(rname));
 				if (pp)
 				{
 					pp->next = NULL;
+					p->next = pp;
 					d = pp->name;
 					while (*s)
 						*d++ = tolower((unsigned char) *s++);
 					*d = '\0';
-					if (prev)
-						prev->next = pp;
-					else
-						TTOff = pp;
 				}
 				pfree(rname);
 			}
@@ -493,7 +486,7 @@ get_timetravel(PG_FUNCTION_ARGS)
 	Name		relname = PG_GETARG_NAME(0);
 	TTOffList  *pp;
 
-	for (pp = TTOff; pp; pp = pp->next)
+	for (pp = TTOff.next; pp; pp = pp->next)
 	{
 		if (namestrcmp(relname, pp->name) == 0)
 			PG_RETURN_INT32(0);
@@ -506,7 +499,7 @@ findTTStatus(char *name)
 {
 	TTOffList  *pp;
 
-	for (pp = TTOff; pp; pp = pp->next)
+	for (pp = TTOff.next; pp; pp = pp->next)
 		if (pg_strcasecmp(name, pp->name) == 0)
 			return 0;
 	return 1;
diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index af6d8d1..aaf500a 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -75,12 +75,9 @@
      <optional><literal>without_oids</></optional>
      <optional><literal>rowtype_oid</> <replaceable>oid</></optional>
      (<replaceable class="parameter">name1</replaceable> =
-     <replaceable class="parameter">type1</replaceable>
-     <optional>FORCE NOT NULL | FORCE NULL </optional> <optional>,
-     <replaceable class="parameter">name2</replaceable> =
-     <replaceable class="parameter">type2</replaceable>
-     <optional>FORCE NOT NULL | FORCE NULL </optional>,
-     ...</optional>)
+     <replaceable class="parameter">type1</replaceable> <optional>,
+     <replaceable class="parameter">name2</replaceable> = <replaceable
+     class="parameter">type2</replaceable>, ...</optional>)
     </term>
 
     <listitem>
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index d27dd49..7704f73 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -229,15 +229,14 @@ hostnossl  <replaceable>database</replaceable>  <replaceable>user</replaceable>
      <term><replaceable>address</replaceable></term>
      <listitem>
       <para>
-       Specifies the client machine address(es) that this record
+       Specifies the client machine addresses that this record
        matches.  This field can contain either a host name, an IP
        address range, or one of the special key words mentioned below.
       </para>
 
       <para>
-       An IP address range is specified using standard numeric notation
-       for the range's starting address, then a slash (<literal>/</literal>)
-       and a <acronym>CIDR</> mask length.  The mask
+       An IP address is specified in standard dotted decimal
+       notation with a <acronym>CIDR</> mask length.  The mask
        length indicates the number of high-order bits of the client
        IP address that must match.  Bits to the right of this should
        be zero in the given IP address.
@@ -246,27 +245,25 @@ hostnossl  <replaceable>database</replaceable>  <replaceable>user</replaceable>
       </para>
 
       <para>
-       Typical examples of an IPv4 address range specified this way are
+       Typical examples of an IP address range specified this way are
        <literal>172.20.143.89/32</literal> for a single host, or
        <literal>172.20.143.0/24</literal> for a small network, or
        <literal>10.6.0.0/16</literal> for a larger one.
-       An IPv6 address range might look like <literal>::1/128</literal>
-       for a single host (in this case the IPv6 loopback address) or
-       <literal>fe80::7a31:c1ff:0000:0000/96</literal> for a small
-       network.
        <literal>0.0.0.0/0</literal> represents all
-       IPv4 addresses, and <literal>::0/0</literal> represents
+       IPv4 addresses, and <literal>::/0</literal> represents
        all IPv6 addresses.
-       To specify a single host, use a mask length of 32 for IPv4 or
+       To specify a single host, use a CIDR mask of 32 for IPv4 or
        128 for IPv6.  In a network address, do not omit trailing zeroes.
       </para>
 
       <para>
-       An entry given in IPv4 format will match only IPv4 connections,
-       and an entry given in IPv6 format will match only IPv6 connections,
-       even if the represented address is in the IPv4-in-IPv6 range.
-       Note that entries in IPv6 format will be rejected if the system's
-       C library does not have support for IPv6 addresses.
+       An IP address given in IPv4 format will match IPv6 connections that
+       have the corresponding address, for example <literal>127.0.0.1</>
+       will match the IPv6 address <literal>::ffff:127.0.0.1</>.  An entry
+       given in IPv6 format will match only IPv6 connections, even if the
+       represented address is in the IPv4-in-IPv6 range.  Note that entries
+       in IPv6 format will be rejected if the system's C library does not have
+       support for IPv6 addresses.
       </para>
 
       <para>
@@ -278,7 +275,7 @@ hostnossl  <replaceable>database</replaceable>  <replaceable>user</replaceable>
 
       <para>
        If a host name is specified (anything that is not an IP address
-       range or a special key word is treated as a host name),
+       or a special key word is treated as a host name),
        that name is compared with the result of a reverse name
        resolution of the client's IP address (e.g., reverse DNS
        lookup, if DNS is used).  Host name comparisons are case
@@ -357,9 +354,8 @@ hostnossl  <replaceable>database</replaceable>  <replaceable>user</replaceable>
      <term><replaceable>IP-mask</replaceable></term>
      <listitem>
       <para>
-       These two fields can be used as an alternative to the
-       <replaceable>IP-address</><literal>/</><replaceable>mask-length</>
-       notation.  Instead of
+       These fields can be used as an alternative to the
+       <replaceable>CIDR-address</replaceable> notation. Instead of
        specifying the mask length, the actual mask is specified in a
        separate column. For example, <literal>255.0.0.0</> represents an IPv4
        CIDR mask length of 8, and <literal>255.255.255.255</> represents a
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9261e7f..6bcb106 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1325,7 +1325,7 @@ include_dir 'conf.d'
         40% of RAM to <varname>shared_buffers</varname> will work better than a
         smaller amount.  Larger settings for <varname>shared_buffers</varname>
         usually require a corresponding increase in
-        <varname>max_wal_size</varname>, in order to spread out the
+        <varname>checkpoint_segments</varname>, in order to spread out the
         process of writing large quantities of new or changed data over a
         longer period of time.
        </para>
@@ -2394,21 +2394,18 @@ include_dir 'conf.d'
      <title>Checkpoints</title>
 
     <variablelist>
-     <varlistentry id="guc-max-wal-size" xreflabel="max_wal_size">
-      <term><varname>max_wal_size</varname> (<type>integer</type>)
+     <varlistentry id="guc-checkpoint-segments" xreflabel="checkpoint_segments">
+      <term><varname>checkpoint_segments</varname> (<type>integer</type>)
       <indexterm>
-       <primary><varname>max_wal_size</> configuration parameter</primary>
+       <primary><varname>checkpoint_segments</> configuration parameter</primary>
       </indexterm>
       </term>
       <listitem>
        <para>
-        Maximum size to let the WAL grow to between automatic WAL
-        checkpoints. This is a soft limit; WAL size can exceed
-        <varname>max_wal_size</> under special circumstances, like
-        under heavy load, a failing <varname>archive_command</>, or a high
-        <varname>wal_keep_segments</> setting. The default is 128 MB.
-        Increasing this parameter can increase the amount of time needed for
-        crash recovery.
+        Maximum number of log file segments between automatic WAL
+        checkpoints (each segment is normally 16 megabytes). The default
+        is three segments.  Increasing this parameter can increase the
+        amount of time needed for crash recovery.
         This parameter can only be set in the <filename>postgresql.conf</>
         file or on the server command line.
        </para>
@@ -2461,7 +2458,7 @@ include_dir 'conf.d'
         Write a message to the server log if checkpoints caused by
         the filling of checkpoint segment files happen closer together
         than this many seconds (which suggests that
-        <varname>max_wal_size</> ought to be raised).  The default is
+        <varname>checkpoint_segments</> ought to be raised).  The default is
         30 seconds (<literal>30s</>).  Zero disables the warning.
         No warnings will be generated if <varname>checkpoint_timeout</varname>
         is less than <varname>checkpoint_warning</varname>.
@@ -2471,25 +2468,6 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-min-wal-size" xreflabel="min_wal_size">
-      <term><varname>min_wal_size</varname> (<type>integer</type>)
-      <indexterm>
-       <primary><varname>min_wal_size</> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        As long as WAL disk usage stays below this setting, old WAL files are
-        always recycled for future use at a checkpoint, rather than removed.
-        This can be used to ensure that enough WAL space is reserved to
-        handle spikes in WAL usage, for example when running large batch
-        jobs. The default is 80 MB.
-        This parameter can only be set in the <filename>postgresql.conf</>
-        file or on the server command line.
-       </para>
-      </listitem>
-     </varlistentry>
-
      </variablelist>
      </sect2>
      <sect2 id="runtime-config-wal-archiving">
@@ -3007,24 +2985,6 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-wal-retrieve-retry-interval" xreflabel="wal_retrieve_retry_interval">
-      <term><varname>wal_retrieve_retry_interval</varname> (<type>integer</type>)
-      <indexterm>
-       <primary><varname>wal_retrieve_retry_interval</> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Specify how long the standby server should wait when WAL data is not
-        available from any sources (streaming replication,
-        local <filename>pg_xlog</> or WAL archive) before retrying to
-        retrieve WAL data.  This parameter can only be set in the
-        <filename>postgresql.conf</> file or on the server command line.
-        The default value is 5 seconds. Units are milliseconds if not specified.
-       </para>
-      </listitem>
-     </varlistentry>
-
      </variablelist>
     </sect2>
    </sect1>
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..1d103f5
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,278 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan provider</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+
+ <para>
+  Prior to query execution, the PostgreSQL planner constructs a plan tree
+  that usually consists of built-in plan nodes (eg: SeqScan, HashJoin, etc).
+  The custom-scan interface allows extensions to provide a custom-scan plan
+  that implements its own logic, in addition to the built-in nodes, to scan
+  a relation or join relations. Once a custom-scan node is chosen by planner,
+  callback functions associated with this custom-scan node shall be invoked
+  during query execution. Custom-scan provider is responsible for returning
+  equivalent result set as built-in logic would, but it is free to scan or
+  join the target relations according to its own logic.
+  This chapter explains how to write a custom-scan provider.
+ </para>
+
+ <para>
+  The first thing custom-scan provider to do is adding alternative paths
+  to scan a relation (on the <literal>set_rel_pathlist_hook</>) or
+  to join relations (on the <literal>set_join_pathlist_hook</>).
+  It expects <literal>CustomPath</> node is added with estimated execution
+  cost and a set of callbacks defined at <literal>CustomPathMethods</>.
+  Both of hooks also give extensions enough information to construct
+  <literal>CustomPath</> node, like <literal>RelOptInfo</> of relations
+  to be scanned, joined or read as source of join. Custom-scan provider
+  is responsible to compute a reasonable cost estimation which is
+  comparable to built-in logics.
+ </para>
+
+ <para>
+  Once a custom-path got chosen by planner, custom-scan provider has to
+  populate a plan node according to the <literal>CustomPath</> node.
+  At this moment, <literal>CustomScan</> is the only node type that allows
+  to implement custom-logic towards any <literal>CustomPath</> node.
+  The <literal>CustomScan</> structure has two special fields to keep
+  private information; <literal>custom_exprs</> and <literal>custom_private</>.
+  The <literal>custom_exprs</> intends to save a couple of expression trees
+  that shall be updated on <filename>setrefs.c</> and <filename>subselect.c</>.
+  On the other hands, <literal>custom_private</> is expected to save really
+  private information nobody will touch except for the custom-scan provider
+  itself. A plan-tree, which contains custom-scan node, can be duplicated
+  using <literal>copyObject()</>, so all the data structure stored within
+  these two fields must be safe to <literal>copyObject()</>.
+ </para>
+
+ <para>
+  In case when extension implements its own logic to join relations, it looks
+  like a simple relation scan but on a pseudo materialized relation from
+  multiple source relations, from the standpoint of the core executor.
+  Custom-scan provider is expected to process relation join with its own
+  logic internally, then return a set of records according to the tuple
+  descriptor of the scan node.
+  <literal>CustomScan</> node that replaced a relations join is not
+  associated with a particular tangible relation, unlike simple scan case,
+  so extension needs to inform the core planner expected records type to be
+  fetched from this node.
+  What we should do here is, setting zero on the <literal>scanrelid</> and
+  a valid list of <literal>TargetEntry</> on the <literal>custom_ps_tlist</>
+  instead. These configuration informs the core planner this custom-scan
+  node is not associated with a particular physical table and expected
+  record type to be returned.
+ </para>
+
+ <para>
+  Once a plan-tree is moved to the executor, it has to construct plan-state
+  objects according to the supplied plan-node.
+  Custom-scan is not an exception. Executor invokes a callback to populate
+  <literal>CustomScanState</> node, if <literal>CustomScan</> node gets
+  found in the supplied plan-tree.
+  It does not have fields to save private information unlike
+  <literal>CustomScan</> node, because custom-scan provider can allocate
+  larger object than the bare <literal>CustomScanState</> to store various
+  private execution state.
+  It looks like a relationship of <literal>ScanState</> structure towards
+  <literal>PlanState</>; that expands scan specific fields towards generic
+  plan-state. In addition, custom-scan provider can expand fields on demand.
+  Once a CustomScanState gets constructed, BeginCustomScan is invoked during
+  executor initialization; ExecCustomScan is repeatedly called during
+  execution (returning a TupleTableSlot with each fetched record), then
+  EndCustomScan is invoked on cleanup of the executor.
+ </para>
+
+ <sect1 id="custom-scan-reference">
+  <title>Custom Scan Hooks and Callbacks</title>
+
+  <sect2 id="custom-scan-hooks">
+   <title>Custom Scan Hooks</title>
+   <para>
+    This hooks is invoked when the planner investigates the optimal way to
+    scan a particular relation. Extension can add alternative paths if it
+    can provide its own logic to scan towards the given scan and qualifiers.
+<programlisting>
+typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
+                                            RelOptInfo *rel,
+                                            Index rti,
+                                            RangeTblEntry *rte);
+extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+</programlisting>
+   </para>
+
+   <para>
+    This hook is invoked when the planner investigates the optimal combination
+    of relations join. Extension can add alternative paths that replaces the
+    relation join with its own logic. 
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+                                             RelOptInfo *joinrel,
+                                             RelOptInfo *outerrel,
+                                             RelOptInfo *innerrel,
+                                             List *restrictlist,
+                                             JoinType jointype,
+                                             SpecialJoinInfo *sjinfo,
+                                             SemiAntiJoinFactors *semifactors,
+                                             Relids param_source_rels,
+                                             Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-path-callbacks">
+   <title>Custom Path Callbacks</title>
+   <para>
+    A <literal>CustomPathMethods</> table contains a set of callbacks related
+    to <literal>CustomPath</> node. The core backend invokes these callbacks
+    during query planning.
+   </para>
+   <para>
+    This callback is invoked when the core backend tries to populate
+    <literal>CustomScan</> node according to the supplied
+    <literal>CustomPath</> node.
+    Custom-scan provider is responsible to allocate a <literal>CustomScan</>
+    node and initialize each fields of them.
+<programlisting>
+Plan *(*PlanCustomPath) (PlannerInfo *root,
+                         RelOptInfo *rel,
+                         CustomPath *best_path,
+                         List *tlist,
+                         List *clauses);
+</programlisting>
+   </para>
+   <para>
+    This optional callback will be invoked when <literal>nodeToString()</>
+    tries to create a text representation of <literal>CustomPath</> node.
+    A custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that expression nodes linked to
+    <literal>custom_private</> shall be transformed to text representation
+    by the core, so nothing to do by extension.
+<programlisting>
+void (*TextOutCustomPath) (StringInfo str,
+                           const CustomPath *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-callbacks">
+   <title>Custom Scan Callbacks</title>
+   <para>
+    A <literal>CustomScanMethods</> contains a set of callbacks related to
+    <literal>CustomScan</> node, then the core backend invokes these callbacks
+    during query planning and initialization of executor.
+   </para>
+   <para>
+    This callback shall be invoked when the core backend tries to populate
+    <literal>CustomScanState</> node according to the supplied
+    <literal>CustomScan</> node. The custom-scan provider is responsible to
+    allocate a <literal>CustomScanState</> (or its own data-type enhanced
+    from it), but no need to initialize the fields here, because
+    <literal>ExecInitCustomScan</> initializes the fields in
+    <literal>CustomScanState</>, then <literal>BeginCustomScan</> shall be
+    kicked on the end of executor initialization.
+<programlisting>
+Node *(*CreateCustomScanState) (CustomScan *cscan);
+</programlisting>
+   </para>
+   <para>
+    This optional callback shall be invoked when <literal>nodeToString()</>
+    tries to make text representation of <literal>CustomScan</> node.
+    Custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that it is not allowed to expand the data
+    structure of <literal>CustomScan</> node, so we usually don't need to
+    implement this callback.
+<programlisting>
+void (*TextOutCustomScan) (StringInfo str,
+                           const CustomScan *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-exec-callbacks">
+   <title>Custom Exec Callbacks</title>
+   <para>
+    A <literal>CustomExecMethods</> contains a set of callbacks related to
+    <literal>CustomScanState</> node, then the core backend invokes these
+    callbacks during query execution.
+   </para>
+   <para>
+    This callback allows a custom-scan provider to have final initialization
+    of the <literal>CustomScanState</> node.
+    The supplied <literal>CustomScanState</> node is partially initialized
+    according to either <literal>scanrelid</> or <literal>custom_ps_tlist</>
+    of <literal>CustomScan</> node. If the custom-scan provider wants to
+    apply additional initialization to the private fields, it can be done
+    by this callback.
+<programlisting>
+void (*BeginCustomScan) (CustomScanState *node,
+                         EState *estate,
+                         int eflags);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to produce the next tuple
+    of the relation scan. If any tuples, it should set it on the
+    <literal>ps_ResultTupleSlot</> then returns the tuple slot. Elsewhere,
+    <literal>NULL</> or empty slot shall be returned to inform end of the
+    relation scan.
+<programlisting>
+TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback allows a custom-scan provider to cleanup the
+    <literal>CustomScanState</> node. If it holds any private (and not
+    released automatically) resources on the supplied node, it can release
+    these resources prior to the cleanup of the common portion.
+<programlisting>
+void (*EndCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to rewind the current scan
+    position to the head of relation. Custom-scan provider is expected to
+    reset its internal state to restart the relation scan again.
+<programlisting>
+void (*ReScanCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to save the current
+    scan position on its internal state. It shall be able to restore the
+    position using <literal>RestrPosCustomScan</> callback. It shall be never
+    called unless <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*MarkPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to restore the
+    previous scan position that was saved by <literal>MarkPosCustomScan</>
+   callback. It shall be never called unless
+   <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*RestrPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback allows custom-scan provider to output additional
+    information on <command>EXPLAIN</> that involves custom-scan node.
+    Note that it can output common items; target-list, qualifiers, relation
+    to be scanned. So, it can be used when custom-scan provider wants to show
+    something others in addition to the items.
+<programlisting>
+void (*ExplainCustomScan) (CustomScanState *node,
+                           List *ancestors,
+                           ExplainState *es);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
+
diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
index f151eb7..156c463 100644
--- a/doc/src/sgml/event-trigger.sgml
+++ b/doc/src/sgml/event-trigger.sgml
@@ -36,9 +36,7 @@
 
    <para>
      The <literal>ddl_command_start</> event occurs just before the
-     execution of a <literal>CREATE</>, <literal>ALTER</>, <literal>DROP</>,
-     <literal>SECURITY LABEL</>,
-     <literal>COMMENT</>, <literal>GRANT</> or <literal>REVOKE</>
+     execution of a <literal>CREATE</>, <literal>ALTER</>, or <literal>DROP</>
      command.  No check whether the affected object exists or doesn't exist is
      performed before the event trigger fires.
      As an exception, however, this event does not occur for
@@ -68,11 +66,12 @@
 
    <para>
     The <literal>table_rewrite</> event occurs just before a table is
-    rewritten by some actions of the commands <literal>ALTER TABLE</> and
-    <literal>ALTER TYPE</>.  While other
+    rewritten by the command <literal>ALTER TABLE</literal>. While other
     control statements are available to rewrite a table,
     like <literal>CLUSTER</literal> and <literal>VACUUM</literal>,
-    the <literal>table_rewrite</> event is not triggered by them.
+    the <literal>table_rewrite</> event is currently only triggered by
+    the <literal>ALTER TABLE</literal> command, and only when that command
+    attempts to rewrite the table.
    </para>
 
    <para>
@@ -124,15 +123,14 @@
 
    <table id="event-trigger-by-command-tag">
      <title>Event Trigger Support by Command Tag</title>
-     <tgroup cols="6">
+     <tgroup cols="5">
       <thead>
        <row>
-        <entry>Command Tag</entry>
+        <entry>command tag</entry>
         <entry><literal>ddl_command_start</literal></entry>
         <entry><literal>ddl_command_end</literal></entry>
         <entry><literal>sql_drop</literal></entry>
         <entry><literal>table_rewrite</literal></entry>
-        <entry>Notes</entry>
        </row>
       </thead>
       <tbody>
@@ -142,7 +140,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER COLLATION</literal></entry>
@@ -150,7 +147,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER CONVERSION</literal></entry>
@@ -158,7 +154,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER DOMAIN</literal></entry>
@@ -166,7 +161,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER EXTENSION</literal></entry>
@@ -174,7 +168,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER FOREIGN DATA WRAPPER</literal></entry>
@@ -182,7 +175,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER FOREIGN TABLE</literal></entry>
@@ -190,7 +182,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER FUNCTION</literal></entry>
@@ -198,7 +189,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER LANGUAGE</literal></entry>
@@ -206,7 +196,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER OPERATOR</literal></entry>
@@ -214,7 +203,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER OPERATOR CLASS</literal></entry>
@@ -222,7 +210,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER OPERATOR FAMILY</literal></entry>
@@ -230,7 +217,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER POLICY</literal></entry>
@@ -238,7 +224,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER SCHEMA</literal></entry>
@@ -246,7 +231,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER SEQUENCE</literal></entry>
@@ -254,7 +238,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER SERVER</literal></entry>
@@ -262,7 +245,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER TABLE</literal></entry>
@@ -270,7 +252,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER TEXT SEARCH CONFIGURATION</literal></entry>
@@ -278,7 +259,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER TEXT SEARCH DICTIONARY</literal></entry>
@@ -286,7 +266,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER TEXT SEARCH PARSER</literal></entry>
@@ -294,7 +273,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER TEXT SEARCH TEMPLATE</literal></entry>
@@ -302,7 +280,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER TRIGGER</literal></entry>
@@ -310,15 +287,13 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER TYPE</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"></entry>
+        <entry align="center"><literal>-</literal></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER USER MAPPING</literal></entry>
@@ -326,7 +301,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>ALTER VIEW</literal></entry>
@@ -334,7 +308,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE AGGREGATE</literal></entry>
@@ -342,15 +315,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
-       </row>
-       <row>
-        <entry align="left"><literal>COMMENT</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"><literal>-</literal></entry>
-        <entry align="center"><literal>-</literal></entry>
-        <entry align="center">Only for local objects</entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE CAST</literal></entry>
@@ -358,7 +322,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE COLLATION</literal></entry>
@@ -366,7 +329,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE CONVERSION</literal></entry>
@@ -374,7 +336,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE DOMAIN</literal></entry>
@@ -382,7 +343,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE EXTENSION</literal></entry>
@@ -390,7 +350,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE FOREIGN DATA WRAPPER</literal></entry>
@@ -398,7 +357,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE FOREIGN TABLE</literal></entry>
@@ -406,7 +364,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE FUNCTION</literal></entry>
@@ -414,7 +371,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE INDEX</literal></entry>
@@ -422,7 +378,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE LANGUAGE</literal></entry>
@@ -430,7 +385,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE OPERATOR</literal></entry>
@@ -438,7 +392,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE OPERATOR CLASS</literal></entry>
@@ -446,7 +399,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE OPERATOR FAMILY</literal></entry>
@@ -454,7 +406,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE POLICY</literal></entry>
@@ -462,7 +413,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE RULE</literal></entry>
@@ -470,7 +420,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE SCHEMA</literal></entry>
@@ -478,7 +427,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE SEQUENCE</literal></entry>
@@ -486,7 +434,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE SERVER</literal></entry>
@@ -494,7 +441,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE TABLE</literal></entry>
@@ -502,7 +448,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE TABLE AS</literal></entry>
@@ -510,7 +455,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE TEXT SEARCH CONFIGURATION</literal></entry>
@@ -518,7 +462,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE TEXT SEARCH DICTIONARY</literal></entry>
@@ -526,7 +469,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE TEXT SEARCH PARSER</literal></entry>
@@ -534,7 +476,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE TEXT SEARCH TEMPLATE</literal></entry>
@@ -542,7 +483,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE TRIGGER</literal></entry>
@@ -550,7 +490,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE TYPE</literal></entry>
@@ -565,7 +504,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>CREATE VIEW</literal></entry>
@@ -573,7 +511,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP AGGREGATE</literal></entry>
@@ -581,7 +518,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP CAST</literal></entry>
@@ -589,7 +525,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP COLLATION</literal></entry>
@@ -597,7 +532,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP CONVERSION</literal></entry>
@@ -605,7 +539,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP DOMAIN</literal></entry>
@@ -613,7 +546,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP EXTENSION</literal></entry>
@@ -621,7 +553,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP FOREIGN DATA WRAPPER</literal></entry>
@@ -629,7 +560,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP FOREIGN TABLE</literal></entry>
@@ -637,7 +567,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP FUNCTION</literal></entry>
@@ -645,7 +574,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP INDEX</literal></entry>
@@ -653,7 +581,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP LANGUAGE</literal></entry>
@@ -661,7 +588,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP OPERATOR</literal></entry>
@@ -669,7 +595,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP OPERATOR CLASS</literal></entry>
@@ -677,7 +602,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP OPERATOR FAMILY</literal></entry>
@@ -685,7 +609,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP OWNED</literal></entry>
@@ -693,7 +616,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP POLICY</literal></entry>
@@ -701,7 +623,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP RULE</literal></entry>
@@ -709,7 +630,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP SCHEMA</literal></entry>
@@ -717,7 +637,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP SEQUENCE</literal></entry>
@@ -725,7 +644,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP SERVER</literal></entry>
@@ -733,7 +651,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP TABLE</literal></entry>
@@ -741,7 +658,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP TEXT SEARCH CONFIGURATION</literal></entry>
@@ -749,7 +665,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP TEXT SEARCH DICTIONARY</literal></entry>
@@ -757,7 +672,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP TEXT SEARCH PARSER</literal></entry>
@@ -765,7 +679,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP TEXT SEARCH TEMPLATE</literal></entry>
@@ -773,7 +686,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP TRIGGER</literal></entry>
@@ -781,7 +693,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP TYPE</literal></entry>
@@ -789,7 +700,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP USER MAPPING</literal></entry>
@@ -797,7 +707,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
        <row>
         <entry align="left"><literal>DROP VIEW</literal></entry>
@@ -805,15 +714,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
-       </row>
-       <row>
-        <entry align="left"><literal>GRANT</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"><literal>-</literal></entry>
-        <entry align="center"><literal>-</literal></entry>
-        <entry align="center">Only for local objects</entry>
        </row>
        <row>
         <entry align="left"><literal>IMPORT FOREIGN SCHEMA</literal></entry>
@@ -821,23 +721,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
-       </row>
-       <row>
-        <entry align="left"><literal>REVOKE</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"><literal>-</literal></entry>
-        <entry align="center"><literal>-</literal></entry>
-        <entry align="center">Only for local objects</entry>
-       </row>
-       <row>
-        <entry align="left"><literal>SECURITY LABEL</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"><literal>X</literal></entry>
-        <entry align="center"><literal>-</literal></entry>
-        <entry align="center"><literal>-</literal></entry>
-        <entry align="center">Only for local objects</entry>
        </row>
        <row>
         <entry align="left"><literal>SELECT INTO</literal></entry>
@@ -845,7 +728,6 @@
         <entry align="center"><literal>X</literal></entry>
         <entry align="center"><literal>-</literal></entry>
         <entry align="center"><literal>-</literal></entry>
-        <entry align="center"></entry>
        </row>
       </tbody>
      </tgroup>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index c1daa4b..d25d5c9 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -598,6 +598,60 @@ IsForeignRelUpdatable (Relation rel);
 
    </sect2>
 
+   <sect2>
+    <title>FDW Routines for remote join</title>
+    <para>
+<programlisting>
+void
+GetForeignJoinPath(PlannerInfo *root,
+                   RelOptInfo *joinrel,
+                   RelOptInfo *outerrel,
+                   RelOptInfo *innerrel,
+                   JoinType jointype,
+                   SpecialJoinInfo *sjinfo,
+                   SemiAntiJoinFactors *semifactors,
+                   List *restrictlist,
+                   Relids extra_lateral_rels);
+</programlisting>
+     Create possible access paths for a join of two foreign tables or
+     joined relations, but both of them needs to be managed with same
+     FDW driver.
+     This optional function is called during query planning.
+    </para>
+    <para>
+     This function allows FDW driver to add <literal>ForeignScan</> path
+     towards the supplied <literal>joinrel</>. From the standpoint of
+     query planner, it looks like scan-node is added for join-relation.
+     It means, <literal>ForeignScan</> path added instead of the built-in
+     local join logic has to generate tuples as if it scans on a joined
+     and materialized relations.
+    </para>
+    <para>
+     Usually, we expect FDW drivers issues a remote query that involves
+     tables join on remote side, then FDW driver fetches the joined result
+     on local side.
+     Unlike simple table scan, definition of slot descriptor of the joined
+     relations is determined on the fly, thus we cannot know its definition
+     from the system catalog.
+     So, FDW driver is responsible to introduce the query planner expected
+     form of the joined relations. In case when <literal>ForeignScan</>
+     replaced a relations join, <literal>scanrelid</> of the generated plan
+     node shall be zero, to mark this <literal>ForeignScan</> node is not
+     associated with a particular foreign tables.
+     Also, it need to construct pseudo scan tlist (<literal>fdw_ps_tlist</>)
+     to indicate expected tuple definition.
+    </para>
+    <para>
+     Once <literal>scanrelid</> equals zero, executor initializes the slot
+     for scan according to <literal>fdw_ps_tlist</>, but excludes junk
+     entries. This list is also used to solve the name of the original
+     relation and columns, so FDW can chains expression nodes which are
+     not run on local side actually, like a join clause to be executed on
+     the remote side, however, target-entries of them will have
+     <literal>resjunk=true</>.
+    </para>
+   </sect2>
+
    <sect2 id="fdw-callbacks-explain">
     <title>FDW Routines for <command>EXPLAIN</></title>
 
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index f03b72a..89fff77 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -93,6 +93,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index da2ed67..d57243a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10391,7 +10391,7 @@ table2-mapping
       </row>
       <row>
        <entry><para><literal>json_object(keys text[], values text[])</literal>
-          </para><para><literal>jsonb_object(keys text[], values text[])</literal>
+          </para><para><literal>json_object(keys text[], values text[])</literal>
        </para></entry>
        <entry>
          This form of <function>json_object</> takes keys and values pairwise from two separate
@@ -10505,12 +10505,6 @@ table2-mapping
   <indexterm>
    <primary>jsonb_to_recordset</primary>
   </indexterm>
-  <indexterm>
-   <primary>json_strip_nulls</primary>
-  </indexterm>
-  <indexterm>
-   <primary>jsonb_strip_nulls</primary>
-  </indexterm>
 
   <table id="functions-json-processing-table">
     <title>JSON Processing Functions</title>
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 34d4f1c..39aede4 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4503,14 +4503,7 @@ int PQflush(PGconn *conn);
   <para>
    After sending any command or data on a nonblocking connection, call
    <function>PQflush</function>.  If it returns 1, wait for the socket
-   to become read- or write-ready.  If it becomes write-ready, call
-   <function>PQflush</function> again.  If it becomes read-ready, call
-   <function>PQconsumeInput</function>, then call
-   <function>PQflush</function> again.  Repeat until
-   <function>PQflush</function> returns 0.  (It is necessary to check for
-   read-ready and drain the input with <function>PQconsumeInput</function>,
-   because the server can block trying to send us data, e.g. NOTICE
-   messages, and won't read our data until we read its.)  Once
+   to be write-ready and call it again; repeat until it returns 0.  Once
    <function>PQflush</function> returns 0, wait for the socket to be
    read-ready and then read the response as described above.
   </para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9eaf144..3ce7e80 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1710,14 +1710,6 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </row>
 
      <row>
-      <entry><literal><function>pg_stat_get_snapshot_timestamp()</function></literal><indexterm><primary>pg_stat_get_snapshot_timestamp</primary></indexterm></entry>
-      <entry><type>timestamp with time zone</type></entry>
-      <entry>
-       Returns the timestamp of the current statistics snapshot
-      </entry>
-     </row>
-
-     <row>
       <entry><literal><function>pg_stat_clear_snapshot()</function></literal><indexterm><primary>pg_stat_clear_snapshot</primary></indexterm></entry>
       <entry><type>void</type></entry>
       <entry>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index c73580e..5a087fb 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1328,19 +1328,19 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
    </para>
   </sect2>
 
-  <sect2 id="populate-max-wal-size">
-   <title>Increase <varname>max_wal_size</varname></title>
+  <sect2 id="populate-checkpoint-segments">
+   <title>Increase <varname>checkpoint_segments</varname></title>
 
    <para>
-    Temporarily increasing the <xref linkend="guc-max-wal-size">
-    configuration variable can also
+    Temporarily increasing the <xref
+    linkend="guc-checkpoint-segments"> configuration variable can also
     make large data loads faster.  This is because loading a large
     amount of data into <productname>PostgreSQL</productname> will
     cause checkpoints to occur more often than the normal checkpoint
     frequency (specified by the <varname>checkpoint_timeout</varname>
     configuration variable). Whenever a checkpoint occurs, all dirty
     pages must be flushed to disk. By increasing
-    <varname>max_wal_size</varname> temporarily during bulk
+    <varname>checkpoint_segments</varname> temporarily during bulk
     data loads, the number of checkpoints that are required can be
     reduced.
    </para>
@@ -1445,7 +1445,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
       <para>
        Set appropriate (i.e., larger than normal) values for
        <varname>maintenance_work_mem</varname> and
-       <varname>max_wal_size</varname>.
+       <varname>checkpoint_segments</varname>.
       </para>
      </listitem>
      <listitem>
@@ -1512,7 +1512,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
 
     So when loading a data-only dump, it is up to you to drop and recreate
     indexes and foreign keys if you wish to use those techniques.
-    It's still useful to increase <varname>max_wal_size</varname>
+    It's still useful to increase <varname>checkpoint_segments</varname>
     while loading the data, but don't bother increasing
     <varname>maintenance_work_mem</varname>; rather, you'd do that while
     manually recreating indexes and foreign keys afterwards.
@@ -1577,7 +1577,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
 
      <listitem>
       <para>
-       Increase <xref linkend="guc-max-wal-size"> and <xref
+       Increase <xref linkend="guc-checkpoint-segments"> and <xref
        linkend="guc-checkpoint-timeout"> ; this reduces the frequency
        of checkpoints, but increases the storage requirements of
        <filename>/pg_xlog</>.
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index a648a4c..e378d69 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -242,6 +242,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index f9e1297..e5d7992 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -329,17 +329,15 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
    to <literal>VARIABLE</literal>.  (Internally, this is represented
    by setting <literal>typlen</> to -1.)  The internal representation of all
    variable-length types must start with a 4-byte integer giving the total
-   length of this value of the type.  (Note that the length field is often
-   encoded, as described in <xref linkend="storage-toast">; it's unwise
-   to access it directly.)
+   length of this value of the type.
   </para>
 
   <para>
    The optional flag <literal>PASSEDBYVALUE</literal> indicates that
    values of this data type are passed by value, rather than by
-   reference.  Types passed by value must be fixed-length, and their internal
-   representation cannot be larger than the size of the <type>Datum</> type
-   (4 bytes on some machines, 8 bytes on others).
+   reference.  You cannot pass by value types whose internal
+   representation is larger than the size of the <type>Datum</> type
+   (4 bytes on most machines, 8 bytes on a few).
   </para>
 
   <para>
@@ -370,17 +368,6 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
   </para>
 
   <para>
-   All <replaceable class="parameter">storage</replaceable> values other
-   than <literal>plain</literal> imply that the functions of the data type
-   can handle values that have been <firstterm>toasted</>, as described
-   in <xref linkend="storage-toast"> and <xref linkend="xtypes-toast">.
-   The specific other value given merely determines the default TOAST
-   storage strategy for columns of a toastable data type; users can pick
-   other strategies for individual columns using <literal>ALTER TABLE
-   SET STORAGE</>.
-  </para>
-
-  <para>
    The <replaceable class="parameter">like_type</replaceable> parameter
    provides an alternative method for specifying the basic representation
    properties of a data type: copy them from some existing type. The values of
@@ -478,8 +465,8 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
     identical things, and you want to allow these things to be accessed
     directly by subscripting, in addition to whatever operations you plan
     to provide for the type as a whole.  For example, type <type>point</>
-    is represented as just two floating-point numbers, which can be accessed
-    using <literal>point[0]</> and <literal>point[1]</>.
+    is represented as just two floating-point numbers, each can be accessed using
+    <literal>point[0]</> and <literal>point[1]</>.
     Note that
     this facility only works for fixed-length types whose internal form
     is exactly a sequence of identical fixed-length fields.  A subscriptable
diff --git a/doc/src/sgml/release-8.3.sgml b/doc/src/sgml/release-8.3.sgml
index b56edb0..3ce96f1 100644
--- a/doc/src/sgml/release-8.3.sgml
+++ b/doc/src/sgml/release-8.3.sgml
@@ -5303,7 +5303,7 @@
     <listitem>
      <para>
       Fix incorrect archive truncation point calculation for the
-      <literal>%r</> macro in <varname>restore_command</> parameters
+      <literal>%r</> macro in <varname>recovery_command</> parameters
       (Simon)
      </para>
 
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index d8c5287..cb76b98 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -303,33 +303,25 @@ Oversized-Attribute Storage Technique).
 
 <para>
 <productname>PostgreSQL</productname> uses a fixed page size (commonly
-8 kB), and does not allow tuples to span multiple pages.  Therefore, it is
+8 kB), and does not allow tuples to span multiple pages.  Therefore,  it is
 not possible to store very large field values directly.  To overcome
-this limitation, large field values are compressed and/or broken up into
-multiple physical rows.  This happens transparently to the user, with only
+this limitation, large  field values are compressed and/or broken up into
+multiple physical rows. This happens transparently to the user, with only
 small impact on most of the backend code.  The technique is affectionately
-known as <acronym>TOAST</> (or <quote>the best thing since sliced bread</>).
-The <acronym>TOAST</> infrastructure is also used to improve handling of
-large data values in-memory.
+known as <acronym>TOAST</>  (or <quote>the best thing since sliced bread</>).
 </para>
 
 <para>
 Only certain data types support <acronym>TOAST</> &mdash; there is no need to
 impose the overhead on data types that cannot produce large field values.
 To support <acronym>TOAST</>, a data type must have a variable-length
-(<firstterm>varlena</>) representation, in which, ordinarily, the first
-four-byte word of any stored value contains the total length of the value in
-bytes (including itself).  <acronym>TOAST</> does not constrain the rest
-of the data type's representation.  The special representations collectively
-called <firstterm><acronym>TOAST</>ed values</firstterm> work by modifying or
-reinterpreting this initial length word.  Therefore, the C-level functions
-supporting a <acronym>TOAST</>-able data type must be careful about how they
-handle potentially <acronym>TOAST</>ed input values: an input might not
-actually consist of a four-byte length word and contents until after it's
-been <firstterm>detoasted</>.  (This is normally done by invoking
-<function>PG_DETOAST_DATUM</> before doing anything with an input value,
-but in some cases more efficient approaches are possible.
-See <xref linkend="xtypes-toast"> for more detail.)
+(<firstterm>varlena</>) representation, in which the first 32-bit word of any
+stored value contains the total length of the value in bytes (including
+itself).  <acronym>TOAST</> does not constrain the rest of the representation.
+All the C-level functions supporting a <acronym>TOAST</>-able data type must
+be careful to handle <acronym>TOAST</>ed input values.  (This is normally done
+by invoking <function>PG_DETOAST_DATUM</> before doing anything with an input
+value, but in some cases more efficient approaches are possible.)
 </para>
 
 <para>
@@ -341,84 +333,58 @@ the value is an ordinary un-<acronym>TOAST</>ed value of the data type, and
 the remaining bits of the length word give the total datum size (including
 length word) in bytes.  When the highest-order or lowest-order bit is set,
 the value has only a single-byte header instead of the normal four-byte
-header, and the remaining bits of that byte give the total datum size
-(including length byte) in bytes.  This alternative supports space-efficient
-storage of values shorter than 127 bytes, while still allowing the data type
-to grow to 1 GB at need.  Values with single-byte headers aren't aligned on
-any particular boundary, whereas values with four-byte headers are aligned on
-at least a four-byte boundary; this omission of alignment padding provides
-additional space savings that is significant compared to short values.
-As a special case, if the remaining bits of a single-byte header are all
-zero (which would be impossible for a self-inclusive length), the value is
-a pointer to out-of-line data, with several possible alternatives as
-described below.  The type and size of such a <firstterm>TOAST pointer</>
-are determined by a code stored in the second byte of the datum.
-Lastly, when the highest-order or lowest-order bit is clear but the adjacent
-bit is set, the content of the datum has been compressed and must be
-decompressed before use.  In this case the remaining bits of the four-byte
-length word give the total size of the compressed datum, not the
+header, and the remaining bits give the total datum size (including length
+byte) in bytes.  As a special case, if the remaining bits are all zero
+(which would be impossible for a self-inclusive length), the value is a
+pointer to out-of-line data stored in a separate TOAST table.  (The size of
+a TOAST pointer is given in the second byte of the datum.)
+Values with single-byte headers aren't aligned on any particular
+boundary, either.  Lastly, when the highest-order or lowest-order bit is
+clear but the adjacent bit is set, the content of the datum has been
+compressed and must be decompressed before use.  In this case the remaining
+bits of the length word give the total size of the compressed datum, not the
 original data.  Note that compression is also possible for out-of-line data
 but the varlena header does not tell whether it has occurred &mdash;
-the content of the <acronym>TOAST</> pointer tells that, instead.
+the content of the TOAST pointer tells that, instead.
 </para>
 
 <para>
-As mentioned, there are multiple types of <acronym>TOAST</> pointer datums.
-The oldest and most common type is a pointer to out-of-line data stored in
-a <firstterm><acronym>TOAST</> table</firstterm> that is separate from, but
-associated with, the table containing the <acronym>TOAST</> pointer datum
-itself.  These <firstterm>on-disk</> pointer datums are created by the
-<acronym>TOAST</> management code (in <filename>access/heap/tuptoaster.c</>)
-when a tuple to be stored on disk is too large to be stored as-is.
-Further details appear in <xref linkend="storage-toast-ondisk">.
-Alternatively, a <acronym>TOAST</> pointer datum can contain a pointer to
-out-of-line data that appears elsewhere in memory.  Such datums are
-necessarily short-lived, and will never appear on-disk, but they are very
-useful for avoiding copying and redundant processing of large data values.
-Further details appear in <xref linkend="storage-toast-inmemory">.
-</para>
-
-<para>
-The compression technique used for either in-line or out-of-line compressed
-data is a fairly simple and very fast member
-of the LZ family of compression techniques.  See
-<filename>src/common/pg_lzcompress.c</> for the details.
-</para>
-
-<sect2 id="storage-toast-ondisk">
- <title>Out-of-line, on-disk TOAST storage</title>
-
-<para>
 If any of the columns of a table are <acronym>TOAST</>-able, the table will
 have an associated <acronym>TOAST</> table, whose OID is stored in the table's
-<structname>pg_class</>.<structfield>reltoastrelid</> entry.  On-disk
+<structname>pg_class</>.<structfield>reltoastrelid</> entry.  Out-of-line
 <acronym>TOAST</>ed values are kept in the <acronym>TOAST</> table, as
 described in more detail below.
 </para>
 
 <para>
+The compression technique used is a fairly simple and very fast member
+of the LZ family of compression techniques.  See
+<filename>src/backend/utils/adt/pg_lzcompress.c</> for the details.
+</para>
+
+<para>
 Out-of-line values are divided (after compression if used) into chunks of at
 most <symbol>TOAST_MAX_CHUNK_SIZE</> bytes (by default this value is chosen
 so that four chunk rows will fit on a page, making it about 2000 bytes).
-Each chunk is stored as a separate row in the <acronym>TOAST</> table
-belonging to the owning table.  Every
+Each chunk is stored
+as a separate row in the <acronym>TOAST</> table for the owning table.  Every
 <acronym>TOAST</> table has the columns <structfield>chunk_id</> (an OID
 identifying the particular <acronym>TOAST</>ed value),
 <structfield>chunk_seq</> (a sequence number for the chunk within its value),
 and <structfield>chunk_data</> (the actual data of the chunk).  A unique index
 on <structfield>chunk_id</> and <structfield>chunk_seq</> provides fast
-retrieval of the values.  A pointer datum representing an out-of-line on-disk
+retrieval of the values.  A pointer datum representing an out-of-line
 <acronym>TOAST</>ed value therefore needs to store the OID of the
 <acronym>TOAST</> table in which to look and the OID of the specific value
 (its <structfield>chunk_id</>).  For convenience, pointer datums also store the
-logical datum size (original uncompressed data length) and physical stored size
+logical datum size (original uncompressed data length) and actual stored size
 (different if compression was applied).  Allowing for the varlena header bytes,
-the total size of an on-disk <acronym>TOAST</> pointer datum is therefore 18
-bytes regardless of the actual size of the represented value.
+the total size of a <acronym>TOAST</> pointer datum is therefore 18 bytes
+regardless of the actual size of the represented value.
 </para>
 
 <para>
-The <acronym>TOAST</> management code is triggered only
+The <acronym>TOAST</> code is triggered only
 when a row value to be stored in a table is wider than
 <symbol>TOAST_TUPLE_THRESHOLD</> bytes (normally 2 kB).
 The <acronym>TOAST</> code will compress and/or move
@@ -431,8 +397,8 @@ none of the out-of-line values change.
 </para>
 
 <para>
-The <acronym>TOAST</> management code recognizes four different strategies
-for storing <acronym>TOAST</>-able columns on disk:
+The <acronym>TOAST</> code recognizes four different strategies for storing
+<acronym>TOAST</>-able columns:
 
    <itemizedlist>
     <listitem>
@@ -494,41 +460,6 @@ pages). There was no run time difference compared to an un-<acronym>TOAST</>ed
 comparison table, in which all the HTML pages were cut down to 7 kB to fit.
 </para>
 
-</sect2>
-
-<sect2 id="storage-toast-inmemory">
- <title>Out-of-line, in-memory TOAST storage</title>
-
-<para>
-<acronym>TOAST</> pointers can point to data that is not on disk, but is
-elsewhere in the memory of the current server process.  Such pointers
-obviously cannot be long-lived, but they are nonetheless useful.  There
-is currently just one sub-case:
-pointers to <firstterm>indirect</> data.
-</para>
-
-<para>
-Indirect <acronym>TOAST</> pointers simply point at a non-indirect varlena
-value stored somewhere in memory.  This case was originally created merely
-as a proof of concept, but it is currently used during logical decoding to
-avoid possibly having to create physical tuples exceeding 1 GB (as pulling
-all out-of-line field values into the tuple might do).  The case is of
-limited use since the creator of the pointer datum is entirely responsible
-that the referenced data survives for as long as the pointer could exist,
-and there is no infrastructure to help with this.
-</para>
-
-<para>
-For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</>
-management code ensures that no such pointer datum can accidentally get
-stored on disk.  In-memory <acronym>TOAST</> pointers are automatically
-expanded to normal in-line varlena values before storage &mdash; and then
-possibly converted to on-disk <acronym>TOAST</> pointers, if the containing
-tuple would otherwise be too big.
-</para>
-
-</sect2>
-
 </sect1>
 
 <sect1 id="storage-fsm">
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index b57749f..1254c03 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -472,10 +472,9 @@
   <para>
    The server's checkpointer process automatically performs
    a checkpoint every so often.  A checkpoint is begun every <xref
-   linkend="guc-checkpoint-timeout"> seconds, or if
-   <xref linkend="guc-max-wal-size"> is about to be exceeded,
-   whichever comes first.
-   The default settings are 5 minutes and 128 MB, respectively.
+   linkend="guc-checkpoint-segments"> log segments, or every <xref
+   linkend="guc-checkpoint-timeout"> seconds, whichever comes first.
+   The default settings are 3 segments and 300 seconds (5 minutes), respectively.
    If no WAL has been written since the previous checkpoint, new checkpoints
    will be skipped even if <varname>checkpoint_timeout</> has passed.
    (If WAL archiving is being used and you want to put a lower limit on how
@@ -487,8 +486,8 @@
   </para>
 
   <para>
-   Reducing <varname>checkpoint_timeout</varname> and/or
-   <varname>max_wal_size</varname> causes checkpoints to occur
+   Reducing <varname>checkpoint_segments</varname> and/or
+   <varname>checkpoint_timeout</varname> causes checkpoints to occur
    more often. This allows faster after-crash recovery, since less work
    will need to be redone. However, one must balance this against the
    increased cost of flushing dirty data pages more often. If
@@ -511,11 +510,11 @@
    parameter.  If checkpoints happen closer together than
    <varname>checkpoint_warning</> seconds,
    a message will be output to the server log recommending increasing
-   <varname>max_wal_size</varname>.  Occasional appearance of such
+   <varname>checkpoint_segments</varname>.  Occasional appearance of such
    a message is not cause for alarm, but if it appears often then the
    checkpoint control parameters should be increased. Bulk operations such
    as large <command>COPY</> transfers might cause a number of such warnings
-   to appear if you have not set <varname>max_wal_size</> high
+   to appear if you have not set <varname>checkpoint_segments</> high
    enough.
   </para>
 
@@ -526,10 +525,10 @@
    <xref linkend="guc-checkpoint-completion-target">, which is
    given as a fraction of the checkpoint interval.
    The I/O rate is adjusted so that the checkpoint finishes when the
-   given fraction of
-   <varname>checkpoint_timeout</varname> seconds have elapsed, or before
-   <varname>max_wal_size</varname> is exceeded, whichever is sooner.
-   With the default value of 0.5,
+   given fraction of <varname>checkpoint_segments</varname> WAL segments
+   have been consumed since checkpoint start, or the given fraction of
+   <varname>checkpoint_timeout</varname> seconds have elapsed,
+   whichever is sooner.  With the default value of 0.5,
    <productname>PostgreSQL</> can be expected to complete each checkpoint
    in about half the time before the next checkpoint starts.  On a system
    that's very close to maximum I/O throughput during normal operation,
@@ -546,35 +545,18 @@
   </para>
 
   <para>
-   The number of WAL segment files in <filename>pg_xlog</> directory depends on
-   <varname>min_wal_size</>, <varname>max_wal_size</> and
-   the amount of WAL generated in previous checkpoint cycles. When old log
-   segment files are no longer needed, they are removed or recycled (that is,
-   renamed to become future segments in the numbered sequence). If, due to a
-   short-term peak of log output rate, <varname>max_wal_size</> is
-   exceeded, the unneeded segment files will be removed until the system
-   gets back under this limit. Below that limit, the system recycles enough
-   WAL files to cover the estimated need until the next checkpoint, and
-   removes the rest. The estimate is based on a moving average of the number
-   of WAL files used in previous checkpoint cycles. The moving average
-   is increased immediately if the actual usage exceeds the estimate, so it
-   accommodates peak usage rather average usage to some extent.
-   <varname>min_wal_size</> puts a minimum on the amount of WAL files
-   recycled for future usage; that much WAL is always recycled for future use,
-   even if the system is idle and the WAL usage estimate suggests that little
-   WAL is needed.
-  </para>
-
-  <para>
-   Independently of <varname>max_wal_size</varname>,
-   <xref linkend="guc-wal-keep-segments"> + 1 most recent WAL files are
-   kept at all times. Also, if WAL archiving is used, old segments can not be
-   removed or recycled until they are archived. If WAL archiving cannot keep up
-   with the pace that WAL is generated, or if <varname>archive_command</varname>
-   fails repeatedly, old WAL files will accumulate in <filename>pg_xlog</>
-   until the situation is resolved. A slow or failed standby server that
-   uses a replication slot will have the same effect (see
-   <xref linkend="streaming-replication-slots">).
+   There will always be at least one WAL segment file, and will normally
+   not be more than (2 + <varname>checkpoint_completion_target</varname>) * <varname>checkpoint_segments</varname> + 1
+   or <varname>checkpoint_segments</> + <xref linkend="guc-wal-keep-segments"> + 1
+   files.  Each segment file is normally 16 MB (though this size can be
+   altered when building the server).  You can use this to estimate space
+   requirements for <acronym>WAL</acronym>.
+   Ordinarily, when old log segment files are no longer needed, they
+   are recycled (that is, renamed to become future segments in the numbered
+   sequence). If, due to a short-term peak of log output rate, there
+   are more than 3 * <varname>checkpoint_segments</varname> + 1
+   segment files, the unneeded segment files will be deleted instead
+   of recycled until the system gets back under this limit.
   </para>
 
   <para>
@@ -589,8 +571,9 @@
    master because restartpoints can only be performed at checkpoint records.
    A restartpoint is triggered when a checkpoint record is reached if at
    least <varname>checkpoint_timeout</> seconds have passed since the last
-   restartpoint, or if WAL size is about to exceed
-   <varname>max_wal_size</>.
+   restartpoint. In standby mode, a restartpoint is also triggered if at
+   least <varname>checkpoint_segments</> log segments have been replayed
+   since the last restartpoint.
   </para>
 
   <para>
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index b85f2ad..f40504c 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -1885,12 +1885,17 @@ typedef struct
 <programlisting>
 typedef struct {
     int32 length;
-    char data[FLEXIBLE_ARRAY_MEMBER];
+    char data[1];
 } text;
 </programlisting>
 
-     The <literal>[FLEXIBLE_ARRAY_MEMBER]</> notation means that the actual
-     length of the data part is not specified by this declaration.
+     Obviously,  the  data  field declared here is not long enough to hold
+     all possible strings.  Since it's impossible to declare a variable-size
+     structure in <acronym>C</acronym>, we rely on the knowledge that the
+     <acronym>C</acronym> compiler won't range-check array subscripts.  We
+     just allocate the necessary amount of space and then access the array as
+     if it were declared the right length.  (This is a common trick, which
+     you can read about in many textbooks about C.)
     </para>
 
     <para>
@@ -2981,20 +2986,6 @@ SRF_RETURN_DONE(funcctx)
      <structfield>multi_call_memory_ctx</> while doing the first-call setup.
     </para>
 
-    <warning>
-     <para>
-      While the actual arguments to the function remain unchanged between
-      calls, if you detoast the argument values (which is normally done
-      transparently by the
-      <function>PG_GETARG_<replaceable>xxx</replaceable></function> macro)
-      in the transient context then the detoasted copies will be freed on
-      each cycle. Accordingly, if you keep references to such values in
-      your <structfield>user_fctx</>, you must either copy them into the
-      <structfield>multi_call_memory_ctx</> after detoasting, or ensure
-      that you detoast the values only in that context.
-     </para>
-    </warning>
-
     <para>
      A complete pseudo-code example looks like the following:
 <programlisting>
diff --git a/doc/src/sgml/xtypes.sgml b/doc/src/sgml/xtypes.sgml
index 2459616..e1340ba 100644
--- a/doc/src/sgml/xtypes.sgml
+++ b/doc/src/sgml/xtypes.sgml
@@ -234,49 +234,35 @@ CREATE TYPE complex (
  </para>
 
  <para>
-  If the internal representation of the data type is variable-length, the
-  internal representation must follow the standard layout for variable-length
-  data: the first four bytes must be a <type>char[4]</type> field which is
-  never accessed directly (customarily named <structfield>vl_len_</>). You
-  must use the <function>SET_VARSIZE()</function> macro to store the total
-  size of the datum (including the length field itself) in this field
-  and <function>VARSIZE()</function> to retrieve it.  (These macros exist
-  because the length field may be encoded depending on platform.)
- </para>
-
- <para>
-  For further details see the description of the
-  <xref linkend="sql-createtype"> command.
- </para>
-
- <sect2 id="xtypes-toast">
-  <title>TOAST Considerations</title>
    <indexterm>
     <primary>TOAST</primary>
     <secondary>and user-defined types</secondary>
    </indexterm>
-
- <para>
-  If the values of your data type vary in size (in internal form), it's
-  usually desirable to make the data type <acronym>TOAST</>-able (see <xref
-  linkend="storage-toast">). You should do this even if the values are always
+  If the values of your data type vary in size (in internal form), you should
+  make the data type <acronym>TOAST</>-able (see <xref
+  linkend="storage-toast">). You should do this even if the data are always
   too small to be compressed or stored externally, because
   <acronym>TOAST</> can save space on small data too, by reducing header
   overhead.
  </para>
 
  <para>
-  To support <acronym>TOAST</> storage, the C functions operating on the data
-  type must always be careful to unpack any toasted values they are handed
-  by using <function>PG_DETOAST_DATUM</>.  (This detail is customarily hidden
-  by defining type-specific <function>GETARG_DATATYPE_P</function> macros.)
-  Then, when running the <command>CREATE TYPE</command> command, specify the
-  internal length as <literal>variable</> and select some appropriate storage
-  option other than <literal>plain</>.
+  To do this, the internal representation must follow the standard layout for
+  variable-length data: the first four bytes must be a <type>char[4]</type>
+  field which is never accessed directly (customarily named
+  <structfield>vl_len_</>). You
+  must use <function>SET_VARSIZE()</function> to store the size of the datum
+  in this field and <function>VARSIZE()</function> to retrieve it. The C
+  functions operating on the data type must always be careful to unpack any
+  toasted values they are handed, by using <function>PG_DETOAST_DATUM</>.
+  (This detail is customarily hidden by defining type-specific
+  <function>GETARG_DATATYPE_P</function> macros.) Then, when running the
+  <command>CREATE TYPE</command> command, specify the internal length as
+  <literal>variable</> and select the appropriate storage option.
  </para>
 
  <para>
-  If data alignment is unimportant (either just for a specific function or
+  If the alignment is unimportant (either just for a specific function or
   because the data type specifies byte alignment anyway) then it's possible
   to avoid some of the overhead of <function>PG_DETOAST_DATUM</>. You can use
   <function>PG_DETOAST_DATUM_PACKED</> instead (customarily hidden by
@@ -300,6 +286,8 @@ CREATE TYPE complex (
   </para>
  </note>
 
- </sect2>
-
+ <para>
+  For further details see the description of the
+  <xref linkend="sql-createtype"> command.
+ </para>
 </sect1>
diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index f96c709..739033f 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -296,7 +296,6 @@ all-shared-lib: $(shlib)
 
 ifndef haslibarule
 $(stlib): $(OBJS) | $(SHLIB_PREREQS)
-	rm -f $@
 	$(LINK.static) $@ $^
 	$(RANLIB) $@
 endif #haslibarule
@@ -338,7 +337,6 @@ else # PORTNAME == aix
 
 # AIX case
 $(shlib) $(stlib): $(OBJS) | $(SHLIB_PREREQS)
-	rm -f $(stlib)
 	$(LINK.static) $(stlib) $^
 	$(RANLIB) $(stlib)
 	$(MKLDEXPORT) $(stlib) >$(exports_file)
@@ -358,7 +356,6 @@ $(shlib): $(OBJS) | $(SHLIB_PREREQS)
 	$(CC) $(CFLAGS)  -shared -o $@  $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(LDAP_LIBS_BE)
 
 $(stlib): $(OBJS) | $(SHLIB_PREREQS)
-	rm -f $@
 	$(LINK.static) $@ $^
 	$(RANLIB) $@
 
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 6cd4e8e..867035d 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -1434,7 +1434,7 @@ heap_form_minimal_tuple(TupleDesc tupleDescriptor,
 	/*
 	 * Determine total space needed
 	 */
-	len = SizeofMinimalTupleHeader;
+	len = offsetof(MinimalTupleData, t_bits);
 
 	if (hasnull)
 		len += BITMAPLEN(numberOfAttributes);
diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c
index 991858f..cc8d818 100644
--- a/src/backend/access/gist/gistscan.c
+++ b/src/backend/access/gist/gistscan.c
@@ -41,9 +41,9 @@ pairingheap_GISTSearchItem_cmp(const pairingheap_node *a, const pairingheap_node
 
 	/* Heap items go before inner pages, to ensure a depth-first search */
 	if (GISTSearchItemIsHeap(*sa) && !GISTSearchItemIsHeap(*sb))
-		return 1;
-	if (!GISTSearchItemIsHeap(*sa) && GISTSearchItemIsHeap(*sb))
 		return -1;
+	if (!GISTSearchItemIsHeap(*sa) && GISTSearchItemIsHeap(*sb))
+		return 1;
 
 	return 0;
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cb6f8a3..46060bc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2186,8 +2186,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		XLogRegisterBufData(0, (char *) &xlhdr, SizeOfHeapHeader);
 		/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
 		XLogRegisterBufData(0,
-							(char *) heaptup->t_data + SizeofHeapTupleHeader,
-							heaptup->t_len - SizeofHeapTupleHeader);
+			(char *) heaptup->t_data + offsetof(HeapTupleHeaderData, t_bits),
+					 heaptup->t_len - offsetof(HeapTupleHeaderData, t_bits));
 
 		recptr = XLogInsert(RM_HEAP_ID, info);
 
@@ -2460,9 +2460,9 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 				tuphdr->t_hoff = heaptup->t_data->t_hoff;
 
 				/* write bitmap [+ padding] [+ oid] + data */
-				datalen = heaptup->t_len - SizeofHeapTupleHeader;
+				datalen = heaptup->t_len - offsetof(HeapTupleHeaderData, t_bits);
 				memcpy(scratchptr,
-					   (char *) heaptup->t_data + SizeofHeapTupleHeader,
+					   (char *) heaptup->t_data + offsetof(HeapTupleHeaderData, t_bits),
 					   datalen);
 				tuphdr->datalen = datalen;
 				scratchptr += datalen;
@@ -2904,9 +2904,9 @@ l1:
 
 			XLogRegisterData((char *) &xlhdr, SizeOfHeapHeader);
 			XLogRegisterData((char *) old_key_tuple->t_data
-							 + SizeofHeapTupleHeader,
+							 + offsetof(HeapTupleHeaderData, t_bits),
 							 old_key_tuple->t_len
-							 - SizeofHeapTupleHeader);
+							 - offsetof(HeapTupleHeaderData, t_bits));
 		}
 
 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE);
@@ -6732,7 +6732,7 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	xlhdr.t_infomask2 = newtup->t_data->t_infomask2;
 	xlhdr.t_infomask = newtup->t_data->t_infomask;
 	xlhdr.t_hoff = newtup->t_data->t_hoff;
-	Assert(SizeofHeapTupleHeader + prefixlen + suffixlen <= newtup->t_len);
+	Assert(offsetof(HeapTupleHeaderData, t_bits) + prefixlen + suffixlen <= newtup->t_len);
 
 	/*
 	 * PG73FORMAT: write bitmap [+ padding] [+ oid] + data
@@ -6743,8 +6743,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	if (prefixlen == 0)
 	{
 		XLogRegisterBufData(0,
-							((char *) newtup->t_data) + SizeofHeapTupleHeader,
-							newtup->t_len - SizeofHeapTupleHeader - suffixlen);
+		   ((char *) newtup->t_data) + offsetof(HeapTupleHeaderData, t_bits),
+		   newtup->t_len - offsetof(HeapTupleHeaderData, t_bits) -suffixlen);
 	}
 	else
 	{
@@ -6753,11 +6753,11 @@ log_heap_update(Relation reln, Buffer oldbuf,
 		 * two separate rdata entries.
 		 */
 		/* bitmap [+ padding] [+ oid] */
-		if (newtup->t_data->t_hoff - SizeofHeapTupleHeader > 0)
+		if (newtup->t_data->t_hoff - offsetof(HeapTupleHeaderData, t_bits) >0)
 		{
 			XLogRegisterBufData(0,
-								((char *) newtup->t_data) + SizeofHeapTupleHeader,
-								newtup->t_data->t_hoff - SizeofHeapTupleHeader);
+			((char *) newtup->t_data) + offsetof(HeapTupleHeaderData, t_bits),
+			 newtup->t_data->t_hoff - offsetof(HeapTupleHeaderData, t_bits));
 		}
 
 		/* data after common prefix */
@@ -6777,8 +6777,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 		XLogRegisterData((char *) &xlhdr_idx, SizeOfHeapHeader);
 
 		/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
-		XLogRegisterData((char *) old_key_tuple->t_data + SizeofHeapTupleHeader,
-						 old_key_tuple->t_len - SizeofHeapTupleHeader);
+		XLogRegisterData((char *) old_key_tuple->t_data + offsetof(HeapTupleHeaderData, t_bits),
+			   old_key_tuple->t_len - offsetof(HeapTupleHeaderData, t_bits));
 	}
 
 	recptr = XLogInsert(RM_HEAP_ID, info);
@@ -7351,7 +7351,7 @@ heap_xlog_insert(XLogReaderState *record)
 	xl_heap_insert *xlrec = (xl_heap_insert *) XLogRecGetData(record);
 	Buffer		buffer;
 	Page		page;
-	union
+	struct
 	{
 		HeapTupleHeaderData hdr;
 		char		data[MaxHeapTupleSize];
@@ -7415,12 +7415,12 @@ heap_xlog_insert(XLogReaderState *record)
 		data += SizeOfHeapHeader;
 
 		htup = &tbuf.hdr;
-		MemSet((char *) htup, 0, SizeofHeapTupleHeader);
+		MemSet((char *) htup, 0, sizeof(HeapTupleHeaderData));
 		/* PG73FORMAT: get bitmap [+ padding] [+ oid] + data */
-		memcpy((char *) htup + SizeofHeapTupleHeader,
+		memcpy((char *) htup + offsetof(HeapTupleHeaderData, t_bits),
 			   data,
 			   newlen);
-		newlen += SizeofHeapTupleHeader;
+		newlen += offsetof(HeapTupleHeaderData, t_bits);
 		htup->t_infomask2 = xlhdr.t_infomask2;
 		htup->t_infomask = xlhdr.t_infomask;
 		htup->t_hoff = xlhdr.t_hoff;
@@ -7469,7 +7469,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	BlockNumber blkno;
 	Buffer		buffer;
 	Page		page;
-	union
+	struct
 	{
 		HeapTupleHeaderData hdr;
 		char		data[MaxHeapTupleSize];
@@ -7548,14 +7548,14 @@ heap_xlog_multi_insert(XLogReaderState *record)
 			newlen = xlhdr->datalen;
 			Assert(newlen <= MaxHeapTupleSize);
 			htup = &tbuf.hdr;
-			MemSet((char *) htup, 0, SizeofHeapTupleHeader);
+			MemSet((char *) htup, 0, sizeof(HeapTupleHeaderData));
 			/* PG73FORMAT: get bitmap [+ padding] [+ oid] + data */
-			memcpy((char *) htup + SizeofHeapTupleHeader,
+			memcpy((char *) htup + offsetof(HeapTupleHeaderData, t_bits),
 				   (char *) tupdata,
 				   newlen);
 			tupdata += newlen;
 
-			newlen += SizeofHeapTupleHeader;
+			newlen += offsetof(HeapTupleHeaderData, t_bits);
 			htup->t_infomask2 = xlhdr->t_infomask2;
 			htup->t_infomask = xlhdr->t_infomask;
 			htup->t_hoff = xlhdr->t_hoff;
@@ -7618,7 +7618,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	uint16		prefixlen = 0,
 				suffixlen = 0;
 	char	   *newp;
-	union
+	struct
 	{
 		HeapTupleHeaderData hdr;
 		char		data[MaxHeapTupleSize];
@@ -7780,19 +7780,19 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		Assert(tuplen <= MaxHeapTupleSize);
 
 		htup = &tbuf.hdr;
-		MemSet((char *) htup, 0, SizeofHeapTupleHeader);
+		MemSet((char *) htup, 0, sizeof(HeapTupleHeaderData));
 
 		/*
 		 * Reconstruct the new tuple using the prefix and/or suffix from the
 		 * old tuple, and the data stored in the WAL record.
 		 */
-		newp = (char *) htup + SizeofHeapTupleHeader;
+		newp = (char *) htup + offsetof(HeapTupleHeaderData, t_bits);
 		if (prefixlen > 0)
 		{
 			int			len;
 
 			/* copy bitmap [+ padding] [+ oid] from WAL record */
-			len = xlhdr.t_hoff - SizeofHeapTupleHeader;
+			len = xlhdr.t_hoff - offsetof(HeapTupleHeaderData, t_bits);
 			memcpy(newp, recdata, len);
 			recdata += len;
 			newp += len;
@@ -7802,7 +7802,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 			newp += prefixlen;
 
 			/* copy new tuple data from WAL record */
-			len = tuplen - (xlhdr.t_hoff - SizeofHeapTupleHeader);
+			len = tuplen - (xlhdr.t_hoff - offsetof(HeapTupleHeaderData, t_bits));
 			memcpy(newp, recdata, len);
 			recdata += len;
 			newp += len;
@@ -7823,7 +7823,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		if (suffixlen > 0)
 			memcpy(newp, (char *) oldtup.t_data + oldtup.t_len - suffixlen, suffixlen);
 
-		newlen = SizeofHeapTupleHeader + tuplen + prefixlen + suffixlen;
+		newlen = offsetof(HeapTupleHeaderData, t_bits) + tuplen + prefixlen + suffixlen;
 		htup->t_infomask2 = xlhdr.t_infomask2;
 		htup->t_infomask = xlhdr.t_infomask;
 		htup->t_hoff = xlhdr.t_hoff;
diff --git a/src/backend/access/heap/syncscan.c b/src/backend/access/heap/syncscan.c
index 266c330..ae7589a 100644
--- a/src/backend/access/heap/syncscan.c
+++ b/src/backend/access/heap/syncscan.c
@@ -103,11 +103,10 @@ typedef struct ss_scan_locations_t
 {
 	ss_lru_item_t *head;
 	ss_lru_item_t *tail;
-	ss_lru_item_t items[FLEXIBLE_ARRAY_MEMBER]; /* SYNC_SCAN_NELEM items */
+	ss_lru_item_t items[1];		/* SYNC_SCAN_NELEM items */
 } ss_scan_locations_t;
 
-#define SizeOfScanLocations(N) \
-	(offsetof(ss_scan_locations_t, items) + (N) * sizeof(ss_lru_item_t))
+#define SizeOfScanLocations(N) offsetof(ss_scan_locations_t, items[N])
 
 /* Pointer to struct in shared memory */
 static ss_scan_locations_t *scan_locations;
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 8464e87..f8c1401 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -677,7 +677,7 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 	 */
 
 	/* compute header overhead --- this should match heap_form_tuple() */
-	hoff = SizeofHeapTupleHeader;
+	hoff = offsetof(HeapTupleHeaderData, t_bits);
 	if (has_nulls)
 		hoff += BITMAPLEN(numAttrs);
 	if (newtup->t_data->t_infomask & HEAP_HASOID)
@@ -963,7 +963,7 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		 * different conclusion about the size of the null bitmap, or even
 		 * whether there needs to be one at all.
 		 */
-		new_header_len = SizeofHeapTupleHeader;
+		new_header_len = offsetof(HeapTupleHeaderData, t_bits);
 		if (has_nulls)
 			new_header_len += BITMAPLEN(numAttrs);
 		if (olddata->t_infomask & HEAP_HASOID)
@@ -986,7 +986,7 @@ toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
 		/*
 		 * Copy the existing tuple header, but adjust natts and t_hoff.
 		 */
-		memcpy(new_data, olddata, SizeofHeapTupleHeader);
+		memcpy(new_data, olddata, offsetof(HeapTupleHeaderData, t_bits));
 		HeapTupleHeaderSetNatts(new_data, numAttrs);
 		new_data->t_hoff = new_header_len;
 		if (olddata->t_infomask & HEAP_HASOID)
@@ -1196,7 +1196,7 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	 *
 	 * This should match the reconstruction code in toast_insert_or_update.
 	 */
-	new_header_len = SizeofHeapTupleHeader;
+	new_header_len = offsetof(HeapTupleHeaderData, t_bits);
 	if (has_nulls)
 		new_header_len += BITMAPLEN(numAttrs);
 	if (tup->t_infomask & HEAP_HASOID)
@@ -1211,7 +1211,7 @@ toast_flatten_tuple_to_datum(HeapTupleHeader tup,
 	/*
 	 * Copy the existing tuple header, but adjust natts and t_hoff.
 	 */
-	memcpy(new_data, tup, SizeofHeapTupleHeader);
+	memcpy(new_data, tup, offsetof(HeapTupleHeaderData, t_bits));
 	HeapTupleHeaderSetNatts(new_data, numAttrs);
 	new_data->t_hoff = new_header_len;
 	if (tup->t_infomask & HEAP_HASOID)
@@ -1365,13 +1365,11 @@ toast_save_datum(Relation rel, Datum value,
 	CommandId	mycid = GetCurrentCommandId(true);
 	struct varlena *result;
 	struct varatt_external toast_pointer;
-	union
+	struct
 	{
 		struct varlena hdr;
-		/* this is to make the union big enough for a chunk: */
-		char		data[TOAST_MAX_CHUNK_SIZE + VARHDRSZ];
-		/* ensure union is aligned well enough: */
-		int32		align_it;
+		char		data[TOAST_MAX_CHUNK_SIZE]; /* make struct big enough */
+		int32		align_it;	/* ensure struct is aligned well enough */
 	}			chunk_data;
 	int32		chunk_size;
 	int32		chunk_seq = 0;
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 379dac9..43e048c 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1836,7 +1836,7 @@ typedef struct BTVacInfo
 	BTCycleId	cycle_ctr;		/* cycle ID most recently assigned */
 	int			num_vacuums;	/* number of currently active VACUUMs */
 	int			max_vacuums;	/* allocated length of vacuums[] array */
-	BTOneVacInfo vacuums[FLEXIBLE_ARRAY_MEMBER];
+	BTOneVacInfo vacuums[1];	/* VARIABLE LENGTH ARRAY */
 } BTVacInfo;
 
 static BTVacInfo *btvacinfo;
@@ -1984,7 +1984,7 @@ BTreeShmemSize(void)
 {
 	Size		size;
 
-	size = offsetof(BTVacInfo, vacuums);
+	size = offsetof(BTVacInfo, vacuums[0]);
 	size = add_size(size, mul_size(MaxBackends, sizeof(BTOneVacInfo)));
 	return size;
 }
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index f9ca028..b2cf770 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -258,7 +258,7 @@ typedef struct MultiXactStateData
 	 * stored in pg_control and used as truncation point for pg_multixact.  At
 	 * checkpoint or restartpoint, unneeded segments are removed.
 	 */
-	MultiXactId perBackendXactIds[FLEXIBLE_ARRAY_MEMBER];
+	MultiXactId perBackendXactIds[1];	/* VARIABLE LENGTH ARRAY */
 } MultiXactStateData;
 
 /*
@@ -1744,9 +1744,8 @@ MultiXactShmemSize(void)
 {
 	Size		size;
 
-	/* We need 2*MaxOldestSlot + 1 perBackendXactIds[] entries */
 #define SHARED_MULTIXACT_STATE_SIZE \
-	add_size(offsetof(MultiXactStateData, perBackendXactIds) + sizeof(MultiXactId), \
+	add_size(sizeof(MultiXactStateData), \
 			 mul_size(sizeof(MultiXactId) * 2, MaxOldestSlot))
 
 	size = SHARED_MULTIXACT_STATE_SIZE;
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 6edc227..6c7029e 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -134,9 +134,12 @@ typedef struct TwoPhaseStateData
 	/* Number of valid prepXacts entries. */
 	int			numPrepXacts;
 
-	/* There are max_prepared_xacts items in this array */
-	GlobalTransaction prepXacts[FLEXIBLE_ARRAY_MEMBER];
-} TwoPhaseStateData;
+	/*
+	 * There are max_prepared_xacts items in this array, but C wants a
+	 * fixed-size array.
+	 */
+	GlobalTransaction prepXacts[1];		/* VARIABLE LENGTH ARRAY */
+} TwoPhaseStateData;			/* VARIABLE LENGTH STRUCT */
 
 static TwoPhaseStateData *TwoPhaseState;
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 89769ea..97000ef 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1031,9 +1031,10 @@ RecordTransactionCommit(void)
 
 		/*
 		 * If we didn't create XLOG entries, we're done here; otherwise we
-		 * should trigger flushing those entries the same as a commit record
-		 * would.  This will primarily happen for HOT pruning and the like; we
-		 * want these to be flushed to disk in due time.
+		 * should flush those entries the same as a commit record.  (An
+		 * example of a possible record that wouldn't cause an XID to be
+		 * assigned is a sequence advance record due to nextval() --- we want
+		 * to flush that to disk before reporting commit.)
 		 */
 		if (!wrote_xlog)
 			goto cleanup;
@@ -1152,13 +1153,11 @@ RecordTransactionCommit(void)
 	/*
 	 * Check if we want to commit asynchronously.  We can allow the XLOG flush
 	 * to happen asynchronously if synchronous_commit=off, or if the current
-	 * transaction has not performed any WAL-logged operation or didn't assign
-	 * a xid.  The transaction can end up not writing any WAL, even if it has
-	 * a xid, if it only wrote to temporary and/or unlogged tables.  It can
-	 * end up having written WAL without an xid if it did HOT pruning.  In
-	 * case of a crash, the loss of such a transaction will be irrelevant;
-	 * temp tables will be lost anyway, unlogged tables will be truncated and
-	 * HOT pruning will be done again later. (Given the foregoing, you might
+	 * transaction has not performed any WAL-logged operation.  The latter
+	 * case can arise if the current transaction wrote only to temporary
+	 * and/or unlogged tables.  In case of a crash, the loss of such a
+	 * transaction will be irrelevant since temp tables will be lost anyway,
+	 * and unlogged tables will be truncated.  (Given the foregoing, you might
 	 * think that it would be unnecessary to emit the XLOG record at all in
 	 * this case, but we don't currently try to do that.  It would certainly
 	 * cause problems at least in Hot Standby mode, where the
@@ -1174,8 +1173,7 @@ RecordTransactionCommit(void)
 	 * if all to-be-deleted tables are temporary though, since they are lost
 	 * anyway if we crash.)
 	 */
-	if ((wrote_xlog && markXidCommitted &&
-		 synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
+	if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
 		forceSyncCommit || nrels > 0)
 	{
 		XLogFlush(XactLastRecEnd);
@@ -1224,15 +1222,12 @@ RecordTransactionCommit(void)
 	latestXid = TransactionIdLatest(xid, nchildren, children);
 
 	/*
-	 * Wait for synchronous replication, if required. Similar to the decision
-	 * above about using committing asynchronously we only want to wait if
-	 * this backend assigned a xid and wrote WAL.  No need to wait if a xid
-	 * was assigned due to temporary/unlogged tables or due to HOT pruning.
+	 * Wait for synchronous replication, if required.
 	 *
 	 * Note that at this stage we have marked clog, but still show as running
 	 * in the procarray and continue to hold locks.
 	 */
-	if (wrote_xlog && markXidCommitted)
+	if (wrote_xlog)
 		SyncRepWaitForLSN(XactLastRecEnd);
 
 	/* Reset XactLastRecEnd until the next transaction writes something */
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a28155f..629a457 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -79,8 +79,7 @@ extern uint32 bootstrap_data_checksum_version;
 
 
 /* User-settable parameters */
-int			max_wal_size = 8;		/* 128 MB */
-int			min_wal_size = 5;		/* 80 MB */
+int			CheckPointSegments = 3;
 int			wal_keep_segments = 0;
 int			XLOGbuffers = -1;
 int			XLogArchiveTimeout = 0;
@@ -94,7 +93,6 @@ int			sync_method = DEFAULT_SYNC_METHOD;
 int			wal_level = WAL_LEVEL_MINIMAL;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
-int			wal_retrieve_retry_interval = 5000;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -108,14 +106,18 @@ bool		XLOG_DEBUG = false;
 #define NUM_XLOGINSERT_LOCKS  8
 
 /*
- * Max distance from last checkpoint, before triggering a new xlog-based
- * checkpoint.
+ * XLOGfileslop is the maximum number of preallocated future XLOG segments.
+ * When we are done with an old XLOG segment file, we will recycle it as a
+ * future XLOG segment as long as there aren't already XLOGfileslop future
+ * segments; else we'll delete it.  This could be made a separate GUC
+ * variable, but at present I think it's sufficient to hardwire it as
+ * 2*CheckPointSegments+1.  Under normal conditions, a checkpoint will free
+ * no more than 2*CheckPointSegments log segments, and we want to recycle all
+ * of them; the +1 allows boundary cases to happen without wasting a
+ * delete/create-segment cycle.
  */
-int			CheckPointSegments;
+#define XLOGfileslop	(2*CheckPointSegments + 1)
 
-/* Estimated distance between checkpoints, in bytes */
-static double CheckPointDistanceEstimate = 0;
-static double PrevCheckPointDistance = 0;
 
 /*
  * GUC support
@@ -776,7 +778,7 @@ static void AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic);
 static bool XLogCheckpointNeeded(XLogSegNo new_segno);
 static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible);
 static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
-					   bool find_free, XLogSegNo max_segno,
+					   bool find_free, int *max_advance,
 					   bool use_lock);
 static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 			 int source, bool notexistOk);
@@ -789,7 +791,7 @@ static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
-static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr);
+static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
 static void CleanupBackupHistory(void);
@@ -1956,104 +1958,6 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 }
 
 /*
- * Calculate CheckPointSegments based on max_wal_size and
- * checkpoint_completion_target.
- */
-static void
-CalculateCheckpointSegments(void)
-{
-	double		target;
-
-	/*-------
-	 * Calculate the distance at which to trigger a checkpoint, to avoid
-	 * exceeding max_wal_size. This is based on two assumptions:
-	 *
-	 * a) we keep WAL for two checkpoint cycles, back to the "prev" checkpoint.
-	 * b) during checkpoint, we consume checkpoint_completion_target *
-	 *    number of segments consumed between checkpoints.
-	 *-------
-	 */
-	target = (double ) max_wal_size / (2.0 + CheckPointCompletionTarget);
-
-	/* round down */
-	CheckPointSegments = (int) target;
-
-	if (CheckPointSegments < 1)
-		CheckPointSegments = 1;
-}
-
-void
-assign_max_wal_size(int newval, void *extra)
-{
-	max_wal_size = newval;
-	CalculateCheckpointSegments();
-}
-
-void
-assign_checkpoint_completion_target(double newval, void *extra)
-{
-	CheckPointCompletionTarget = newval;
-	CalculateCheckpointSegments();
-}
-
-/*
- * At a checkpoint, how many WAL segments to recycle as preallocated future
- * XLOG segments? Returns the highest segment that should be preallocated.
- */
-static XLogSegNo
-XLOGfileslop(XLogRecPtr PriorRedoPtr)
-{
-	XLogSegNo	minSegNo;
-	XLogSegNo	maxSegNo;
-	double		distance;
-	XLogSegNo	recycleSegNo;
-
-	/*
-	 * Calculate the segment numbers that min_wal_size and max_wal_size
-	 * correspond to. Always recycle enough segments to meet the minimum, and
-	 * remove enough segments to stay below the maximum.
-	 */
-	minSegNo = PriorRedoPtr / XLOG_SEG_SIZE + min_wal_size - 1;
-	maxSegNo =  PriorRedoPtr / XLOG_SEG_SIZE + max_wal_size - 1;
-
-	/*
-	 * Between those limits, recycle enough segments to get us through to the
-	 * estimated end of next checkpoint.
-	 *
-	 * To estimate where the next checkpoint will finish, assume that the
-	 * system runs steadily consuming CheckPointDistanceEstimate
-	 * bytes between every checkpoint.
-	 *
-	 * The reason this calculation is done from the prior checkpoint, not the
-	 * one that just finished, is that this behaves better if some checkpoint
-	 * cycles are abnormally short, like if you perform a manual checkpoint
-	 * right after a timed one. The manual checkpoint will make almost a full
-	 * cycle's worth of WAL segments available for recycling, because the
-	 * segments from the prior's prior, fully-sized checkpoint cycle are no
-	 * longer needed. However, the next checkpoint will make only few segments
-	 * available for recycling, the ones generated between the timed
-	 * checkpoint and the manual one right after that. If at the manual
-	 * checkpoint we only retained enough segments to get us to the next timed
-	 * one, and removed the rest, then at the next checkpoint we would not
-	 * have enough segments around for recycling, to get us to the checkpoint
-	 * after that. Basing the calculations on the distance from the prior redo
-	 * pointer largely fixes that problem.
-	 */
-	distance = (2.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate;
-	/* add 10% for good measure. */
-	distance *= 1.10;
-
-	recycleSegNo = (XLogSegNo) ceil(((double) PriorRedoPtr + distance) / XLOG_SEG_SIZE);
-
-	if (recycleSegNo < minSegNo)
-		recycleSegNo = minSegNo;
-	if (recycleSegNo > maxSegNo)
-		recycleSegNo = maxSegNo;
-
-	return recycleSegNo;
-}
-
-/*
  * Check whether we've consumed enough xlog space that a checkpoint is needed.
  *
  * new_segno indicates a log file that has just been filled up (or read
@@ -2860,7 +2764,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	char		zbuffer_raw[XLOG_BLCKSZ + MAXIMUM_ALIGNOF];
 	char	   *zbuffer;
 	XLogSegNo	installed_segno;
-	XLogSegNo	max_segno;
+	int			max_advance;
 	int			fd;
 	int			nbytes;
 
@@ -2963,19 +2867,9 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	 * pre-create a future log segment.
 	 */
 	installed_segno = logsegno;
-
-	/*
-	 * XXX: What should we use as max_segno? We used to use XLOGfileslop when
-	 * that was a constant, but that was always a bit dubious: normally, at a
-	 * checkpoint, XLOGfileslop was the offset from the checkpoint record,
-	 * but here, it was the offset from the insert location. We can't do the
-	 * normal XLOGfileslop calculation here because we don't have access to
-	 * the prior checkpoint's redo location. So somewhat arbitrarily, just
-	 * use CheckPointSegments.
-	 */
-	max_segno = logsegno + CheckPointSegments;
+	max_advance = XLOGfileslop;
 	if (!InstallXLogFileSegment(&installed_segno, tmppath,
-								*use_existent, max_segno,
+								*use_existent, &max_advance,
 								use_lock))
 	{
 		/*
@@ -3116,7 +3010,7 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
 	/*
 	 * Now move the segment into place with its final name.
 	 */
-	if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, false))
+	if (!InstallXLogFileSegment(&destsegno, tmppath, false, NULL, false))
 		elog(ERROR, "InstallXLogFileSegment should not have failed");
 }
 
@@ -3136,21 +3030,22 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
  * number at or after the passed numbers.  If FALSE, install the new segment
  * exactly where specified, deleting any existing segment file there.
  *
- * max_segno: maximum segment number to install the new file as.  Fail if no
- * free slot is found between *segno and max_segno. (Ignored when find_free
- * is FALSE.)
+ * *max_advance: maximum number of segno slots to advance past the starting
+ * point.  Fail if no free slot is found in this range.  On return, reduced
+ * by the number of slots skipped over.  (Irrelevant, and may be NULL,
+ * when find_free is FALSE.)
  *
  * use_lock: if TRUE, acquire ControlFileLock while moving file into
  * place.  This should be TRUE except during bootstrap log creation.  The
  * caller must *not* hold the lock at call.
  *
  * Returns TRUE if the file was installed successfully.  FALSE indicates that
- * max_segno limit was exceeded, or an error occurred while renaming the
+ * max_advance limit was exceeded, or an error occurred while renaming the
  * file into place.
  */
 static bool
 InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
-					   bool find_free, XLogSegNo max_segno,
+					   bool find_free, int *max_advance,
 					   bool use_lock)
 {
 	char		path[MAXPGPATH];
@@ -3174,7 +3069,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 		/* Find a free slot to put it in */
 		while (stat(path, &stat_buf) == 0)
 		{
-			if ((*segno) >= max_segno)
+			if (*max_advance <= 0)
 			{
 				/* Failed to find a free slot within specified range */
 				if (use_lock)
@@ -3182,6 +3077,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 				return false;
 			}
 			(*segno)++;
+			(*max_advance)--;
 			XLogFilePath(path, ThisTimeLineID, *segno);
 		}
 	}
@@ -3529,15 +3425,14 @@ UpdateLastRemovedPtr(char *filename)
 /*
  * Recycle or remove all log files older or equal to passed segno
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog; this is used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
  */
 static void
-RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
 {
 	XLogSegNo	endlogSegNo;
-	XLogSegNo	recycleSegNo;
+	int			max_advance;
 	DIR		   *xldir;
 	struct dirent *xlde;
 	char		lastoff[MAXFNAMELEN];
@@ -3549,10 +3444,11 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	struct stat statbuf;
 
 	/*
-	 * Initialize info about where to try to recycle to.
+	 * Initialize info about where to try to recycle to.  We allow recycling
+	 * segments up to XLOGfileslop segments beyond the current XLOG location.
 	 */
 	XLByteToPrevSeg(endptr, endlogSegNo);
-	recycleSegNo = XLOGfileslop(PriorRedoPtr);
+	max_advance = XLOGfileslop;
 
 	xldir = AllocateDir(XLOGDIR);
 	if (xldir == NULL)
@@ -3601,17 +3497,20 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 				 * for example can create symbolic links pointing to a
 				 * separate archive directory.
 				 */
-				if (endlogSegNo <= recycleSegNo &&
-					lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
+				if (lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
 					InstallXLogFileSegment(&endlogSegNo, path,
-										   true, recycleSegNo, true))
+										   true, &max_advance, true))
 				{
 					ereport(DEBUG2,
 							(errmsg("recycled transaction log file \"%s\"",
 									xlde->d_name)));
 					CheckpointStats.ckpt_segs_recycled++;
 					/* Needn't recheck that slot on future iterations */
-					endlogSegNo++;
+					if (max_advance > 0)
+					{
+						endlogSegNo++;
+						max_advance--;
+					}
 				}
 				else
 				{
@@ -7694,8 +7593,7 @@ LogCheckpointEnd(bool restartpoint)
 	elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
 		 "%d transaction log file(s) added, %d removed, %d recycled; "
 		 "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
-		 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
-		 "distance=%d kB, estimate=%d kB",
+		 "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s",
 		 restartpoint ? "restartpoint" : "checkpoint",
 		 CheckpointStats.ckpt_bufs_written,
 		 (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
@@ -7707,48 +7605,7 @@ LogCheckpointEnd(bool restartpoint)
 		 total_secs, total_usecs / 1000,
 		 CheckpointStats.ckpt_sync_rels,
 		 longest_secs, longest_usecs / 1000,
-		 average_secs, average_usecs / 1000,
-		 (int) (PrevCheckPointDistance / 1024.0),
-		 (int) (CheckPointDistanceEstimate / 1024.0));
-}
-
-/*
- * Update the estimate of distance between checkpoints.
- *
- * The estimate is used to calculate the number of WAL segments to keep
- * preallocated, see XLOGFileSlop().
- */
-static void
-UpdateCheckPointDistanceEstimate(uint64 nbytes)
-{
-	/*
-	 * To estimate the number of segments consumed between checkpoints, keep
-	 * a moving average of the amount of WAL generated in previous checkpoint
-	 * cycles. However, if the load is bursty, with quiet periods and busy
-	 * periods, we want to cater for the peak load. So instead of a plain
-	 * moving average, let the average decline slowly if the previous cycle
-	 * used less WAL than estimated, but bump it up immediately if it used
-	 * more.
-	 *
-	 * When checkpoints are triggered by max_wal_size, this should converge to
-	 * CheckpointSegments * XLOG_SEG_SIZE,
-	 *
-	 * Note: This doesn't pay any attention to what caused the checkpoint.
-	 * Checkpoints triggered manually with CHECKPOINT command, or by e.g.
-	 * starting a base backup, are counted the same as those created
-	 * automatically. The slow-decline will largely mask them out, if they are
-	 * not frequent. If they are frequent, it seems reasonable to count them
-	 * in as any others; if you issue a manual checkpoint every 5 minutes and
-	 * never let a timed checkpoint happen, it makes sense to base the
-	 * preallocation on that 5 minute interval rather than whatever
-	 * checkpoint_timeout is set to.
-	 */
-	PrevCheckPointDistance = nbytes;
-	if (CheckPointDistanceEstimate < nbytes)
-		CheckPointDistanceEstimate = nbytes;
-	else
-		CheckPointDistanceEstimate =
-			(0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes);
+		 average_secs, average_usecs / 1000);
 }
 
 /*
@@ -7788,7 +7645,7 @@ CreateCheckPoint(int flags)
 	XLogRecPtr	recptr;
 	XLogCtlInsert *Insert = &XLogCtl->Insert;
 	uint32		freespace;
-	XLogRecPtr	PriorRedoPtr;
+	XLogSegNo	_logSegNo;
 	XLogRecPtr	curInsert;
 	VirtualTransactionId *vxids;
 	int			nvxids;
@@ -8103,10 +7960,10 @@ CreateCheckPoint(int flags)
 				(errmsg("concurrent transaction log activity while database system is shutting down")));
 
 	/*
-	 * Remember the prior checkpoint's redo pointer, used later to determine
-	 * the point where the log can be truncated.
+	 * Select point at which we can truncate the log, which we base on the
+	 * prior checkpoint's earliest info.
 	 */
-	PriorRedoPtr = ControlFile->checkPointCopy.redo;
+	XLByteToSeg(ControlFile->checkPointCopy.redo, _logSegNo);
 
 	/*
 	 * Update the control file.
@@ -8161,17 +8018,11 @@ CreateCheckPoint(int flags)
 	 * Delete old log files (those no longer needed even for previous
 	 * checkpoint or the standbys in XLOG streaming).
 	 */
-	if (PriorRedoPtr != InvalidXLogRecPtr)
+	if (_logSegNo)
 	{
-		XLogSegNo	_logSegNo;
-
-		/* Update the average distance between checkpoints. */
-		UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
-
-		XLByteToSeg(PriorRedoPtr, _logSegNo);
 		KeepLogSeg(recptr, &_logSegNo);
 		_logSegNo--;
-		RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, recptr);
+		RemoveOldXlogFiles(_logSegNo, recptr);
 	}
 
 	/*
@@ -8339,7 +8190,7 @@ CreateRestartPoint(int flags)
 {
 	XLogRecPtr	lastCheckPointRecPtr;
 	CheckPoint	lastCheckPoint;
-	XLogRecPtr	PriorRedoPtr;
+	XLogSegNo	_logSegNo;
 	TimestampTz xtime;
 
 	/*
@@ -8404,14 +8255,14 @@ CreateRestartPoint(int flags)
 	/*
 	 * Update the shared RedoRecPtr so that the startup process can calculate
 	 * the number of segments replayed since last restartpoint, and request a
-	 * restartpoint if it exceeds CheckPointSegments.
+	 * restartpoint if it exceeds checkpoint_segments.
 	 *
 	 * Like in CreateCheckPoint(), hold off insertions to update it, although
 	 * during recovery this is just pro forma, because no WAL insertions are
 	 * happening.
 	 */
 	WALInsertLockAcquireExclusive();
-	RedoRecPtr = XLogCtl->Insert.RedoRecPtr = lastCheckPoint.redo;
+	XLogCtl->Insert.RedoRecPtr = lastCheckPoint.redo;
 	WALInsertLockRelease();
 
 	/* Also update the info_lck-protected copy */
@@ -8435,10 +8286,10 @@ CreateRestartPoint(int flags)
 	CheckPointGuts(lastCheckPoint.redo, flags);
 
 	/*
-	 * Remember the prior checkpoint's redo pointer, used later to determine
-	 * the point at which we can truncate the log.
+	 * Select point at which we can truncate the xlog, which we base on the
+	 * prior checkpoint's earliest info.
 	 */
-	PriorRedoPtr = ControlFile->checkPointCopy.redo;
+	XLByteToSeg(ControlFile->checkPointCopy.redo, _logSegNo);
 
 	/*
 	 * Update pg_control, using current time.  Check that it still shows
@@ -8465,18 +8316,12 @@ CreateRestartPoint(int flags)
 	 * checkpoint/restartpoint) to prevent the disk holding the xlog from
 	 * growing full.
 	 */
-	if (PriorRedoPtr != InvalidXLogRecPtr)
+	if (_logSegNo)
 	{
 		XLogRecPtr	receivePtr;
 		XLogRecPtr	replayPtr;
 		TimeLineID	replayTLI;
 		XLogRecPtr	endptr;
-		XLogSegNo	_logSegNo;
-
-		/* Update the average distance between checkpoints/restartpoints. */
-		UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
-
-		XLByteToSeg(PriorRedoPtr, _logSegNo);
 
 		/*
 		 * Get the current end of xlog replayed or received, whichever is
@@ -8505,7 +8350,7 @@ CreateRestartPoint(int flags)
 		if (RecoveryInProgress())
 			ThisTimeLineID = replayTLI;
 
-		RemoveOldXlogFiles(_logSegNo, PriorRedoPtr, endptr);
+		RemoveOldXlogFiles(_logSegNo, endptr);
 
 		/*
 		 * Make more log segments if needed.  (Do this after recycling old log
@@ -10495,8 +10340,8 @@ static bool
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 							bool fetching_ckpt, XLogRecPtr tliRecPtr)
 {
-	static TimestampTz	last_fail_time = 0;
-	TimestampTz	now;
+	static pg_time_t last_fail_time = 0;
+	pg_time_t	now;
 
 	/*-------
 	 * Standby mode is implemented by a state machine:
@@ -10506,7 +10351,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * 2. Check trigger file
 	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
 	 * 4. Rescan timelines
-	 * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.
+	 * 5. Sleep 5 seconds, and loop back to 1.
 	 *
 	 * Failure to read from the current source advances the state machine to
 	 * the next state.
@@ -10645,25 +10490,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					 * machine, so we've exhausted all the options for
 					 * obtaining the requested WAL. We're going to loop back
 					 * and retry from the archive, but if it hasn't been long
-					 * since last attempt, sleep wal_retrieve_retry_interval
-					 * milliseconds to avoid busy-waiting.
+					 * since last attempt, sleep 5 seconds to avoid
+					 * busy-waiting.
 					 */
-					now = GetCurrentTimestamp();
-					if (!TimestampDifferenceExceeds(last_fail_time, now,
-													wal_retrieve_retry_interval))
+					now = (pg_time_t) time(NULL);
+					if ((now - last_fail_time) < 5)
 					{
-						long		secs, wait_time;
-						int			usecs;
-
-						TimestampDifference(last_fail_time, now, &secs, &usecs);
-						wait_time = wal_retrieve_retry_interval -
-							(secs * 1000 + usecs / 1000);
-
-						WaitLatch(&XLogCtl->recoveryWakeupLatch,
-								  WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
-								  wait_time);
-						ResetLatch(&XLogCtl->recoveryWakeupLatch);
-						now = GetCurrentTimestamp();
+						pg_usleep(1000000L * (5 - (now - last_fail_time)));
+						now = (pg_time_t) time(NULL);
 					}
 					last_fail_time = now;
 					currentSource = XLOG_FROM_ARCHIVE;
@@ -10819,11 +10653,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					}
 
 					/*
-					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly.
+					 * Wait for more WAL to arrive. Time out after 5 seconds,
+					 * like when polling the archive, to react to a trigger
+					 * file promptly.
 					 */
 					WaitLatch(&XLogCtl->recoveryWakeupLatch,
-							  WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+							  WL_LATCH_SET | WL_TIMEOUT,
 							  5000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 					break;
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index fdb1f7f..56fa1aa 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -107,7 +107,7 @@ static int num_columns_read = 0;
 %type <list>  boot_index_params
 %type <ielem> boot_index_param
 %type <str>   boot_const boot_ident
-%type <ival>  optbootstrap optsharedrelation optwithoutoids boot_column_nullness
+%type <ival>  optbootstrap optsharedrelation optwithoutoids
 %type <oidval> oidspec optoideq optrowtypeoid
 
 %token <str> CONST_P ID
@@ -115,7 +115,6 @@ static int num_columns_read = 0;
 %token XDECLARE INDEX ON USING XBUILD INDICES UNIQUE XTOAST
 %token COMMA EQUALS LPAREN RPAREN
 %token OBJ_ID XBOOTSTRAP XSHARED_RELATION XWITHOUT_OIDS XROWTYPE_OID NULLVAL
-%token XFORCE XNOT XNULL
 
 %start TopLevel
 
@@ -304,9 +303,7 @@ Boot_DeclareIndexStmt:
 					stmt->isconstraint = false;
 					stmt->deferrable = false;
 					stmt->initdeferred = false;
-					stmt->transformed = false;
 					stmt->concurrent = false;
-					stmt->if_not_exists = false;
 
 					/* locks and races need not concern us in bootstrap mode */
 					relationId = RangeVarGetRelid(stmt->relation, NoLock,
@@ -347,9 +344,7 @@ Boot_DeclareUniqueIndexStmt:
 					stmt->isconstraint = false;
 					stmt->deferrable = false;
 					stmt->initdeferred = false;
-					stmt->transformed = false;
 					stmt->concurrent = false;
-					stmt->if_not_exists = false;
 
 					/* locks and races need not concern us in bootstrap mode */
 					relationId = RangeVarGetRelid(stmt->relation, NoLock,
@@ -432,20 +427,14 @@ boot_column_list:
 		;
 
 boot_column_def:
-		  boot_ident EQUALS boot_ident boot_column_nullness
+		  boot_ident EQUALS boot_ident
 				{
 				   if (++numattr > MAXATTR)
 						elog(FATAL, "too many columns");
-				   DefineAttr($1, $3, numattr-1, $4);
+				   DefineAttr($1, $3, numattr-1);
 				}
 		;
 
-boot_column_nullness:
-			XFORCE XNOT XNULL	{ $$ = BOOTCOL_NULL_FORCE_NOT_NULL; }
-		|	XFORCE XNULL		{  $$ = BOOTCOL_NULL_FORCE_NULL; }
-		| { $$ = BOOTCOL_NULL_AUTO; }
-		;
-
 oidspec:
 			boot_ident							{ $$ = atooid($1); }
 		;
diff --git a/src/backend/bootstrap/bootscanner.l b/src/backend/bootstrap/bootscanner.l
index 72714f4..fa4e2ff 100644
--- a/src/backend/bootstrap/bootscanner.l
+++ b/src/backend/bootstrap/bootscanner.l
@@ -109,9 +109,6 @@ insert			{ return(INSERT_TUPLE); }
 "on"			{ return(ON); }
 "using"			{ return(USING); }
 "toast"			{ return(XTOAST); }
-"FORCE"			{ return(XFORCE); }
-"NOT"			{ return(XNOT); }
-"NULL"			{ return(XNULL); }
 
 {arrayid}		{
 					yylval.str = MapArrayTypeName(yytext);
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index ad49964..bc66eac 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -642,7 +642,7 @@ closerel(char *name)
  * ----------------
  */
 void
-DefineAttr(char *name, char *type, int attnum, int nullness)
+DefineAttr(char *name, char *type, int attnum)
 {
 	Oid			typeoid;
 
@@ -697,44 +697,30 @@ DefineAttr(char *name, char *type, int attnum, int nullness)
 	attrtypes[attnum]->atttypmod = -1;
 	attrtypes[attnum]->attislocal = true;
 
-	if (nullness == BOOTCOL_NULL_FORCE_NOT_NULL)
-	{
-		attrtypes[attnum]->attnotnull = true;
-	}
-	else if (nullness == BOOTCOL_NULL_FORCE_NULL)
-	{
-		attrtypes[attnum]->attnotnull = false;
-	}
-	else
-	{
-		Assert(nullness == BOOTCOL_NULL_AUTO);
-
-		/*
-		 * Mark as "not null" if type is fixed-width and prior columns are
-		 * too.  This corresponds to case where column can be accessed
-		 * directly via C struct declaration.
-		 *
-		 * oidvector and int2vector are also treated as not-nullable, even
-		 * though they are no longer fixed-width.
-		 */
+	/*
+	 * Mark as "not null" if type is fixed-width and prior columns are too.
+	 * This corresponds to case where column can be accessed directly via C
+	 * struct declaration.
+	 *
+	 * oidvector and int2vector are also treated as not-nullable, even though
+	 * they are no longer fixed-width.
+	 */
 #define MARKNOTNULL(att) \
-		((att)->attlen > 0 || \
-		 (att)->atttypid == OIDVECTOROID || \
-		 (att)->atttypid == INT2VECTOROID)
+	((att)->attlen > 0 || \
+	 (att)->atttypid == OIDVECTOROID || \
+	 (att)->atttypid == INT2VECTOROID)
 
-		if (MARKNOTNULL(attrtypes[attnum]))
-		{
-			int			i;
+	if (MARKNOTNULL(attrtypes[attnum]))
+	{
+		int			i;
 
-			/* check earlier attributes */
-			for (i = 0; i < attnum; i++)
-			{
-				if (!attrtypes[i]->attnotnull)
-					break;
-			}
-			if (i == attnum)
-				attrtypes[attnum]->attnotnull = true;
+		for (i = 0; i < attnum; i++)
+		{
+			if (!MARKNOTNULL(attrtypes[i]))
+				break;
 		}
+		if (i == attnum)
+			attrtypes[attnum]->attnotnull = true;
 	}
 }
 
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index c7b1c17..c773eca 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -161,8 +161,7 @@ sub Catalogs
 				}
 				else
 				{
-					my %row;
-					my ($atttype, $attname, $attopt) = split /\s+/, $_;
+					my ($atttype, $attname) = split /\s+/, $_;
 					die "parse error ($input_file)" unless $attname;
 					if (exists $RENAME_ATTTYPE{$atttype})
 					{
@@ -173,26 +172,7 @@ sub Catalogs
 						$attname = $1;
 						$atttype .= '[]';            # variable-length only
 					}
-
-					$row{'type'} = $atttype;
-					$row{'name'} = $attname;
-
-					if (defined $attopt)
-					{
-						if ($attopt eq 'PG_FORCE_NULL')
-						{
-							$row{'forcenull'} = 1;
-						}
-						elsif ($attopt eq 'BKI_FORCE_NOT_NULL')
-						{
-							$row{'forcenotnull'} = 1;
-						}
-						else
-						{
-							die "unknown column option $attopt on column $attname"
-						}
-					}
-					push @{ $catalog{columns} }, \%row;
+					push @{ $catalog{columns} }, { $attname => $atttype };
 				}
 			}
 		}
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index a5c78ee..e1c7fe5 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -118,36 +118,17 @@ foreach my $catname (@{ $catalogs->{names} })
 
 	my %bki_attr;
 	my @attnames;
-	my $first = 1;
-
-	print BKI " (\n";
 	foreach my $column (@{ $catalog->{columns} })
 	{
-		my $attname = $column->{name};
-		my $atttype = $column->{type};
-		$bki_attr{$attname} = $column;
+		my ($attname, $atttype) = %$column;
+		$bki_attr{$attname} = $atttype;
 		push @attnames, $attname;
-
-		if (!$first)
-		{
-			print BKI " ,\n";
-		}
-		$first = 0;
-
-		print BKI " $attname = $atttype";
-
-		if (defined $column->{forcenotnull})
-		{
-			print BKI " FORCE NOT NULL";
-		}
-		elsif (defined $column->{forcenull})
-		{
-			print BKI " FORCE NULL";
-		}
 	}
+	print BKI " (\n";
+	print BKI join " ,\n", map(" $_ = $bki_attr{$_}", @attnames);
 	print BKI "\n )\n";
 
-	# open it, unless bootstrap case (create bootstrap does this automatically)
+   # open it, unless bootstrap case (create bootstrap does this automatically)
 	if ($catalog->{bootstrap} eq '')
 	{
 		print BKI "open $catname\n";
@@ -229,7 +210,7 @@ foreach my $catname (@{ $catalogs->{names} })
 				# Store schemapg entries for later.
 				$row =
 				  emit_schemapg_row($row,
-					grep { $bki_attr{$_}{type} eq 'bool' } @attnames);
+					grep { $bki_attr{$_} eq 'bool' } @attnames);
 				push @{ $schemapg_entries{$table_name} }, '{ '
 				  . join(
 					', ',             grep { defined $_ }
@@ -242,13 +223,13 @@ foreach my $catname (@{ $catalogs->{names} })
 			{
 				$attnum = 0;
 				my @SYS_ATTRS = (
-					{ name => 'ctid', type => 'tid' },
-					{ name => 'oid', type => 'oid' },
-					{ name => 'xmin', type => 'xid' },
-					{ name => 'cmin', type=> 'cid' },
-					{ name => 'xmax', type=> 'xid' },
-					{ name => 'cmax', type => 'cid' },
-					{ name => 'tableoid', type => 'oid' });
+					{ ctid     => 'tid' },
+					{ oid      => 'oid' },
+					{ xmin     => 'xid' },
+					{ cmin     => 'cid' },
+					{ xmax     => 'xid' },
+					{ cmax     => 'cid' },
+					{ tableoid => 'oid' });
 				foreach my $attr (@SYS_ATTRS)
 				{
 					$attnum--;
@@ -345,8 +326,7 @@ exit 0;
 sub emit_pgattr_row
 {
 	my ($table_name, $attr, $priornotnull) = @_;
-	my $attname = $attr->{name};
-	my $atttype = $attr->{type};
+	my ($attname, $atttype) = %$attr;
 	my %row;
 
 	$row{attrelid} = $catalogs->{$table_name}->{relation_oid};
@@ -374,20 +354,11 @@ sub emit_pgattr_row
 			$row{attndims} = $type->{typcategory} eq 'A' ? '1' : '0';
 			$row{attcollation} = $type->{typcollation};
 
-			if (defined $attr->{forcenotnull})
-			{
-				$row{attnotnull} = 't';
-			}
-			elsif (defined $attr->{forcenull})
-			{
-				$row{attnotnull} = 'f';
-			}
-			elsif ($priornotnull)
+			# attnotnull must be set true if the type is fixed-width and
+			# prior columns are too --- compare DefineAttr in bootstrap.c.
+			# oidvector and int2vector are also treated as not-nullable.
+			if ($priornotnull)
 			{
-				# attnotnull will automatically be set if the type is
-				# fixed-width and prior columns are all NOT NULL ---
-				# compare DefineAttr in bootstrap.c. oidvector and
-				# int2vector are also treated as not-nullable.
 				$row{attnotnull} =
 				    $type->{typname} eq 'oidvector'   ? 't'
 				  : $type->{typname} eq 'int2vector'  ? 't'
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 1af977c..bfb4fdc 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -261,9 +261,9 @@ RangeVarGetRelidExtended(const RangeVar *relation, LOCKMODE lockmode,
 	 * with the answer changing under them, or that they already hold some
 	 * appropriate lock, and therefore return the first answer we get without
 	 * checking for invalidation messages.  Also, if the requested lock is
-	 * already held, LockRelationOid will not AcceptInvalidationMessages, so
-	 * we may fail to notice a change.  We could protect against that case by
-	 * calling AcceptInvalidationMessages() before beginning this loop, but
+	 * already held, LockRelationOid will not AcceptInvalidationMessages,
+	 * so we may fail to notice a change.  We could protect against that case
+	 * by calling AcceptInvalidationMessages() before beginning this loop, but
 	 * that would add a significant amount overhead, so for now we don't.
 	 */
 	for (;;)
@@ -1075,8 +1075,8 @@ FuncnameGetCandidates(List *names, int nargs, List *argnames,
 		 */
 		effective_nargs = Max(pronargs, nargs);
 		newResult = (FuncCandidateList)
-			palloc(offsetof(struct _FuncCandidateList, args) +
-				   effective_nargs * sizeof(Oid));
+			palloc(sizeof(struct _FuncCandidateList) - sizeof(Oid)
+				   + effective_nargs * sizeof(Oid));
 		newResult->pathpos = pathpos;
 		newResult->oid = HeapTupleGetOid(proctup);
 		newResult->nargs = effective_nargs;
@@ -1597,8 +1597,7 @@ OpernameGetCandidates(List *names, char oprkind, bool missing_schema_ok)
 	 * separate palloc for each operator, but profiling revealed that the
 	 * pallocs used an unreasonably large fraction of parsing time.
 	 */
-#define SPACE_PER_OP MAXALIGN(offsetof(struct _FuncCandidateList, args) + \
-							  2 * sizeof(Oid))
+#define SPACE_PER_OP MAXALIGN(sizeof(struct _FuncCandidateList) + sizeof(Oid))
 
 	if (catlist->n_members > 0)
 		resultSpace = palloc(catlist->n_members * SPACE_PER_OP);
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index d899dd7..825d8b2 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -3415,7 +3415,6 @@ getObjectIdentityParts(const ObjectAddress *object,
 			{
 				HeapTuple	conTup;
 				Form_pg_conversion conForm;
-				char	   *schema;
 
 				conTup = SearchSysCache1(CONVOID,
 										 ObjectIdGetDatum(object->objectId));
@@ -3423,13 +3422,10 @@ getObjectIdentityParts(const ObjectAddress *object,
 					elog(ERROR, "cache lookup failed for conversion %u",
 						 object->objectId);
 				conForm = (Form_pg_conversion) GETSTRUCT(conTup);
-				schema = get_namespace_name(conForm->connamespace);
 				appendStringInfoString(&buffer,
-								quote_qualified_identifier(schema,
-														   NameStr(conForm->conname)));
+								quote_identifier(NameStr(conForm->conname)));
 				if (objname)
 					*objname = list_make1(pstrdup(NameStr(conForm->conname)));
-				pfree(schema);
 				ReleaseSysCache(conTup);
 				break;
 			}
@@ -3533,7 +3529,7 @@ getObjectIdentityParts(const ObjectAddress *object,
 				appendStringInfoString(&buffer,
 									   quote_qualified_identifier(schema,
 												 NameStr(opcForm->opcname)));
-				appendStringInfo(&buffer, " USING %s",
+				appendStringInfo(&buffer, " for %s",
 								 quote_identifier(NameStr(amForm->amname)));
 				if (objname)
 				{
@@ -4070,7 +4066,7 @@ getOpFamilyIdentity(StringInfo buffer, Oid opfid, List **objname, List **objargs
 	amForm = (Form_pg_am) GETSTRUCT(amTup);
 
 	schema = get_namespace_name(opfForm->opfnamespace);
-	appendStringInfo(buffer, "%s USING %s",
+	appendStringInfo(buffer, "%s for %s",
 					 quote_qualified_identifier(schema,
 												NameStr(opfForm->opfname)),
 					 NameStr(amForm->amname));
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index a1efddb..e73252c 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -447,7 +447,7 @@ needs_toast_table(Relation rel)
 		return false;			/* nothing to toast? */
 	if (maxlength_unknown)
 		return true;			/* any unlimited-length attrs? */
-	tuple_length = MAXALIGN(SizeofHeapTupleHeader +
+	tuple_length = MAXALIGN(offsetof(HeapTupleHeaderData, t_bits) +
 							BITMAPLEN(tupdesc->natts)) +
 		MAXALIGN(data_length);
 	return (tuple_length > TOAST_TUPLE_THRESHOLD);
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 2826b7e..d73248c 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -237,8 +237,8 @@ typedef struct AsyncQueueControl
 	QueuePosition tail;			/* the global tail is equivalent to the tail
 								 * of the "slowest" backend */
 	TimestampTz lastQueueFillWarn;		/* time of last queue-full msg */
-	QueueBackendStatus backend[FLEXIBLE_ARRAY_MEMBER];
-	/* backend[0] is not used; used entries are from [1] to [MaxBackends] */
+	QueueBackendStatus backend[1];		/* actually of length MaxBackends+1 */
+	/* DO NOT ADD FURTHER STRUCT MEMBERS HERE */
 } AsyncQueueControl;
 
 static AsyncQueueControl *asyncQueueControl;
@@ -303,7 +303,7 @@ typedef enum
 typedef struct
 {
 	ListenActionKind action;
-	char		channel[FLEXIBLE_ARRAY_MEMBER]; /* nul-terminated string */
+	char		channel[1];		/* actually, as long as needed */
 } ListenAction;
 
 static List *pendingActions = NIL;		/* list of ListenAction */
@@ -417,8 +417,8 @@ AsyncShmemSize(void)
 	Size		size;
 
 	/* This had better match AsyncShmemInit */
-	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
-	size = add_size(size, offsetof(AsyncQueueControl, backend));
+	size = mul_size(MaxBackends, sizeof(QueueBackendStatus));
+	size = add_size(size, sizeof(AsyncQueueControl));
 
 	size = add_size(size, SimpleLruShmemSize(NUM_ASYNC_BUFFERS, 0));
 
@@ -438,11 +438,12 @@ AsyncShmemInit(void)
 	/*
 	 * Create or attach to the AsyncQueueControl structure.
 	 *
-	 * The used entries in the backend[] array run from 1 to MaxBackends; the
-	 * zero'th entry is unused but must be allocated.
+	 * The used entries in the backend[] array run from 1 to MaxBackends.
+	 * sizeof(AsyncQueueControl) already includes space for the unused zero'th
+	 * entry, but we need to add on space for the used entries.
 	 */
-	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
-	size = add_size(size, offsetof(AsyncQueueControl, backend));
+	size = mul_size(MaxBackends, sizeof(QueueBackendStatus));
+	size = add_size(size, sizeof(AsyncQueueControl));
 
 	asyncQueueControl = (AsyncQueueControl *)
 		ShmemInitStruct("Async Queue Control", size, &found);
@@ -604,8 +605,7 @@ queue_listen(ListenActionKind action, const char *channel)
 	oldcontext = MemoryContextSwitchTo(CurTransactionContext);
 
 	/* space for terminating null is included in sizeof(ListenAction) */
-	actrec = (ListenAction *) palloc(offsetof(ListenAction, channel) +
-									 strlen(channel) + 1);
+	actrec = (ListenAction *) palloc(sizeof(ListenAction) + strlen(channel));
 	actrec->action = action;
 	strcpy(actrec->channel, channel);
 
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index deeb8dc..a33a5ad 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -267,12 +267,8 @@ check_ddl_tag(const char *tag)
 		pg_strcasecmp(tag, "REFRESH MATERIALIZED VIEW") == 0 ||
 		pg_strcasecmp(tag, "ALTER DEFAULT PRIVILEGES") == 0 ||
 		pg_strcasecmp(tag, "ALTER LARGE OBJECT") == 0 ||
-		pg_strcasecmp(tag, "COMMENT") == 0 ||
-		pg_strcasecmp(tag, "GRANT") == 0 ||
-		pg_strcasecmp(tag, "REVOKE") == 0 ||
 		pg_strcasecmp(tag, "DROP OWNED") == 0 ||
-		pg_strcasecmp(tag, "IMPORT FOREIGN SCHEMA") == 0 ||
-		pg_strcasecmp(tag, "SECURITY LABEL") == 0)
+		pg_strcasecmp(tag, "IMPORT FOREIGN SCHEMA") == 0)
 		return EVENT_TRIGGER_COMMAND_TAG_OK;
 
 	/*
@@ -326,8 +322,7 @@ validate_table_rewrite_tags(const char *filtervar, List *taglist)
 static event_trigger_command_tag_check_result
 check_table_rewrite_ddl_tag(const char *tag)
 {
-	if (pg_strcasecmp(tag, "ALTER TABLE") == 0 ||
-		pg_strcasecmp(tag, "ALTER TYPE") == 0)
+	if (pg_strcasecmp(tag, "ALTER TABLE") == 0)
 		return EVENT_TRIGGER_COMMAND_TAG_OK;
 
 	return EVENT_TRIGGER_COMMAND_TAG_NOT_SUPPORTED;
@@ -1154,34 +1149,6 @@ EventTriggerSupportsObjectClass(ObjectClass objclass)
 	return true;
 }
 
-bool
-EventTriggerSupportsGrantObjectType(GrantObjectType objtype)
-{
-	switch (objtype)
-	{
-		case ACL_OBJECT_DATABASE:
-		case ACL_OBJECT_TABLESPACE:
-			/* no support for global objects */
-			return false;
-
-		case ACL_OBJECT_COLUMN:
-		case ACL_OBJECT_RELATION:
-		case ACL_OBJECT_SEQUENCE:
-		case ACL_OBJECT_DOMAIN:
-		case ACL_OBJECT_FDW:
-		case ACL_OBJECT_FOREIGN_SERVER:
-		case ACL_OBJECT_FUNCTION:
-		case ACL_OBJECT_LANGUAGE:
-		case ACL_OBJECT_LARGEOBJECT:
-		case ACL_OBJECT_NAMESPACE:
-		case ACL_OBJECT_TYPE:
-			return true;
-		default:
-			Assert(false);
-			return true;
-	}
-}
-
 /*
  * Prepare event trigger state for a new complete query to run, if necessary;
  * returns whether this was done.  If it was, EventTriggerEndCompleteQuery must
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a951c55..0b8de3f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -736,8 +736,9 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 										((Scan *) plan)->scanrelid);
 			break;
 		case T_ModifyTable:
+			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
-									((ModifyTable *) plan)->nominalRelation);
+					  linitial_int(((ModifyTable *) plan)->resultRelations));
 			break;
 		default:
 			break;
@@ -1072,9 +1073,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
@@ -2191,7 +2195,16 @@ ExplainScanTarget(Scan *plan, ExplainState *es)
 static void
 ExplainModifyTarget(ModifyTable *plan, ExplainState *es)
 {
-	ExplainTargetRel((Plan *) plan, plan->nominalRelation, es);
+	Index		rti;
+
+	/*
+	 * We show the name of the first target relation.  In multi-target-table
+	 * cases this should always be the parent of the inheritance tree.
+	 */
+	Assert(plan->resultRelations != NIL);
+	rti = linitial_int(plan->resultRelations);
+
+	ExplainTargetRel((Plan *) plan, rti, es);
 }
 
 /*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index fb33d30..71b08f0 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -383,9 +383,10 @@ EvaluateParams(PreparedStatement *pstmt, List *params,
 	/* Prepare the expressions for execution */
 	exprstates = (List *) ExecPrepareExpr((Expr *) params, estate);
 
+	/* sizeof(ParamListInfoData) includes the first array element */
 	paramLI = (ParamListInfo)
-		palloc(offsetof(ParamListInfoData, params) +
-			   num_params * sizeof(ParamExternData));
+		palloc(sizeof(ParamListInfoData) +
+			   (num_params - 1) * sizeof(ParamExternData));
 	/* we have static list of params, so no hooks needed */
 	paramLI->paramFetch = NULL;
 	paramLI->paramFetchArg = NULL;
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 0070c4f..622ccf7 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -17,7 +17,6 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/xact.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
@@ -359,10 +358,6 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
 	tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
 	ItemPointerSet(&tuple->t_data->t_ctid, 0, FirstOffsetNumber);
 
-	/* check the comment above nextval_internal()'s equivalent call. */
-	if (RelationNeedsWAL(rel))
-		GetTopTransactionId();
-
 	START_CRIT_SECTION();
 
 	MarkBufferDirty(buf);
@@ -443,10 +438,6 @@ AlterSequence(AlterSeqStmt *stmt)
 	/* Note that we do not change the currval() state */
 	elm->cached = elm->last;
 
-	/* check the comment above nextval_internal()'s equivalent call. */
-	if (RelationNeedsWAL(seqrel))
-		GetTopTransactionId();
-
 	/* Now okay to update the on-disk tuple */
 	START_CRIT_SECTION();
 
@@ -688,16 +679,6 @@ nextval_internal(Oid relid)
 
 	last_used_seq = elm;
 
-	/*
-	 * If something needs to be WAL logged, acquire an xid, so this
-	 * transaction's commit will trigger a WAL flush and wait for
-	 * syncrep. It's sufficient to ensure the toplevel transaction has a xid,
-	 * no need to assign xids subxacts, that'll already trigger a appropriate
-	 * wait.  (Have to do that here, so we're outside the critical section)
-	 */
-	if (logit && RelationNeedsWAL(seqrel))
-		GetTopTransactionId();
-
 	/* ready to change the on-disk (or really, in-buffer) tuple */
 	START_CRIT_SECTION();
 
@@ -886,10 +867,6 @@ do_setval(Oid relid, int64 next, bool iscalled)
 	/* In any case, forget any future cached numbers */
 	elm->cached = elm->last;
 
-	/* check the comment above nextval_internal()'s equivalent call. */
-	if (RelationNeedsWAL(seqrel))
-		GetTopTransactionId();
-
 	/* ready to change the on-disk (or really, in-buffer) tuple */
 	START_CRIT_SECTION();
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7455020..66d5083 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -71,7 +71,6 @@
 #include "parser/parse_type.h"
 #include "parser/parse_utilcmd.h"
 #include "parser/parser.h"
-#include "pgstat.h"
 #include "rewrite/rewriteDefine.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
@@ -325,7 +324,7 @@ static void ATTypedTableRecursion(List **wqueue, Relation rel, AlterTableCmd *cm
 static List *find_typed_table_dependencies(Oid typeOid, const char *typeName,
 							  DropBehavior behavior);
 static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recursing,
-				bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
+				AlterTableCmd *cmd, LOCKMODE lockmode);
 static void ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 				ColumnDef *colDef, bool isOid,
 				bool recurse, bool recursing, LOCKMODE lockmode);
@@ -1221,8 +1220,6 @@ ExecuteTruncate(TruncateStmt *stmt)
 			 */
 			reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST);
 		}
-
-		pgstat_count_truncate(rel);
 	}
 
 	/*
@@ -3085,16 +3082,14 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 		case AT_AddColumn:		/* ADD COLUMN */
 			ATSimplePermissions(rel,
 						 ATT_TABLE | ATT_COMPOSITE_TYPE | ATT_FOREIGN_TABLE);
-			ATPrepAddColumn(wqueue, rel, recurse, recursing, false, cmd,
-							lockmode);
+			ATPrepAddColumn(wqueue, rel, recurse, recursing, cmd, lockmode);
 			/* Recursion occurs during execution phase */
 			pass = AT_PASS_ADD_COL;
 			break;
 		case AT_AddColumnToView:		/* add column via CREATE OR REPLACE
 										 * VIEW */
 			ATSimplePermissions(rel, ATT_VIEW);
-			ATPrepAddColumn(wqueue, rel, recurse, recursing, true, cmd,
-							lockmode);
+			ATPrepAddColumn(wqueue, rel, recurse, recursing, cmd, lockmode);
 			/* Recursion occurs during execution phase */
 			pass = AT_PASS_ADD_COL;
 			break;
@@ -4578,7 +4573,7 @@ check_of_type(HeapTuple typetuple)
  */
 static void
 ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recursing,
-				bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode)
+				AlterTableCmd *cmd, LOCKMODE lockmode)
 {
 	if (rel->rd_rel->reloftype && !recursing)
 		ereport(ERROR,
@@ -4588,7 +4583,7 @@ ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recursing,
 	if (rel->rd_rel->relkind == RELKIND_COMPOSITE_TYPE)
 		ATTypedTableRecursion(wqueue, rel, cmd, lockmode);
 
-	if (recurse && !is_view)
+	if (recurse)
 		cmd->subtype = AT_AddColumnRecurse;
 }
 
@@ -4824,7 +4819,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	{
 		defval = (Expr *) build_column_default(rel, attribute.attnum);
 
-		if (!defval && DomainHasConstraints(typeOid))
+		if (!defval && GetDomainConstraints(typeOid) != NIL)
 		{
 			Oid			baseTypeId;
 			int32		baseTypeMod;
@@ -5028,7 +5023,7 @@ ATPrepAddOids(List **wqueue, Relation rel, bool recurse, AlterTableCmd *cmd, LOC
 		cdef->location = -1;
 		cmd->def = (Node *) cdef;
 	}
-	ATPrepAddColumn(wqueue, rel, recurse, false, false, cmd, lockmode);
+	ATPrepAddColumn(wqueue, rel, recurse, false, cmd, lockmode);
 
 	if (recurse)
 		cmd->subtype = AT_AddOidsRecurse;
@@ -5707,9 +5702,6 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 	Assert(IsA(stmt, IndexStmt));
 	Assert(!stmt->concurrent);
 
-	/* The IndexStmt has already been through transformIndexStmt */
-	Assert(stmt->transformed);
-
 	/* suppress schema rights check when rebuilding existing index */
 	check_rights = !is_rebuild;
 	/* skip index build if phase 3 will do it or we're reusing an old one */
@@ -5717,6 +5709,8 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 	/* suppress notices when rebuilding existing index */
 	quiet = is_rebuild;
 
+	/* The IndexStmt has already been through transformIndexStmt */
+
 	new_index = DefineIndex(RelationGetRelid(rel),
 							stmt,
 							InvalidOid, /* no predefined OID */
@@ -7778,7 +7772,7 @@ ATColumnChangeRequiresRewrite(Node *expr, AttrNumber varattno)
 		{
 			CoerceToDomain *d = (CoerceToDomain *) expr;
 
-			if (DomainHasConstraints(d->resulttype))
+			if (GetDomainConstraints(d->resulttype) != NIL)
 				return true;
 			expr = (Node *) d->arg;
 		}
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 03cc8fe..e098b9f 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1088,7 +1088,7 @@ GetDefaultTablespace(char relpersistence)
 typedef struct
 {
 	int			numSpcs;
-	Oid			tblSpcs[FLEXIBLE_ARRAY_MEMBER];
+	Oid			tblSpcs[1];		/* VARIABLE LENGTH ARRAY */
 } temp_tablespaces_extra;
 
 /* check_hook: validate new temp_tablespaces */
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index a84e86e..5c1c1be 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3005,7 +3005,7 @@ typedef struct SetConstraintStateData
 	bool		all_isdeferred;
 	int			numstates;		/* number of trigstates[] entries in use */
 	int			numalloc;		/* allocated size of trigstates[] */
-	SetConstraintTriggerData trigstates[FLEXIBLE_ARRAY_MEMBER];
+	SetConstraintTriggerData trigstates[1];		/* VARIABLE LENGTH ARRAY */
 } SetConstraintStateData;
 
 typedef SetConstraintStateData *SetConstraintState;
@@ -4398,8 +4398,8 @@ SetConstraintStateCreate(int numalloc)
 	 */
 	state = (SetConstraintState)
 		MemoryContextAllocZero(TopTransactionContext,
-							   offsetof(SetConstraintStateData, trigstates) +
-						   numalloc * sizeof(SetConstraintTriggerData));
+							   sizeof(SetConstraintStateData) +
+						   (numalloc - 1) *sizeof(SetConstraintTriggerData));
 
 	state->numalloc = numalloc;
 
@@ -4440,8 +4440,8 @@ SetConstraintStateAddItem(SetConstraintState state,
 		newalloc = Max(newalloc, 8);	/* in case original has size 0 */
 		state = (SetConstraintState)
 			repalloc(state,
-					 offsetof(SetConstraintStateData, trigstates) +
-					 newalloc * sizeof(SetConstraintTriggerData));
+					 sizeof(SetConstraintStateData) +
+					 (newalloc - 1) *sizeof(SetConstraintTriggerData));
 		state->numalloc = newalloc;
 		Assert(state->numstates < state->numalloc);
 	}
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 60ab3aa..b77e1b4 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -31,11 +31,15 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/xact.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/catalog.h"
+#include "catalog/dependency.h"
 #include "catalog/heap.h"
+#include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_collation.h"
@@ -55,12 +59,14 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "optimizer/planner.h"
 #include "optimizer/var.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
 #include "parser/parse_func.h"
 #include "parser/parse_type.h"
+#include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
@@ -69,6 +75,7 @@
 #include "utils/ruleutils.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/tqual.h"
 
 
 /* result structure for get_rels_with_domain() */
@@ -3074,6 +3081,126 @@ domainAddConstraint(Oid domainOid, Oid domainNamespace, Oid baseTypeOid,
 	return ccbin;
 }
 
+/*
+ * GetDomainConstraints - get a list of the current constraints of domain
+ *
+ * Returns a possibly-empty list of DomainConstraintState nodes.
+ *
+ * This is called by the executor during plan startup for a CoerceToDomain
+ * expression node.  The given constraints will be checked for each value
+ * passed through the node.
+ *
+ * We allow this to be called for non-domain types, in which case the result
+ * is always NIL.
+ */
+List *
+GetDomainConstraints(Oid typeOid)
+{
+	List	   *result = NIL;
+	bool		notNull = false;
+	Relation	conRel;
+
+	conRel = heap_open(ConstraintRelationId, AccessShareLock);
+
+	for (;;)
+	{
+		HeapTuple	tup;
+		HeapTuple	conTup;
+		Form_pg_type typTup;
+		ScanKeyData key[1];
+		SysScanDesc scan;
+
+		tup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typeOid));
+		if (!HeapTupleIsValid(tup))
+			elog(ERROR, "cache lookup failed for type %u", typeOid);
+		typTup = (Form_pg_type) GETSTRUCT(tup);
+
+		if (typTup->typtype != TYPTYPE_DOMAIN)
+		{
+			/* Not a domain, so done */
+			ReleaseSysCache(tup);
+			break;
+		}
+
+		/* Test for NOT NULL Constraint */
+		if (typTup->typnotnull)
+			notNull = true;
+
+		/* Look for CHECK Constraints on this domain */
+		ScanKeyInit(&key[0],
+					Anum_pg_constraint_contypid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(typeOid));
+
+		scan = systable_beginscan(conRel, ConstraintTypidIndexId, true,
+								  NULL, 1, key);
+
+		while (HeapTupleIsValid(conTup = systable_getnext(scan)))
+		{
+			Form_pg_constraint c = (Form_pg_constraint) GETSTRUCT(conTup);
+			Datum		val;
+			bool		isNull;
+			Expr	   *check_expr;
+			DomainConstraintState *r;
+
+			/* Ignore non-CHECK constraints (presently, shouldn't be any) */
+			if (c->contype != CONSTRAINT_CHECK)
+				continue;
+
+			/*
+			 * Not expecting conbin to be NULL, but we'll test for it anyway
+			 */
+			val = fastgetattr(conTup, Anum_pg_constraint_conbin,
+							  conRel->rd_att, &isNull);
+			if (isNull)
+				elog(ERROR, "domain \"%s\" constraint \"%s\" has NULL conbin",
+					 NameStr(typTup->typname), NameStr(c->conname));
+
+			check_expr = (Expr *) stringToNode(TextDatumGetCString(val));
+
+			/* ExecInitExpr assumes we've planned the expression */
+			check_expr = expression_planner(check_expr);
+
+			r = makeNode(DomainConstraintState);
+			r->constrainttype = DOM_CONSTRAINT_CHECK;
+			r->name = pstrdup(NameStr(c->conname));
+			r->check_expr = ExecInitExpr(check_expr, NULL);
+
+			/*
+			 * use lcons() here because constraints of lower domains should be
+			 * applied earlier.
+			 */
+			result = lcons(r, result);
+		}
+
+		systable_endscan(scan);
+
+		/* loop to next domain in stack */
+		typeOid = typTup->typbasetype;
+		ReleaseSysCache(tup);
+	}
+
+	heap_close(conRel, AccessShareLock);
+
+	/*
+	 * Only need to add one NOT NULL check regardless of how many domains in
+	 * the stack request it.
+	 */
+	if (notNull)
+	{
+		DomainConstraintState *r = makeNode(DomainConstraintState);
+
+		r->constrainttype = DOM_CONSTRAINT_NOTNULL;
+		r->name = pstrdup("NOT NULL");
+		r->check_expr = NULL;
+
+		/* lcons to apply the nullness check FIRST */
+		result = lcons(r, result);
+	}
+
+	return result;
+}
+
 
 /*
  * Execute ALTER TYPE RENAME
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index d94fe58..0e7400f 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -41,6 +41,7 @@
 #include "access/tupconvert.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_type.h"
+#include "commands/typecmds.h"
 #include "executor/execdebug.h"
 #include "executor/nodeSubplan.h"
 #include "funcapi.h"
@@ -251,6 +252,12 @@ static Datum ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext,
  *
  * NOTE: if we get a NULL result from a subscript expression, we return NULL
  * when it's an array reference, or raise an error when it's an assignment.
+ *
+ * NOTE: we deliberately refrain from applying DatumGetArrayTypeP() here,
+ * even though that might seem natural, because this code needs to support
+ * both varlena arrays and fixed-length array types.  DatumGetArrayTypeP()
+ * only works for the varlena kind.  The routines we call in arrayfuncs.c
+ * have to know the difference (that's what they need refattrlength for).
  *----------
  */
 static Datum
@@ -260,7 +267,8 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 				 ExprDoneCond *isDone)
 {
 	ArrayRef   *arrayRef = (ArrayRef *) astate->xprstate.expr;
-	Datum		array_source;
+	ArrayType  *array_source;
+	ArrayType  *resultArray;
 	bool		isAssignment = (arrayRef->refassgnexpr != NULL);
 	bool		eisnull;
 	ListCell   *l;
@@ -270,10 +278,11 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 				lower;
 	int		   *lIndex;
 
-	array_source = ExecEvalExpr(astate->refexpr,
-								econtext,
-								isNull,
-								isDone);
+	array_source = (ArrayType *)
+		DatumGetPointer(ExecEvalExpr(astate->refexpr,
+									 econtext,
+									 isNull,
+									 isDone));
 
 	/*
 	 * If refexpr yields NULL, and it's a fetch, then result is NULL. In the
@@ -381,24 +390,23 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 			}
 			else if (lIndex == NULL)
 			{
-				econtext->caseValue_datum =
-					array_get_element(array_source, i,
-									  upper.indx,
-									  astate->refattrlength,
-									  astate->refelemlength,
-									  astate->refelembyval,
-									  astate->refelemalign,
-									  &econtext->caseValue_isNull);
+				econtext->caseValue_datum = array_ref(array_source, i,
+													  upper.indx,
+													  astate->refattrlength,
+													  astate->refelemlength,
+													  astate->refelembyval,
+													  astate->refelemalign,
+												&econtext->caseValue_isNull);
 			}
 			else
 			{
-				econtext->caseValue_datum =
-					array_get_slice(array_source, i,
-									upper.indx, lower.indx,
-									astate->refattrlength,
-									astate->refelemlength,
-									astate->refelembyval,
-									astate->refelemalign);
+				resultArray = array_get_slice(array_source, i,
+											  upper.indx, lower.indx,
+											  astate->refattrlength,
+											  astate->refelemlength,
+											  astate->refelembyval,
+											  astate->refelemalign);
+				econtext->caseValue_datum = PointerGetDatum(resultArray);
 				econtext->caseValue_isNull = false;
 			}
 		}
@@ -427,7 +435,7 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 		 */
 		if (astate->refattrlength > 0)	/* fixed-length array? */
 			if (eisnull || *isNull)
-				return array_source;
+				return PointerGetDatum(array_source);
 
 		/*
 		 * For assignment to varlena arrays, we handle a NULL original array
@@ -437,45 +445,48 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 		 */
 		if (*isNull)
 		{
-			array_source = PointerGetDatum(construct_empty_array(arrayRef->refelemtype));
+			array_source = construct_empty_array(arrayRef->refelemtype);
 			*isNull = false;
 		}
 
 		if (lIndex == NULL)
-			return array_set_element(array_source, i,
-									 upper.indx,
-									 sourceData,
-									 eisnull,
-									 astate->refattrlength,
-									 astate->refelemlength,
-									 astate->refelembyval,
-									 astate->refelemalign);
+			resultArray = array_set(array_source, i,
+									upper.indx,
+									sourceData,
+									eisnull,
+									astate->refattrlength,
+									astate->refelemlength,
+									astate->refelembyval,
+									astate->refelemalign);
 		else
-			return array_set_slice(array_source, i,
-								   upper.indx, lower.indx,
-								   sourceData,
-								   eisnull,
-								   astate->refattrlength,
-								   astate->refelemlength,
-								   astate->refelembyval,
-								   astate->refelemalign);
+			resultArray = array_set_slice(array_source, i,
+										  upper.indx, lower.indx,
+								   (ArrayType *) DatumGetPointer(sourceData),
+										  eisnull,
+										  astate->refattrlength,
+										  astate->refelemlength,
+										  astate->refelembyval,
+										  astate->refelemalign);
+		return PointerGetDatum(resultArray);
 	}
 
 	if (lIndex == NULL)
-		return array_get_element(array_source, i,
-								 upper.indx,
-								 astate->refattrlength,
-								 astate->refelemlength,
-								 astate->refelembyval,
-								 astate->refelemalign,
-								 isNull);
+		return array_ref(array_source, i, upper.indx,
+						 astate->refattrlength,
+						 astate->refelemlength,
+						 astate->refelembyval,
+						 astate->refelemalign,
+						 isNull);
 	else
-		return array_get_slice(array_source, i,
-							   upper.indx, lower.indx,
-							   astate->refattrlength,
-							   astate->refelemlength,
-							   astate->refelembyval,
-							   astate->refelemalign);
+	{
+		resultArray = array_get_slice(array_source, i,
+									  upper.indx, lower.indx,
+									  astate->refattrlength,
+									  astate->refelemlength,
+									  astate->refelembyval,
+									  astate->refelemalign);
+		return PointerGetDatum(resultArray);
+	}
 }
 
 /*
@@ -889,9 +900,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 	 * If we can't locate the RTE, assume the column names we've got are OK.
 	 * (As of this writing, the only cases where we can't locate the RTE are
 	 * in execution of trigger WHEN clauses, and then the Var will have the
-	 * trigger's relation's rowtype, so its names are fine.)  Also, if the
-	 * creator of the RTE didn't bother to fill in an eref field, assume our
-	 * column names are OK.  (This happens in COPY, and perhaps other places.)
+	 * trigger's relation's rowtype, so its names are fine.)
 	 */
 	if (econtext->ecxt_estate &&
 		variable->varno <= list_length(econtext->ecxt_estate->es_range_table))
@@ -899,8 +908,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 		RangeTblEntry *rte = rt_fetch(variable->varno,
 									  econtext->ecxt_estate->es_range_table);
 
-		if (rte->eref)
-			ExecTypeSetColNames(output_tupdesc, rte->eref->colnames);
+		ExecTypeSetColNames(output_tupdesc, rte->eref->colnames);
 	}
 
 	/* Bless the tupdesc if needed, and save it in the execution state */
@@ -3928,10 +3936,7 @@ ExecEvalCoerceToDomain(CoerceToDomainState *cstate, ExprContext *econtext,
 	if (isDone && *isDone == ExprEndResult)
 		return result;			/* nothing to check */
 
-	/* Make sure we have up-to-date constraints */
-	UpdateDomainConstraintRef(cstate->constraint_ref);
-
-	foreach(l, cstate->constraint_ref->constraints)
+	foreach(l, cstate->constraints)
 	{
 		DomainConstraintState *con = (DomainConstraintState *) lfirst(l);
 
@@ -5052,12 +5057,7 @@ ExecInitExpr(Expr *node, PlanState *parent)
 
 				cstate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalCoerceToDomain;
 				cstate->arg = ExecInitExpr(ctest->arg, parent);
-				/* We spend an extra palloc to reduce header inclusions */
-				cstate->constraint_ref = (DomainConstraintRef *)
-					palloc(sizeof(DomainConstraintRef));
-				InitDomainConstraintRef(ctest->resulttype,
-										cstate->constraint_ref,
-										CurrentMemoryContext);
+				cstate->constraints = GetDomainConstraints(ctest->resulttype);
 				state = (ExprState *) cstate;
 			}
 			break;
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..2f18a8a 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0 &&
+			 (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+		varno = INDEX_VAR;
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 6c3eff7..84be37c 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -896,9 +896,9 @@ postquel_sub_params(SQLFunctionCachePtr fcache,
 
 		if (fcache->paramLI == NULL)
 		{
-			paramLI = (ParamListInfo)
-				palloc(offsetof(ParamListInfoData, params) +
-					   nargs * sizeof(ParamExternData));
+			/* sizeof(ParamListInfoData) includes the first array element */
+			paramLI = (ParamListInfo) palloc(sizeof(ParamListInfoData) +
+									  (nargs - 1) * sizeof(ParamExternData));
 			/* we have static list of params, so no hooks needed */
 			paramLI->paramFetch = NULL;
 			paramLI->paramFetchArg = NULL;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 9ff0eff..8079d97 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -297,9 +297,9 @@ typedef struct AggHashEntryData *AggHashEntry;
 typedef struct AggHashEntryData
 {
 	TupleHashEntryData shared;	/* common header for hash table entries */
-	/* per-aggregate transition status array */
-	AggStatePerGroupData pergroup[FLEXIBLE_ARRAY_MEMBER];
-}	AggHashEntryData;
+	/* per-aggregate transition status array - must be last! */
+	AggStatePerGroupData pergroup[1];	/* VARIABLE LENGTH ARRAY */
+}	AggHashEntryData;	/* VARIABLE LENGTH STRUCT */
 
 
 static void initialize_aggregates(AggState *aggstate,
@@ -941,8 +941,8 @@ build_hash_table(AggState *aggstate)
 	Assert(node->aggstrategy == AGG_HASHED);
 	Assert(node->numGroups > 0);
 
-	entrysize = offsetof(AggHashEntryData, pergroup) +
-		aggstate->numaggs * sizeof(AggStatePerGroupData);
+	entrysize = sizeof(AggHashEntryData) +
+		(aggstate->numaggs - 1) * sizeof(AggStatePerGroupData);
 
 	aggstate->hashtable = BuildTupleHashTable(node->numCols,
 											  node->grpColIdx,
@@ -1013,8 +1013,8 @@ hash_agg_entry_size(int numAggs)
 	Size		entrysize;
 
 	/* This must match build_hash_table */
-	entrysize = offsetof(AggHashEntryData, pergroup) +
-		numAggs * sizeof(AggStatePerGroupData);
+	entrysize = sizeof(AggHashEntryData) +
+		(numAggs - 1) * sizeof(AggStatePerGroupData);
 	entrysize = MAXALIGN(entrysize);
 	/* Account for hashtable overhead (assuming fill factor = 1) */
 	entrysize += 3 * sizeof(void *);
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..2344129 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
+	Index				scan_relid = cscan->scan.scanrelid;
 	Relation			scan_rel;
 
 	/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,31 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +109,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..542d176 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
+	Index		scanrelid = node->scan.scanrelid;
 	Relation	currentRelation;
 	FdwRoutine *fdwroutine;
 
@@ -141,16 +142,28 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +174,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +206,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index b1f6c82..abd70b3 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -439,7 +439,7 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
 	 * don't count palloc overhead either.
 	 */
 	tupsize = HJTUPLE_OVERHEAD +
-		MAXALIGN(SizeofMinimalTupleHeader) +
+		MAXALIGN(sizeof(MinimalTupleData)) +
 		MAXALIGN(tupwidth);
 	inner_rel_bytes = ntuples * tupsize;
 
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index 9eb4d63..f3ce1d7 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -262,7 +262,7 @@ ExecScanSubPlan(SubPlanState *node,
 	/* Initialize ArrayBuildStateAny in caller's context, if needed */
 	if (subLinkType == ARRAY_SUBLINK)
 		astate = initArrayResultAny(subplan->firstColType,
-									CurrentMemoryContext, true);
+									CurrentMemoryContext);
 
 	/*
 	 * We are probably in a short-lived expression-evaluation context. Switch
@@ -964,7 +964,7 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext)
 	/* Initialize ArrayBuildStateAny in caller's context, if needed */
 	if (subLinkType == ARRAY_SUBLINK)
 		astate = initArrayResultAny(subplan->firstColType,
-									CurrentMemoryContext, true);
+									CurrentMemoryContext);
 
 	/*
 	 * Must switch to per-query memory context.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index b3c0502..4b86e91 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2290,8 +2290,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
 	{
 		int			i;
 
-		paramLI = (ParamListInfo) palloc(offsetof(ParamListInfoData, params) +
-										 nargs * sizeof(ParamExternData));
+		/* sizeof(ParamListInfoData) includes the first array element */
+		paramLI = (ParamListInfo) palloc(sizeof(ParamListInfoData) +
+									  (nargs - 1) * sizeof(ParamExternData));
 		/* we have static list of params, so no hooks needed */
 		paramLI->paramFetch = NULL;
 		paramLI->paramFetchArg = NULL;
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..df69a95 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -302,13 +302,12 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +349,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,6 +408,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+	return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
 
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/lib/pairingheap.c b/src/backend/lib/pairingheap.c
index 17278fd..213c9d3 100644
--- a/src/backend/lib/pairingheap.c
+++ b/src/backend/lib/pairingheap.c
@@ -70,10 +70,6 @@ pairingheap_free(pairingheap *heap)
  *
  * The subheap with smaller value is put as a child of the other one (assuming
  * a max-heap).
- *
- * The next_sibling and prev_or_parent pointers of the input nodes are
- * ignored. On return, the returned node's next_sibling and prev_or_parent
- * pointers are garbage.
  */
 static pairingheap_node *
 merge(pairingheap *heap, pairingheap_node *a, pairingheap_node *b)
@@ -115,8 +111,6 @@ pairingheap_add(pairingheap *heap, pairingheap_node *node)
 
 	/* Link the new node as a new tree */
 	heap->ph_root = merge(heap, heap->ph_root, node);
-	heap->ph_root->prev_or_parent = NULL;
-	heap->ph_root->next_sibling = NULL;
 }
 
 /*
@@ -154,11 +148,6 @@ pairingheap_remove_first(pairingheap *heap)
 	children = result->first_child;
 
 	heap->ph_root = merge_children(heap, children);
-	if (heap->ph_root)
-	{
-		heap->ph_root->prev_or_parent = NULL;
-		heap->ph_root->next_sibling = NULL;
-	}
 
 	return result;
 }
@@ -283,51 +272,3 @@ merge_children(pairingheap *heap, pairingheap_node *children)
 
 	return newroot;
 }
-
-/*
- * A debug function to dump the contents of the heap as a string.
- *
- * The 'dumpfunc' callback appends a string representation of a single node
- * to the StringInfo. 'opaque' can be used to pass more information to the
- * callback.
- */
-#ifdef PAIRINGHEAP_DEBUG
-static void
-pairingheap_dump_recurse(StringInfo buf,
-						 pairingheap_node *node,
-						 void (*dumpfunc) (pairingheap_node *node, StringInfo buf, void *opaque),
-						 void *opaque,
-						 int depth,
-						 pairingheap_node *prev_or_parent)
-{
-	while (node)
-	{
-		Assert(node->prev_or_parent == prev_or_parent);
-
-		appendStringInfoSpaces(buf, depth * 4);
-		dumpfunc(node, buf, opaque);
-		appendStringInfoString(buf, "\n");
-		if (node->first_child)
-			pairingheap_dump_recurse(buf, node->first_child, dumpfunc, opaque, depth + 1, node);
-		prev_or_parent = node;
-		node = node->next_sibling;
-	}
-}
-
-char *
-pairingheap_dump(pairingheap *heap,
-				 void (*dumpfunc) (pairingheap_node *node, StringInfo buf, void *opaque),
-				 void *opaque)
-{
-	StringInfoData buf;
-
-	if (!heap->ph_root)
-		return pstrdup("(empty)");
-
-	initStringInfo(&buf);
-
-	pairingheap_dump_recurse(&buf, heap->ph_root, dumpfunc, opaque, 0, NULL);
-
-	return buf.data;
-}
-#endif
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 28b050a..346f808 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -2172,7 +2172,7 @@ typedef struct
 {
 	uint8		attribute;
 	uint8		length;
-	uint8		data[FLEXIBLE_ARRAY_MEMBER];
+	uint8		data[1];
 } radius_attribute;
 
 typedef struct
@@ -2220,6 +2220,7 @@ radius_add_attribute(radius_packet *packet, uint8 type, const unsigned char *dat
 			 "Adding attribute code %d with length %d to radius packet would create oversize packet, ignoring",
 			 type, len);
 		return;
+
 	}
 
 	attr = (radius_attribute *) ((unsigned char *) packet + packet->length);
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index b06f987..d5f9712 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -511,11 +511,14 @@ be_tls_close(Port *port)
  *	Read data from a secure connection.
  */
 ssize_t
-be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
+be_tls_read(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;
 	int			err;
+	int			waitfor;
+	int			latchret;
 
+rloop:
 	errno = 0;
 	n = SSL_read(port->ssl, ptr, len);
 	err = SSL_get_error(port->ssl, n);
@@ -525,15 +528,39 @@ be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
 			port->count += n;
 			break;
 		case SSL_ERROR_WANT_READ:
-			*waitfor = WL_SOCKET_READABLE;
-			errno = EWOULDBLOCK;
-			n = -1;
-			break;
 		case SSL_ERROR_WANT_WRITE:
-			*waitfor = WL_SOCKET_WRITEABLE;
-			errno = EWOULDBLOCK;
-			n = -1;
-			break;
+			/* Don't retry if the socket is in nonblocking mode. */
+			if (port->noblock)
+			{
+				errno = EWOULDBLOCK;
+				n = -1;
+				break;
+			}
+
+			waitfor = WL_LATCH_SET;
+
+			if (err == SSL_ERROR_WANT_READ)
+				waitfor |= WL_SOCKET_READABLE;
+			else
+				waitfor |= WL_SOCKET_WRITEABLE;
+
+			latchret = WaitLatchOrSocket(MyLatch, waitfor, port->sock, 0);
+
+			/*
+			 * We'll, among other situations, get here if the low level
+			 * routine doing the actual recv() via the socket got interrupted
+			 * by a signal. That's so we can handle interrupts once outside
+			 * openssl, so we don't jump out from underneath its covers. We
+			 * can check this both, when reading and writing, because even
+			 * when writing that's just openssl's doing, not a 'proper' write
+			 * initiated by postgres.
+			 */
+			if (latchret & WL_LATCH_SET)
+			{
+				ResetLatch(MyLatch);
+				ProcessClientReadInterrupt(true);  /* preserves errno */
+			}
+			goto rloop;
 		case SSL_ERROR_SYSCALL:
 			/* leave it to caller to ereport the value of errno */
 			if (n != -1)
@@ -568,10 +595,12 @@ be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
  *	Write data to a secure connection.
  */
 ssize_t
-be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
+be_tls_write(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;
 	int			err;
+	int			waitfor;
+	int			latchret;
 
 	/*
 	 * If SSL renegotiations are enabled and we're getting close to the
@@ -595,16 +624,36 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 		 */
 		SSL_clear_num_renegotiations(port->ssl);
 
-		/* without this, renegotiation fails when a client cert is used */
 		SSL_set_session_id_context(port->ssl, (void *) &SSL_context,
 								   sizeof(SSL_context));
-
 		if (SSL_renegotiate(port->ssl) <= 0)
 			ereport(COMMERROR,
 					(errcode(ERRCODE_PROTOCOL_VIOLATION),
 					 errmsg("SSL failure during renegotiation start")));
+		else
+		{
+			int			retries;
+
+			/*
+			 * A handshake can fail, so be prepared to retry it, but only
+			 * a few times.
+			 */
+			for (retries = 0;; retries++)
+			{
+				if (SSL_do_handshake(port->ssl) > 0)
+					break;	/* done */
+				ereport(COMMERROR,
+						(errcode(ERRCODE_PROTOCOL_VIOLATION),
+						 errmsg("SSL handshake failure on renegotiation, retrying")));
+				if (retries >= 20)
+					ereport(FATAL,
+							(errcode(ERRCODE_PROTOCOL_VIOLATION),
+							 errmsg("could not complete SSL handshake on renegotiation, too many failures")));
+			}
+		}
 	}
 
+wloop:
 	errno = 0;
 	n = SSL_write(port->ssl, ptr, len);
 	err = SSL_get_error(port->ssl, n);
@@ -614,15 +663,30 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 			port->count += n;
 			break;
 		case SSL_ERROR_WANT_READ:
-			*waitfor = WL_SOCKET_READABLE;
-			errno = EWOULDBLOCK;
-			n = -1;
-			break;
 		case SSL_ERROR_WANT_WRITE:
-			*waitfor = WL_SOCKET_WRITEABLE;
-			errno = EWOULDBLOCK;
-			n = -1;
-			break;
+
+			waitfor = WL_LATCH_SET;
+
+			if (err == SSL_ERROR_WANT_READ)
+				waitfor |= WL_SOCKET_READABLE;
+			else
+				waitfor |= WL_SOCKET_WRITEABLE;
+
+			latchret = WaitLatchOrSocket(MyLatch, waitfor, port->sock, 0);
+
+			/*
+			 * Check for interrupts here, in addition to secure_write(),
+			 * because an interrupted write in secure_raw_write() will return
+			 * here, and we cannot return to secure_write() until we've
+			 * written something.
+			 */
+			if (latchret & WL_LATCH_SET)
+			{
+				ResetLatch(MyLatch);
+				ProcessClientWriteInterrupt(true); /* preserves errno */
+			}
+
+			goto wloop;
 		case SSL_ERROR_SYSCALL:
 			/* leave it to caller to ereport the value of errno */
 			if (n != -1)
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 4e7acbe..c2c1842 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -127,45 +127,30 @@ ssize_t
 secure_read(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;
-	int			waitfor;
 
 retry:
 #ifdef USE_SSL
-	waitfor = 0;
 	if (port->ssl_in_use)
 	{
-		n = be_tls_read(port, ptr, len, &waitfor);
+		n = be_tls_read(port, ptr, len);
 	}
 	else
 #endif
 	{
 		n = secure_raw_read(port, ptr, len);
-		waitfor = WL_SOCKET_READABLE;
 	}
 
-	/* In blocking mode, wait until the socket is ready */
-	if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN))
+	/* retry after processing interrupts */
+	if (n < 0 && errno == EINTR)
 	{
-		int		w;
-
-		Assert(waitfor);
-
-		w = WaitLatchOrSocket(MyLatch,
-							  WL_LATCH_SET | waitfor,
-							  port->sock, 0);
-
-		/* Handle interrupt. */
-		if (w & WL_LATCH_SET)
-		{
-			ResetLatch(MyLatch);
-			ProcessClientReadInterrupt(true);
-
-			/*
-			 * We'll retry the read. Most likely it will return immediately
-			 * because there's still no data available, and we'll wait
-			 * for the socket to become ready again.
-			 */
-		}
+		/*
+		 * We tried to read data, the socket was empty, and we were
+		 * interrupted while waiting for readability. We only process
+		 * interrupts if we got interrupted while reading and when in blocking
+		 * mode. In other cases it's better to allow the interrupts to be
+		 * handled at higher layers.
+		 */
+		ProcessClientReadInterrupt(!port->noblock); /* preserves errno */
 		goto retry;
 	}
 
@@ -188,6 +173,7 @@ secure_raw_read(Port *port, void *ptr, size_t len)
 	 * Try to read from the socket without blocking. If it succeeds we're
 	 * done, otherwise we'll wait for the socket using the latch mechanism.
 	 */
+rloop:
 #ifdef WIN32
 	pgwin32_noblock = true;
 #endif
@@ -196,6 +182,37 @@ secure_raw_read(Port *port, void *ptr, size_t len)
 	pgwin32_noblock = false;
 #endif
 
+	if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN))
+	{
+		int		w;
+		int		save_errno = errno;
+
+		w = WaitLatchOrSocket(MyLatch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE,
+							  port->sock, 0);
+
+		if (w & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			/*
+			 * Force a return, so interrupts can be processed when not
+			 * (possibly) underneath a ssl library.
+			 */
+			errno = EINTR;
+			return -1;
+		}
+		else if (w & WL_SOCKET_READABLE)
+		{
+			goto rloop;
+		}
+
+		/*
+		 * Restore errno, clobbered by WaitLatchOrSocket, so the caller can
+		 * react properly.
+		 */
+		errno = save_errno;
+	}
+
 	return n;
 }
 
@@ -207,54 +224,33 @@ ssize_t
 secure_write(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;
-	int			waitfor;
 
 retry:
-	waitfor = 0;
 #ifdef USE_SSL
 	if (port->ssl_in_use)
 	{
-		n = be_tls_write(port, ptr, len, &waitfor);
+		n = be_tls_write(port, ptr, len);
 	}
 	else
 #endif
 	{
 		n = secure_raw_write(port, ptr, len);
-		waitfor = WL_SOCKET_WRITEABLE;
 	}
 
-	if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN))
+	/* retry after processing interrupts */
+	if (n < 0 && errno == EINTR)
 	{
-		int		w;
-
-		Assert(waitfor);
+		/*
+		 * We tried to send data, the socket was full, and we were interrupted
+		 * while waiting for writability. We only process interrupts if we got
+		 * interrupted while writing and when in blocking mode. In other cases
+		 * it's better to allow the interrupts to be handled at higher layers.
+		 */
+		ProcessClientWriteInterrupt(!port->noblock);
 
-		w = WaitLatchOrSocket(MyLatch,
-							  WL_LATCH_SET | waitfor,
-							  port->sock, 0);
-
-		/* Handle interrupt. */
-		if (w & WL_LATCH_SET)
-		{
-			ResetLatch(MyLatch);
-			ProcessClientWriteInterrupt(true);
-
-			/*
-			 * We'll retry the write. Most likely it will return immediately
-			 * because there's still no data available, and we'll wait
-			 * for the socket to become ready again.
-			 */
-		}
 		goto retry;
 	}
 
-	/*
-	 * Process interrupts that happened while (or before) sending. Note that
-	 * we signal that we're not blocking, which will prevent some types of
-	 * interrupts from being processed.
-	 */
-	ProcessClientWriteInterrupt(false);
-
 	return n;
 }
 
@@ -263,6 +259,8 @@ secure_raw_write(Port *port, const void *ptr, size_t len)
 {
 	ssize_t		n;
 
+wloop:
+
 #ifdef WIN32
 	pgwin32_noblock = true;
 #endif
@@ -271,5 +269,36 @@ secure_raw_write(Port *port, const void *ptr, size_t len)
 	pgwin32_noblock = false;
 #endif
 
+	if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN))
+	{
+		int		w;
+		int		save_errno = errno;
+
+		w = WaitLatchOrSocket(MyLatch,
+							  WL_LATCH_SET | WL_SOCKET_WRITEABLE,
+							  port->sock, 0);
+
+		if (w & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			/*
+			 * Force a return, so interrupts can be processed when not
+			 * (possibly) underneath a ssl library.
+			 */
+			errno = EINTR;
+			return -1;
+		}
+		else if (w & WL_SOCKET_WRITEABLE)
+		{
+			goto wloop;
+		}
+
+		/*
+		 * Restore errno, clobbered by WaitLatchOrSocket, so the caller can
+		 * react properly.
+		 */
+		errno = save_errno;
+	}
+
 	return n;
 }
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index a0f5396..9cde6a2 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -680,12 +680,42 @@ check_hostname(hbaPort *port, const char *hostname)
 static bool
 check_ip(SockAddr *raddr, struct sockaddr * addr, struct sockaddr * mask)
 {
-	if (raddr->addr.ss_family == addr->sa_family &&
-		pg_range_sockaddr(&raddr->addr,
-						  (struct sockaddr_storage *) addr,
-						  (struct sockaddr_storage *) mask))
-		return true;
-	return false;
+	if (raddr->addr.ss_family == addr->sa_family)
+	{
+		/* Same address family */
+		if (!pg_range_sockaddr(&raddr->addr,
+							   (struct sockaddr_storage *) addr,
+							   (struct sockaddr_storage *) mask))
+			return false;
+	}
+#ifdef HAVE_IPV6
+	else if (addr->sa_family == AF_INET &&
+			 raddr->addr.ss_family == AF_INET6)
+	{
+		/*
+		 * If we're connected on IPv6 but the file specifies an IPv4 address
+		 * to match against, promote the latter to an IPv6 address before
+		 * trying to match the client's address.
+		 */
+		struct sockaddr_storage addrcopy,
+					maskcopy;
+
+		memcpy(&addrcopy, &addr, sizeof(addrcopy));
+		memcpy(&maskcopy, &mask, sizeof(maskcopy));
+		pg_promote_v4_to_v6_addr(&addrcopy);
+		pg_promote_v4_to_v6_mask(&maskcopy);
+
+		if (!pg_range_sockaddr(&raddr->addr, &addrcopy, &maskcopy))
+			return false;
+	}
+#endif   /* HAVE_IPV6 */
+	else
+	{
+		/* Wrong address family, no IPV6 */
+		return false;
+	}
+
+	return true;
 }
 
 /*
diff --git a/src/backend/libpq/ip.c b/src/backend/libpq/ip.c
index db939b5..995a258 100644
--- a/src/backend/libpq/ip.c
+++ b/src/backend/libpq/ip.c
@@ -407,6 +407,79 @@ pg_sockaddr_cidr_mask(struct sockaddr_storage * mask, char *numbits, int family)
 }
 
 
+#ifdef HAVE_IPV6
+
+/*
+ * pg_promote_v4_to_v6_addr --- convert an AF_INET addr to AF_INET6, using
+ *		the standard convention for IPv4 addresses mapped into IPv6 world
+ *
+ * The passed addr is modified in place; be sure it is large enough to
+ * hold the result!  Note that we only worry about setting the fields
+ * that pg_range_sockaddr will look at.
+ */
+void
+pg_promote_v4_to_v6_addr(struct sockaddr_storage * addr)
+{
+	struct sockaddr_in addr4;
+	struct sockaddr_in6 addr6;
+	uint32		ip4addr;
+
+	memcpy(&addr4, addr, sizeof(addr4));
+	ip4addr = ntohl(addr4.sin_addr.s_addr);
+
+	memset(&addr6, 0, sizeof(addr6));
+
+	addr6.sin6_family = AF_INET6;
+
+	addr6.sin6_addr.s6_addr[10] = 0xff;
+	addr6.sin6_addr.s6_addr[11] = 0xff;
+	addr6.sin6_addr.s6_addr[12] = (ip4addr >> 24) & 0xFF;
+	addr6.sin6_addr.s6_addr[13] = (ip4addr >> 16) & 0xFF;
+	addr6.sin6_addr.s6_addr[14] = (ip4addr >> 8) & 0xFF;
+	addr6.sin6_addr.s6_addr[15] = (ip4addr) & 0xFF;
+
+	memcpy(addr, &addr6, sizeof(addr6));
+}
+
+/*
+ * pg_promote_v4_to_v6_mask --- convert an AF_INET netmask to AF_INET6, using
+ *		the standard convention for IPv4 addresses mapped into IPv6 world
+ *
+ * This must be different from pg_promote_v4_to_v6_addr because we want to
+ * set the high-order bits to 1's not 0's.
+ *
+ * The passed addr is modified in place; be sure it is large enough to
+ * hold the result!  Note that we only worry about setting the fields
+ * that pg_range_sockaddr will look at.
+ */
+void
+pg_promote_v4_to_v6_mask(struct sockaddr_storage * addr)
+{
+	struct sockaddr_in addr4;
+	struct sockaddr_in6 addr6;
+	uint32		ip4addr;
+	int			i;
+
+	memcpy(&addr4, addr, sizeof(addr4));
+	ip4addr = ntohl(addr4.sin_addr.s_addr);
+
+	memset(&addr6, 0, sizeof(addr6));
+
+	addr6.sin6_family = AF_INET6;
+
+	for (i = 0; i < 12; i++)
+		addr6.sin6_addr.s6_addr[i] = 0xff;
+
+	addr6.sin6_addr.s6_addr[12] = (ip4addr >> 24) & 0xFF;
+	addr6.sin6_addr.s6_addr[13] = (ip4addr >> 16) & 0xFF;
+	addr6.sin6_addr.s6_addr[14] = (ip4addr >> 8) & 0xFF;
+	addr6.sin6_addr.s6_addr[15] = (ip4addr) & 0xFF;
+
+	memcpy(addr, &addr6, sizeof(addr6));
+}
+#endif   /* HAVE_IPV6 */
+
+
 /*
  * Run the callback function for the addr/mask, after making sure the
  * mask is sane for the addr.
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 34efac4..09dea4b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -185,8 +185,7 @@ pq_init(void)
 	/*
 	 * In backends (as soon as forked) we operate the underlying socket in
 	 * nonblocking mode and use latches to implement blocking semantics if
-	 * needed. That allows us to provide safely interruptible reads and
-	 * writes.
+	 * needed. That allows us to provide safely interruptible reads.
 	 *
 	 * Use COMMERROR on failure, because ERROR would try to send the error to
 	 * the client, which might require changing the mode again, leading to
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 2f07a58..582198f 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -261,6 +261,12 @@ startup_hacks(const char *progname)
 
 		/* In case of general protection fault, don't show GUI popup box */
 		SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
+
+#ifndef HAVE_GETTIMEOFDAY
+		/* Figure out which syscall to use to capture timestamp information */
+		init_win32_gettimeofday();
+#endif
+
 	}
 #endif   /* WIN32 */
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9fe8008..cb85468 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -93,7 +93,6 @@ _copyPlannedStmt(const PlannedStmt *from)
 	COPY_NODE_FIELD(relationOids);
 	COPY_NODE_FIELD(invalItems);
 	COPY_SCALAR_FIELD(nParamExec);
-	COPY_SCALAR_FIELD(hasRowSecurity);
 
 	return newnode;
 }
@@ -176,7 +175,6 @@ _copyModifyTable(const ModifyTable *from)
 	 */
 	COPY_SCALAR_FIELD(operation);
 	COPY_SCALAR_FIELD(canSetTag);
-	COPY_SCALAR_FIELD(nominalRelation);
 	COPY_NODE_FIELD(resultRelations);
 	COPY_SCALAR_FIELD(resultRelIndex);
 	COPY_NODE_FIELD(plans);
@@ -592,7 +590,9 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
@@ -617,6 +617,7 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
 
 	/*
@@ -1694,7 +1695,6 @@ _copyNullTest(const NullTest *from)
 	COPY_NODE_FIELD(arg);
 	COPY_SCALAR_FIELD(nulltesttype);
 	COPY_SCALAR_FIELD(argisrow);
-	COPY_LOCATION_FIELD(location);
 
 	return newnode;
 }
@@ -1709,7 +1709,6 @@ _copyBooleanTest(const BooleanTest *from)
 
 	COPY_NODE_FIELD(arg);
 	COPY_SCALAR_FIELD(booltesttype);
-	COPY_LOCATION_FIELD(location);
 
 	return newnode;
 }
@@ -2940,7 +2939,6 @@ _copyIndexStmt(const IndexStmt *from)
 	COPY_SCALAR_FIELD(isconstraint);
 	COPY_SCALAR_FIELD(deferrable);
 	COPY_SCALAR_FIELD(initdeferred);
-	COPY_SCALAR_FIELD(transformed);
 	COPY_SCALAR_FIELD(concurrent);
 	COPY_SCALAR_FIELD(if_not_exists);
 
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index fe509b0..6e8b308 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -622,7 +622,6 @@ _equalNullTest(const NullTest *a, const NullTest *b)
 	COMPARE_NODE_FIELD(arg);
 	COMPARE_SCALAR_FIELD(nulltesttype);
 	COMPARE_SCALAR_FIELD(argisrow);
-	COMPARE_LOCATION_FIELD(location);
 
 	return true;
 }
@@ -632,7 +631,6 @@ _equalBooleanTest(const BooleanTest *a, const BooleanTest *b)
 {
 	COMPARE_NODE_FIELD(arg);
 	COMPARE_SCALAR_FIELD(booltesttype);
-	COMPARE_LOCATION_FIELD(location);
 
 	return true;
 }
@@ -1211,7 +1209,6 @@ _equalIndexStmt(const IndexStmt *a, const IndexStmt *b)
 	COMPARE_SCALAR_FIELD(isconstraint);
 	COMPARE_SCALAR_FIELD(deferrable);
 	COMPARE_SCALAR_FIELD(initdeferred);
-	COMPARE_SCALAR_FIELD(transformed);
 	COMPARE_SCALAR_FIELD(concurrent);
 	COMPARE_SCALAR_FIELD(if_not_exists);
 
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index d6f1f5b..21dfda7 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1346,22 +1346,12 @@ exprLocation(const Node *expr)
 			}
 			break;
 		case T_NullTest:
-			{
-				const NullTest *nexpr = (const NullTest *) expr;
-
-				/* Much as above */
-				loc = leftmostLoc(nexpr->location,
-								  exprLocation((Node *) nexpr->arg));
-			}
+			/* just use argument's location */
+			loc = exprLocation((Node *) ((const NullTest *) expr)->arg);
 			break;
 		case T_BooleanTest:
-			{
-				const BooleanTest *bexpr = (const BooleanTest *) expr;
-
-				/* Much as above */
-				loc = leftmostLoc(bexpr->location,
-								  exprLocation((Node *) bexpr->arg));
-			}
+			/* just use argument's location */
+			loc = exprLocation((Node *) ((const BooleanTest *) expr)->arg);
 			break;
 		case T_CoerceToDomain:
 			{
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 775f482..c4a06fc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -255,7 +255,6 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_NODE_FIELD(relationOids);
 	WRITE_NODE_FIELD(invalItems);
 	WRITE_INT_FIELD(nParamExec);
-	WRITE_BOOL_FIELD(hasRowSecurity);
 }
 
 /*
@@ -328,7 +327,6 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
 
 	WRITE_ENUM_FIELD(operation, CmdType);
 	WRITE_BOOL_FIELD(canSetTag);
-	WRITE_UINT_FIELD(nominalRelation);
 	WRITE_NODE_FIELD(resultRelations);
 	WRITE_INT_FIELD(resultRelIndex);
 	WRITE_NODE_FIELD(plans);
@@ -558,7 +556,9 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
@@ -572,6 +572,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
@@ -1371,7 +1372,6 @@ _outNullTest(StringInfo str, const NullTest *node)
 	WRITE_NODE_FIELD(arg);
 	WRITE_ENUM_FIELD(nulltesttype, NullTestType);
 	WRITE_BOOL_FIELD(argisrow);
-	WRITE_LOCATION_FIELD(location);
 }
 
 static void
@@ -1381,7 +1381,6 @@ _outBooleanTest(StringInfo str, const BooleanTest *node)
 
 	WRITE_NODE_FIELD(arg);
 	WRITE_ENUM_FIELD(booltesttype, BoolTestType);
-	WRITE_LOCATION_FIELD(location);
 }
 
 static void
@@ -1720,7 +1719,6 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
 	WRITE_UINT_FIELD(lastPHId);
 	WRITE_UINT_FIELD(lastRowMarkId);
 	WRITE_BOOL_FIELD(transientPlan);
-	WRITE_BOOL_FIELD(hasRowSecurity);
 }
 
 static void
@@ -2082,9 +2080,7 @@ _outIndexStmt(StringInfo str, const IndexStmt *node)
 	WRITE_BOOL_FIELD(isconstraint);
 	WRITE_BOOL_FIELD(deferrable);
 	WRITE_BOOL_FIELD(initdeferred);
-	WRITE_BOOL_FIELD(transformed);
 	WRITE_BOOL_FIELD(concurrent);
-	WRITE_BOOL_FIELD(if_not_exists);
 }
 
 static void
@@ -2518,34 +2514,6 @@ _outAExpr(StringInfo str, const A_Expr *node)
 			appendStringInfoString(str, " IN ");
 			WRITE_NODE_FIELD(name);
 			break;
-		case AEXPR_LIKE:
-			appendStringInfoString(str, " LIKE ");
-			WRITE_NODE_FIELD(name);
-			break;
-		case AEXPR_ILIKE:
-			appendStringInfoString(str, " ILIKE ");
-			WRITE_NODE_FIELD(name);
-			break;
-		case AEXPR_SIMILAR:
-			appendStringInfoString(str, " SIMILAR ");
-			WRITE_NODE_FIELD(name);
-			break;
-		case AEXPR_BETWEEN:
-			appendStringInfoString(str, " BETWEEN ");
-			WRITE_NODE_FIELD(name);
-			break;
-		case AEXPR_NOT_BETWEEN:
-			appendStringInfoString(str, " NOT_BETWEEN ");
-			WRITE_NODE_FIELD(name);
-			break;
-		case AEXPR_BETWEEN_SYM:
-			appendStringInfoString(str, " BETWEEN_SYM ");
-			WRITE_NODE_FIELD(name);
-			break;
-		case AEXPR_NOT_BETWEEN_SYM:
-			appendStringInfoString(str, " NOT_BETWEEN_SYM ");
-			WRITE_NODE_FIELD(name);
-			break;
 		default:
 			appendStringInfoString(str, " ??");
 			break;
diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index fb803f8..2f2f5ed 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -40,8 +40,9 @@ copyParamList(ParamListInfo from)
 	if (from == NULL || from->numParams <= 0)
 		return NULL;
 
-	size = offsetof(ParamListInfoData, params) +
-		from->numParams * sizeof(ParamExternData);
+	/* sizeof(ParamListInfoData) includes the first array element */
+	size = sizeof(ParamListInfoData) +
+		(from->numParams - 1) * sizeof(ParamExternData);
 
 	retval = (ParamListInfo) palloc(size);
 	retval->paramFetch = NULL;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 563209c..ae24d05 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1044,7 +1044,6 @@ _readNullTest(void)
 	READ_NODE_FIELD(arg);
 	READ_ENUM_FIELD(nulltesttype, NullTestType);
 	READ_BOOL_FIELD(argisrow);
-	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
 }
@@ -1059,7 +1058,6 @@ _readBooleanTest(void)
 
 	READ_NODE_FIELD(arg);
 	READ_ENUM_FIELD(booltesttype, BoolTestType);
-	READ_LOCATION_FIELD(location);
 
 	READ_DONE();
 }
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 6cae9e8..88cdbc3 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -702,37 +702,17 @@ intermediate layers of joins, for example:
 			-> Index Scan using C_Z_IDX on C
 				Index Condition: C.Z = A.X
 
-If all joins are plain inner joins then this is usually unnecessary,
-because it's possible to reorder the joins so that a parameter is used
+If all joins are plain inner joins then this is unnecessary, because
+it's always possible to reorder the joins so that a parameter is used
 immediately below the nestloop node that provides it.  But in the
-presence of outer joins, such join reordering may not be possible.
-
-Also, the bottom-level scan might require parameters from more than one
-other relation.  In principle we could join the other relations first
-so that all the parameters are supplied from a single nestloop level.
-But if those other relations have no join clause in common (which is
-common in star-schema queries for instance), the planner won't consider
-joining them directly to each other.  In such a case we need to be able
-to create a plan like
-
-    NestLoop
-        -> Seq Scan on SmallTable1 A
-        NestLoop
-            -> Seq Scan on SmallTable2 B
-            NestLoop
-                -> Index Scan using XYIndex on LargeTable C
-                      Index Condition: C.X = A.AID and C.Y = B.BID
-
-so we should be willing to pass down A.AID through a join even though
-there is no join order constraint forcing the plan to look like this.
-
-Before version 9.2, Postgres used ad-hoc methods for planning and
-executing nestloop queries of this kind, and those methods could not
-handle passing parameters down through multiple join levels.
+presence of outer joins, join reordering may not be possible, and then
+this option can be critical.  Before version 9.2, Postgres used ad-hoc
+methods for planning and executing such queries, and those methods could
+not handle passing parameters down through multiple join levels.
 
 To plan such queries, we now use a notion of a "parameterized path",
 which is a path that makes use of a join clause to a relation that's not
-scanned by the path.  In the example two above, we would construct a
+scanned by the path.  In the example just above, we would construct a
 path representing the possibility of doing this:
 
 	-> Index Scan using C_Z_IDX on C
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 78ef229..020558b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4036,11 +4036,11 @@ set_rel_width(PlannerInfo *root, RelOptInfo *rel)
 
 	/*
 	 * If we have a whole-row reference, estimate its width as the sum of
-	 * per-column widths plus heap tuple header overhead.
+	 * per-column widths plus sizeof(HeapTupleHeaderData).
 	 */
 	if (have_wholerow_var)
 	{
-		int32		wholerow_width = MAXALIGN(SizeofHeapTupleHeader);
+		int32		wholerow_width = sizeof(HeapTupleHeaderData);
 
 		if (reloid != InvalidOid)
 		{
@@ -4078,7 +4078,7 @@ set_rel_width(PlannerInfo *root, RelOptInfo *rel)
 static double
 relation_byte_size(double tuples, int width)
 {
-	return tuples * (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+	return tuples * (MAXALIGN(width) + MAXALIGN(sizeof(HeapTupleHeaderData)));
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1da953f..e730137 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,10 +17,13 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -117,14 +120,13 @@ add_paths_to_joinrel(PlannerInfo *root,
 	/*
 	 * Decide whether it's sensible to generate parameterized paths for this
 	 * joinrel, and if so, which relations such paths should require.  There
-	 * is usually no need to create a parameterized result path unless there
-	 * is a join order restriction that prevents joining one of our input rels
-	 * directly to the parameter source rel instead of joining to the other
-	 * input rel.  (But see exception in try_nestloop_path.)  This restriction
-	 * reduces the number of parameterized paths we have to deal with at
-	 * higher join levels, without compromising the quality of the resulting
-	 * plan.  We express the restriction as a Relids set that must overlap the
-	 * parameterization of any proposed join path.
+	 * is no need to create a parameterized result path unless there is a join
+	 * order restriction that prevents joining one of our input rels directly
+	 * to the parameter source rel instead of joining to the other input rel.
+	 * This restriction reduces the number of parameterized paths we have to
+	 * deal with at higher join levels, without compromising the quality of
+	 * the resulting plan.  We express the restriction as a Relids set that
+	 * must overlap the parameterization of any proposed join path.
 	 */
 	foreach(lc, root->join_info_list)
 	{
@@ -260,6 +262,37 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDW drivers or custom-scan providers, in
+	 * addition to built-in paths.
+	 *
+	 * XXX - In case of FDW, we may be able to omit invocation if joinrel's
+	 * fdwhandler (set only if both relations are managed by same FDW server).
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 6. Consider paths added by FDWs when both outer and inner relations are
+	 * managed by same foreign-data wrapper.  Matching of foreign server and/or
+	 * checkAsUser should be checked in GetForeignJoinPath by the FDW.
+	 */
+	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPath)
+	{
+		joinrel->fdwroutine->GetForeignJoinPath(root,
+												joinrel,
+												outerrel,
+												innerrel,
+												jointype,
+												sjinfo,
+												&semifactors,
+												restrictlist,
+												extra_lateral_rels);
+	}
 }
 
 /*
@@ -292,29 +325,9 @@ try_nestloop_path(PlannerInfo *root,
 	if (required_outer &&
 		!bms_overlap(required_outer, param_source_rels))
 	{
-		/*
-		 * We override the param_source_rels heuristic to accept nestloop
-		 * paths in which the outer rel satisfies some but not all of the
-		 * inner path's parameterization.  This is necessary to get good plans
-		 * for star-schema scenarios, in which a parameterized path for a
-		 * large table may require parameters from multiple small tables that
-		 * will not get joined directly to each other.  We can handle that by
-		 * stacking nestloops that have the small tables on the outside; but
-		 * this breaks the rule the param_source_rels heuristic is based on,
-		 * namely that parameters should not be passed down across joins
-		 * unless there's a join-order-constraint-based reason to do so.  So
-		 * ignore the param_source_rels restriction when this case applies.
-		 */
-		Relids		outerrelids = outer_path->parent->relids;
-		Relids		innerparams = PATH_REQ_OUTER(inner_path);
-
-		if (!(bms_overlap(innerparams, outerrelids) &&
-			  bms_nonempty_difference(innerparams, outerrelids)))
-		{
-			/* Waste no memory when we reject a path here */
-			bms_free(required_outer);
-			return;
-		}
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+		return;
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76ba1bf..7a2c134 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1958,16 +1957,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1982,13 +1991,35 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
+	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+	 * ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (scan_plan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, scan_plan->fdw_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2050,12 +2081,7 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
-
-	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
+	ListCell   *lc;
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2075,6 +2101,26 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	Assert(IsA(cplan, CustomScan));
 
 	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the custom_ps_tlist, excluding resjunk=true, so we need
+	 * to ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (cplan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, cplan->custom_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+
+	/*
 	 * Copy cost data from Path to Plan; no need to make custom-plan providers
 	 * do this
 	 */
@@ -4809,7 +4855,6 @@ make_result(PlannerInfo *root,
 ModifyTable *
 make_modifytable(PlannerInfo *root,
 				 CmdType operation, bool canSetTag,
-				 Index nominalRelation,
 				 List *resultRelations, List *subplans,
 				 List *withCheckOptionLists, List *returningLists,
 				 List *rowMarks, int epqParam)
@@ -4858,7 +4903,6 @@ make_modifytable(PlannerInfo *root,
 
 	node->operation = operation;
 	node->canSetTag = canSetTag;
-	node->nominalRelation = nominalRelation;
 	node->resultRelations = resultRelations;
 	node->resultRelIndex = -1;	/* will be set correctly in setrefs.c */
 	node->plans = subplans;
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index af772a2..b90c2ef 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -449,7 +449,6 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 	ntest->arg = copyObject(mminfo->target);
 	/* we checked it wasn't a rowtype in find_minmax_aggs_walker */
 	ntest->argisrow = false;
-	ntest->location = -1;
 
 	/* User might have had that in WHERE already */
 	if (!list_member((List *) parse->jointree->quals, ntest))
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b02a107..9cbbcfb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -607,7 +607,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
 			plan = (Plan *) make_modifytable(root,
 											 parse->commandType,
 											 parse->canSetTag,
-											 parse->resultRelation,
 									   list_make1_int(parse->resultRelation),
 											 list_make1(plan),
 											 withCheckOptionLists,
@@ -791,7 +790,6 @@ inheritance_planner(PlannerInfo *root)
 {
 	Query	   *parse = root->parse;
 	int			parentRTindex = parse->resultRelation;
-	int			nominalRelation = -1;
 	List	   *final_rtable = NIL;
 	int			save_rel_array_size = 0;
 	RelOptInfo **save_rel_array = NULL;
@@ -927,20 +925,6 @@ inheritance_planner(PlannerInfo *root)
 		appinfo->child_relid = subroot.parse->resultRelation;
 
 		/*
-		 * We'll use the first child relation (even if it's excluded) as the
-		 * nominal target relation of the ModifyTable node.  Because of the
-		 * way expand_inherited_rtentry works, this should always be the RTE
-		 * representing the parent table in its role as a simple member of the
-		 * inheritance set.  (It would be logically cleaner to use the
-		 * inheritance parent RTE as the nominal target; but since that RTE
-		 * will not be otherwise referenced in the plan, doing so would give
-		 * rise to confusing use of multiple aliases in EXPLAIN output for
-		 * what the user will think is the "same" table.)
-		 */
-		if (nominalRelation < 0)
-			nominalRelation = appinfo->child_relid;
-
-		/*
 		 * If this child rel was excluded by constraint exclusion, exclude it
 		 * from the result plan.
 		 */
@@ -1067,7 +1051,6 @@ inheritance_planner(PlannerInfo *root)
 	return (Plan *) make_modifytable(root,
 									 parse->commandType,
 									 parse->canSetTag,
-									 nominalRelation,
 									 resultRelations,
 									 subplans,
 									 withCheckOptionLists,
@@ -2277,7 +2260,7 @@ preprocess_rowmarks(PlannerInfo *root)
 			newrc->markType = ROW_MARK_REFERENCE;
 		else
 			newrc->markType = ROW_MARK_COPY;
-		newrc->waitPolicy = LockWaitBlock;		/* doesn't matter */
+		newrc->waitPolicy = LockWaitBlock;	/* doesn't matter */
 		newrc->isParent = false;
 
 		prowmarks = lappend(prowmarks, newrc);
@@ -2755,7 +2738,7 @@ choose_hashed_grouping(PlannerInfo *root,
 	 */
 
 	/* Estimate per-hash-entry space at tuple width... */
-	hashentrysize = MAXALIGN(path_width) + MAXALIGN(SizeofMinimalTupleHeader);
+	hashentrysize = MAXALIGN(path_width) + MAXALIGN(sizeof(MinimalTupleData));
 	/* plus space for pass-by-ref transition values... */
 	hashentrysize += agg_costs->transitionSpace;
 	/* plus the per-hash-entry overhead */
@@ -2923,7 +2906,7 @@ choose_hashed_distinct(PlannerInfo *root,
 	 */
 
 	/* Estimate per-hash-entry space at tuple width... */
-	hashentrysize = MAXALIGN(path_width) + MAXALIGN(SizeofMinimalTupleHeader);
+	hashentrysize = MAXALIGN(path_width) + MAXALIGN(sizeof(MinimalTupleData));
 	/* plus the per-hash-entry overhead */
 	hashentrysize += hash_agg_entry_size(0);
 
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..52f2361 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -41,8 +41,9 @@ typedef struct
 	int			num_vars;		/* number of plain Var tlist entries */
 	bool		has_ph_vars;	/* are there PlaceHolderVar entries? */
 	bool		has_non_vars;	/* are there other entries? */
-	tlist_vinfo vars[FLEXIBLE_ARRAY_MEMBER];	/* has num_vars entries */
-} indexed_tlist;
+	/* array of num_vars entries: */
+	tlist_vinfo vars[1];		/* VARIABLE LENGTH ARRAY */
+} indexed_tlist;				/* VARIABLE LENGTH STRUCT */
 
 typedef struct
 {
@@ -568,6 +569,34 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -582,6 +611,34 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -753,8 +810,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					splan->plan.targetlist = copyObject(linitial(newRL));
 				}
 
-				splan->nominalRelation += rtoffset;
-
 				foreach(l, splan->resultRelations)
 				{
 					lfirst_int(l) += rtoffset;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 5a1d539..78fb6b1 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -974,12 +974,12 @@ subplan_is_hashable(Plan *plan)
 
 	/*
 	 * The estimated size of the subquery result must fit in work_mem. (Note:
-	 * we use heap tuple overhead here even though the tuples will actually be
-	 * stored as MinimalTuples; this provides some fudge factor for hashtable
-	 * overhead.)
+	 * we use sizeof(HeapTupleHeaderData) here even though the tuples will
+	 * actually be stored as MinimalTuples; this provides some fudge factor
+	 * for hashtable overhead.)
 	 */
 	subquery_size = plan->plan_rows *
-		(MAXALIGN(plan->plan_width) + MAXALIGN(SizeofHeapTupleHeader));
+		(MAXALIGN(plan->plan_width) + MAXALIGN(sizeof(HeapTupleHeaderData)));
 	if (subquery_size > work_mem * 1024L)
 		return false;
 
diff --git a/src/backend/optimizer/prep/prepqual.c b/src/backend/optimizer/prep/prepqual.c
index 1176e81..bd50926 100644
--- a/src/backend/optimizer/prep/prepqual.c
+++ b/src/backend/optimizer/prep/prepqual.c
@@ -212,7 +212,6 @@ negate_clause(Node *node)
 					newexpr->nulltesttype = (expr->nulltesttype == IS_NULL ?
 											 IS_NOT_NULL : IS_NULL);
 					newexpr->argisrow = expr->argisrow;
-					newexpr->location = expr->location;
 					return (Node *) newexpr;
 				}
 			}
@@ -248,7 +247,6 @@ negate_clause(Node *node)
 							 (int) expr->booltesttype);
 						break;
 				}
-				newexpr->location = expr->location;
 				return (Node *) newexpr;
 			}
 			break;
diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c
index b382f13..af3ee61 100644
--- a/src/backend/optimizer/prep/prepsecurity.c
+++ b/src/backend/optimizer/prep/prepsecurity.c
@@ -37,7 +37,7 @@ typedef struct
 } security_barrier_replace_vars_context;
 
 static void expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
-					 RangeTblEntry *rte, Node *qual, bool targetRelation);
+					 RangeTblEntry *rte, Node *qual);
 
 static void security_barrier_replace_vars(Node *node,
 							  security_barrier_replace_vars_context *context);
@@ -73,8 +73,7 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 	rt_index = 0;
 	foreach(cell, parse->rtable)
 	{
-		bool			targetRelation = false;
-		RangeTblEntry  *rte = (RangeTblEntry *) lfirst(cell);
+		RangeTblEntry *rte = (RangeTblEntry *) lfirst(cell);
 
 		rt_index++;
 
@@ -99,15 +98,6 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 		{
 			RangeTblEntry *newrte = copyObject(rte);
 
-			/*
-			 * We need to let expand_security_qual know if this is the target
-			 * relation, as it has additional work to do in that case.
-			 *
-			 * Capture that information here as we're about to replace
-			 * parse->resultRelation.
-			 */
-			targetRelation = true;
-
 			parse->rtable = lappend(parse->rtable, newrte);
 			parse->resultRelation = list_length(parse->rtable);
 
@@ -157,8 +147,7 @@ expand_security_quals(PlannerInfo *root, List *tlist)
 			rte->securityQuals = list_delete_first(rte->securityQuals);
 
 			ChangeVarNodes(qual, rt_index, 1, 0);
-			expand_security_qual(root, tlist, rt_index, rte, qual,
-								 targetRelation);
+			expand_security_qual(root, tlist, rt_index, rte, qual);
 		}
 	}
 }
@@ -171,7 +160,7 @@ expand_security_quals(PlannerInfo *root, List *tlist)
  */
 static void
 expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
-					 RangeTblEntry *rte, Node *qual, bool targetRelation)
+					 RangeTblEntry *rte, Node *qual)
 {
 	Query	   *parse = root->parse;
 	Oid			relid = rte->relid;
@@ -230,11 +219,10 @@ expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
 			 * Now deal with any PlanRowMark on this RTE by requesting a lock
 			 * of the same strength on the RTE copied down to the subquery.
 			 *
-			 * Note that we can only push down user-defined quals if they are
-			 * only using leakproof (and therefore trusted) functions and
-			 * operators.  As a result, we may end up locking more rows than
-			 * strictly necessary (and, in the worst case, we could end up
-			 * locking all rows which pass the securityQuals).  This is
+			 * Note that we can't push the user-defined quals down since they
+			 * may included untrusted functions and that means that we will
+			 * end up locking all rows which pass the securityQuals, even if
+			 * those rows don't pass the user-defined quals.  This is
 			 * currently documented behavior, but it'd be nice to come up with
 			 * a better solution some day.
 			 */
@@ -268,15 +256,6 @@ expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
 			}
 
 			/*
-			 * When we are replacing the target relation with a subquery, we
-			 * need to make sure to add a locking clause explicitly to the
-			 * generated subquery since there won't be any row marks against
-			 * the target relation itself.
-			 */
-			if (targetRelation)
-				applyLockingClause(subquery, 1, LCS_FORUPDATE,
-								   LockWaitBlock, false);
-			/*
 			 * Replace any variables in the outer query that refer to the
 			 * original relation RTE with references to columns that we will
 			 * expose in the new subquery, building the subquery's targetlist
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b90fee3..05f601e 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -832,7 +832,7 @@ choose_hashed_setop(PlannerInfo *root, List *groupClauses,
 	 * Don't do it if it doesn't look like the hashtable will fit into
 	 * work_mem.
 	 */
-	hashentrysize = MAXALIGN(input_plan->plan_width) + MAXALIGN(SizeofMinimalTupleHeader);
+	hashentrysize = MAXALIGN(input_plan->plan_width) + MAXALIGN(sizeof(MinimalTupleData));
 
 	if (hashentrysize * dNumGroups > work_mem * 1024L)
 		return false;
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 84d58ae..b340b01 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3305,7 +3305,6 @@ eval_const_expressions_mutator(Node *node,
 						newntest->arg = (Expr *) relem;
 						newntest->nulltesttype = ntest->nulltesttype;
 						newntest->argisrow = type_is_rowtype(exprType(relem));
-						newntest->location = ntest->location;
 						newargs = lappend(newargs, newntest);
 					}
 					/* If all the inputs were constants, result is TRUE */
@@ -3344,7 +3343,6 @@ eval_const_expressions_mutator(Node *node,
 				newntest->arg = (Expr *) arg;
 				newntest->nulltesttype = ntest->nulltesttype;
 				newntest->argisrow = ntest->argisrow;
-				newntest->location = ntest->location;
 				return (Node *) newntest;
 			}
 		case T_BooleanTest:
@@ -3397,7 +3395,6 @@ eval_const_expressions_mutator(Node *node,
 				newbtest = makeNode(BooleanTest);
 				newbtest->arg = (Expr *) arg;
 				newbtest->booltesttype = btest->booltesttype;
-				newbtest->location = btest->location;
 				return (Node *) newbtest;
 			}
 		case T_PlaceHolderVar:
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 313a5c1..a4a35c3 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -378,10 +378,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerForRelation(relation);
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
@@ -508,7 +513,7 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
 				int32		tuple_width;
 
 				tuple_width = get_rel_data_width(rel, attr_widths);
-				tuple_width += MAXALIGN(SizeofHeapTupleHeader);
+				tuple_width += sizeof(HeapTupleHeaderData);
 				tuple_width += sizeof(ItemIdData);
 				/* note: integer division is intentional here */
 				density = (BLCKSZ - SizeOfPageHeaderData) / tuple_width;
@@ -720,7 +725,6 @@ get_relation_constraints(PlannerInfo *root,
 												  0);
 					ntest->nulltesttype = IS_NOT_NULL;
 					ntest->argisrow = type_is_rowtype(att->atttypid);
-					ntest->location = -1;
 					result = lappend(result, ntest);
 				}
 			}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..ca71093 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -427,6 +428,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 581f7a1..36dac29 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -633,9 +633,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 /*
  * The grammar thinks these are keywords, but they are not in the kwlist.h
  * list and so can never be entered directly.  The filter in parser.c
- * creates these tokens when required (based on looking one token ahead).
+ * creates these tokens when required.
  */
-%token			NULLS_LA WITH_LA
+%token			NULLS_FIRST NULLS_LAST WITH_ORDINALITY WITH_TIME
 
 
 /* Precedence: lowest to highest */
@@ -873,7 +873,6 @@ CreateRoleStmt:
 
 
 opt_with:	WITH									{}
-			| WITH_LA								{}
 			| /*EMPTY*/								{}
 		;
 
@@ -6564,7 +6563,6 @@ IndexStmt:	CREATE opt_unique INDEX opt_concurrently opt_index_name
 					n->isconstraint = false;
 					n->deferrable = false;
 					n->initdeferred = false;
-					n->transformed = false;
 					n->if_not_exists = false;
 					$$ = (Node *)n;
 				}
@@ -6590,7 +6588,6 @@ IndexStmt:	CREATE opt_unique INDEX opt_concurrently opt_index_name
 					n->isconstraint = false;
 					n->deferrable = false;
 					n->initdeferred = false;
-					n->transformed = false;
 					n->if_not_exists = true;
 					$$ = (Node *)n;
 				}
@@ -6674,8 +6671,8 @@ opt_asc_desc: ASC							{ $$ = SORTBY_ASC; }
 			| /*EMPTY*/						{ $$ = SORTBY_DEFAULT; }
 		;
 
-opt_nulls_order: NULLS_LA FIRST_P			{ $$ = SORTBY_NULLS_FIRST; }
-			| NULLS_LA LAST_P				{ $$ = SORTBY_NULLS_LAST; }
+opt_nulls_order: NULLS_FIRST				{ $$ = SORTBY_NULLS_FIRST; }
+			| NULLS_LAST					{ $$ = SORTBY_NULLS_LAST; }
 			| /*EMPTY*/						{ $$ = SORTBY_NULLS_DEFAULT; }
 		;
 
@@ -8924,7 +8921,7 @@ AlterTSDictionaryStmt:
 		;
 
 AlterTSConfigurationStmt:
-			ALTER TEXT_P SEARCH CONFIGURATION any_name ADD_P MAPPING FOR name_list any_with any_name_list
+			ALTER TEXT_P SEARCH CONFIGURATION any_name ADD_P MAPPING FOR name_list WITH any_name_list
 				{
 					AlterTSConfigurationStmt *n = makeNode(AlterTSConfigurationStmt);
 					n->cfgname = $5;
@@ -8934,7 +8931,7 @@ AlterTSConfigurationStmt:
 					n->replace = false;
 					$$ = (Node*)n;
 				}
-			| ALTER TEXT_P SEARCH CONFIGURATION any_name ALTER MAPPING FOR name_list any_with any_name_list
+			| ALTER TEXT_P SEARCH CONFIGURATION any_name ALTER MAPPING FOR name_list WITH any_name_list
 				{
 					AlterTSConfigurationStmt *n = makeNode(AlterTSConfigurationStmt);
 					n->cfgname = $5;
@@ -8944,7 +8941,7 @@ AlterTSConfigurationStmt:
 					n->replace = false;
 					$$ = (Node*)n;
 				}
-			| ALTER TEXT_P SEARCH CONFIGURATION any_name ALTER MAPPING REPLACE any_name any_with any_name
+			| ALTER TEXT_P SEARCH CONFIGURATION any_name ALTER MAPPING REPLACE any_name WITH any_name
 				{
 					AlterTSConfigurationStmt *n = makeNode(AlterTSConfigurationStmt);
 					n->cfgname = $5;
@@ -8954,7 +8951,7 @@ AlterTSConfigurationStmt:
 					n->replace = true;
 					$$ = (Node*)n;
 				}
-			| ALTER TEXT_P SEARCH CONFIGURATION any_name ALTER MAPPING FOR name_list REPLACE any_name any_with any_name
+			| ALTER TEXT_P SEARCH CONFIGURATION any_name ALTER MAPPING FOR name_list REPLACE any_name WITH any_name
 				{
 					AlterTSConfigurationStmt *n = makeNode(AlterTSConfigurationStmt);
 					n->cfgname = $5;
@@ -8982,11 +8979,6 @@ AlterTSConfigurationStmt:
 				}
 		;
 
-/* Use this if TIME or ORDINALITY after WITH should be taken as an identifier */
-any_with:	WITH									{}
-			| WITH_LA								{}
-		;
-
 
 /*****************************************************************************
  *
@@ -9897,8 +9889,6 @@ simple_select:
  *		AS (query) [ SEARCH or CYCLE clause ]
  *
  * We don't currently support the SEARCH or CYCLE clause.
- *
- * Recognizing WITH_LA here allows a CTE to be named TIME or ORDINALITY.
  */
 with_clause:
 		WITH cte_list
@@ -9908,13 +9898,6 @@ with_clause:
 				$$->recursive = false;
 				$$->location = @1;
 			}
-		| WITH_LA cte_list
-			{
-				$$ = makeNode(WithClause);
-				$$->ctes = $2;
-				$$->recursive = false;
-				$$->location = @1;
-			}
 		| WITH RECURSIVE cte_list
 			{
 				$$ = makeNode(WithClause);
@@ -10616,7 +10599,7 @@ opt_col_def_list: AS '(' TableFuncElementList ')'	{ $$ = $3; }
 			| /*EMPTY*/								{ $$ = NIL; }
 		;
 
-opt_ordinality: WITH_LA ORDINALITY					{ $$ = true; }
+opt_ordinality: WITH_ORDINALITY						{ $$ = true; }
 			| /*EMPTY*/								{ $$ = false; }
 		;
 
@@ -11072,7 +11055,7 @@ ConstInterval:
 		;
 
 opt_timezone:
-			WITH_LA TIME ZONE						{ $$ = TRUE; }
+			WITH_TIME ZONE							{ $$ = TRUE; }
 			| WITHOUT TIME ZONE						{ $$ = FALSE; }
 			| /*EMPTY*/								{ $$ = FALSE; }
 		;
@@ -11195,7 +11178,7 @@ a_expr:		c_expr									{ $$ = $1; }
 		 * below; and all those operators will have the same precedence.
 		 *
 		 * If you add more explicitly-known operators, be sure to add them
-		 * also to b_expr and to the MathOp list below.
+		 * also to b_expr and to the MathOp list above.
 		 */
 			| '+' a_expr					%prec UMINUS
 				{ $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", NULL, $2, @1); }
@@ -11235,56 +11218,40 @@ a_expr:		c_expr									{ $$ = $1; }
 				{ $$ = makeNotExpr($2, @1); }
 
 			| a_expr LIKE a_expr
-				{
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "~~",
-												   $1, $3, @2);
-				}
+				{ $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "~~", $1, $3, @2); }
 			| a_expr LIKE a_expr ESCAPE a_expr
 				{
 					FuncCall *n = makeFuncCall(SystemFuncName("like_escape"),
 											   list_make2($3, $5),
 											   @2);
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "~~",
-												   $1, (Node *) n, @2);
+					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "~~", $1, (Node *) n, @2);
 				}
 			| a_expr NOT LIKE a_expr
-				{
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "!~~",
-												   $1, $4, @2);
-				}
+				{ $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "!~~", $1, $4, @2); }
 			| a_expr NOT LIKE a_expr ESCAPE a_expr
 				{
 					FuncCall *n = makeFuncCall(SystemFuncName("like_escape"),
 											   list_make2($4, $6),
 											   @2);
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "!~~",
-												   $1, (Node *) n, @2);
+					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "!~~", $1, (Node *) n, @2);
 				}
 			| a_expr ILIKE a_expr
-				{
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "~~*",
-												   $1, $3, @2);
-				}
+				{ $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "~~*", $1, $3, @2); }
 			| a_expr ILIKE a_expr ESCAPE a_expr
 				{
 					FuncCall *n = makeFuncCall(SystemFuncName("like_escape"),
 											   list_make2($3, $5),
 											   @2);
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "~~*",
-												   $1, (Node *) n, @2);
+					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "~~*", $1, (Node *) n, @2);
 				}
 			| a_expr NOT ILIKE a_expr
-				{
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "!~~*",
-												   $1, $4, @2);
-				}
+				{ $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "!~~*", $1, $4, @2); }
 			| a_expr NOT ILIKE a_expr ESCAPE a_expr
 				{
 					FuncCall *n = makeFuncCall(SystemFuncName("like_escape"),
 											   list_make2($4, $6),
 											   @2);
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "!~~*",
-												   $1, (Node *) n, @2);
+					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "!~~*", $1, (Node *) n, @2);
 				}
 
 			| a_expr SIMILAR TO a_expr				%prec SIMILAR
@@ -11292,32 +11259,28 @@ a_expr:		c_expr									{ $$ = $1; }
 					FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"),
 											   list_make2($4, makeNullAConst(-1)),
 											   @2);
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "~",
-												   $1, (Node *) n, @2);
+					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "~", $1, (Node *) n, @2);
 				}
 			| a_expr SIMILAR TO a_expr ESCAPE a_expr
 				{
 					FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"),
 											   list_make2($4, $6),
 											   @2);
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "~",
-												   $1, (Node *) n, @2);
+					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "~", $1, (Node *) n, @2);
 				}
 			| a_expr NOT SIMILAR TO a_expr			%prec SIMILAR
 				{
 					FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"),
 											   list_make2($5, makeNullAConst(-1)),
 											   @2);
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "!~",
-												   $1, (Node *) n, @2);
+					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "!~", $1, (Node *) n, @2);
 				}
 			| a_expr NOT SIMILAR TO a_expr ESCAPE a_expr
 				{
 					FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"),
 											   list_make2($5, $7),
 											   @2);
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "!~",
-												   $1, (Node *) n, @2);
+					$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "!~", $1, (Node *) n, @2);
 				}
 
 			/* NullTest clause
@@ -11334,7 +11297,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					NullTest *n = makeNode(NullTest);
 					n->arg = (Expr *) $1;
 					n->nulltesttype = IS_NULL;
-					n->location = @2;
 					$$ = (Node *)n;
 				}
 			| a_expr ISNULL
@@ -11342,7 +11304,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					NullTest *n = makeNode(NullTest);
 					n->arg = (Expr *) $1;
 					n->nulltesttype = IS_NULL;
-					n->location = @2;
 					$$ = (Node *)n;
 				}
 			| a_expr IS NOT NULL_P						%prec IS
@@ -11350,7 +11311,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					NullTest *n = makeNode(NullTest);
 					n->arg = (Expr *) $1;
 					n->nulltesttype = IS_NOT_NULL;
-					n->location = @2;
 					$$ = (Node *)n;
 				}
 			| a_expr NOTNULL
@@ -11358,7 +11318,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					NullTest *n = makeNode(NullTest);
 					n->arg = (Expr *) $1;
 					n->nulltesttype = IS_NOT_NULL;
-					n->location = @2;
 					$$ = (Node *)n;
 				}
 			| row OVERLAPS row
@@ -11382,7 +11341,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					BooleanTest *b = makeNode(BooleanTest);
 					b->arg = (Expr *) $1;
 					b->booltesttype = IS_TRUE;
-					b->location = @2;
 					$$ = (Node *)b;
 				}
 			| a_expr IS NOT TRUE_P						%prec IS
@@ -11390,7 +11348,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					BooleanTest *b = makeNode(BooleanTest);
 					b->arg = (Expr *) $1;
 					b->booltesttype = IS_NOT_TRUE;
-					b->location = @2;
 					$$ = (Node *)b;
 				}
 			| a_expr IS FALSE_P							%prec IS
@@ -11398,7 +11355,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					BooleanTest *b = makeNode(BooleanTest);
 					b->arg = (Expr *) $1;
 					b->booltesttype = IS_FALSE;
-					b->location = @2;
 					$$ = (Node *)b;
 				}
 			| a_expr IS NOT FALSE_P						%prec IS
@@ -11406,7 +11362,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					BooleanTest *b = makeNode(BooleanTest);
 					b->arg = (Expr *) $1;
 					b->booltesttype = IS_NOT_FALSE;
-					b->location = @2;
 					$$ = (Node *)b;
 				}
 			| a_expr IS UNKNOWN							%prec IS
@@ -11414,7 +11369,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					BooleanTest *b = makeNode(BooleanTest);
 					b->arg = (Expr *) $1;
 					b->booltesttype = IS_UNKNOWN;
-					b->location = @2;
 					$$ = (Node *)b;
 				}
 			| a_expr IS NOT UNKNOWN						%prec IS
@@ -11422,7 +11376,6 @@ a_expr:		c_expr									{ $$ = $1; }
 					BooleanTest *b = makeNode(BooleanTest);
 					b->arg = (Expr *) $1;
 					b->booltesttype = IS_NOT_UNKNOWN;
-					b->location = @2;
 					$$ = (Node *)b;
 				}
 			| a_expr IS DISTINCT FROM a_expr			%prec IS
@@ -11443,37 +11396,51 @@ a_expr:		c_expr									{ $$ = $1; }
 				{
 					$$ = (Node *) makeSimpleA_Expr(AEXPR_OF, "<>", $1, (Node *) $6, @2);
 				}
+			/*
+			 *	Ideally we would not use hard-wired operators below but
+			 *	instead use opclasses.  However, mixed data types and other
+			 *	issues make this difficult:
+			 *	http://archives.postgresql.org/pgsql-hackers/2008-08/msg01142.php
+			 */
 			| a_expr BETWEEN opt_asymmetric b_expr AND b_expr		%prec BETWEEN
 				{
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_BETWEEN,
-												   "BETWEEN",
-												   $1,
-												   (Node *) list_make2($4, $6),
-												   @2);
+					$$ = makeAndExpr(
+						(Node *) makeSimpleA_Expr(AEXPR_OP, ">=", $1, $4, @2),
+						(Node *) makeSimpleA_Expr(AEXPR_OP, "<=", $1, $6, @2),
+									 @2);
 				}
 			| a_expr NOT BETWEEN opt_asymmetric b_expr AND b_expr	%prec BETWEEN
 				{
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_BETWEEN,
-												   "NOT BETWEEN",
-												   $1,
-												   (Node *) list_make2($5, $7),
-												   @2);
+					$$ = makeOrExpr(
+						(Node *) makeSimpleA_Expr(AEXPR_OP, "<", $1, $5, @2),
+						(Node *) makeSimpleA_Expr(AEXPR_OP, ">", $1, $7, @2),
+									@2);
 				}
 			| a_expr BETWEEN SYMMETRIC b_expr AND b_expr			%prec BETWEEN
 				{
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_BETWEEN_SYM,
-												   "BETWEEN SYMMETRIC",
-												   $1,
-												   (Node *) list_make2($4, $6),
-												   @2);
+					$$ = makeOrExpr(
+						  makeAndExpr(
+							(Node *) makeSimpleA_Expr(AEXPR_OP, ">=", $1, $4, @2),
+							(Node *) makeSimpleA_Expr(AEXPR_OP, "<=", $1, $6, @2),
+									  @2),
+						  makeAndExpr(
+							(Node *) makeSimpleA_Expr(AEXPR_OP, ">=", $1, $6, @2),
+							(Node *) makeSimpleA_Expr(AEXPR_OP, "<=", $1, $4, @2),
+									  @2),
+									@2);
 				}
 			| a_expr NOT BETWEEN SYMMETRIC b_expr AND b_expr		%prec BETWEEN
 				{
-					$$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_BETWEEN_SYM,
-												   "NOT BETWEEN SYMMETRIC",
-												   $1,
-												   (Node *) list_make2($5, $7),
-												   @2);
+					$$ = makeAndExpr(
+						   makeOrExpr(
+							(Node *) makeSimpleA_Expr(AEXPR_OP, "<", $1, $5, @2),
+							(Node *) makeSimpleA_Expr(AEXPR_OP, ">", $1, $7, @2),
+									  @2),
+						   makeOrExpr(
+							(Node *) makeSimpleA_Expr(AEXPR_OP, "<", $1, $7, @2),
+							(Node *) makeSimpleA_Expr(AEXPR_OP, ">", $1, $5, @2),
+									  @2),
+									 @2);
 				}
 			| a_expr IN_P in_expr
 				{
@@ -11485,7 +11452,7 @@ a_expr:		c_expr									{ $$ = $1; }
 						n->subLinkType = ANY_SUBLINK;
 						n->subLinkId = 0;
 						n->testexpr = $1;
-						n->operName = NIL;		/* show it's IN not = ANY */
+						n->operName = list_make1(makeString("="));
 						n->location = @2;
 						$$ = (Node *)n;
 					}
@@ -11506,9 +11473,9 @@ a_expr:		c_expr									{ $$ = $1; }
 						n->subLinkType = ANY_SUBLINK;
 						n->subLinkId = 0;
 						n->testexpr = $1;
-						n->operName = NIL;		/* show it's IN not = ANY */
-						n->location = @2;
-						/* Stick a NOT on top; must have same parse location */
+						n->operName = list_make1(makeString("="));
+						n->location = @3;
+						/* Stick a NOT on top */
 						$$ = makeNotExpr((Node *) n, @2);
 					}
 					else
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 8d90b50..654dce6 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -339,11 +339,10 @@ transformJoinUsingClause(ParseState *pstate,
 
 	/*
 	 * We cheat a little bit here by building an untransformed operator tree
-	 * whose leaves are the already-transformed Vars.  This requires collusion
-	 * from transformExpr(), which normally could be expected to complain
-	 * about already-transformed subnodes.  However, this does mean that we
-	 * have to mark the columns as requiring SELECT privilege for ourselves;
-	 * transformExpr() won't do it.
+	 * whose leaves are the already-transformed Vars.  This is OK because
+	 * transformExpr() won't complain about already-transformed subnodes.
+	 * However, this does mean that we have to mark the columns as requiring
+	 * SELECT privilege for ourselves; transformExpr() won't do it.
 	 */
 	forboth(lvars, leftVars, rvars, rightVars)
 	{
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 7829bcb..f0f0488 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -48,7 +48,6 @@ static Node *transformAExprDistinct(ParseState *pstate, A_Expr *a);
 static Node *transformAExprNullIf(ParseState *pstate, A_Expr *a);
 static Node *transformAExprOf(ParseState *pstate, A_Expr *a);
 static Node *transformAExprIn(ParseState *pstate, A_Expr *a);
-static Node *transformAExprBetween(ParseState *pstate, A_Expr *a);
 static Node *transformBoolExpr(ParseState *pstate, BoolExpr *a);
 static Node *transformFuncCall(ParseState *pstate, FuncCall *fn);
 static Node *transformMultiAssignRef(ParseState *pstate, MultiAssignRef *maref);
@@ -81,8 +80,28 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 /*
  * transformExpr -
  *	  Analyze and transform expressions. Type checking and type casting is
- *	  done here.  This processing converts the raw grammar output into
- *	  expression trees with fully determined semantics.
+ *	  done here. The optimizer and the executor cannot handle the original
+ *	  (raw) expressions collected by the parse tree. Hence the transformation
+ *	  here.
+ *
+ * NOTE: there are various cases in which this routine will get applied to
+ * an already-transformed expression.  Some examples:
+ *	1. At least one construct (BETWEEN/AND) puts the same nodes
+ *	into two branches of the parse tree; hence, some nodes
+ *	are transformed twice.
+ *	2. Another way it can happen is that coercion of an operator or
+ *	function argument to the required type (via coerce_type())
+ *	can apply transformExpr to an already-transformed subexpression.
+ *	An example here is "SELECT count(*) + 1.0 FROM table".
+ *	3. CREATE TABLE t1 (LIKE t2 INCLUDING INDEXES) can pass in
+ *	already-transformed index expressions.
+ * While it might be possible to eliminate these cases, the path of
+ * least resistance so far has been to ensure that transformExpr() does
+ * no damage if applied to an already-transformed tree.  This is pretty
+ * easy for cases where the transformation replaces one node type with
+ * another, such as A_Const => Const; we just do nothing when handed
+ * a Const.  More care is needed for node types that are used as both
+ * input and output of transformExpr; see SubLink for example.
  */
 Node *
 transformExpr(ParseState *pstate, Node *expr, ParseExprKind exprKind)
@@ -148,8 +167,48 @@ transformExprRecurse(ParseState *pstate, Node *expr)
 			break;
 
 		case T_TypeCast:
-			result = transformTypeCast(pstate, (TypeCast *) expr);
-			break;
+			{
+				TypeCast   *tc = (TypeCast *) expr;
+
+				/*
+				 * If the subject of the typecast is an ARRAY[] construct and
+				 * the target type is an array type, we invoke
+				 * transformArrayExpr() directly so that we can pass down the
+				 * type information.  This avoids some cases where
+				 * transformArrayExpr() might not infer the correct type.
+				 */
+				if (IsA(tc->arg, A_ArrayExpr))
+				{
+					Oid			targetType;
+					Oid			elementType;
+					int32		targetTypmod;
+
+					typenameTypeIdAndMod(pstate, tc->typeName,
+										 &targetType, &targetTypmod);
+
+					/*
+					 * If target is a domain over array, work with the base
+					 * array type here.  transformTypeCast below will cast the
+					 * array type to the domain.  In the usual case that the
+					 * target is not a domain, transformTypeCast is a no-op.
+					 */
+					targetType = getBaseTypeAndTypmod(targetType,
+													  &targetTypmod);
+					elementType = get_element_type(targetType);
+					if (OidIsValid(elementType))
+					{
+						tc = copyObject(tc);
+						tc->arg = transformArrayExpr(pstate,
+													 (A_ArrayExpr *) tc->arg,
+													 targetType,
+													 elementType,
+													 targetTypmod);
+					}
+				}
+
+				result = transformTypeCast(pstate, tc);
+				break;
+			}
 
 		case T_CollateClause:
 			result = transformCollateClause(pstate, (CollateClause *) expr);
@@ -182,18 +241,6 @@ transformExprRecurse(ParseState *pstate, Node *expr)
 					case AEXPR_IN:
 						result = transformAExprIn(pstate, a);
 						break;
-					case AEXPR_LIKE:
-					case AEXPR_ILIKE:
-					case AEXPR_SIMILAR:
-						/* we can transform these just like AEXPR_OP */
-						result = transformAExprOp(pstate, a);
-						break;
-					case AEXPR_BETWEEN:
-					case AEXPR_NOT_BETWEEN:
-					case AEXPR_BETWEEN_SYM:
-					case AEXPR_NOT_BETWEEN_SYM:
-						result = transformAExprBetween(pstate, a);
-						break;
 					default:
 						elog(ERROR, "unrecognized A_Expr kind: %d", a->kind);
 						result = NULL;	/* keep compiler quiet */
@@ -270,19 +317,37 @@ transformExprRecurse(ParseState *pstate, Node *expr)
 			result = transformCurrentOfExpr(pstate, (CurrentOfExpr *) expr);
 			break;
 
-			/*
-			 * CaseTestExpr and SetToDefault don't require any processing;
-			 * they are only injected into parse trees in fully-formed state.
+			/*********************************************
+			 * Quietly accept node types that may be presented when we are
+			 * called on an already-transformed tree.
 			 *
-			 * Ordinarily we should not see a Var here, but it is convenient
-			 * for transformJoinUsingClause() to create untransformed operator
-			 * trees containing already-transformed Vars.  The best
-			 * alternative would be to deconstruct and reconstruct column
-			 * references, which seems expensively pointless.  So allow it.
-			 */
+			 * Do any other node types need to be accepted?  For now we are
+			 * taking a conservative approach, and only accepting node
+			 * types that are demonstrably necessary to accept.
+			 *********************************************/
+		case T_Var:
+		case T_Const:
+		case T_Param:
+		case T_Aggref:
+		case T_WindowFunc:
+		case T_ArrayRef:
+		case T_FuncExpr:
+		case T_OpExpr:
+		case T_DistinctExpr:
+		case T_NullIfExpr:
+		case T_ScalarArrayOpExpr:
+		case T_FieldSelect:
+		case T_FieldStore:
+		case T_RelabelType:
+		case T_CoerceViaIO:
+		case T_ArrayCoerceExpr:
+		case T_ConvertRowtypeExpr:
+		case T_CollateExpr:
 		case T_CaseTestExpr:
+		case T_ArrayExpr:
+		case T_CoerceToDomain:
+		case T_CoerceToDomainValue:
 		case T_SetToDefault:
-		case T_Var:
 			{
 				result = (Node *) expr;
 				break;
@@ -795,7 +860,6 @@ transformAExprOp(ParseState *pstate, A_Expr *a)
 		NullTest   *n = makeNode(NullTest);
 
 		n->nulltesttype = IS_NULL;
-		n->location = a->location;
 
 		if (exprIsNullConstant(lexpr))
 			n->arg = (Expr *) rexpr;
@@ -1132,101 +1196,6 @@ transformAExprIn(ParseState *pstate, A_Expr *a)
 }
 
 static Node *
-transformAExprBetween(ParseState *pstate, A_Expr *a)
-{
-	Node	   *aexpr;
-	Node	   *bexpr;
-	Node	   *cexpr;
-	Node	   *result;
-	Node	   *sub1;
-	Node	   *sub2;
-	List	   *args;
-
-	/* Deconstruct A_Expr into three subexprs */
-	aexpr = a->lexpr;
-	Assert(IsA(a->rexpr, List));
-	args = (List *) a->rexpr;
-	Assert(list_length(args) == 2);
-	bexpr = (Node *) linitial(args);
-	cexpr = (Node *) lsecond(args);
-
-	/*
-	 * Build the equivalent comparison expression.  Make copies of
-	 * multiply-referenced subexpressions for safety.  (XXX this is really
-	 * wrong since it results in multiple runtime evaluations of what may be
-	 * volatile expressions ...)
-	 *
-	 * Ideally we would not use hard-wired operators here but instead use
-	 * opclasses.  However, mixed data types and other issues make this
-	 * difficult:
-	 * http://archives.postgresql.org/pgsql-hackers/2008-08/msg01142.php
-	 */
-	switch (a->kind)
-	{
-		case AEXPR_BETWEEN:
-			args = list_make2(makeSimpleA_Expr(AEXPR_OP, ">=",
-											   aexpr, bexpr,
-											   a->location),
-							  makeSimpleA_Expr(AEXPR_OP, "<=",
-											   copyObject(aexpr), cexpr,
-											   a->location));
-			result = (Node *) makeBoolExpr(AND_EXPR, args, a->location);
-			break;
-		case AEXPR_NOT_BETWEEN:
-			args = list_make2(makeSimpleA_Expr(AEXPR_OP, "<",
-											   aexpr, bexpr,
-											   a->location),
-							  makeSimpleA_Expr(AEXPR_OP, ">",
-											   copyObject(aexpr), cexpr,
-											   a->location));
-			result = (Node *) makeBoolExpr(OR_EXPR, args, a->location);
-			break;
-		case AEXPR_BETWEEN_SYM:
-			args = list_make2(makeSimpleA_Expr(AEXPR_OP, ">=",
-											   aexpr, bexpr,
-											   a->location),
-							  makeSimpleA_Expr(AEXPR_OP, "<=",
-											   copyObject(aexpr), cexpr,
-											   a->location));
-			sub1 = (Node *) makeBoolExpr(AND_EXPR, args, a->location);
-			args = list_make2(makeSimpleA_Expr(AEXPR_OP, ">=",
-										copyObject(aexpr), copyObject(cexpr),
-											   a->location),
-							  makeSimpleA_Expr(AEXPR_OP, "<=",
-										copyObject(aexpr), copyObject(bexpr),
-											   a->location));
-			sub2 = (Node *) makeBoolExpr(AND_EXPR, args, a->location);
-			args = list_make2(sub1, sub2);
-			result = (Node *) makeBoolExpr(OR_EXPR, args, a->location);
-			break;
-		case AEXPR_NOT_BETWEEN_SYM:
-			args = list_make2(makeSimpleA_Expr(AEXPR_OP, "<",
-											   aexpr, bexpr,
-											   a->location),
-							  makeSimpleA_Expr(AEXPR_OP, ">",
-											   copyObject(aexpr), cexpr,
-											   a->location));
-			sub1 = (Node *) makeBoolExpr(OR_EXPR, args, a->location);
-			args = list_make2(makeSimpleA_Expr(AEXPR_OP, "<",
-										copyObject(aexpr), copyObject(cexpr),
-											   a->location),
-							  makeSimpleA_Expr(AEXPR_OP, ">",
-										copyObject(aexpr), copyObject(bexpr),
-											   a->location));
-			sub2 = (Node *) makeBoolExpr(OR_EXPR, args, a->location);
-			args = list_make2(sub1, sub2);
-			result = (Node *) makeBoolExpr(AND_EXPR, args, a->location);
-			break;
-		default:
-			elog(ERROR, "unrecognized A_Expr kind: %d", a->kind);
-			result = NULL;		/* keep compiler quiet */
-			break;
-	}
-
-	return transformExprRecurse(pstate, result);
-}
-
-static Node *
 transformBoolExpr(ParseState *pstate, BoolExpr *a)
 {
 	List	   *args = NIL;
@@ -1390,6 +1359,10 @@ transformCaseExpr(ParseState *pstate, CaseExpr *c)
 	Node	   *defresult;
 	Oid			ptype;
 
+	/* If we already transformed this node, do nothing */
+	if (OidIsValid(c->casetype))
+		return (Node *) c;
+
 	newc = makeNode(CaseExpr);
 
 	/* transform the test expression, if any */
@@ -1517,6 +1490,10 @@ transformSubLink(ParseState *pstate, SubLink *sublink)
 	Query	   *qtree;
 	const char *err;
 
+	/* If we already transformed this node, do nothing */
+	if (IsA(sublink->subselect, Query))
+		return result;
+
 	/*
 	 * Check to see if the sublink is in an invalid place within the query. We
 	 * allow sublinks everywhere in SELECT/INSERT/UPDATE/DELETE, but generally
@@ -1655,12 +1632,6 @@ transformSubLink(ParseState *pstate, SubLink *sublink)
 		ListCell   *l;
 
 		/*
-		 * If the source was "x IN (select)", convert to "x = ANY (select)".
-		 */
-		if (sublink->operName == NIL)
-			sublink->operName = list_make1(makeString("="));
-
-		/*
 		 * Transform lefthand expression, and convert to a list
 		 */
 		lefthand = transformExprRecurse(pstate, sublink->testexpr);
@@ -1891,6 +1862,10 @@ transformRowExpr(ParseState *pstate, RowExpr *r)
 	int			fnum;
 	ListCell   *lc;
 
+	/* If we already transformed this node, do nothing */
+	if (OidIsValid(r->row_typeid))
+		return (Node *) r;
+
 	newr = makeNode(RowExpr);
 
 	/* Transform the field expressions */
@@ -1997,6 +1972,10 @@ transformXmlExpr(ParseState *pstate, XmlExpr *x)
 	ListCell   *lc;
 	int			i;
 
+	/* If we already transformed this node, do nothing */
+	if (OidIsValid(x->type))
+		return (Node *) x;
+
 	newx = makeNode(XmlExpr);
 	newx->op = x->op;
 	if (x->name)
@@ -2300,51 +2279,14 @@ static Node *
 transformTypeCast(ParseState *pstate, TypeCast *tc)
 {
 	Node	   *result;
-	Node	   *expr;
-	Oid			inputType;
+	Node	   *expr = transformExprRecurse(pstate, tc->arg);
+	Oid			inputType = exprType(expr);
 	Oid			targetType;
 	int32		targetTypmod;
 	int			location;
 
 	typenameTypeIdAndMod(pstate, tc->typeName, &targetType, &targetTypmod);
 
-	/*
-	 * If the subject of the typecast is an ARRAY[] construct and the target
-	 * type is an array type, we invoke transformArrayExpr() directly so that
-	 * we can pass down the type information.  This avoids some cases where
-	 * transformArrayExpr() might not infer the correct type.  Otherwise, just
-	 * transform the argument normally.
-	 */
-	if (IsA(tc->arg, A_ArrayExpr))
-	{
-		Oid			targetBaseType;
-		int32		targetBaseTypmod;
-		Oid			elementType;
-
-		/*
-		 * If target is a domain over array, work with the base array type
-		 * here.  Below, we'll cast the array type to the domain.  In the
-		 * usual case that the target is not a domain, the remaining steps
-		 * will be a no-op.
-		 */
-		targetBaseTypmod = targetTypmod;
-		targetBaseType = getBaseTypeAndTypmod(targetType, &targetBaseTypmod);
-		elementType = get_element_type(targetBaseType);
-		if (OidIsValid(elementType))
-		{
-			expr = transformArrayExpr(pstate,
-									  (A_ArrayExpr *) tc->arg,
-									  targetBaseType,
-									  elementType,
-									  targetBaseTypmod);
-		}
-		else
-			expr = transformExprRecurse(pstate, tc->arg);
-	}
-	else
-		expr = transformExprRecurse(pstate, tc->arg);
-
-	inputType = exprType(expr);
 	if (inputType == InvalidOid)
 		return expr;			/* do nothing if NULL input */
 
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index c29f106..7540043 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1072,9 +1072,7 @@ generateClonedIndexStmt(CreateStmtContext *cxt, Relation source_idx,
 	index->oldNode = InvalidOid;
 	index->unique = idxrec->indisunique;
 	index->primary = idxrec->indisprimary;
-	index->transformed = true;	/* don't need transformIndexStmt */
 	index->concurrent = false;
-	index->if_not_exists = false;
 
 	/*
 	 * We don't try to preserve the name of the source index; instead, just
@@ -1532,9 +1530,7 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
 	index->idxcomment = NULL;
 	index->indexOid = InvalidOid;
 	index->oldNode = InvalidOid;
-	index->transformed = false;
 	index->concurrent = false;
-	index->if_not_exists = false;
 
 	/*
 	 * If it's ALTER TABLE ADD CONSTRAINT USING INDEX, look up the index and
@@ -1945,10 +1941,6 @@ transformIndexStmt(Oid relid, IndexStmt *stmt, const char *queryString)
 	ListCell   *l;
 	Relation	rel;
 
-	/* Nothing to do if statement already transformed. */
-	if (stmt->transformed)
-		return stmt;
-
 	/*
 	 * We must not scribble on the passed-in IndexStmt, so copy it.  (This is
 	 * overkill, but easy.)
@@ -2029,9 +2021,6 @@ transformIndexStmt(Oid relid, IndexStmt *stmt, const char *queryString)
 	/* Close relation */
 	heap_close(rel, NoLock);
 
-	/* Mark statement as successfully transformed */
-	stmt->transformed = true;
-
 	return stmt;
 }
 
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index b17771d..db49275 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -64,13 +64,13 @@ raw_parser(const char *str)
 /*
  * Intermediate filter between parser and core lexer (core_yylex in scan.l).
  *
- * This filter is needed because in some cases the standard SQL grammar
+ * The filter is needed because in some cases the standard SQL grammar
  * requires more than one token lookahead.  We reduce these cases to one-token
- * lookahead by replacing tokens here, in order to keep the grammar LALR(1).
+ * lookahead by combining tokens here, in order to keep the grammar LALR(1).
  *
  * Using a filter is simpler than trying to recognize multiword tokens
  * directly in scan.l, because we'd have to allow for comments between the
- * words.  Furthermore it's not clear how to do that without re-introducing
+ * words.  Furthermore it's not clear how to do it without re-introducing
  * scanner backtrack, which would cost more performance than this filter
  * layer does.
  *
@@ -84,7 +84,7 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
 	base_yy_extra_type *yyextra = pg_yyget_extra(yyscanner);
 	int			cur_token;
 	int			next_token;
-	int			cur_token_length;
+	core_YYSTYPE cur_yylval;
 	YYLTYPE		cur_yylloc;
 
 	/* Get next token --- we might already have it */
@@ -93,85 +93,74 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
 		cur_token = yyextra->lookahead_token;
 		lvalp->core_yystype = yyextra->lookahead_yylval;
 		*llocp = yyextra->lookahead_yylloc;
-		*(yyextra->lookahead_end) = yyextra->lookahead_hold_char;
 		yyextra->have_lookahead = false;
 	}
 	else
 		cur_token = core_yylex(&(lvalp->core_yystype), llocp, yyscanner);
 
-	/*
-	 * If this token isn't one that requires lookahead, just return it.  If it
-	 * does, determine the token length.  (We could get that via strlen(), but
-	 * since we have such a small set of possibilities, hardwiring seems
-	 * feasible and more efficient.)
-	 */
+	/* Do we need to look ahead for a possible multiword token? */
 	switch (cur_token)
 	{
 		case NULLS_P:
-			cur_token_length = 5;
-			break;
-		case WITH:
-			cur_token_length = 4;
-			break;
-		default:
-			return cur_token;
-	}
 
-	/*
-	 * Identify end+1 of current token.  core_yylex() has temporarily stored a
-	 * '\0' here, and will undo that when we call it again.  We need to redo
-	 * it to fully revert the lookahead call for error reporting purposes.
-	 */
-	yyextra->lookahead_end = yyextra->core_yy_extra.scanbuf +
-		*llocp + cur_token_length;
-	Assert(*(yyextra->lookahead_end) == '\0');
-
-	/*
-	 * Save and restore *llocp around the call.  It might look like we could
-	 * avoid this by just passing &lookahead_yylloc to core_yylex(), but that
-	 * does not work because flex actually holds onto the last-passed pointer
-	 * internally, and will use that for error reporting.  We need any error
-	 * reports to point to the current token, not the next one.
-	 */
-	cur_yylloc = *llocp;
-
-	/* Get next token, saving outputs into lookahead variables */
-	next_token = core_yylex(&(yyextra->lookahead_yylval), llocp, yyscanner);
-	yyextra->lookahead_token = next_token;
-	yyextra->lookahead_yylloc = *llocp;
-
-	*llocp = cur_yylloc;
-
-	/* Now revert the un-truncation of the current token */
-	yyextra->lookahead_hold_char = *(yyextra->lookahead_end);
-	*(yyextra->lookahead_end) = '\0';
-
-	yyextra->have_lookahead = true;
-
-	/* Replace cur_token if needed, based on lookahead */
-	switch (cur_token)
-	{
-		case NULLS_P:
-			/* Replace NULLS_P by NULLS_LA if it's followed by FIRST or LAST */
+			/*
+			 * NULLS FIRST and NULLS LAST must be reduced to one token
+			 */
+			cur_yylval = lvalp->core_yystype;
+			cur_yylloc = *llocp;
+			next_token = core_yylex(&(lvalp->core_yystype), llocp, yyscanner);
 			switch (next_token)
 			{
 				case FIRST_P:
+					cur_token = NULLS_FIRST;
+					break;
 				case LAST_P:
-					cur_token = NULLS_LA;
+					cur_token = NULLS_LAST;
+					break;
+				default:
+					/* save the lookahead token for next time */
+					yyextra->lookahead_token = next_token;
+					yyextra->lookahead_yylval = lvalp->core_yystype;
+					yyextra->lookahead_yylloc = *llocp;
+					yyextra->have_lookahead = true;
+					/* and back up the output info to cur_token */
+					lvalp->core_yystype = cur_yylval;
+					*llocp = cur_yylloc;
 					break;
 			}
 			break;
 
 		case WITH:
-			/* Replace WITH by WITH_LA if it's followed by TIME or ORDINALITY */
+
+			/*
+			 * WITH TIME and WITH ORDINALITY must each be reduced to one token
+			 */
+			cur_yylval = lvalp->core_yystype;
+			cur_yylloc = *llocp;
+			next_token = core_yylex(&(lvalp->core_yystype), llocp, yyscanner);
 			switch (next_token)
 			{
 				case TIME:
+					cur_token = WITH_TIME;
+					break;
 				case ORDINALITY:
-					cur_token = WITH_LA;
+					cur_token = WITH_ORDINALITY;
+					break;
+				default:
+					/* save the lookahead token for next time */
+					yyextra->lookahead_token = next_token;
+					yyextra->lookahead_yylval = lvalp->core_yystype;
+					yyextra->lookahead_yylloc = *llocp;
+					yyextra->have_lookahead = true;
+					/* and back up the output info to cur_token */
+					lvalp->core_yystype = cur_yylval;
+					*llocp = cur_yylloc;
 					break;
 			}
 			break;
+
+		default:
+			break;
 	}
 
 	return cur_token;
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 0dce6a8..237be12 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -130,7 +130,7 @@ typedef struct
 
 	int			num_requests;	/* current # of requests */
 	int			max_requests;	/* allocated array size */
-	CheckpointerRequest requests[FLEXIBLE_ARRAY_MEMBER];
+	CheckpointerRequest requests[1];	/* VARIABLE LENGTH ARRAY */
 } CheckpointerShmemStruct;
 
 static CheckpointerShmemStruct *CheckpointerShmem;
@@ -471,7 +471,7 @@ CheckpointerMain(void)
 				"checkpoints are occurring too frequently (%d seconds apart)",
 									   elapsed_secs,
 									   elapsed_secs),
-						 errhint("Consider increasing the configuration parameter \"max_wal_size\".")));
+						 errhint("Consider increasing the configuration parameter \"checkpoint_segments\".")));
 
 			/*
 			 * Initialize checkpointer-private variables used during
@@ -749,11 +749,11 @@ IsCheckpointOnSchedule(double progress)
 		return false;
 
 	/*
-	 * Check progress against WAL segments written and CheckPointSegments.
+	 * Check progress against WAL segments written and checkpoint_segments.
 	 *
 	 * We compare the current WAL insert location against the location
 	 * computed before calling CreateCheckPoint. The code in XLogInsert that
-	 * actually triggers a checkpoint when CheckPointSegments is exceeded
+	 * actually triggers a checkpoint when checkpoint_segments is exceeded
 	 * compares against RedoRecptr, so this is not completely accurate.
 	 * However, it's good enough for our purposes, we're only calculating an
 	 * estimate anyway.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 1148e29..268bcd5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -197,12 +197,8 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter tuples_inserted;		/* tuples inserted in xact */
 	PgStat_Counter tuples_updated;		/* tuples updated in xact */
 	PgStat_Counter tuples_deleted;		/* tuples deleted in xact */
-	PgStat_Counter inserted_pre_trunc;	/* tuples inserted prior to truncate */
-	PgStat_Counter updated_pre_trunc;	/* tuples updated prior to truncate */
-	PgStat_Counter deleted_pre_trunc;	/* tuples deleted prior to truncate */
 	Oid			t_id;			/* table's OID */
 	bool		t_shared;		/* is it a shared catalog? */
-	bool		t_truncated;	/* was the relation truncated? */
 } TwoPhasePgStatRecord;
 
 /*
@@ -1863,64 +1859,6 @@ pgstat_count_heap_delete(Relation rel)
 }
 
 /*
- * pgstat_truncate_save_counters
- *
- * Whenever a table is truncated, we save its i/u/d counters so that they can
- * be cleared, and if the (sub)xact that executed the truncate later aborts,
- * the counters can be restored to the saved (pre-truncate) values.  Note we do
- * this on the first truncate in any particular subxact level only.
- */
-static void
-pgstat_truncate_save_counters(PgStat_TableXactStatus *trans)
-{
-	if (!trans->truncated)
-	{
-		trans->inserted_pre_trunc = trans->tuples_inserted;
-		trans->updated_pre_trunc = trans->tuples_updated;
-		trans->deleted_pre_trunc = trans->tuples_deleted;
-		trans->truncated = true;
-	}
-}
-
-/*
- * pgstat_truncate_restore_counters - restore counters when a truncate aborts
- */
-static void
-pgstat_truncate_restore_counters(PgStat_TableXactStatus *trans)
-{
-	if (trans->truncated)
-	{
-		trans->tuples_inserted = trans->inserted_pre_trunc;
-		trans->tuples_updated = trans->updated_pre_trunc;
-		trans->tuples_deleted = trans->deleted_pre_trunc;
-	}
-}
-
-/*
- * pgstat_count_truncate - update tuple counters due to truncate
- */
-void
-pgstat_count_truncate(Relation rel)
-{
-	PgStat_TableStatus *pgstat_info = rel->pgstat_info;
-
-	if (pgstat_info != NULL)
-	{
-		/* We have to log the effect at the proper transactional level */
-		int			nest_level = GetCurrentTransactionNestLevel();
-
-		if (pgstat_info->trans == NULL ||
-			pgstat_info->trans->nest_level != nest_level)
-			add_tabstat_xact_level(pgstat_info, nest_level);
-
-		pgstat_truncate_save_counters(pgstat_info->trans);
-		pgstat_info->trans->tuples_inserted = 0;
-		pgstat_info->trans->tuples_updated = 0;
-		pgstat_info->trans->tuples_deleted = 0;
-	}
-}
-
-/*
  * pgstat_update_heap_dead_tuples - update dead-tuples count
  *
  * The semantics of this are that we are reporting the nontransactional
@@ -1978,22 +1916,12 @@ AtEOXact_PgStat(bool isCommit)
 			Assert(trans->upper == NULL);
 			tabstat = trans->parent;
 			Assert(tabstat->trans == trans);
-			/* restore pre-truncate stats (if any) in case of aborted xact */
-			if (!isCommit)
-				pgstat_truncate_restore_counters(trans);
 			/* count attempted actions regardless of commit/abort */
 			tabstat->t_counts.t_tuples_inserted += trans->tuples_inserted;
 			tabstat->t_counts.t_tuples_updated += trans->tuples_updated;
 			tabstat->t_counts.t_tuples_deleted += trans->tuples_deleted;
 			if (isCommit)
 			{
-				tabstat->t_counts.t_truncated = trans->truncated;
-				if (trans->truncated)
-				{
-					/* forget live/dead stats seen by backend thus far */
-					tabstat->t_counts.t_delta_live_tuples = 0;
-					tabstat->t_counts.t_delta_dead_tuples = 0;
-				}
 				/* insert adds a live tuple, delete removes one */
 				tabstat->t_counts.t_delta_live_tuples +=
 					trans->tuples_inserted - trans->tuples_deleted;
@@ -2058,21 +1986,9 @@ AtEOSubXact_PgStat(bool isCommit, int nestDepth)
 			{
 				if (trans->upper && trans->upper->nest_level == nestDepth - 1)
 				{
-					if (trans->truncated)
-					{
-						/* propagate the truncate status one level up */
-						pgstat_truncate_save_counters(trans->upper);
-						/* replace upper xact stats with ours */
-						trans->upper->tuples_inserted = trans->tuples_inserted;
-						trans->upper->tuples_updated = trans->tuples_updated;
-						trans->upper->tuples_deleted = trans->tuples_deleted;
-					}
-					else
-					{
-						trans->upper->tuples_inserted += trans->tuples_inserted;
-						trans->upper->tuples_updated += trans->tuples_updated;
-						trans->upper->tuples_deleted += trans->tuples_deleted;
-					}
+					trans->upper->tuples_inserted += trans->tuples_inserted;
+					trans->upper->tuples_updated += trans->tuples_updated;
+					trans->upper->tuples_deleted += trans->tuples_deleted;
 					tabstat->trans = trans->upper;
 					pfree(trans);
 				}
@@ -2101,8 +2017,6 @@ AtEOSubXact_PgStat(bool isCommit, int nestDepth)
 				 * subtransaction
 				 */
 
-				/* first restore values obliterated by truncate */
-				pgstat_truncate_restore_counters(trans);
 				/* count attempted actions regardless of commit/abort */
 				tabstat->t_counts.t_tuples_inserted += trans->tuples_inserted;
 				tabstat->t_counts.t_tuples_updated += trans->tuples_updated;
@@ -2151,12 +2065,8 @@ AtPrepare_PgStat(void)
 			record.tuples_inserted = trans->tuples_inserted;
 			record.tuples_updated = trans->tuples_updated;
 			record.tuples_deleted = trans->tuples_deleted;
-			record.inserted_pre_trunc = trans->inserted_pre_trunc;
-			record.updated_pre_trunc = trans->updated_pre_trunc;
-			record.deleted_pre_trunc = trans->deleted_pre_trunc;
 			record.t_id = tabstat->t_id;
 			record.t_shared = tabstat->t_shared;
-			record.t_truncated = trans->truncated;
 
 			RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
 								   &record, sizeof(TwoPhasePgStatRecord));
@@ -2222,8 +2132,6 @@ pgstat_twophase_postcommit(TransactionId xid, uint16 info,
 	pgstat_info->t_counts.t_tuples_inserted += rec->tuples_inserted;
 	pgstat_info->t_counts.t_tuples_updated += rec->tuples_updated;
 	pgstat_info->t_counts.t_tuples_deleted += rec->tuples_deleted;
-	pgstat_info->t_counts.t_truncated = rec->t_truncated;
-
 	pgstat_info->t_counts.t_delta_live_tuples +=
 		rec->tuples_inserted - rec->tuples_deleted;
 	pgstat_info->t_counts.t_delta_dead_tuples +=
@@ -2250,12 +2158,6 @@ pgstat_twophase_postabort(TransactionId xid, uint16 info,
 	pgstat_info = get_tabstat_entry(rec->t_id, rec->t_shared);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
-	if (rec->t_truncated)
-	{
-		rec->tuples_inserted = rec->inserted_pre_trunc;
-		rec->tuples_updated = rec->updated_pre_trunc;
-		rec->tuples_deleted = rec->deleted_pre_trunc;
-	}
 	pgstat_info->t_counts.t_tuples_inserted += rec->tuples_inserted;
 	pgstat_info->t_counts.t_tuples_updated += rec->tuples_updated;
 	pgstat_info->t_counts.t_tuples_deleted += rec->tuples_deleted;
@@ -4756,12 +4658,6 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 			tabentry->tuples_updated += tabmsg->t_counts.t_tuples_updated;
 			tabentry->tuples_deleted += tabmsg->t_counts.t_tuples_deleted;
 			tabentry->tuples_hot_updated += tabmsg->t_counts.t_tuples_hot_updated;
-			/* If table was truncated, first reset the live/dead counters */
-			if (tabmsg->t_counts.t_truncated)
-			{
-				tabentry->n_live_tuples = 0;
-				tabentry->n_dead_tuples = 0;
-			}
 			tabentry->n_live_tuples += tabmsg->t_counts.t_delta_live_tuples;
 			tabentry->n_dead_tuples += tabmsg->t_counts.t_delta_dead_tuples;
 			tabentry->changes_since_analyze += tabmsg->t_counts.t_changed_tuples;
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 14ff147..41b8dbb 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -785,13 +785,13 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
 	int			dest = LOG_DESTINATION_STDERR;
 
 	/* While we have enough for a header, process data... */
-	while (count >= (int) (offsetof(PipeProtoHeader, data) +1))
+	while (count >= (int) sizeof(PipeProtoHeader))
 	{
 		PipeProtoHeader p;
 		int			chunklen;
 
 		/* Do we have a valid header? */
-		memcpy(&p, cursor, offsetof(PipeProtoHeader, data));
+		memcpy(&p, cursor, sizeof(PipeProtoHeader));
 		if (p.nuls[0] == '\0' && p.nuls[1] == '\0' &&
 			p.len > 0 && p.len <= PIPE_MAX_PAYLOAD &&
 			p.pid != 0 &&
diff --git a/src/backend/replication/README b/src/backend/replication/README
index 8e5bf0d..2f5df49 100644
--- a/src/backend/replication/README
+++ b/src/backend/replication/README
@@ -42,7 +42,7 @@ Walreceiver IPC
 ---------------
 
 When the WAL replay in startup process has reached the end of archived WAL,
-restorable using restore_command, it starts up the walreceiver process
+recoverable using recovery_command, it starts up the walreceiver process
 to fetch more WAL (if streaming replication is configured).
 
 Walreceiver is a postmaster subprocess, so the startup process can't fork it
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 3563fd9..3058ce9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1258,30 +1258,11 @@ _tarWriteHeader(const char *filename, const char *linktarget,
 				struct stat * statbuf)
 {
 	char		h[512];
-	enum tarError rc;
 
-	rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size,
+	tarCreateHeader(h, filename, linktarget, statbuf->st_size,
 					statbuf->st_mode, statbuf->st_uid, statbuf->st_gid,
 					statbuf->st_mtime);
 
-	switch (rc)
-	{
-		case TAR_OK:
-			break;
-		case TAR_NAME_TOO_LONG:
-			ereport(ERROR,
-					(errmsg("file name too long for tar format: \"%s\"",
-							filename)));
-			break;
-		case TAR_SYMLINK_TOO_LONG:
-			ereport(ERROR,
-					(errmsg("symbolic link target too long for tar format: file name \"%s\", target \"%s\"",
-							filename, linktarget)));
-			break;
-		default:
-			elog(ERROR, "unrecognized tar error: %d", rc);
-	}
-
 	pq_putmessage('d', h, 512);
 }
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e7614bd..77c02ba 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -765,19 +765,21 @@ DecodeMultiInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			 * transactions.
 			 */
 			tuple->tuple.t_tableOid = InvalidOid;
-			tuple->tuple.t_data = &tuple->t_data.header;
-			tuple->tuple.t_len = datalen + SizeofHeapTupleHeader;
+			tuple->tuple.t_data = &tuple->header;
+			tuple->tuple.t_len = datalen
+				+ offsetof(HeapTupleHeaderData, t_bits);
 
-			memset(&tuple->t_data.header, 0, SizeofHeapTupleHeader);
+			memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
 
-			memcpy((char *) &tuple->t_data.header + SizeofHeapTupleHeader,
+			memcpy((char *) &tuple->header
+				   + offsetof(HeapTupleHeaderData, t_bits),
 				   (char *) data,
 				   datalen);
 			data += datalen;
 
-			tuple->t_data.header.t_infomask = xlhdr->t_infomask;
-			tuple->t_data.header.t_infomask2 = xlhdr->t_infomask2;
-			tuple->t_data.header.t_hoff = xlhdr->t_hoff;
+			tuple->header.t_infomask = xlhdr->t_infomask;
+			tuple->header.t_infomask2 = xlhdr->t_infomask2;
+			tuple->header.t_hoff = xlhdr->t_hoff;
 		}
 
 		/*
@@ -813,27 +815,27 @@ DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tuple)
 	Assert(datalen >= 0);
 	Assert(datalen <= MaxHeapTupleSize);
 
-	tuple->tuple.t_len = datalen + SizeofHeapTupleHeader;
+	tuple->tuple.t_len = datalen + offsetof(HeapTupleHeaderData, t_bits);
 
 	/* not a disk based tuple */
 	ItemPointerSetInvalid(&tuple->tuple.t_self);
 
 	/* we can only figure this out after reassembling the transactions */
 	tuple->tuple.t_tableOid = InvalidOid;
-	tuple->tuple.t_data = &tuple->t_data.header;
+	tuple->tuple.t_data = &tuple->header;
 
 	/* data is not stored aligned, copy to aligned storage */
 	memcpy((char *) &xlhdr,
 		   data,
 		   SizeOfHeapHeader);
 
-	memset(&tuple->t_data.header, 0, SizeofHeapTupleHeader);
+	memset(&tuple->header, 0, sizeof(HeapTupleHeaderData));
 
-	memcpy((char *) &tuple->t_data.header + SizeofHeapTupleHeader,
+	memcpy((char *) &tuple->header + offsetof(HeapTupleHeaderData, t_bits),
 		   data + SizeOfHeapHeader,
 		   datalen);
 
-	tuple->t_data.header.t_infomask = xlhdr.t_infomask;
-	tuple->t_data.header.t_infomask2 = xlhdr.t_infomask2;
-	tuple->t_data.header.t_hoff = xlhdr.t_hoff;
+	tuple->header.t_infomask = xlhdr.t_infomask;
+	tuple->header.t_infomask2 = xlhdr.t_infomask2;
+	tuple->header.t_hoff = xlhdr.t_hoff;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 20bb3b7..bcd5896 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2014,12 +2014,14 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				newtup = change->data.tp.newtuple;
 
 				if (oldtup)
-					oldlen = offsetof(ReorderBufferTupleBuf, t_data) +
-						oldtup->tuple.t_len;
+					oldlen = offsetof(ReorderBufferTupleBuf, data)
+						+oldtup->tuple.t_len
+						- offsetof(HeapTupleHeaderData, t_bits);
 
 				if (newtup)
-					newlen = offsetof(ReorderBufferTupleBuf, t_data) +
-						newtup->tuple.t_len;
+					newlen = offsetof(ReorderBufferTupleBuf, data)
+						+newtup->tuple.t_len
+						- offsetof(HeapTupleHeaderData, t_bits);
 
 				sz += oldlen;
 				sz += newlen;
@@ -2260,25 +2262,27 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		case REORDER_BUFFER_CHANGE_DELETE:
 			if (change->data.tp.newtuple)
 			{
-				Size		len = offsetof(ReorderBufferTupleBuf, t_data) +
-				((ReorderBufferTupleBuf *) data)->tuple.t_len;
+				Size		len = offsetof(ReorderBufferTupleBuf, data)
+				+((ReorderBufferTupleBuf *) data)->tuple.t_len
+				- offsetof(HeapTupleHeaderData, t_bits);
 
 				change->data.tp.newtuple = ReorderBufferGetTupleBuf(rb);
 				memcpy(change->data.tp.newtuple, data, len);
 				change->data.tp.newtuple->tuple.t_data =
-					&change->data.tp.newtuple->t_data.header;
+					&change->data.tp.newtuple->header;
 				data += len;
 			}
 
 			if (change->data.tp.oldtuple)
 			{
-				Size		len = offsetof(ReorderBufferTupleBuf, t_data) +
-				((ReorderBufferTupleBuf *) data)->tuple.t_len;
+				Size		len = offsetof(ReorderBufferTupleBuf, data)
+				+((ReorderBufferTupleBuf *) data)->tuple.t_len
+				- offsetof(HeapTupleHeaderData, t_bits);
 
 				change->data.tp.oldtuple = ReorderBufferGetTupleBuf(rb);
 				memcpy(change->data.tp.oldtuple, data, len);
 				change->data.tp.oldtuple->tuple.t_data =
-					&change->data.tp.oldtuple->t_data.header;
+					&change->data.tp.oldtuple->header;
 				data += len;
 			}
 			break;
@@ -2656,7 +2660,7 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	tmphtup = heap_form_tuple(desc, attrs, isnull);
 	Assert(newtup->tuple.t_len <= MaxHeapTupleSize);
-	Assert(&newtup->t_data.header == newtup->tuple.t_data);
+	Assert(&newtup->header == newtup->tuple.t_data);
 
 	memcpy(newtup->tuple.t_data, tmphtup->t_data, tmphtup->t_len);
 	newtup->tuple.t_len = tmphtup->t_len;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 9d2c280..b8e6e7a 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -664,7 +664,7 @@ adjustJoinTreeList(Query *parsetree, bool removert, int rt_index)
  *			UPDATE table SET foo[2] = 42, foo[4] = 43;
  * We can merge such operations into a single assignment op.  Essentially,
  * the expression we want to produce in this case is like
- *		foo = array_set_element(array_set_element(foo, 2, 42), 4, 43)
+ *		foo = array_set(array_set(foo, 2, 42), 4, 43)
  *
  * 4. Sort the tlist into standard order: non-junk fields in order by resno,
  * then junk fields (these in no particular order).
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index df45708..75dd41e 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -1023,7 +1023,6 @@ AddInvertedQual(Query *parsetree, Node *qual)
 	invqual = makeNode(BooleanTest);
 	invqual->arg = (Expr *) qual;
 	invqual->booltesttype = IS_NOT_TRUE;
-	invqual->location = -1;
 
 	AddQual(parsetree, (Node *) invqual);
 }
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index a68eae8..e1e6240 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3273,20 +3273,6 @@ LockBufferForCleanup(Buffer buffer)
 		else
 			ProcWaitForSignal();
 
-		/*
-		 * Remove flag marking us as waiter. Normally this will not be set
-		 * anymore, but ProcWaitForSignal() can return for other signals as
-		 * well.  We take care to only reset the flag if we're the waiter, as
-		 * theoretically another backend could have started waiting. That's
-		 * impossible with the current usages due to table level locking, but
-		 * better be safe.
-		 */
-		LockBufHdr(bufHdr);
-		if ((bufHdr->flags & BM_PIN_COUNT_WAITER) != 0 &&
-			bufHdr->wait_backend_pid == MyProcPid)
-			bufHdr->flags &= ~BM_PIN_COUNT_WAITER;
-		UnlockBufHdr(bufHdr);
-
 		PinCountWaitBuf = NULL;
 		/* Loop back and try again */
 	}
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index f0d23d6..0d1cbd1 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -93,7 +93,7 @@ typedef struct BufferAccessStrategyData
 	 * simplicity this is palloc'd together with the fixed fields of the
 	 * struct.
 	 */
-	Buffer		buffers[FLEXIBLE_ARRAY_MEMBER];
+	Buffer		buffers[1];		/* VARIABLE SIZE ARRAY */
 }	BufferAccessStrategyData;
 
 
diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index ea3fe20..0c89eb7 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -66,7 +66,7 @@ struct PMSignalData
 	/* per-child-process flags */
 	int			num_child_flags;	/* # of entries in PMChildFlags[] */
 	int			next_child_flag;	/* next slot to try to assign */
-	sig_atomic_t PMChildFlags[FLEXIBLE_ARRAY_MEMBER];
+	sig_atomic_t PMChildFlags[1];		/* VARIABLE LENGTH ARRAY */
 };
 
 NON_EXEC_STATIC volatile PMSignalData *PMSignalState = NULL;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 8eaec0c..a1ebc72 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -90,8 +90,11 @@ typedef struct ProcArrayStruct
 	/* oldest catalog xmin of any replication slot */
 	TransactionId replication_slot_catalog_xmin;
 
-	/* indexes into allPgXact[], has PROCARRAY_MAXPROCS entries */
-	int			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
+	/*
+	 * We declare pgprocnos[] as 1 entry because C wants a fixed-size array,
+	 * but actually it is maxProcs entries long.
+	 */
+	int			pgprocnos[1];	/* VARIABLE LENGTH ARRAY */
 } ProcArrayStruct;
 
 static ProcArrayStruct *procArray;
diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index 81b85c0..bb3e604 100644
--- a/src/backend/storage/ipc/sinvaladt.c
+++ b/src/backend/storage/ipc/sinvaladt.c
@@ -184,9 +184,12 @@ typedef struct SISeg
 	SharedInvalidationMessage buffer[MAXNUMMESSAGES];
 
 	/*
-	 * Per-backend invalidation state info (has MaxBackends entries).
+	 * Per-backend state info.
+	 *
+	 * We declare procState as 1 entry because C wants a fixed-size array, but
+	 * actually it is maxBackends entries long.
 	 */
-	ProcState	procState[FLEXIBLE_ARRAY_MEMBER];
+	ProcState	procState[1];	/* reflects the invalidation state */
 } SISeg;
 
 static SISeg *shmInvalBuffer;	/* pointer to the shared inval buffer */
@@ -218,12 +221,16 @@ SInvalShmemSize(void)
 void
 CreateSharedInvalidationState(void)
 {
+	Size		size;
 	int			i;
 	bool		found;
 
 	/* Allocate space in shared memory */
+	size = offsetof(SISeg, procState);
+	size = add_size(size, mul_size(sizeof(ProcState), MaxBackends));
+
 	shmInvalBuffer = (SISeg *)
-		ShmemInitStruct("shmInvalBuffer", SInvalShmemSize(), &found);
+		ShmemInitStruct("shmInvalBuffer", size, &found);
 	if (found)
 		return;
 
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
index ecd5e61..a19c401 100644
--- a/src/backend/storage/large_object/inv_api.c
+++ b/src/backend/storage/large_object/inv_api.c
@@ -562,13 +562,11 @@ inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes)
 	bool		neednextpage;
 	bytea	   *datafield;
 	bool		pfreeit;
-	union
+	struct
 	{
 		bytea		hdr;
-		/* this is to make the union big enough for a LO data chunk: */
-		char		data[LOBLKSIZE + VARHDRSZ];
-		/* ensure union is aligned well enough: */
-		int32		align_it;
+		char		data[LOBLKSIZE];	/* make struct big enough */
+		int32		align_it;	/* ensure struct is aligned well enough */
 	}			workbuf;
 	char	   *workb = VARDATA(&workbuf.hdr);
 	HeapTuple	newtup;
@@ -750,13 +748,11 @@ inv_truncate(LargeObjectDesc *obj_desc, int64 len)
 	SysScanDesc sd;
 	HeapTuple	oldtuple;
 	Form_pg_largeobject olddata;
-	union
+	struct
 	{
 		bytea		hdr;
-		/* this is to make the union big enough for a LO data chunk: */
-		char		data[LOBLKSIZE + VARHDRSZ];
-		/* ensure union is aligned well enough: */
-		int32		align_it;
+		char		data[LOBLKSIZE];	/* make struct big enough */
+		int32		align_it;	/* ensure struct is aligned well enough */
 	}			workbuf;
 	char	   *workb = VARDATA(&workbuf.hdr);
 	HeapTuple	newtup;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 33720e8..28af40c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1619,8 +1619,9 @@ exec_bind_message(StringInfo input_message)
 	{
 		int			paramno;
 
-		params = (ParamListInfo) palloc(offsetof(ParamListInfoData, params) +
-										numParams * sizeof(ParamExternData));
+		/* sizeof(ParamListInfoData) includes the first array element */
+		params = (ParamListInfo) palloc(sizeof(ParamListInfoData) +
+								  (numParams - 1) * sizeof(ParamExternData));
 		/* we have static list of params, so no hooks needed */
 		params->paramFetch = NULL;
 		params->paramFetchArg = NULL;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 6d26986..3533cfa 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -513,6 +513,14 @@ standard_ProcessUtility(Node *parsetree,
 			ExecuteTruncate((TruncateStmt *) parsetree);
 			break;
 
+		case T_CommentStmt:
+			CommentObject((CommentStmt *) parsetree);
+			break;
+
+		case T_SecLabelStmt:
+			ExecSecLabelStmt((SecLabelStmt *) parsetree);
+			break;
+
 		case T_CopyStmt:
 			{
 				uint64		processed;
@@ -540,6 +548,11 @@ standard_ProcessUtility(Node *parsetree,
 			DeallocateQuery((DeallocateStmt *) parsetree);
 			break;
 
+		case T_GrantStmt:
+			/* no event triggers for global objects */
+			ExecuteGrantStmt((GrantStmt *) parsetree);
+			break;
+
 		case T_GrantRoleStmt:
 			/* no event triggers for global objects */
 			GrantRole((GrantRoleStmt *) parsetree);
@@ -770,19 +783,6 @@ standard_ProcessUtility(Node *parsetree,
 			 * in some cases, so we "fast path" them in the other cases.
 			 */
 
-		case T_GrantStmt:
-			{
-				GrantStmt  *stmt = (GrantStmt *) parsetree;
-
-				if (EventTriggerSupportsGrantObjectType(stmt->objtype))
-					ProcessUtilitySlow(parsetree, queryString,
-									   context, params,
-									   dest, completionTag);
-				else
-					ExecuteGrantStmt((GrantStmt *) parsetree);
-			}
-			break;
-
 		case T_DropStmt:
 			{
 				DropStmt   *stmt = (DropStmt *) parsetree;
@@ -835,32 +835,6 @@ standard_ProcessUtility(Node *parsetree,
 			}
 			break;
 
-		case T_CommentStmt:
-			{
-				CommentStmt *stmt = (CommentStmt *) parsetree;
-
-				if (EventTriggerSupportsObjectType(stmt->objtype))
-					ProcessUtilitySlow(parsetree, queryString,
-									   context, params,
-									   dest, completionTag);
-				else
-					CommentObject((CommentStmt *) parsetree);
-				break;
-			}
-
-		case T_SecLabelStmt:
-			{
-				SecLabelStmt *stmt = (SecLabelStmt *) parsetree;
-
-				if (EventTriggerSupportsObjectType(stmt->objtype))
-					ProcessUtilitySlow(parsetree, queryString,
-									   context, params,
-									   dest, completionTag);
-				else
-					ExecSecLabelStmt(stmt);
-				break;
-			}
-
 		default:
 			/* All other statement types have event trigger support */
 			ProcessUtilitySlow(parsetree, queryString,
@@ -1341,14 +1315,6 @@ ProcessUtilitySlow(Node *parsetree,
 				ExecAlterOwnerStmt((AlterOwnerStmt *) parsetree);
 				break;
 
-			case T_CommentStmt:
-				CommentObject((CommentStmt *) parsetree);
-				break;
-
-			case T_GrantStmt:
-				ExecuteGrantStmt((GrantStmt *) parsetree);
-				break;
-
 			case T_DropOwnedStmt:
 				DropOwnedObjects((DropOwnedStmt *) parsetree);
 				break;
@@ -1365,10 +1331,6 @@ ProcessUtilitySlow(Node *parsetree,
 				AlterPolicy((AlterPolicyStmt *) parsetree);
 				break;
 
-			case T_SecLabelStmt:
-				ExecSecLabelStmt((SecLabelStmt *) parsetree);
-				break;
-
 			default:
 				elog(ERROR, "unrecognized node type: %d",
 					 (int) nodeTag(parsetree));
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index f5cc265..8b71864 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -52,7 +52,7 @@ my @fmgr = ();
 my @attnames;
 foreach my $column (@{ $catalogs->{pg_proc}->{columns} })
 {
-	push @attnames, $column->{name};
+	push @attnames, keys %$column;
 }
 
 my $data = $catalogs->{pg_proc}->{data};
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 6679333..600646e 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -17,157 +17,101 @@
 #include "utils/lsyscache.h"
 
 
-/*
- * fetch_array_arg_replace_nulls
- *
- * Fetch an array-valued argument; if it's null, construct an empty array
- * value of the proper data type.  Also cache basic element type information
- * in fn_extra.
- */
-static ArrayType *
-fetch_array_arg_replace_nulls(FunctionCallInfo fcinfo, int argno)
-{
-	ArrayType  *v;
-	Oid			element_type;
-	ArrayMetaState *my_extra;
-
-	/* First collect the array value */
-	if (!PG_ARGISNULL(argno))
-	{
-		v = PG_GETARG_ARRAYTYPE_P(argno);
-		element_type = ARR_ELEMTYPE(v);
-	}
-	else
-	{
-		/* We have to look up the array type and element type */
-		Oid			arr_typeid = get_fn_expr_argtype(fcinfo->flinfo, argno);
-
-		if (!OidIsValid(arr_typeid))
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("could not determine input data type")));
-		element_type = get_element_type(arr_typeid);
-		if (!OidIsValid(element_type))
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("input data type is not an array")));
-
-		v = construct_empty_array(element_type);
-	}
-
-	/* Now cache required info, which might change from call to call */
-	my_extra = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
-	if (my_extra == NULL)
-	{
-		my_extra = (ArrayMetaState *)
-			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   sizeof(ArrayMetaState));
-		my_extra->element_type = InvalidOid;
-		fcinfo->flinfo->fn_extra = my_extra;
-	}
-
-	if (my_extra->element_type != element_type)
-	{
-		get_typlenbyvalalign(element_type,
-							 &my_extra->typlen,
-							 &my_extra->typbyval,
-							 &my_extra->typalign);
-		my_extra->element_type = element_type;
-	}
-
-	return v;
-}
-
 /*-----------------------------------------------------------------------------
- * array_append :
- *		push an element onto the end of a one-dimensional array
+ * array_push :
+ *		push an element onto either end of a one-dimensional array
  *----------------------------------------------------------------------------
  */
 Datum
-array_append(PG_FUNCTION_ARGS)
+array_push(PG_FUNCTION_ARGS)
 {
 	ArrayType  *v;
 	Datum		newelem;
 	bool		isNull;
-	ArrayType  *result;
 	int		   *dimv,
 			   *lb;
+	ArrayType  *result;
 	int			indx;
+	Oid			element_type;
+	int16		typlen;
+	bool		typbyval;
+	char		typalign;
+	Oid			arg0_typeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+	Oid			arg1_typeid = get_fn_expr_argtype(fcinfo->flinfo, 1);
+	Oid			arg0_elemid;
+	Oid			arg1_elemid;
 	ArrayMetaState *my_extra;
 
-	v = fetch_array_arg_replace_nulls(fcinfo, 0);
-	isNull = PG_ARGISNULL(1);
-	if (isNull)
-		newelem = (Datum) 0;
-	else
-		newelem = PG_GETARG_DATUM(1);
+	if (arg0_typeid == InvalidOid || arg1_typeid == InvalidOid)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("could not determine input data types")));
 
-	if (ARR_NDIM(v) == 1)
-	{
-		/* append newelem */
-		int			ub;
+	arg0_elemid = get_element_type(arg0_typeid);
+	arg1_elemid = get_element_type(arg1_typeid);
 
-		lb = ARR_LBOUND(v);
-		dimv = ARR_DIMS(v);
-		ub = dimv[0] + lb[0] - 1;
-		indx = ub + 1;
-
-		/* overflow? */
-		if (indx < ub)
-			ereport(ERROR,
-					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-					 errmsg("integer out of range")));
+	if (arg0_elemid != InvalidOid)
+	{
+		if (PG_ARGISNULL(0))
+			v = construct_empty_array(arg0_elemid);
+		else
+			v = PG_GETARG_ARRAYTYPE_P(0);
+		isNull = PG_ARGISNULL(1);
+		if (isNull)
+			newelem = (Datum) 0;
+		else
+			newelem = PG_GETARG_DATUM(1);
+	}
+	else if (arg1_elemid != InvalidOid)
+	{
+		if (PG_ARGISNULL(1))
+			v = construct_empty_array(arg1_elemid);
+		else
+			v = PG_GETARG_ARRAYTYPE_P(1);
+		isNull = PG_ARGISNULL(0);
+		if (isNull)
+			newelem = (Datum) 0;
+		else
+			newelem = PG_GETARG_DATUM(0);
 	}
-	else if (ARR_NDIM(v) == 0)
-		indx = 1;
 	else
+	{
+		/* Shouldn't get here given proper type checking in parser */
 		ereport(ERROR,
-				(errcode(ERRCODE_DATA_EXCEPTION),
-				 errmsg("argument must be empty or one-dimensional array")));
-
-	/* Perform element insertion */
-	my_extra = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
-
-	result = array_set(v, 1, &indx, newelem, isNull,
-			   -1, my_extra->typlen, my_extra->typbyval, my_extra->typalign);
-
-	PG_RETURN_ARRAYTYPE_P(result);
-}
-
-/*-----------------------------------------------------------------------------
- * array_prepend :
- *		push an element onto the front of a one-dimensional array
- *----------------------------------------------------------------------------
- */
-Datum
-array_prepend(PG_FUNCTION_ARGS)
-{
-	ArrayType  *v;
-	Datum		newelem;
-	bool		isNull;
-	ArrayType  *result;
-	int		   *lb;
-	int			indx;
-	ArrayMetaState *my_extra;
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("neither input type is an array")));
+		PG_RETURN_NULL();		/* keep compiler quiet */
+	}
 
-	isNull = PG_ARGISNULL(0);
-	if (isNull)
-		newelem = (Datum) 0;
-	else
-		newelem = PG_GETARG_DATUM(0);
-	v = fetch_array_arg_replace_nulls(fcinfo, 1);
+	element_type = ARR_ELEMTYPE(v);
 
 	if (ARR_NDIM(v) == 1)
 	{
-		/* prepend newelem */
 		lb = ARR_LBOUND(v);
-		indx = lb[0] - 1;
+		dimv = ARR_DIMS(v);
+
+		if (arg0_elemid != InvalidOid)
+		{
+			/* append newelem */
+			int			ub = dimv[0] + lb[0] - 1;
 
-		/* overflow? */
-		if (indx > lb[0])
-			ereport(ERROR,
-					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-					 errmsg("integer out of range")));
+			indx = ub + 1;
+			/* overflow? */
+			if (indx < ub)
+				ereport(ERROR,
+						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+						 errmsg("integer out of range")));
+		}
+		else
+		{
+			/* prepend newelem */
+			indx = lb[0] - 1;
+			/* overflow? */
+			if (indx > lb[0])
+				ereport(ERROR,
+						(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+						 errmsg("integer out of range")));
+		}
 	}
 	else if (ARR_NDIM(v) == 0)
 		indx = 1;
@@ -176,13 +120,39 @@ array_prepend(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_DATA_EXCEPTION),
 				 errmsg("argument must be empty or one-dimensional array")));
 
-	/* Perform element insertion */
+	/*
+	 * We arrange to look up info about element type only once per series of
+	 * calls, assuming the element type doesn't change underneath us.
+	 */
 	my_extra = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
+	if (my_extra == NULL)
+	{
+		fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+													  sizeof(ArrayMetaState));
+		my_extra = (ArrayMetaState *) fcinfo->flinfo->fn_extra;
+		my_extra->element_type = ~element_type;
+	}
+
+	if (my_extra->element_type != element_type)
+	{
+		/* Get info about element type */
+		get_typlenbyvalalign(element_type,
+							 &my_extra->typlen,
+							 &my_extra->typbyval,
+							 &my_extra->typalign);
+		my_extra->element_type = element_type;
+	}
+	typlen = my_extra->typlen;
+	typbyval = my_extra->typbyval;
+	typalign = my_extra->typalign;
 
 	result = array_set(v, 1, &indx, newelem, isNull,
-			   -1, my_extra->typlen, my_extra->typbyval, my_extra->typalign);
+					   -1, typlen, typbyval, typalign);
 
-	/* Readjust result's LB to match the input's, as expected for prepend */
+	/*
+	 * Readjust result's LB to match the input's.  This does nothing in the
+	 * append case, but it's the simplest way to implement the prepend case.
+	 */
 	if (ARR_NDIM(v) == 1)
 		ARR_LBOUND(result)[0] = ARR_LBOUND(v)[0];
 
@@ -528,13 +498,8 @@ array_agg_transfn(PG_FUNCTION_ARGS)
 		elog(ERROR, "array_agg_transfn called in non-aggregate context");
 	}
 
-	if (PG_ARGISNULL(0))
-		state = initArrayResult(arg1_typeid, aggcontext, false);
-	else
-		state = (ArrayBuildState *) PG_GETARG_POINTER(0);
-
+	state = PG_ARGISNULL(0) ? NULL : (ArrayBuildState *) PG_GETARG_POINTER(0);
 	elem = PG_ARGISNULL(1) ? (Datum) 0 : PG_GETARG_DATUM(1);
-
 	state = accumArrayResult(state,
 							 elem,
 							 PG_ARGISNULL(1),
@@ -608,12 +573,7 @@ array_agg_array_transfn(PG_FUNCTION_ARGS)
 		elog(ERROR, "array_agg_array_transfn called in non-aggregate context");
 	}
 
-
-	if (PG_ARGISNULL(0))
-		state = initArrayResultArr(arg1_typeid, InvalidOid, aggcontext, false);
-	else
-		state = (ArrayBuildStateArr *) PG_GETARG_POINTER(0);
-
+	state = PG_ARGISNULL(0) ? NULL : (ArrayBuildStateArr *) PG_GETARG_POINTER(0);
 	state = accumArrayResultArr(state,
 								PG_GETARG_DATUM(1),
 								PG_ARGISNULL(1),
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 54979fa..5591b46 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -1795,15 +1795,15 @@ array_cardinality(PG_FUNCTION_ARGS)
 
 
 /*
- * array_get_element :
- *	  This routine takes an array datum and a subscript array and returns
+ * array_ref :
+ *	  This routine takes an array pointer and a subscript array and returns
  *	  the referenced item as a Datum.  Note that for a pass-by-reference
  *	  datatype, the returned Datum is a pointer into the array object.
  *
  * This handles both ordinary varlena arrays and fixed-length arrays.
  *
  * Inputs:
- *	arraydatum: the array object (mustn't be NULL)
+ *	array: the array object (mustn't be NULL)
  *	nSubscripts: number of subscripts supplied
  *	indx[]: the subscript values
  *	arraytyplen: pg_type.typlen for the array type
@@ -1816,16 +1816,15 @@ array_cardinality(PG_FUNCTION_ARGS)
  *	*isNull is set to indicate whether the element is NULL.
  */
 Datum
-array_get_element(Datum arraydatum,
-				  int nSubscripts,
-				  int *indx,
-				  int arraytyplen,
-				  int elmlen,
-				  bool elmbyval,
-				  char elmalign,
-				  bool *isNull)
+array_ref(ArrayType *array,
+		  int nSubscripts,
+		  int *indx,
+		  int arraytyplen,
+		  int elmlen,
+		  bool elmbyval,
+		  char elmalign,
+		  bool *isNull)
 {
-	ArrayType  *array;
 	int			i,
 				ndim,
 			   *dim,
@@ -1847,13 +1846,13 @@ array_get_element(Datum arraydatum,
 		fixedLb[0] = 0;
 		dim = fixedDim;
 		lb = fixedLb;
-		arraydataptr = (char *) DatumGetPointer(arraydatum);
+		arraydataptr = (char *) array;
 		arraynullsptr = NULL;
 	}
 	else
 	{
 		/* detoast input array if necessary */
-		array = DatumGetArrayTypeP(arraydatum);
+		array = DatumGetArrayTypeP(PointerGetDatum(array));
 
 		ndim = ARR_NDIM(array);
 		dim = ARR_DIMS(array);
@@ -1911,7 +1910,7 @@ array_get_element(Datum arraydatum,
  * This handles both ordinary varlena arrays and fixed-length arrays.
  *
  * Inputs:
- *	arraydatum: the array object (mustn't be NULL)
+ *	array: the array object (mustn't be NULL)
  *	nSubscripts: number of subscripts supplied (must be same for upper/lower)
  *	upperIndx[]: the upper subscript values
  *	lowerIndx[]: the lower subscript values
@@ -1926,8 +1925,8 @@ array_get_element(Datum arraydatum,
  * NOTE: we assume it is OK to scribble on the provided subscript arrays
  * lowerIndx[] and upperIndx[].  These are generally just temporaries.
  */
-Datum
-array_get_slice(Datum arraydatum,
+ArrayType *
+array_get_slice(ArrayType *array,
 				int nSubscripts,
 				int *upperIndx,
 				int *lowerIndx,
@@ -1936,7 +1935,6 @@ array_get_slice(Datum arraydatum,
 				bool elmbyval,
 				char elmalign)
 {
-	ArrayType  *array;
 	ArrayType  *newarray;
 	int			i,
 				ndim,
@@ -1975,13 +1973,13 @@ array_get_slice(Datum arraydatum,
 		dim = fixedDim;
 		lb = fixedLb;
 		elemtype = InvalidOid;	/* XXX */
-		arraydataptr = (char *) DatumGetPointer(arraydatum);
+		arraydataptr = (char *) array;
 		arraynullsptr = NULL;
 	}
 	else
 	{
 		/* detoast input array if necessary */
-		array = DatumGetArrayTypeP(arraydatum);
+		array = DatumGetArrayTypeP(PointerGetDatum(array));
 
 		ndim = ARR_NDIM(array);
 		dim = ARR_DIMS(array);
@@ -1997,7 +1995,7 @@ array_get_slice(Datum arraydatum,
 	 * slice, return an empty array.
 	 */
 	if (ndim < nSubscripts || ndim <= 0 || ndim > MAXDIM)
-		return PointerGetDatum(construct_empty_array(elemtype));
+		return construct_empty_array(elemtype);
 
 	for (i = 0; i < nSubscripts; i++)
 	{
@@ -2006,7 +2004,7 @@ array_get_slice(Datum arraydatum,
 		if (upperIndx[i] >= (dim[i] + lb[i]))
 			upperIndx[i] = dim[i] + lb[i] - 1;
 		if (lowerIndx[i] > upperIndx[i])
-			return PointerGetDatum(construct_empty_array(elemtype));
+			return construct_empty_array(elemtype);
 	}
 	/* fill any missing subscript positions with full array range */
 	for (; i < ndim; i++)
@@ -2014,7 +2012,7 @@ array_get_slice(Datum arraydatum,
 		lowerIndx[i] = lb[i];
 		upperIndx[i] = dim[i] + lb[i] - 1;
 		if (lowerIndx[i] > upperIndx[i])
-			return PointerGetDatum(construct_empty_array(elemtype));
+			return construct_empty_array(elemtype);
 	}
 
 	mda_get_range(ndim, span, lowerIndx, upperIndx);
@@ -2060,18 +2058,18 @@ array_get_slice(Datum arraydatum,
 						lowerIndx, upperIndx,
 						elmlen, elmbyval, elmalign);
 
-	return PointerGetDatum(newarray);
+	return newarray;
 }
 
 /*
- * array_set_element :
- *		  This routine sets the value of one array element (specified by
+ * array_set :
+ *		  This routine sets the value of an array element (specified by
  *		  a subscript array) to a new value specified by "dataValue".
  *
  * This handles both ordinary varlena arrays and fixed-length arrays.
  *
  * Inputs:
- *	arraydatum: the initial array object (mustn't be NULL)
+ *	array: the initial array object (mustn't be NULL)
  *	nSubscripts: number of subscripts supplied
  *	indx[]: the subscript values
  *	dataValue: the datum to be inserted at the given position
@@ -2093,18 +2091,17 @@ array_get_slice(Datum arraydatum,
  * NOTE: For assignments, we throw an error for invalid subscripts etc,
  * rather than returning a NULL as the fetch operations do.
  */
-Datum
-array_set_element(Datum arraydatum,
-				  int nSubscripts,
-				  int *indx,
-				  Datum dataValue,
-				  bool isNull,
-				  int arraytyplen,
-				  int elmlen,
-				  bool elmbyval,
-				  char elmalign)
+ArrayType *
+array_set(ArrayType *array,
+		  int nSubscripts,
+		  int *indx,
+		  Datum dataValue,
+		  bool isNull,
+		  int arraytyplen,
+		  int elmlen,
+		  bool elmbyval,
+		  char elmalign)
 {
-	ArrayType  *array;
 	ArrayType  *newarray;
 	int			i,
 				ndim,
@@ -2133,8 +2130,6 @@ array_set_element(Datum arraydatum,
 		 * fixed-length arrays -- these are assumed to be 1-d, 0-based. We
 		 * cannot extend them, either.
 		 */
-		char	   *resultarray;
-
 		if (nSubscripts != 1)
 			ereport(ERROR,
 					(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
@@ -2150,11 +2145,11 @@ array_set_element(Datum arraydatum,
 					(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 					 errmsg("cannot assign null value to an element of a fixed-length array")));
 
-		resultarray = (char *) palloc(arraytyplen);
-		memcpy(resultarray, DatumGetPointer(arraydatum), arraytyplen);
-		elt_ptr = (char *) resultarray + indx[0] * elmlen;
+		newarray = (ArrayType *) palloc(arraytyplen);
+		memcpy(newarray, array, arraytyplen);
+		elt_ptr = (char *) newarray + indx[0] * elmlen;
 		ArrayCastAndSet(dataValue, elmlen, elmbyval, elmalign, elt_ptr);
-		return PointerGetDatum(resultarray);
+		return newarray;
 	}
 
 	if (nSubscripts <= 0 || nSubscripts > MAXDIM)
@@ -2167,7 +2162,7 @@ array_set_element(Datum arraydatum,
 		dataValue = PointerGetDatum(PG_DETOAST_DATUM(dataValue));
 
 	/* detoast input array if necessary */
-	array = DatumGetArrayTypeP(arraydatum);
+	array = DatumGetArrayTypeP(PointerGetDatum(array));
 
 	ndim = ARR_NDIM(array);
 
@@ -2186,10 +2181,9 @@ array_set_element(Datum arraydatum,
 			lb[i] = indx[i];
 		}
 
-		return PointerGetDatum(construct_md_array(&dataValue, &isNull,
-												  nSubscripts, dim, lb,
-												  elmtype,
-												elmlen, elmbyval, elmalign));
+		return construct_md_array(&dataValue, &isNull, nSubscripts,
+								  dim, lb, elmtype,
+								  elmlen, elmbyval, elmalign);
 	}
 
 	if (ndim != nSubscripts)
@@ -2351,7 +2345,7 @@ array_set_element(Datum arraydatum,
 		}
 	}
 
-	return PointerGetDatum(newarray);
+	return newarray;
 }
 
 /*
@@ -2363,12 +2357,12 @@ array_set_element(Datum arraydatum,
  * This handles both ordinary varlena arrays and fixed-length arrays.
  *
  * Inputs:
- *	arraydatum: the initial array object (mustn't be NULL)
+ *	array: the initial array object (mustn't be NULL)
  *	nSubscripts: number of subscripts supplied (must be same for upper/lower)
  *	upperIndx[]: the upper subscript values
  *	lowerIndx[]: the lower subscript values
- *	srcArrayDatum: the source for the inserted values
- *	isNull: indicates whether srcArrayDatum is NULL
+ *	srcArray: the source for the inserted values
+ *	isNull: indicates whether srcArray is NULL
  *	arraytyplen: pg_type.typlen for the array type
  *	elmlen: pg_type.typlen for the array's element type
  *	elmbyval: pg_type.typbyval for the array's element type
@@ -2389,20 +2383,18 @@ array_set_element(Datum arraydatum,
  * NOTE: For assignments, we throw an error for silly subscripts etc,
  * rather than returning a NULL or empty array as the fetch operations do.
  */
-Datum
-array_set_slice(Datum arraydatum,
+ArrayType *
+array_set_slice(ArrayType *array,
 				int nSubscripts,
 				int *upperIndx,
 				int *lowerIndx,
-				Datum srcArrayDatum,
+				ArrayType *srcArray,
 				bool isNull,
 				int arraytyplen,
 				int elmlen,
 				bool elmbyval,
 				char elmalign)
 {
-	ArrayType  *array;
-	ArrayType  *srcArray;
 	ArrayType  *newarray;
 	int			i,
 				ndim,
@@ -2428,7 +2420,7 @@ array_set_slice(Datum arraydatum,
 
 	/* Currently, assignment from a NULL source array is a no-op */
 	if (isNull)
-		return arraydatum;
+		return array;
 
 	if (arraytyplen > 0)
 	{
@@ -2441,8 +2433,8 @@ array_set_slice(Datum arraydatum,
 	}
 
 	/* detoast arrays if necessary */
-	array = DatumGetArrayTypeP(arraydatum);
-	srcArray = DatumGetArrayTypeP(srcArrayDatum);
+	array = DatumGetArrayTypeP(PointerGetDatum(array));
+	srcArray = DatumGetArrayTypeP(PointerGetDatum(srcArray));
 
 	/* note: we assume srcArray contains no toasted elements */
 
@@ -2475,9 +2467,9 @@ array_set_slice(Datum arraydatum,
 					(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
 					 errmsg("source array too small")));
 
-		return PointerGetDatum(construct_md_array(dvalues, dnulls, nSubscripts,
-												  dim, lb, elmtype,
-												elmlen, elmbyval, elmalign));
+		return construct_md_array(dvalues, dnulls, nSubscripts,
+								  dim, lb, elmtype,
+								  elmlen, elmbyval, elmalign);
 	}
 
 	if (ndim < nSubscripts || ndim <= 0 || ndim > MAXDIM)
@@ -2679,43 +2671,7 @@ array_set_slice(Datum arraydatum,
 		}
 	}
 
-	return PointerGetDatum(newarray);
-}
-
-/*
- * array_ref : backwards compatibility wrapper for array_get_element
- *
- * This only works for detoasted/flattened varlena arrays, since the array
- * argument is declared as "ArrayType *".  However there's enough code like
- * that to justify preserving this API.
- */
-Datum
-array_ref(ArrayType *array, int nSubscripts, int *indx,
-		  int arraytyplen, int elmlen, bool elmbyval, char elmalign,
-		  bool *isNull)
-{
-	return array_get_element(PointerGetDatum(array), nSubscripts, indx,
-							 arraytyplen, elmlen, elmbyval, elmalign,
-							 isNull);
-}
-
-/*
- * array_set : backwards compatibility wrapper for array_set_element
- *
- * This only works for detoasted/flattened varlena arrays, since the array
- * argument and result are declared as "ArrayType *".  However there's enough
- * code like that to justify preserving this API.
- */
-ArrayType *
-array_set(ArrayType *array, int nSubscripts, int *indx,
-		  Datum dataValue, bool isNull,
-		  int arraytyplen, int elmlen, bool elmbyval, char elmalign)
-{
-	return DatumGetArrayTypeP(array_set_element(PointerGetDatum(array),
-												nSubscripts, indx,
-												dataValue, isNull,
-												arraytyplen,
-												elmlen, elmbyval, elmalign));
+	return newarray;
 }
 
 /*
@@ -4650,7 +4606,6 @@ array_insert_slice(ArrayType *destArray,
  *
  *	element_type is the array element type (must be a valid array element type)
  *	rcontext is where to keep working state
- *	subcontext is a flag determining whether to use a separate memory context
  *
  * Note: there are two common schemes for using accumArrayResult().
  * In the older scheme, you start with a NULL ArrayBuildState pointer, and
@@ -4660,39 +4615,24 @@ array_insert_slice(ArrayType *destArray,
  * once per element.  In this scheme you always end with a non-NULL pointer
  * that you can pass to makeArrayResult; you get an empty array if there
  * were no elements.  This is preferred if an empty array is what you want.
- *
- * It's possible to choose whether to create a separate memory context for the
- * array build state, or whether to allocate it directly within rcontext.
- *
- * When there are many concurrent small states (e.g. array_agg() using hash
- * aggregation of many small groups), using a separate memory context for each
- * one may result in severe memory bloat. In such cases, use the same memory
- * context to initialize all such array build states, and pass
- * subcontext=false.
- *
- * In cases when the array build states have different lifetimes, using a
- * single memory context is impractical. Instead, pass subcontext=true so that
- * the array build states can be freed individually.
  */
 ArrayBuildState *
-initArrayResult(Oid element_type, MemoryContext rcontext, bool subcontext)
+initArrayResult(Oid element_type, MemoryContext rcontext)
 {
 	ArrayBuildState *astate;
-	MemoryContext arr_context = rcontext;
+	MemoryContext arr_context;
 
 	/* Make a temporary context to hold all the junk */
-	if (subcontext)
-		arr_context = AllocSetContextCreate(rcontext,
-											"accumArrayResult",
-											ALLOCSET_DEFAULT_MINSIZE,
-											ALLOCSET_DEFAULT_INITSIZE,
-											ALLOCSET_DEFAULT_MAXSIZE);
+	arr_context = AllocSetContextCreate(rcontext,
+										"accumArrayResult",
+										ALLOCSET_DEFAULT_MINSIZE,
+										ALLOCSET_DEFAULT_INITSIZE,
+										ALLOCSET_DEFAULT_MAXSIZE);
 
 	astate = (ArrayBuildState *)
 		MemoryContextAlloc(arr_context, sizeof(ArrayBuildState));
 	astate->mcontext = arr_context;
-	astate->private_cxt = subcontext;
-	astate->alen = (subcontext ? 64 : 8);	/* arbitrary starting array size */
+	astate->alen = 64;			/* arbitrary starting array size */
 	astate->dvalues = (Datum *)
 		MemoryContextAlloc(arr_context, astate->alen * sizeof(Datum));
 	astate->dnulls = (bool *)
@@ -4726,7 +4666,7 @@ accumArrayResult(ArrayBuildState *astate,
 	if (astate == NULL)
 	{
 		/* First time through --- initialize */
-		astate = initArrayResult(element_type, rcontext, true);
+		astate = initArrayResult(element_type, rcontext);
 	}
 	else
 	{
@@ -4773,9 +4713,6 @@ accumArrayResult(ArrayBuildState *astate,
 /*
  * makeArrayResult - produce 1-D final result of accumArrayResult
  *
- * Note: only releases astate if it was initialized within a separate memory
- * context (i.e. using subcontext=true when calling initArrayResult).
- *
  *	astate is working state (must not be NULL)
  *	rcontext is where to construct result
  */
@@ -4792,8 +4729,7 @@ makeArrayResult(ArrayBuildState *astate,
 	dims[0] = astate->nelems;
 	lbs[0] = 1;
 
-	return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
-							 astate->private_cxt);
+	return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true);
 }
 
 /*
@@ -4802,11 +4738,6 @@ makeArrayResult(ArrayBuildState *astate,
  * beware: no check that specified dimensions match the number of values
  * accumulated.
  *
- * Note: if the astate was not initialized within a separate memory context
- * (that is, initArrayResult was called with subcontext=false), then using
- * release=true is illegal. Instead, release astate along with the rest of its
- * context when appropriate.
- *
  *	astate is working state (must not be NULL)
  *	rcontext is where to construct result
  *	release is true if okay to release working state
@@ -4839,10 +4770,7 @@ makeMdArrayResult(ArrayBuildState *astate,
 
 	/* Clean up all the junk */
 	if (release)
-	{
-		Assert(astate->private_cxt);
 		MemoryContextDelete(astate->mcontext);
-	}
 
 	return PointerGetDatum(result);
 }
@@ -4859,42 +4787,26 @@ makeMdArrayResult(ArrayBuildState *astate,
  * initArrayResultArr - initialize an empty ArrayBuildStateArr
  *
  *	array_type is the array type (must be a valid varlena array type)
- *	element_type is the type of the array's elements (lookup if InvalidOid)
+ *	element_type is the type of the array's elements
  *	rcontext is where to keep working state
- *	subcontext is a flag determining whether to use a separate memory context
  */
 ArrayBuildStateArr *
-initArrayResultArr(Oid array_type, Oid element_type, MemoryContext rcontext,
-				   bool subcontext)
+initArrayResultArr(Oid array_type, Oid element_type, MemoryContext rcontext)
 {
 	ArrayBuildStateArr *astate;
-	MemoryContext arr_context = rcontext;   /* by default use the parent ctx */
-
-	/* Lookup element type, unless element_type already provided */
-	if (! OidIsValid(element_type))
-	{
-		element_type = get_element_type(array_type);
-
-		if (!OidIsValid(element_type))
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("data type %s is not an array type",
-							format_type_be(array_type))));
-	}
+	MemoryContext arr_context;
 
 	/* Make a temporary context to hold all the junk */
-	if (subcontext)
-		arr_context = AllocSetContextCreate(rcontext,
-											"accumArrayResultArr",
-											ALLOCSET_DEFAULT_MINSIZE,
-											ALLOCSET_DEFAULT_INITSIZE,
-											ALLOCSET_DEFAULT_MAXSIZE);
+	arr_context = AllocSetContextCreate(rcontext,
+										"accumArrayResultArr",
+										ALLOCSET_DEFAULT_MINSIZE,
+										ALLOCSET_DEFAULT_INITSIZE,
+										ALLOCSET_DEFAULT_MAXSIZE);
 
 	/* Note we initialize all fields to zero */
 	astate = (ArrayBuildStateArr *)
 		MemoryContextAllocZero(arr_context, sizeof(ArrayBuildStateArr));
 	astate->mcontext = arr_context;
-	astate->private_cxt = subcontext;
 
 	/* Save relevant datatype information */
 	astate->array_type = array_type;
@@ -4941,9 +4853,21 @@ accumArrayResultArr(ArrayBuildStateArr *astate,
 	arg = DatumGetArrayTypeP(dvalue);
 
 	if (astate == NULL)
-		astate = initArrayResultArr(array_type, InvalidOid, rcontext, true);
+	{
+		/* First time through --- initialize */
+		Oid			element_type = get_element_type(array_type);
+
+		if (!OidIsValid(element_type))
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("data type %s is not an array type",
+							format_type_be(array_type))));
+		astate = initArrayResultArr(array_type, element_type, rcontext);
+	}
 	else
+	{
 		Assert(astate->array_type == array_type);
+	}
 
 	oldcontext = MemoryContextSwitchTo(astate->mcontext);
 
@@ -5122,10 +5046,7 @@ makeArrayResultArr(ArrayBuildStateArr *astate,
 
 	/* Clean up all the junk */
 	if (release)
-	{
-		Assert(astate->private_cxt);
 		MemoryContextDelete(astate->mcontext);
-	}
 
 	return PointerGetDatum(result);
 }
@@ -5141,10 +5062,9 @@ makeArrayResultArr(ArrayBuildStateArr *astate,
  *
  *	input_type is the input datatype (either element or array type)
  *	rcontext is where to keep working state
- *	subcontext is a flag determining whether to use a separate memory context
  */
 ArrayBuildStateAny *
-initArrayResultAny(Oid input_type, MemoryContext rcontext, bool subcontext)
+initArrayResultAny(Oid input_type, MemoryContext rcontext)
 {
 	ArrayBuildStateAny *astate;
 	Oid			element_type = get_element_type(input_type);
@@ -5154,7 +5074,7 @@ initArrayResultAny(Oid input_type, MemoryContext rcontext, bool subcontext)
 		/* Array case */
 		ArrayBuildStateArr *arraystate;
 
-		arraystate = initArrayResultArr(input_type, InvalidOid, rcontext, subcontext);
+		arraystate = initArrayResultArr(input_type, element_type, rcontext);
 		astate = (ArrayBuildStateAny *)
 			MemoryContextAlloc(arraystate->mcontext,
 							   sizeof(ArrayBuildStateAny));
@@ -5169,7 +5089,7 @@ initArrayResultAny(Oid input_type, MemoryContext rcontext, bool subcontext)
 		/* Let's just check that we have a type that can be put into arrays */
 		Assert(OidIsValid(get_array_type(input_type)));
 
-		scalarstate = initArrayResult(input_type, rcontext, subcontext);
+		scalarstate = initArrayResult(input_type, rcontext);
 		astate = (ArrayBuildStateAny *)
 			MemoryContextAlloc(scalarstate->mcontext,
 							   sizeof(ArrayBuildStateAny));
@@ -5195,7 +5115,7 @@ accumArrayResultAny(ArrayBuildStateAny *astate,
 					MemoryContext rcontext)
 {
 	if (astate == NULL)
-		astate = initArrayResultAny(input_type, rcontext, true);
+		astate = initArrayResultAny(input_type, rcontext);
 
 	if (astate->scalarstate)
 		(void) accumArrayResult(astate->scalarstate,
diff --git a/src/backend/utils/adt/domains.c b/src/backend/utils/adt/domains.c
index ac8c252..d84d4e8 100644
--- a/src/backend/utils/adt/domains.c
+++ b/src/backend/utils/adt/domains.c
@@ -12,9 +12,10 @@
  * The overhead required for constraint checking can be high, since examining
  * the catalogs to discover the constraints for a given domain is not cheap.
  * We have three mechanisms for minimizing this cost:
- *	1.  We rely on the typcache to keep up-to-date copies of the constraints.
- *	2.  In a nest of domains, we flatten the checking of all the levels
- *		into just one operation (the typcache does this for us).
+ *	1.  In a nest of domains, we flatten the checking of all the levels
+ *		into just one operation.
+ *	2.  We cache the list of constraint items in the FmgrInfo struct
+ *		passed by the caller.
  *	3.  If there are CHECK constraints, we cache a standalone ExprContext
  *		to evaluate them in.
  *
@@ -32,12 +33,12 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "commands/typecmds.h"
 #include "executor/executor.h"
 #include "lib/stringinfo.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
-#include "utils/typcache.h"
 
 
 /*
@@ -51,8 +52,8 @@ typedef struct DomainIOData
 	Oid			typioparam;
 	int32		typtypmod;
 	FmgrInfo	proc;
-	/* Reference to cached list of constraint items to check */
-	DomainConstraintRef constraint_ref;
+	/* List of constraint items to check */
+	List	   *constraint_list;
 	/* Context for evaluating CHECK constraints in */
 	ExprContext *econtext;
 	/* Memory context this cache is in */
@@ -62,19 +63,16 @@ typedef struct DomainIOData
 
 /*
  * domain_state_setup - initialize the cache for a new domain type.
- *
- * Note: we can't re-use the same cache struct for a new domain type,
- * since there's no provision for releasing the DomainConstraintRef.
- * If a call site needs to deal with a new domain type, we just leak
- * the old struct for the duration of the query.
  */
-static DomainIOData *
-domain_state_setup(Oid domainType, bool binary, MemoryContext mcxt)
+static void
+domain_state_setup(DomainIOData *my_extra, Oid domainType, bool binary,
+				   MemoryContext mcxt)
 {
-	DomainIOData *my_extra;
 	Oid			baseType;
+	MemoryContext oldcontext;
 
-	my_extra = (DomainIOData *) MemoryContextAlloc(mcxt, sizeof(DomainIOData));
+	/* Mark cache invalid */
+	my_extra->domain_type = InvalidOid;
 
 	/* Find out the base type */
 	my_extra->typtypmod = -1;
@@ -97,7 +95,9 @@ domain_state_setup(Oid domainType, bool binary, MemoryContext mcxt)
 	fmgr_info_cxt(my_extra->typiofunc, &my_extra->proc, mcxt);
 
 	/* Look up constraints for domain */
-	InitDomainConstraintRef(domainType, &my_extra->constraint_ref, mcxt);
+	oldcontext = MemoryContextSwitchTo(mcxt);
+	my_extra->constraint_list = GetDomainConstraints(domainType);
+	MemoryContextSwitchTo(oldcontext);
 
 	/* We don't make an ExprContext until needed */
 	my_extra->econtext = NULL;
@@ -105,8 +105,6 @@ domain_state_setup(Oid domainType, bool binary, MemoryContext mcxt)
 
 	/* Mark cache valid */
 	my_extra->domain_type = domainType;
-
-	return my_extra;
 }
 
 /*
@@ -120,10 +118,7 @@ domain_check_input(Datum value, bool isnull, DomainIOData *my_extra)
 	ExprContext *econtext = my_extra->econtext;
 	ListCell   *l;
 
-	/* Make sure we have up-to-date constraints */
-	UpdateDomainConstraintRef(&my_extra->constraint_ref);
-
-	foreach(l, my_extra->constraint_ref.constraints)
+	foreach(l, my_extra->constraint_list)
 	{
 		DomainConstraintState *con = (DomainConstraintState *) lfirst(l);
 
@@ -220,16 +215,20 @@ domain_in(PG_FUNCTION_ARGS)
 
 	/*
 	 * We arrange to look up the needed info just once per series of calls,
-	 * assuming the domain type doesn't change underneath us (which really
-	 * shouldn't happen, but cope if it does).
+	 * assuming the domain type doesn't change underneath us.
 	 */
 	my_extra = (DomainIOData *) fcinfo->flinfo->fn_extra;
-	if (my_extra == NULL || my_extra->domain_type != domainType)
+	if (my_extra == NULL)
 	{
-		my_extra = domain_state_setup(domainType, false,
-									  fcinfo->flinfo->fn_mcxt);
+		my_extra = (DomainIOData *) MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+													   sizeof(DomainIOData));
+		domain_state_setup(my_extra, domainType, false,
+						   fcinfo->flinfo->fn_mcxt);
 		fcinfo->flinfo->fn_extra = (void *) my_extra;
 	}
+	else if (my_extra->domain_type != domainType)
+		domain_state_setup(my_extra, domainType, false,
+						   fcinfo->flinfo->fn_mcxt);
 
 	/*
 	 * Invoke the base type's typinput procedure to convert the data.
@@ -276,16 +275,20 @@ domain_recv(PG_FUNCTION_ARGS)
 
 	/*
 	 * We arrange to look up the needed info just once per series of calls,
-	 * assuming the domain type doesn't change underneath us (which really
-	 * shouldn't happen, but cope if it does).
+	 * assuming the domain type doesn't change underneath us.
 	 */
 	my_extra = (DomainIOData *) fcinfo->flinfo->fn_extra;
-	if (my_extra == NULL || my_extra->domain_type != domainType)
+	if (my_extra == NULL)
 	{
-		my_extra = domain_state_setup(domainType, true,
-									  fcinfo->flinfo->fn_mcxt);
+		my_extra = (DomainIOData *) MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+													   sizeof(DomainIOData));
+		domain_state_setup(my_extra, domainType, true,
+						   fcinfo->flinfo->fn_mcxt);
 		fcinfo->flinfo->fn_extra = (void *) my_extra;
 	}
+	else if (my_extra->domain_type != domainType)
+		domain_state_setup(my_extra, domainType, true,
+						   fcinfo->flinfo->fn_mcxt);
 
 	/*
 	 * Invoke the base type's typreceive procedure to convert the data.
@@ -323,17 +326,20 @@ domain_check(Datum value, bool isnull, Oid domainType,
 
 	/*
 	 * We arrange to look up the needed info just once per series of calls,
-	 * assuming the domain type doesn't change underneath us (which really
-	 * shouldn't happen, but cope if it does).
+	 * assuming the domain type doesn't change underneath us.
 	 */
 	if (extra)
 		my_extra = (DomainIOData *) *extra;
-	if (my_extra == NULL || my_extra->domain_type != domainType)
+	if (my_extra == NULL)
 	{
-		my_extra = domain_state_setup(domainType, true, mcxt);
+		my_extra = (DomainIOData *) MemoryContextAlloc(mcxt,
+													   sizeof(DomainIOData));
+		domain_state_setup(my_extra, domainType, true, mcxt);
 		if (extra)
 			*extra = (void *) my_extra;
 	}
+	else if (my_extra->domain_type != domainType)
+		domain_state_setup(my_extra, domainType, true, mcxt);
 
 	/*
 	 * Do the necessary checks to ensure it's a valid domain value.
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index 6cb6be5..6b6510e 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -1390,7 +1390,7 @@ path_in(PG_FUNCTION_ARGS)
 	}
 
 	base_size = sizeof(path->p[0]) * npts;
-	size = offsetof(PATH, p) +base_size;
+	size = offsetof(PATH, p[0]) +base_size;
 
 	/* Check for integer overflow */
 	if (base_size / npts != sizeof(path->p[0]) || size <= base_size)
@@ -1443,12 +1443,12 @@ path_recv(PG_FUNCTION_ARGS)
 
 	closed = pq_getmsgbyte(buf);
 	npts = pq_getmsgint(buf, sizeof(int32));
-	if (npts <= 0 || npts >= (int32) ((INT_MAX - offsetof(PATH, p)) / sizeof(Point)))
+	if (npts <= 0 || npts >= (int32) ((INT_MAX - offsetof(PATH, p[0])) / sizeof(Point)))
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 			 errmsg("invalid number of points in external \"path\" value")));
 
-	size = offsetof(PATH, p) +sizeof(path->p[0]) * npts;
+	size = offsetof(PATH, p[0]) +sizeof(path->p[0]) * npts;
 	path = (PATH *) palloc(size);
 
 	SET_VARSIZE(path, size);
@@ -3476,7 +3476,7 @@ poly_in(PG_FUNCTION_ARGS)
 			  errmsg("invalid input syntax for type polygon: \"%s\"", str)));
 
 	base_size = sizeof(poly->p[0]) * npts;
-	size = offsetof(POLYGON, p) +base_size;
+	size = offsetof(POLYGON, p[0]) +base_size;
 
 	/* Check for integer overflow */
 	if (base_size / npts != sizeof(poly->p[0]) || size <= base_size)
@@ -3530,12 +3530,12 @@ poly_recv(PG_FUNCTION_ARGS)
 	int			size;
 
 	npts = pq_getmsgint(buf, sizeof(int32));
-	if (npts <= 0 || npts >= (int32) ((INT_MAX - offsetof(POLYGON, p)) / sizeof(Point)))
+	if (npts <= 0 || npts >= (int32) ((INT_MAX - offsetof(POLYGON, p[0])) / sizeof(Point)))
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 		  errmsg("invalid number of points in external \"polygon\" value")));
 
-	size = offsetof(POLYGON, p) +sizeof(poly->p[0]) * npts;
+	size = offsetof(POLYGON, p[0]) +sizeof(poly->p[0]) * npts;
 	poly = (POLYGON *) palloc0(size);	/* zero any holes */
 
 	SET_VARSIZE(poly, size);
@@ -4251,7 +4251,7 @@ path_add(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 
 	base_size = sizeof(p1->p[0]) * (p1->npts + p2->npts);
-	size = offsetof(PATH, p) +base_size;
+	size = offsetof(PATH, p[0]) +base_size;
 
 	/* Check for integer overflow */
 	if (base_size / sizeof(p1->p[0]) != (p1->npts + p2->npts) ||
@@ -4393,7 +4393,7 @@ path_poly(PG_FUNCTION_ARGS)
 	 * Never overflows: the old size fit in MaxAllocSize, and the new size is
 	 * just a small constant larger.
 	 */
-	size = offsetof(POLYGON, p) +sizeof(poly->p[0]) * path->npts;
+	size = offsetof(POLYGON, p[0]) +sizeof(poly->p[0]) * path->npts;
 	poly = (POLYGON *) palloc(size);
 
 	SET_VARSIZE(poly, size);
@@ -4468,7 +4468,7 @@ box_poly(PG_FUNCTION_ARGS)
 	int			size;
 
 	/* map four corners of the box to a polygon */
-	size = offsetof(POLYGON, p) +sizeof(poly->p[0]) * 4;
+	size = offsetof(POLYGON, p[0]) +sizeof(poly->p[0]) * 4;
 	poly = (POLYGON *) palloc(size);
 
 	SET_VARSIZE(poly, size);
@@ -4502,7 +4502,7 @@ poly_path(PG_FUNCTION_ARGS)
 	 * Never overflows: the old size fit in MaxAllocSize, and the new size is
 	 * smaller by a small constant.
 	 */
-	size = offsetof(PATH, p) +sizeof(path->p[0]) * poly->npts;
+	size = offsetof(PATH, p[0]) +sizeof(path->p[0]) * poly->npts;
 	path = (PATH *) palloc(size);
 
 	SET_VARSIZE(path, size);
@@ -5181,7 +5181,7 @@ circle_poly(PG_FUNCTION_ARGS)
 				 errmsg("must request at least 2 points")));
 
 	base_size = sizeof(poly->p[0]) * npts;
-	size = offsetof(POLYGON, p) +base_size;
+	size = offsetof(POLYGON, p[0]) +base_size;
 
 	/* Check for integer overflow */
 	if (base_size / npts != sizeof(poly->p[0]) || size <= base_size)
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d0d7206..951b655 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -32,9 +32,6 @@
 #include "utils/typcache.h"
 #include "utils/syscache.h"
 
-/* String to output for infinite dates and timestamps */
-#define DT_INFINITY "\"infinity\""
-
 /*
  * The context of the parser is maintained by the recursive descent
  * mechanism, but is passed explicitly to the error reporting routine
@@ -1439,18 +1436,20 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 
 				date = DatumGetDateADT(val);
 
+				/* XSD doesn't support infinite values */
 				if (DATE_NOT_FINITE(date))
-				{
-					/* we have to format infinity ourselves */
-					appendStringInfoString(result,DT_INFINITY);
-				}
+					ereport(ERROR,
+							(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+							 errmsg("date out of range"),
+							 errdetail("JSON does not support infinite date values.")));
 				else
 				{
 					j2date(date + POSTGRES_EPOCH_JDATE,
 						   &(tm.tm_year), &(tm.tm_mon), &(tm.tm_mday));
 					EncodeDateOnly(&tm, USE_XSD_DATES, buf);
-					appendStringInfo(result, "\"%s\"", buf);
 				}
+
+				appendStringInfo(result, "\"%s\"", buf);
 			}
 			break;
 		case JSONTYPE_TIMESTAMP:
@@ -1462,20 +1461,20 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 
 				timestamp = DatumGetTimestamp(val);
 
+				/* XSD doesn't support infinite values */
 				if (TIMESTAMP_NOT_FINITE(timestamp))
-				{
-					/* we have to format infinity ourselves */
-					appendStringInfoString(result,DT_INFINITY);
-				}
+					ereport(ERROR,
+							(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+							 errmsg("timestamp out of range"),
+							 errdetail("JSON does not support infinite timestamp values.")));
 				else if (timestamp2tm(timestamp, NULL, &tm, &fsec, NULL, NULL) == 0)
-				{
 					EncodeDateTime(&tm, fsec, false, 0, NULL, USE_XSD_DATES, buf);
-					appendStringInfo(result, "\"%s\"", buf);
-				}
 				else
 					ereport(ERROR,
 							(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
 							 errmsg("timestamp out of range")));
+
+				appendStringInfo(result, "\"%s\"", buf);
 			}
 			break;
 		case JSONTYPE_TIMESTAMPTZ:
@@ -1489,20 +1488,20 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 
 				timestamp = DatumGetTimestamp(val);
 
+				/* XSD doesn't support infinite values */
 				if (TIMESTAMP_NOT_FINITE(timestamp))
-				{
-					/* we have to format infinity ourselves */
-					appendStringInfoString(result,DT_INFINITY);
-				}
+					ereport(ERROR,
+							(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+							 errmsg("timestamp out of range"),
+							 errdetail("JSON does not support infinite timestamp values.")));
 				else if (timestamp2tm(timestamp, &tz, &tm, &fsec, &tzn, NULL) == 0)
-				{
 					EncodeDateTime(&tm, fsec, true, tz, tzn, USE_XSD_DATES, buf);
-					appendStringInfo(result, "\"%s\"", buf);
-				}
 				else
 					ereport(ERROR,
 							(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
 							 errmsg("timestamp out of range")));
+
+				appendStringInfo(result, "\"%s\"", buf);
 			}
 			break;
 		case JSONTYPE_JSON:
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 5833401..644ea6d 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -28,14 +28,6 @@
 #include "utils/syscache.h"
 #include "utils/typcache.h"
 
-/*
- * String to output for infinite dates and timestamps.
- * Note the we don't use embedded quotes, unlike for json, because
- * we store jsonb strings dequoted.
- */
-
-#define DT_INFINITY "infinity"
-
 typedef struct JsonbInState
 {
 	JsonbParseState *parseState;
@@ -424,7 +416,7 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len)
 {
 	bool		first = true;
 	JsonbIterator *it;
-	JsonbIteratorToken type = WJB_DONE;
+	int			type = 0;
 	JsonbValue	v;
 	int			level = 0;
 	bool		redo_switch = false;
@@ -506,7 +498,7 @@ JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len)
 				first = false;
 				break;
 			default:
-				elog(ERROR, "unknown jsonb iterator token type");
+				elog(ERROR, "unknown flag of jsonb iterator");
 		}
 	}
 
@@ -722,21 +714,23 @@ datum_to_jsonb(Datum val, bool is_null, JsonbInState *result,
 				char		buf[MAXDATELEN + 1];
 
 				date = DatumGetDateADT(val);
-				jb.type = jbvString;
 
+				/* XSD doesn't support infinite values */
 				if (DATE_NOT_FINITE(date))
-				{
-					jb.val.string.len = strlen(DT_INFINITY);
-					jb.val.string.val = pstrdup(DT_INFINITY);
-				}
+					ereport(ERROR,
+							(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+							 errmsg("date out of range"),
+							 errdetail("JSON does not support infinite date values.")));
 				else
 				{
 					j2date(date + POSTGRES_EPOCH_JDATE,
 						   &(tm.tm_year), &(tm.tm_mon), &(tm.tm_mday));
 					EncodeDateOnly(&tm, USE_XSD_DATES, buf);
-					jb.val.string.len = strlen(buf);
-					jb.val.string.val = pstrdup(buf);
 				}
+
+				jb.type = jbvString;
+				jb.val.string.len = strlen(buf);
+				jb.val.string.val = pstrdup(buf);
 			}
 			break;
 			case JSONBTYPE_TIMESTAMP:
@@ -747,24 +741,23 @@ datum_to_jsonb(Datum val, bool is_null, JsonbInState *result,
 					char		buf[MAXDATELEN + 1];
 
 					timestamp = DatumGetTimestamp(val);
-					jb.type = jbvString;
 
+					/* XSD doesn't support infinite values */
 					if (TIMESTAMP_NOT_FINITE(timestamp))
-					{
-						jb.val.string.len = strlen(DT_INFINITY);
-						jb.val.string.val = pstrdup(DT_INFINITY);
-					}
+						ereport(ERROR,
+								(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+								 errmsg("timestamp out of range"),
+								 errdetail("JSON does not support infinite timestamp values.")));
 					else if (timestamp2tm(timestamp, NULL, &tm, &fsec, NULL, NULL) == 0)
-					{
-
 						EncodeDateTime(&tm, fsec, false, 0, NULL, USE_XSD_DATES, buf);
-						jb.val.string.len = strlen(buf);
-						jb.val.string.val = pstrdup(buf);
-					}
 					else
 						ereport(ERROR,
 								(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
 								 errmsg("timestamp out of range")));
+
+					jb.type = jbvString;
+					jb.val.string.len = strlen(buf);
+					jb.val.string.val = pstrdup(buf);
 				}
 				break;
 			case JSONBTYPE_TIMESTAMPTZ:
@@ -777,23 +770,23 @@ datum_to_jsonb(Datum val, bool is_null, JsonbInState *result,
 					char		buf[MAXDATELEN + 1];
 
 					timestamp = DatumGetTimestamp(val);
-					jb.type = jbvString;
 
+					/* XSD doesn't support infinite values */
 					if (TIMESTAMP_NOT_FINITE(timestamp))
-					{
-						jb.val.string.len = strlen(DT_INFINITY);
-						jb.val.string.val = pstrdup(DT_INFINITY);
-					}
+						ereport(ERROR,
+								(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+								 errmsg("timestamp out of range"),
+								 errdetail("JSON does not support infinite timestamp values.")));
 					else if (timestamp2tm(timestamp, &tz, &tm, &fsec, &tzn, NULL) == 0)
-					{
 						EncodeDateTime(&tm, fsec, true, tz, tzn, USE_XSD_DATES, buf);
-						jb.val.string.len = strlen(buf);
-						jb.val.string.val = pstrdup(buf);
-					}
 					else
 						ereport(ERROR,
 								(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
 								 errmsg("timestamp out of range")));
+
+					jb.type = jbvString;
+					jb.val.string.len = strlen(buf);
+					jb.val.string.val = pstrdup(buf);
 				}
 				break;
 			case JSONBTYPE_JSONCAST:
@@ -824,7 +817,7 @@ datum_to_jsonb(Datum val, bool is_null, JsonbInState *result,
 			case JSONBTYPE_JSONB:
 				{
 					Jsonb	   *jsonb = DatumGetJsonb(val);
-					JsonbIteratorToken type;
+					int			type;
 					JsonbIterator *it;
 
 					it = JsonbIteratorInit(&jsonb->root);
@@ -1519,7 +1512,7 @@ jsonb_agg_transfn(PG_FUNCTION_ARGS)
 	JsonbIterator *it;
 	Jsonb	   *jbelem;
 	JsonbValue	v;
-	JsonbIteratorToken type;
+	int			type;
 
 	if (val_type == InvalidOid)
 		ereport(ERROR,
@@ -1591,7 +1584,7 @@ jsonb_agg_transfn(PG_FUNCTION_ARGS)
 			case WJB_VALUE:
 				if (v.type == jbvString)
 				{
-					/* copy string values in the aggregate context */
+					/* copy string values in the aggreagate context */
 					char	   *buf = palloc(v.val.string.len + 1);;
 					snprintf(buf, v.val.string.len + 1, "%s", v.val.string.val);
 					v.val.string.val = buf;
@@ -1607,8 +1600,6 @@ jsonb_agg_transfn(PG_FUNCTION_ARGS)
 				result->res = pushJsonbValue(&result->parseState,
 											 type, &v);
 				break;
-			default:
-				elog(ERROR, "unknown jsonb iterator token type");
 		}
 	}
 
@@ -1669,7 +1660,7 @@ jsonb_object_agg_transfn(PG_FUNCTION_ARGS)
 	Jsonb	   *jbkey,
 			   *jbval;
 	JsonbValue	v;
-	JsonbIteratorToken type;
+	int			type;
 
 	if (!AggCheckCallContext(fcinfo, &aggcontext))
 	{
@@ -1752,7 +1743,7 @@ jsonb_object_agg_transfn(PG_FUNCTION_ARGS)
 			case WJB_ELEM:
 				if (v.type == jbvString)
 				{
-					/* copy string values in the aggregate context */
+					/* copy string values in the aggreagate context */
 					char	   *buf = palloc(v.val.string.len + 1);;
 					snprintf(buf, v.val.string.len + 1, "%s", v.val.string.val);
 					v.val.string.val = buf;
@@ -1810,7 +1801,7 @@ jsonb_object_agg_transfn(PG_FUNCTION_ARGS)
 			case WJB_VALUE:
 				if (v.type == jbvString)
 				{
-					/* copy string values in the aggregate context */
+					/* copy string values in the aggreagate context */
 					char	   *buf = palloc(v.val.string.len + 1);;
 					snprintf(buf, v.val.string.len + 1, "%s", v.val.string.val);
 					v.val.string.val = buf;
@@ -1827,8 +1818,6 @@ jsonb_object_agg_transfn(PG_FUNCTION_ARGS)
 											 single_scalar ? WJB_VALUE : type,
 											 &v);
 				break;
-			default:
-				elog(ERROR, "unknown jsonb iterator token type");
 		}
 	}
 
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index a8cdeaa..3688163 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -216,7 +216,7 @@ typedef struct RecordIOData
 	Oid			record_type;
 	int32		record_typmod;
 	int			ncolumns;
-	ColumnIOData columns[FLEXIBLE_ARRAY_MEMBER];
+	ColumnIOData columns[1];	/* VARIABLE LENGTH ARRAY */
 } RecordIOData;
 
 /* state for populate_recordset */
@@ -2148,8 +2148,8 @@ populate_record_worker(FunctionCallInfo fcinfo, const char *funcname,
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordIOData, columns) +
-							   ncolumns * sizeof(ColumnIOData));
+							   sizeof(RecordIOData) - sizeof(ColumnIOData)
+							   + ncolumns * sizeof(ColumnIOData));
 		my_extra = (RecordIOData *) fcinfo->flinfo->fn_extra;
 		my_extra->record_type = InvalidOid;
 		my_extra->record_typmod = 0;
@@ -2161,8 +2161,8 @@ populate_record_worker(FunctionCallInfo fcinfo, const char *funcname,
 							my_extra->record_typmod != tupTypmod))
 	{
 		MemSet(my_extra, 0,
-			   offsetof(RecordIOData, columns) +
-			   ncolumns * sizeof(ColumnIOData));
+			   sizeof(RecordIOData) - sizeof(ColumnIOData)
+			   + ncolumns * sizeof(ColumnIOData));
 		my_extra->record_type = tupType;
 		my_extra->record_typmod = tupTypmod;
 		my_extra->ncolumns = ncolumns;
@@ -2653,8 +2653,8 @@ populate_recordset_worker(FunctionCallInfo fcinfo, const char *funcname,
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordIOData, columns) +
-							   ncolumns * sizeof(ColumnIOData));
+							   sizeof(RecordIOData) - sizeof(ColumnIOData)
+							   + ncolumns * sizeof(ColumnIOData));
 		my_extra = (RecordIOData *) fcinfo->flinfo->fn_extra;
 		my_extra->record_type = InvalidOid;
 		my_extra->record_typmod = 0;
@@ -2664,8 +2664,8 @@ populate_recordset_worker(FunctionCallInfo fcinfo, const char *funcname,
 		my_extra->record_typmod != tupTypmod)
 	{
 		MemSet(my_extra, 0,
-			   offsetof(RecordIOData, columns) +
-			   ncolumns * sizeof(ColumnIOData));
+			   sizeof(RecordIOData) - sizeof(ColumnIOData)
+			   + ncolumns * sizeof(ColumnIOData));
 		my_extra->record_type = tupType;
 		my_extra->record_typmod = tupTypmod;
 		my_extra->ncolumns = ncolumns;
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 715917b..1e7a176 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -123,14 +123,14 @@ typedef int16 NumericDigit;
 struct NumericShort
 {
 	uint16		n_header;		/* Sign + display scale + weight */
-	NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER]; /* Digits */
+	NumericDigit n_data[1];		/* Digits */
 };
 
 struct NumericLong
 {
 	uint16		n_sign_dscale;	/* Sign + display scale */
 	int16		n_weight;		/* Weight of 1st digit	*/
-	NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER]; /* Digits */
+	NumericDigit n_data[1];		/* Digits */
 };
 
 union NumericChoice
@@ -1262,7 +1262,7 @@ numeric_floor(PG_FUNCTION_ARGS)
 /*
  * generate_series_numeric() -
  *
- *	Generate series of numeric.
+ *  Generate series of numeric.
  */
 Datum
 generate_series_numeric(PG_FUNCTION_ARGS)
@@ -1297,7 +1297,7 @@ generate_series_step_numeric(PG_FUNCTION_ARGS)
 		/* see if we were given an explicit step size */
 		if (PG_NARGS() == 3)
 		{
-			Numeric		step_num = PG_GETARG_NUMERIC(2);
+			Numeric	step_num = PG_GETARG_NUMERIC(2);
 
 			if (NUMERIC_IS_NAN(step_num))
 				ereport(ERROR,
@@ -1356,7 +1356,7 @@ generate_series_step_numeric(PG_FUNCTION_ARGS)
 		(fctx->step.sign == NUMERIC_NEG &&
 		 cmp_var(&fctx->current, &fctx->stop) >= 0))
 	{
-		Numeric		result = make_result(&fctx->current);
+		Numeric	result = make_result(&fctx->current);
 
 		/* switch to memory context appropriate for iteration calculation */
 		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9964c5e..389ea49 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -115,7 +115,6 @@ extern Datum pg_stat_get_xact_function_calls(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_xact_function_total_time(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_xact_function_self_time(PG_FUNCTION_ARGS);
 
-extern Datum pg_stat_get_snapshot_timestamp(PG_FUNCTION_ARGS);
 extern Datum pg_stat_clear_snapshot(PG_FUNCTION_ARGS);
 extern Datum pg_stat_reset(PG_FUNCTION_ARGS);
 extern Datum pg_stat_reset_shared(PG_FUNCTION_ARGS);
@@ -1683,13 +1682,6 @@ pg_stat_get_xact_function_self_time(PG_FUNCTION_ARGS)
 }
 
 
-/* Get the timestamp of the current statistics snapshot */
-Datum
-pg_stat_get_snapshot_timestamp(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stats_timestamp);
-}
-
 /* Discard the active statistics snapshot */
 Datum
 pg_stat_clear_snapshot(PG_FUNCTION_ARGS)
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index a65e18d..3dc9a84 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -43,7 +43,7 @@ typedef struct RecordIOData
 	Oid			record_type;
 	int32		record_typmod;
 	int			ncolumns;
-	ColumnIOData columns[FLEXIBLE_ARRAY_MEMBER];
+	ColumnIOData columns[1];	/* VARIABLE LENGTH ARRAY */
 } RecordIOData;
 
 /*
@@ -61,7 +61,7 @@ typedef struct RecordCompareData
 	int32		record1_typmod;
 	Oid			record2_type;
 	int32		record2_typmod;
-	ColumnCompareData columns[FLEXIBLE_ARRAY_MEMBER];
+	ColumnCompareData columns[1];		/* VARIABLE LENGTH ARRAY */
 } RecordCompareData;
 
 
@@ -120,8 +120,8 @@ record_in(PG_FUNCTION_ARGS)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordIOData, columns) +
-							   ncolumns * sizeof(ColumnIOData));
+							   sizeof(RecordIOData) - sizeof(ColumnIOData)
+							   + ncolumns * sizeof(ColumnIOData));
 		my_extra = (RecordIOData *) fcinfo->flinfo->fn_extra;
 		my_extra->record_type = InvalidOid;
 		my_extra->record_typmod = 0;
@@ -131,8 +131,8 @@ record_in(PG_FUNCTION_ARGS)
 		my_extra->record_typmod != tupTypmod)
 	{
 		MemSet(my_extra, 0,
-			   offsetof(RecordIOData, columns) +
-			   ncolumns * sizeof(ColumnIOData));
+			   sizeof(RecordIOData) - sizeof(ColumnIOData)
+			   + ncolumns * sizeof(ColumnIOData));
 		my_extra->record_type = tupType;
 		my_extra->record_typmod = tupTypmod;
 		my_extra->ncolumns = ncolumns;
@@ -334,8 +334,8 @@ record_out(PG_FUNCTION_ARGS)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordIOData, columns) +
-							   ncolumns * sizeof(ColumnIOData));
+							   sizeof(RecordIOData) - sizeof(ColumnIOData)
+							   + ncolumns * sizeof(ColumnIOData));
 		my_extra = (RecordIOData *) fcinfo->flinfo->fn_extra;
 		my_extra->record_type = InvalidOid;
 		my_extra->record_typmod = 0;
@@ -345,8 +345,8 @@ record_out(PG_FUNCTION_ARGS)
 		my_extra->record_typmod != tupTypmod)
 	{
 		MemSet(my_extra, 0,
-			   offsetof(RecordIOData, columns) +
-			   ncolumns * sizeof(ColumnIOData));
+			   sizeof(RecordIOData) - sizeof(ColumnIOData)
+			   + ncolumns * sizeof(ColumnIOData));
 		my_extra->record_type = tupType;
 		my_extra->record_typmod = tupTypmod;
 		my_extra->ncolumns = ncolumns;
@@ -489,8 +489,8 @@ record_recv(PG_FUNCTION_ARGS)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordIOData, columns) +
-							   ncolumns * sizeof(ColumnIOData));
+							   sizeof(RecordIOData) - sizeof(ColumnIOData)
+							   + ncolumns * sizeof(ColumnIOData));
 		my_extra = (RecordIOData *) fcinfo->flinfo->fn_extra;
 		my_extra->record_type = InvalidOid;
 		my_extra->record_typmod = 0;
@@ -500,8 +500,8 @@ record_recv(PG_FUNCTION_ARGS)
 		my_extra->record_typmod != tupTypmod)
 	{
 		MemSet(my_extra, 0,
-			   offsetof(RecordIOData, columns) +
-			   ncolumns * sizeof(ColumnIOData));
+			   sizeof(RecordIOData) - sizeof(ColumnIOData)
+			   + ncolumns * sizeof(ColumnIOData));
 		my_extra->record_type = tupType;
 		my_extra->record_typmod = tupTypmod;
 		my_extra->ncolumns = ncolumns;
@@ -677,8 +677,8 @@ record_send(PG_FUNCTION_ARGS)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordIOData, columns) +
-							   ncolumns * sizeof(ColumnIOData));
+							   sizeof(RecordIOData) - sizeof(ColumnIOData)
+							   + ncolumns * sizeof(ColumnIOData));
 		my_extra = (RecordIOData *) fcinfo->flinfo->fn_extra;
 		my_extra->record_type = InvalidOid;
 		my_extra->record_typmod = 0;
@@ -688,8 +688,8 @@ record_send(PG_FUNCTION_ARGS)
 		my_extra->record_typmod != tupTypmod)
 	{
 		MemSet(my_extra, 0,
-			   offsetof(RecordIOData, columns) +
-			   ncolumns * sizeof(ColumnIOData));
+			   sizeof(RecordIOData) - sizeof(ColumnIOData)
+			   + ncolumns * sizeof(ColumnIOData));
 		my_extra->record_type = tupType;
 		my_extra->record_typmod = tupTypmod;
 		my_extra->ncolumns = ncolumns;
@@ -829,8 +829,8 @@ record_cmp(FunctionCallInfo fcinfo)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordCompareData, columns) +
-							   ncols * sizeof(ColumnCompareData));
+						sizeof(RecordCompareData) - sizeof(ColumnCompareData)
+							   + ncols * sizeof(ColumnCompareData));
 		my_extra = (RecordCompareData *) fcinfo->flinfo->fn_extra;
 		my_extra->ncolumns = ncols;
 		my_extra->record1_type = InvalidOid;
@@ -1065,8 +1065,8 @@ record_eq(PG_FUNCTION_ARGS)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordCompareData, columns) +
-							   ncols * sizeof(ColumnCompareData));
+						sizeof(RecordCompareData) - sizeof(ColumnCompareData)
+							   + ncols * sizeof(ColumnCompareData));
 		my_extra = (RecordCompareData *) fcinfo->flinfo->fn_extra;
 		my_extra->ncolumns = ncols;
 		my_extra->record1_type = InvalidOid;
@@ -1324,8 +1324,8 @@ record_image_cmp(FunctionCallInfo fcinfo)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordCompareData, columns) +
-							   ncols * sizeof(ColumnCompareData));
+						sizeof(RecordCompareData) - sizeof(ColumnCompareData)
+							   + ncols * sizeof(ColumnCompareData));
 		my_extra = (RecordCompareData *) fcinfo->flinfo->fn_extra;
 		my_extra->ncolumns = ncols;
 		my_extra->record1_type = InvalidOid;
@@ -1601,8 +1601,8 @@ record_image_eq(PG_FUNCTION_ARGS)
 	{
 		fcinfo->flinfo->fn_extra =
 			MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
-							   offsetof(RecordCompareData, columns) +
-							   ncols * sizeof(ColumnCompareData));
+						sizeof(RecordCompareData) - sizeof(ColumnCompareData)
+							   + ncols * sizeof(ColumnCompareData));
 		my_extra = (RecordCompareData *) fcinfo->flinfo->fn_extra;
 		my_extra->ncolumns = ncols;
 		my_extra->record1_type = InvalidOid;
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 2fa30be..eb9eaf0 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3842,6 +3842,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
@@ -4498,7 +4502,10 @@ get_simple_values_rte(Query *query)
 	/*
 	 * We want to return TRUE even if the Query also contains OLD or NEW rule
 	 * RTEs.  So the idea is to scan the rtable and see if there is only one
-	 * inFromCl RTE that is a VALUES RTE.
+	 * inFromCl RTE that is a VALUES RTE.  We don't look at the targetlist at
+	 * all.  This is okay because parser/analyze.c will never generate a
+	 * "bare" VALUES RTE --- they only appear inside auto-generated
+	 * sub-queries with very restricted structure.
 	 */
 	foreach(lc, query->rtable)
 	{
@@ -4515,33 +4522,6 @@ get_simple_values_rte(Query *query)
 		else
 			return NULL;		/* something else -> not simple VALUES */
 	}
-
-	/*
-	 * We don't need to check the targetlist in any great detail, because
-	 * parser/analyze.c will never generate a "bare" VALUES RTE --- they only
-	 * appear inside auto-generated sub-queries with very restricted
-	 * structure.  However, DefineView might have modified the tlist by
-	 * injecting new column aliases; so compare tlist resnames against the
-	 * RTE's names to detect that.
-	 */
-	if (result)
-	{
-		ListCell   *lcn;
-
-		if (list_length(query->targetList) != list_length(result->eref->colnames))
-			return NULL;		/* this probably cannot happen */
-		forboth(lc, query->targetList, lcn, result->eref->colnames)
-		{
-			TargetEntry *tle = (TargetEntry *) lfirst(lc);
-			char	   *cname = strVal(lfirst(lcn));
-
-			if (tle->resjunk)
-				return NULL;	/* this probably cannot happen */
-			if (tle->resname == NULL || strcmp(tle->resname, cname) != 0)
-				return NULL;	/* column name has been changed */
-		}
-	}
-
 	return result;
 }
 
@@ -8541,9 +8521,7 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
 				break;
 			case RTE_VALUES:
 				/* Values list RTE */
-				appendStringInfoChar(buf, '(');
 				get_values_def(rte->values_lists, context);
-				appendStringInfoChar(buf, ')');
 				break;
 			case RTE_CTE:
 				appendStringInfoString(buf, quote_identifier(rte->ctename));
@@ -8585,11 +8563,6 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context)
 			 */
 			printalias = true;
 		}
-		else if (rte->rtekind == RTE_VALUES)
-		{
-			/* Alias is syntactically required for VALUES */
-			printalias = true;
-		}
 		else if (rte->rtekind == RTE_CTE)
 		{
 			/*
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 723c670..67e0cf9 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -27,7 +27,6 @@
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
-#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/scansup.h"
 #include "utils/array.h"
@@ -4875,87 +4874,6 @@ interval_part(PG_FUNCTION_ARGS)
 }
 
 
-/* timestamp_zone_transform()
- * If the zone argument of a timestamp_zone() or timestamptz_zone() call is a
- * plan-time constant denoting a zone equivalent to UTC, the call will always
- * return its second argument unchanged.  Simplify the expression tree
- * accordingly.  Civil time zones almost never qualify, because jurisdictions
- * that follow UTC today have not done so continuously.
- */
-Datum
-timestamp_zone_transform(PG_FUNCTION_ARGS)
-{
-	Node	   *func_node = (Node *) PG_GETARG_POINTER(0);
-	FuncExpr   *expr = (FuncExpr *) func_node;
-	Node	   *ret = NULL;
-	Node	   *zone_node;
-
-	Assert(IsA(expr, FuncExpr));
-	Assert(list_length(expr->args) == 2);
-
-	zone_node = (Node *) linitial(expr->args);
-
-	if (IsA(zone_node, Const) &&!((Const *) zone_node)->constisnull)
-	{
-		text	   *zone = DatumGetTextPP(((Const *) zone_node)->constvalue);
-		char		tzname[TZ_STRLEN_MAX + 1];
-		char	   *lowzone;
-		int			type,
-					abbrev_offset;
-		pg_tz	   *tzp;
-		bool		noop = false;
-
-		/*
-		 * If the timezone is forever UTC+0, the FuncExpr function call is a
-		 * no-op for all possible timestamps.  This passage mirrors code in
-		 * timestamp_zone().
-		 */
-		text_to_cstring_buffer(zone, tzname, sizeof(tzname));
-		lowzone = downcase_truncate_identifier(tzname,
-											   strlen(tzname),
-											   false);
-		type = DecodeTimezoneAbbrev(0, lowzone, &abbrev_offset, &tzp);
-		if (type == TZ || type == DTZ)
-			noop = (abbrev_offset == 0);
-		else if (type == DYNTZ)
-		{
-			/*
-			 * An abbreviation of a single-offset timezone ought not to be
-			 * configured as a DYNTZ, so don't bother checking.
-			 */
-		}
-		else
-		{
-			long		tzname_offset;
-
-			tzp = pg_tzset(tzname);
-			if (tzp && pg_get_timezone_offset(tzp, &tzname_offset))
-				noop = (tzname_offset == 0);
-		}
-
-		if (noop)
-		{
-			Node	   *timestamp = (Node *) lsecond(expr->args);
-
-			/* Strip any existing RelabelType node(s) */
-			while (timestamp && IsA(timestamp, RelabelType))
-				timestamp = (Node *) ((RelabelType *) timestamp)->arg;
-
-			/*
-			 * Replace the FuncExpr with its timestamp argument, relabeled as
-			 * though the function call had computed it.
-			 */
-			ret = (Node *) makeRelabelType((Expr *) timestamp,
-										   exprType(func_node),
-										   exprTypmod(func_node),
-										   exprCollation(func_node),
-										   COERCE_EXPLICIT_CAST);
-		}
-	}
-
-	PG_RETURN_POINTER(ret);
-}
-
 /*	timestamp_zone()
  *	Encode timestamp type with specified time zone.
  *	This function is just timestamp2timestamptz() except instead of
@@ -5045,52 +4963,6 @@ timestamp_zone(PG_FUNCTION_ARGS)
 	PG_RETURN_TIMESTAMPTZ(result);
 }
 
-/* timestamp_izone_transform()
- * If we deduce at plan time that a particular timestamp_izone() or
- * timestamptz_izone() call can only compute tz=0, the call will always return
- * its second argument unchanged.  Simplify the expression tree accordingly.
- */
-Datum
-timestamp_izone_transform(PG_FUNCTION_ARGS)
-{
-	Node	   *func_node = (Node *) PG_GETARG_POINTER(0);
-	FuncExpr   *expr = (FuncExpr *) func_node;
-	Node	   *ret = NULL;
-	Node	   *zone_node;
-
-	Assert(IsA(expr, FuncExpr));
-	Assert(list_length(expr->args) == 2);
-
-	zone_node = (Node *) linitial(expr->args);
-
-	if (IsA(zone_node, Const) &&!((Const *) zone_node)->constisnull)
-	{
-		Interval   *zone;
-
-		zone = DatumGetIntervalP(((Const *) zone_node)->constvalue);
-		if (zone->month == 0 && zone->day == 0 && zone->time == 0)
-		{
-			Node	   *timestamp = (Node *) lsecond(expr->args);
-
-			/* Strip any existing RelabelType node(s) */
-			while (timestamp && IsA(timestamp, RelabelType))
-				timestamp = (Node *) ((RelabelType *) timestamp)->arg;
-
-			/*
-			 * Replace the FuncExpr with its timestamp argument, relabeled as
-			 * though the function call had computed it.
-			 */
-			ret = (Node *) makeRelabelType((Expr *) timestamp,
-										   exprType(func_node),
-										   exprTypmod(func_node),
-										   exprCollation(func_node),
-										   COERCE_EXPLICIT_CAST);
-		}
-	}
-
-	PG_RETURN_POINTER(ret);
-}
-
 /* timestamp_izone()
  * Encode timestamp type with specified time interval as time zone.
  */
diff --git a/src/backend/utils/adt/trigfuncs.c b/src/backend/utils/adt/trigfuncs.c
index a8a75ef..fb79092 100644
--- a/src/backend/utils/adt/trigfuncs.c
+++ b/src/backend/utils/adt/trigfuncs.c
@@ -84,9 +84,9 @@ suppress_redundant_updates_trigger(PG_FUNCTION_ARGS)
 		 HeapTupleHeaderGetNatts(oldheader)) &&
 		((newheader->t_infomask & ~HEAP_XACT_MASK) ==
 		 (oldheader->t_infomask & ~HEAP_XACT_MASK)) &&
-		memcmp(((char *) newheader) + SizeofHeapTupleHeader,
-			   ((char *) oldheader) + SizeofHeapTupleHeader,
-			   newtuple->t_len - SizeofHeapTupleHeader) == 0)
+		memcmp(((char *) newheader) + offsetof(HeapTupleHeaderData, t_bits),
+			   ((char *) oldheader) + offsetof(HeapTupleHeaderData, t_bits),
+			   newtuple->t_len - offsetof(HeapTupleHeaderData, t_bits)) == 0)
 	{
 		/* ... then suppress the update */
 		rettuple = NULL;
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index 25132be..b56aa91 100644
--- a/src/backend/utils/adt/tsgistidx.c
+++ b/src/backend/utils/adt/tsgistidx.c
@@ -50,7 +50,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		flag;
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];
 } SignTSVector;
 
 #define ARRKEY		0x01
diff --git a/src/backend/utils/adt/tsrank.c b/src/backend/utils/adt/tsrank.c
index 733203e..8952d7f 100644
--- a/src/backend/utils/adt/tsrank.c
+++ b/src/backend/utils/adt/tsrank.c
@@ -195,12 +195,16 @@ SortAndUniqItems(TSQuery q, int *size)
 	return res;
 }
 
+/* A dummy WordEntryPos array to use when haspos is false */
+static WordEntryPosVector POSNULL = {
+	1,							/* Number of elements that follow */
+	{0}
+};
+
 static float
 calc_rank_and(const float *w, TSVector t, TSQuery q)
 {
 	WordEntryPosVector **pos;
-	WordEntryPosVector1 posnull;
-	WordEntryPosVector *POSNULL;
 	int			i,
 				k,
 				l,
@@ -224,12 +228,7 @@ calc_rank_and(const float *w, TSVector t, TSQuery q)
 		return calc_rank_or(w, t, q);
 	}
 	pos = (WordEntryPosVector **) palloc0(sizeof(WordEntryPosVector *) * q->size);
-
-	/* A dummy WordEntryPos array to use when haspos is false */
-	posnull.npos = 1;
-	posnull.pos[0] = 0;
-	WEP_SETPOS(posnull.pos[0], MAXENTRYPOS - 1);
-	POSNULL = (WordEntryPosVector *) &posnull;
+	WEP_SETPOS(POSNULL.pos[0], MAXENTRYPOS - 1);
 
 	for (i = 0; i < size; i++)
 	{
@@ -242,7 +241,7 @@ calc_rank_and(const float *w, TSVector t, TSQuery q)
 			if (entry->haspos)
 				pos[i] = _POSVECPTR(t, entry);
 			else
-				pos[i] = POSNULL;
+				pos[i] = &POSNULL;
 
 			dimt = pos[i]->npos;
 			post = pos[i]->pos;
@@ -257,7 +256,7 @@ calc_rank_and(const float *w, TSVector t, TSQuery q)
 					for (p = 0; p < lenct; p++)
 					{
 						dist = Abs((int) WEP_GETPOS(post[l]) - (int) WEP_GETPOS(ct[p]));
-						if (dist || (dist == 0 && (pos[i] == POSNULL || pos[k] == POSNULL)))
+						if (dist || (dist == 0 && (pos[i] == &POSNULL || pos[k] == &POSNULL)))
 						{
 							float		curw;
 
@@ -283,7 +282,6 @@ calc_rank_or(const float *w, TSVector t, TSQuery q)
 {
 	WordEntry  *entry,
 			   *firstentry;
-	WordEntryPosVector1 posnull;
 	WordEntryPos *post;
 	int32		dimt,
 				j,
@@ -293,10 +291,6 @@ calc_rank_or(const float *w, TSVector t, TSQuery q)
 	QueryOperand **item;
 	int			size = q->size;
 
-	/* A dummy WordEntryPos array to use when haspos is false */
-	posnull.npos = 1;
-	posnull.pos[0] = 0;
-
 	item = SortAndUniqItems(q, &size);
 
 	for (i = 0; i < size; i++)
@@ -318,8 +312,8 @@ calc_rank_or(const float *w, TSVector t, TSQuery q)
 			}
 			else
 			{
-				dimt = posnull.npos;
-				post = posnull.pos;
+				dimt = POSNULL.npos;
+				post = POSNULL.pos;
 			}
 
 			resj = 0.0;
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 266a728..3ac15f4 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -44,7 +44,7 @@ typedef struct StatEntry
 	struct StatEntry *left;
 	struct StatEntry *right;
 	uint32		lenlexeme;
-	char		lexeme[FLEXIBLE_ARRAY_MEMBER];
+	char		lexeme[1];
 } StatEntry;
 
 #define STATENTRYHDRSZ	(offsetof(StatEntry, lexeme))
diff --git a/src/backend/utils/adt/txid.c b/src/backend/utils/adt/txid.c
index f973ef9..8c7fe70 100644
--- a/src/backend/utils/adt/txid.c
+++ b/src/backend/utils/adt/txid.c
@@ -64,8 +64,7 @@ typedef struct
 	uint32		nxip;			/* number of txids in xip array */
 	txid		xmin;
 	txid		xmax;
-	/* in-progress txids, xmin <= xip[i] < xmax: */
-	txid		xip[FLEXIBLE_ARRAY_MEMBER];
+	txid		xip[1];			/* in-progress txids, xmin <= xip[i] < xmax */
 } TxidSnapshot;
 
 #define TXID_SNAPSHOT_SIZE(nxip) \
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 8bb7144..bfe9447 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3948,7 +3948,7 @@ xpath(PG_FUNCTION_ARGS)
 	ArrayType  *namespaces = PG_GETARG_ARRAYTYPE_P(2);
 	ArrayBuildState *astate;
 
-	astate = initArrayResult(XMLOID, CurrentMemoryContext, true);
+	astate = initArrayResult(XMLOID, CurrentMemoryContext);
 	xpath_internal(xpath_expr_text, data, namespaces,
 				   NULL, astate);
 	PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate, CurrentMemoryContext));
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 1af43c6..2e4d0b3 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1590,7 +1590,7 @@ SearchCatCacheList(CatCache *cache,
 		oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
 		nmembers = list_length(ctlist);
 		cl = (CatCList *)
-			palloc(offsetof(CatCList, members) +nmembers * sizeof(CatCTup *));
+			palloc(sizeof(CatCList) + nmembers * sizeof(CatCTup *));
 		heap_copytuple_with_tuple(ntp, &cl->tuple);
 		MemoryContextSwitchTo(oldcxt);
 		heap_freetuple(ntp);
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 8826a5d..0f2192c 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -122,8 +122,8 @@ typedef struct InvalidationChunk
 	struct InvalidationChunk *next;		/* list link */
 	int			nitems;			/* # items currently stored in chunk */
 	int			maxitems;		/* size of allocated array in this chunk */
-	SharedInvalidationMessage msgs[FLEXIBLE_ARRAY_MEMBER];
-} InvalidationChunk;
+	SharedInvalidationMessage msgs[1];	/* VARIABLE LENGTH ARRAY */
+} InvalidationChunk;			/* VARIABLE LENGTH STRUCTURE */
 
 typedef struct InvalidationListHeader
 {
@@ -225,8 +225,8 @@ AddInvalidationMessage(InvalidationChunk **listHdr,
 #define FIRSTCHUNKSIZE 32
 		chunk = (InvalidationChunk *)
 			MemoryContextAlloc(CurTransactionContext,
-							   offsetof(InvalidationChunk, msgs) +
-					FIRSTCHUNKSIZE * sizeof(SharedInvalidationMessage));
+							   sizeof(InvalidationChunk) +
+					(FIRSTCHUNKSIZE - 1) *sizeof(SharedInvalidationMessage));
 		chunk->nitems = 0;
 		chunk->maxitems = FIRSTCHUNKSIZE;
 		chunk->next = *listHdr;
@@ -239,8 +239,8 @@ AddInvalidationMessage(InvalidationChunk **listHdr,
 
 		chunk = (InvalidationChunk *)
 			MemoryContextAlloc(CurTransactionContext,
-							   offsetof(InvalidationChunk, msgs) +
-						 chunksize * sizeof(SharedInvalidationMessage));
+							   sizeof(InvalidationChunk) +
+						 (chunksize - 1) *sizeof(SharedInvalidationMessage));
 		chunk->nitems = 0;
 		chunk->maxitems = chunksize;
 		chunk->next = *listHdr;
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 44b5937..82b6668 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -18,16 +18,15 @@
  *
  * Once created, a type cache entry lives as long as the backend does, so
  * there is no need for a call to release a cache entry.  If the type is
- * dropped, the cache entry simply becomes wasted storage.  This is not
- * expected to happen often, and assuming that typcache entries are good
- * permanently allows caching pointers to them in long-lived places.
+ * dropped, the cache entry simply becomes wasted storage.  (For present uses,
+ * it would be okay to flush type cache entries at the ends of transactions,
+ * if we needed to reclaim space.)
  *
  * We have some provisions for updating cache entries if the stored data
  * becomes obsolete.  Information dependent on opclasses is cleared if we
  * detect updates to pg_opclass.  We also support clearing the tuple
  * descriptor and operator/function parts of a rowtype's cache entry,
  * since those may need to change as a consequence of ALTER TABLE.
- * Domain constraint changes are also tracked properly.
  *
  *
  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
@@ -47,20 +46,16 @@
 #include "access/htup_details.h"
 #include "access/nbtree.h"
 #include "catalog/indexing.h"
-#include "catalog/pg_constraint.h"
 #include "catalog/pg_enum.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_range.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
-#include "executor/executor.h"
-#include "optimizer/planner.h"
 #include "utils/builtins.h"
 #include "utils/catcache.h"
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
-#include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -70,9 +65,6 @@
 /* The main type cache hashtable searched by lookup_type_cache */
 static HTAB *TypeCacheHash = NULL;
 
-/* List of type cache entries for domain types */
-static TypeCacheEntry *firstDomainTypeEntry = NULL;
-
 /* Private flag bits in the TypeCacheEntry.flags field */
 #define TCFLAGS_CHECKED_BTREE_OPCLASS		0x0001
 #define TCFLAGS_CHECKED_HASH_OPCLASS		0x0002
@@ -88,19 +80,6 @@ static TypeCacheEntry *firstDomainTypeEntry = NULL;
 #define TCFLAGS_CHECKED_FIELD_PROPERTIES	0x0800
 #define TCFLAGS_HAVE_FIELD_EQUALITY			0x1000
 #define TCFLAGS_HAVE_FIELD_COMPARE			0x2000
-#define TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS	0x4000
-
-/*
- * Data stored about a domain type's constraints.  Note that we do not create
- * this struct for the common case of a constraint-less domain; we just set
- * domainData to NULL to indicate that.
- */
-struct DomainConstraintCache
-{
-	List	   *constraints;	/* list of DomainConstraintState nodes */
-	MemoryContext dccContext;	/* memory context holding all associated data */
-	long		dccRefCount;	/* number of references to this struct */
-};
 
 /* Private information to support comparisons of enum values */
 typedef struct
@@ -114,7 +93,7 @@ typedef struct TypeCacheEnumData
 	Oid			bitmap_base;	/* OID corresponding to bit 0 of bitmapset */
 	Bitmapset  *sorted_values;	/* Set of OIDs known to be in order */
 	int			num_values;		/* total number of values in enum */
-	EnumItem	enum_values[FLEXIBLE_ARRAY_MEMBER];
+	EnumItem	enum_values[1]; /* VARIABLE LENGTH ARRAY */
 } TypeCacheEnumData;
 
 /*
@@ -148,9 +127,6 @@ static int32 NextRecordTypmod = 0;		/* number of entries used */
 
 static void load_typcache_tupdesc(TypeCacheEntry *typentry);
 static void load_rangetype_info(TypeCacheEntry *typentry);
-static void load_domaintype_info(TypeCacheEntry *typentry);
-static void decr_dcc_refcount(DomainConstraintCache *dcc);
-static void dccref_deletion_callback(void *arg);
 static bool array_element_has_equality(TypeCacheEntry *typentry);
 static bool array_element_has_compare(TypeCacheEntry *typentry);
 static bool array_element_has_hashing(TypeCacheEntry *typentry);
@@ -160,7 +136,6 @@ static bool record_fields_have_compare(TypeCacheEntry *typentry);
 static void cache_record_field_properties(TypeCacheEntry *typentry);
 static void TypeCacheRelCallback(Datum arg, Oid relid);
 static void TypeCacheOpcCallback(Datum arg, int cacheid, uint32 hashvalue);
-static void TypeCacheConstrCallback(Datum arg, int cacheid, uint32 hashvalue);
 static void load_enum_cache_data(TypeCacheEntry *tcache);
 static EnumItem *find_enumitem(TypeCacheEnumData *enumdata, Oid arg);
 static int	enum_oid_cmp(const void *left, const void *right);
@@ -197,8 +172,6 @@ lookup_type_cache(Oid type_id, int flags)
 		/* Also set up callbacks for SI invalidations */
 		CacheRegisterRelcacheCallback(TypeCacheRelCallback, (Datum) 0);
 		CacheRegisterSyscacheCallback(CLAOID, TypeCacheOpcCallback, (Datum) 0);
-		CacheRegisterSyscacheCallback(CONSTROID, TypeCacheConstrCallback, (Datum) 0);
-		CacheRegisterSyscacheCallback(TYPEOID, TypeCacheConstrCallback, (Datum) 0);
 
 		/* Also make sure CacheMemoryContext exists */
 		if (!CacheMemoryContext)
@@ -244,13 +217,6 @@ lookup_type_cache(Oid type_id, int flags)
 		typentry->typtype = typtup->typtype;
 		typentry->typrelid = typtup->typrelid;
 
-		/* If it's a domain, immediately thread it into the domain cache list */
-		if (typentry->typtype == TYPTYPE_DOMAIN)
-		{
-			typentry->nextDomain = firstDomainTypeEntry;
-			firstDomainTypeEntry = typentry;
-		}
-
 		ReleaseSysCache(tp);
 	}
 
@@ -537,16 +503,6 @@ lookup_type_cache(Oid type_id, int flags)
 		load_rangetype_info(typentry);
 	}
 
-	/*
-	 * If requested, get information about a domain type
-	 */
-	if ((flags & TYPECACHE_DOMAIN_INFO) &&
-		(typentry->flags & TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS) == 0 &&
-		typentry->typtype == TYPTYPE_DOMAIN)
-	{
-		load_domaintype_info(typentry);
-	}
-
 	return typentry;
 }
 
@@ -636,327 +592,6 @@ load_rangetype_info(TypeCacheEntry *typentry)
 
 
 /*
- * load_domaintype_info --- helper routine to set up domain constraint info
- *
- * Note: we assume we're called in a relatively short-lived context, so it's
- * okay to leak data into the current context while scanning pg_constraint.
- * We build the new DomainConstraintCache data in a context underneath
- * CurrentMemoryContext, and reparent it under CacheMemoryContext when
- * complete.
- */
-static void
-load_domaintype_info(TypeCacheEntry *typentry)
-{
-	Oid			typeOid = typentry->type_id;
-	DomainConstraintCache *dcc;
-	bool		notNull = false;
-	Relation	conRel;
-	MemoryContext oldcxt;
-
-	/*
-	 * If we're here, any existing constraint info is stale, so release it.
-	 * For safety, be sure to null the link before trying to delete the data.
-	 */
-	if (typentry->domainData)
-	{
-		dcc = typentry->domainData;
-		typentry->domainData = NULL;
-		decr_dcc_refcount(dcc);
-	}
-
-	/*
-	 * We try to optimize the common case of no domain constraints, so don't
-	 * create the dcc object and context until we find a constraint.
-	 */
-	dcc = NULL;
-
-	/*
-	 * Scan pg_constraint for relevant constraints.  We want to find
-	 * constraints for not just this domain, but any ancestor domains, so the
-	 * outer loop crawls up the domain stack.
-	 */
-	conRel = heap_open(ConstraintRelationId, AccessShareLock);
-
-	for (;;)
-	{
-		HeapTuple	tup;
-		HeapTuple	conTup;
-		Form_pg_type typTup;
-		ScanKeyData key[1];
-		SysScanDesc scan;
-
-		tup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typeOid));
-		if (!HeapTupleIsValid(tup))
-			elog(ERROR, "cache lookup failed for type %u", typeOid);
-		typTup = (Form_pg_type) GETSTRUCT(tup);
-
-		if (typTup->typtype != TYPTYPE_DOMAIN)
-		{
-			/* Not a domain, so done */
-			ReleaseSysCache(tup);
-			break;
-		}
-
-		/* Test for NOT NULL Constraint */
-		if (typTup->typnotnull)
-			notNull = true;
-
-		/* Look for CHECK Constraints on this domain */
-		ScanKeyInit(&key[0],
-					Anum_pg_constraint_contypid,
-					BTEqualStrategyNumber, F_OIDEQ,
-					ObjectIdGetDatum(typeOid));
-
-		scan = systable_beginscan(conRel, ConstraintTypidIndexId, true,
-								  NULL, 1, key);
-
-		while (HeapTupleIsValid(conTup = systable_getnext(scan)))
-		{
-			Form_pg_constraint c = (Form_pg_constraint) GETSTRUCT(conTup);
-			Datum		val;
-			bool		isNull;
-			char	   *constring;
-			Expr	   *check_expr;
-			DomainConstraintState *r;
-
-			/* Ignore non-CHECK constraints (presently, shouldn't be any) */
-			if (c->contype != CONSTRAINT_CHECK)
-				continue;
-
-			/* Not expecting conbin to be NULL, but we'll test for it anyway */
-			val = fastgetattr(conTup, Anum_pg_constraint_conbin,
-							  conRel->rd_att, &isNull);
-			if (isNull)
-				elog(ERROR, "domain \"%s\" constraint \"%s\" has NULL conbin",
-					 NameStr(typTup->typname), NameStr(c->conname));
-
-			/* Convert conbin to C string in caller context */
-			constring = TextDatumGetCString(val);
-
-			/* Create the DomainConstraintCache object and context if needed */
-			if (dcc == NULL)
-			{
-				MemoryContext cxt;
-
-				cxt = AllocSetContextCreate(CurrentMemoryContext,
-											"Domain constraints",
-											ALLOCSET_SMALL_INITSIZE,
-											ALLOCSET_SMALL_MINSIZE,
-											ALLOCSET_SMALL_MAXSIZE);
-				dcc = (DomainConstraintCache *)
-					MemoryContextAlloc(cxt, sizeof(DomainConstraintCache));
-				dcc->constraints = NIL;
-				dcc->dccContext = cxt;
-				dcc->dccRefCount = 0;
-			}
-
-			/* Create node trees in DomainConstraintCache's context */
-			oldcxt = MemoryContextSwitchTo(dcc->dccContext);
-
-			check_expr = (Expr *) stringToNode(constring);
-
-			/* ExecInitExpr assumes we've planned the expression */
-			check_expr = expression_planner(check_expr);
-
-			r = makeNode(DomainConstraintState);
-			r->constrainttype = DOM_CONSTRAINT_CHECK;
-			r->name = pstrdup(NameStr(c->conname));
-			r->check_expr = ExecInitExpr(check_expr, NULL);
-
-			/*
-			 * Use lcons() here because constraints of parent domains should
-			 * be applied earlier.
-			 */
-			dcc->constraints = lcons(r, dcc->constraints);
-
-			MemoryContextSwitchTo(oldcxt);
-		}
-
-		systable_endscan(scan);
-
-		/* loop to next domain in stack */
-		typeOid = typTup->typbasetype;
-		ReleaseSysCache(tup);
-	}
-
-	heap_close(conRel, AccessShareLock);
-
-	/*
-	 * Only need to add one NOT NULL check regardless of how many domains in
-	 * the stack request it.
-	 */
-	if (notNull)
-	{
-		DomainConstraintState *r;
-
-		/* Create the DomainConstraintCache object and context if needed */
-		if (dcc == NULL)
-		{
-			MemoryContext cxt;
-
-			cxt = AllocSetContextCreate(CurrentMemoryContext,
-										"Domain constraints",
-										ALLOCSET_SMALL_INITSIZE,
-										ALLOCSET_SMALL_MINSIZE,
-										ALLOCSET_SMALL_MAXSIZE);
-			dcc = (DomainConstraintCache *)
-				MemoryContextAlloc(cxt, sizeof(DomainConstraintCache));
-			dcc->constraints = NIL;
-			dcc->dccContext = cxt;
-			dcc->dccRefCount = 0;
-		}
-
-		/* Create node trees in DomainConstraintCache's context */
-		oldcxt = MemoryContextSwitchTo(dcc->dccContext);
-
-		r = makeNode(DomainConstraintState);
-
-		r->constrainttype = DOM_CONSTRAINT_NOTNULL;
-		r->name = pstrdup("NOT NULL");
-		r->check_expr = NULL;
-
-		/* lcons to apply the nullness check FIRST */
-		dcc->constraints = lcons(r, dcc->constraints);
-
-		MemoryContextSwitchTo(oldcxt);
-	}
-
-	/*
-	 * If we made a constraint object, move it into CacheMemoryContext and
-	 * attach it to the typcache entry.
-	 */
-	if (dcc)
-	{
-		MemoryContextSetParent(dcc->dccContext, CacheMemoryContext);
-		typentry->domainData = dcc;
-		dcc->dccRefCount++;		/* count the typcache's reference */
-	}
-
-	/* Either way, the typcache entry's domain data is now valid. */
-	typentry->flags |= TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS;
-}
-
-/*
- * decr_dcc_refcount --- decrement a DomainConstraintCache's refcount,
- * and free it if no references remain
- */
-static void
-decr_dcc_refcount(DomainConstraintCache *dcc)
-{
-	Assert(dcc->dccRefCount > 0);
-	if (--(dcc->dccRefCount) <= 0)
-		MemoryContextDelete(dcc->dccContext);
-}
-
-/*
- * Context reset/delete callback for a DomainConstraintRef
- */
-static void
-dccref_deletion_callback(void *arg)
-{
-	DomainConstraintRef *ref = (DomainConstraintRef *) arg;
-	DomainConstraintCache *dcc = ref->dcc;
-
-	/* Paranoia --- be sure link is nulled before trying to release */
-	if (dcc)
-	{
-		ref->constraints = NIL;
-		ref->dcc = NULL;
-		decr_dcc_refcount(dcc);
-	}
-}
-
-/*
- * InitDomainConstraintRef --- initialize a DomainConstraintRef struct
- *
- * Caller must tell us the MemoryContext in which the DomainConstraintRef
- * lives.  The ref will be cleaned up when that context is reset/deleted.
- */
-void
-InitDomainConstraintRef(Oid type_id, DomainConstraintRef *ref,
-						MemoryContext refctx)
-{
-	/* Look up the typcache entry --- we assume it survives indefinitely */
-	ref->tcache = lookup_type_cache(type_id, TYPECACHE_DOMAIN_INFO);
-	/* For safety, establish the callback before acquiring a refcount */
-	ref->dcc = NULL;
-	ref->callback.func = dccref_deletion_callback;
-	ref->callback.arg = (void *) ref;
-	MemoryContextRegisterResetCallback(refctx, &ref->callback);
-	/* Acquire refcount if there are constraints, and set up exported list */
-	if (ref->tcache->domainData)
-	{
-		ref->dcc = ref->tcache->domainData;
-		ref->dcc->dccRefCount++;
-		ref->constraints = ref->dcc->constraints;
-	}
-	else
-		ref->constraints = NIL;
-}
-
-/*
- * UpdateDomainConstraintRef --- recheck validity of domain constraint info
- *
- * If the domain's constraint set changed, ref->constraints is updated to
- * point at a new list of cached constraints.
- *
- * In the normal case where nothing happened to the domain, this is cheap
- * enough that it's reasonable (and expected) to check before *each* use
- * of the constraint info.
- */
-void
-UpdateDomainConstraintRef(DomainConstraintRef *ref)
-{
-	TypeCacheEntry *typentry = ref->tcache;
-
-	/* Make sure typcache entry's data is up to date */
-	if ((typentry->flags & TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS) == 0 &&
-		typentry->typtype == TYPTYPE_DOMAIN)
-		load_domaintype_info(typentry);
-
-	/* Transfer to ref object if there's new info, adjusting refcounts */
-	if (ref->dcc != typentry->domainData)
-	{
-		/* Paranoia --- be sure link is nulled before trying to release */
-		DomainConstraintCache *dcc = ref->dcc;
-
-		if (dcc)
-		{
-			ref->constraints = NIL;
-			ref->dcc = NULL;
-			decr_dcc_refcount(dcc);
-		}
-		dcc = typentry->domainData;
-		if (dcc)
-		{
-			ref->dcc = dcc;
-			dcc->dccRefCount++;
-			ref->constraints = dcc->constraints;
-		}
-	}
-}
-
-/*
- * DomainHasConstraints --- utility routine to check if a domain has constraints
- *
- * This is defined to return false, not fail, if type is not a domain.
- */
-bool
-DomainHasConstraints(Oid type_id)
-{
-	TypeCacheEntry *typentry;
-
-	/*
-	 * Note: a side effect is to cause the typcache's domain data to become
-	 * valid.  This is fine since we'll likely need it soon if there is any.
-	 */
-	typentry = lookup_type_cache(type_id, TYPECACHE_DOMAIN_INFO);
-
-	return (typentry->domainData != NULL);
-}
-
-
-/*
  * array_element_has_equality and friends are helper routines to check
  * whether we should believe that array_eq and related functions will work
  * on the given array type or composite type.
@@ -1368,40 +1003,6 @@ TypeCacheOpcCallback(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
-/*
- * TypeCacheConstrCallback
- *		Syscache inval callback function
- *
- * This is called when a syscache invalidation event occurs for any
- * pg_constraint or pg_type row.  We flush information about domain
- * constraints when this happens.
- *
- * It's slightly annoying that we can't tell whether the inval event was for a
- * domain constraint/type record or not; there's usually more update traffic
- * for table constraints/types than domain constraints, so we'll do a lot of
- * useless flushes.  Still, this is better than the old no-caching-at-all
- * approach to domain constraints.
- */
-static void
-TypeCacheConstrCallback(Datum arg, int cacheid, uint32 hashvalue)
-{
-	TypeCacheEntry *typentry;
-
-	/*
-	 * Because this is called very frequently, and typically very few of the
-	 * typcache entries are for domains, we don't use hash_seq_search here.
-	 * Instead we thread all the domain-type entries together so that we can
-	 * visit them cheaply.
-	 */
-	for (typentry = firstDomainTypeEntry;
-		 typentry != NULL;
-		 typentry = typentry->nextDomain)
-	{
-		/* Reset domain constraint validity information */
-		typentry->flags &= ~TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS;
-	}
-}
-
 
 /*
  * Check if given OID is part of the subset that's sortable by comparisons
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 7476a26..1b69322 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -51,7 +51,12 @@ typedef struct df_files
 	ino_t		inode;			/* Inode number of file */
 #endif
 	void	   *handle;			/* a handle for pg_dl* functions */
-	char		filename[FLEXIBLE_ARRAY_MEMBER];		/* Full pathname of file */
+	char		filename[1];	/* Full pathname of file */
+
+	/*
+	 * we allocate the block big enough for actual length of pathname.
+	 * filename[] must be last item in struct!
+	 */
 } DynamicFileList;
 
 static DynamicFileList *file_list = NULL;
@@ -212,13 +217,13 @@ internal_load_library(const char *libname)
 		 * File not loaded yet.
 		 */
 		file_scanner = (DynamicFileList *)
-			malloc(offsetof(DynamicFileList, filename) +strlen(libname) + 1);
+			malloc(sizeof(DynamicFileList) + strlen(libname));
 		if (file_scanner == NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_OUT_OF_MEMORY),
 					 errmsg("out of memory")));
 
-		MemSet(file_scanner, 0, offsetof(DynamicFileList, filename));
+		MemSet(file_scanner, 0, sizeof(DynamicFileList));
 		strcpy(file_scanner->filename, libname);
 		file_scanner->device = stat_buf.st_dev;
 #ifndef WIN32
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d84dba7..9572777 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -97,6 +97,20 @@
 #define CONFIG_EXEC_PARAMS_NEW "global/config_exec_params.new"
 #endif
 
+#define KB_PER_MB (1024)
+#define KB_PER_GB (1024*1024)
+#define KB_PER_TB (1024*1024*1024)
+
+#define MS_PER_S 1000
+#define S_PER_MIN 60
+#define MS_PER_MIN (1000 * 60)
+#define MIN_PER_H 60
+#define S_PER_H (60 * 60)
+#define MS_PER_H (1000 * 60 * 60)
+#define MIN_PER_D (60 * 24)
+#define S_PER_D (60 * 60 * 24)
+#define MS_PER_D (1000 * 60 * 60 * 24)
+
 /*
  * Precision with which REAL type guc values are to be printed for GUC
  * serialization.
@@ -652,96 +666,6 @@ const char *const config_type_names[] =
 	 /* PGC_ENUM */ "enum"
 };
 
-/*
- * Unit conversion tables.
- *
- * There are two tables, one for memory units, and another for time units.
- * For each supported conversion from one unit to another, we have an entry
- * in the table.
- *
- * To keep things simple, and to avoid intermediate-value overflows,
- * conversions are never chained.  There needs to be a direct conversion
- * between all units (of the same type).
- *
- * The conversions from each base unit must be kept in order from greatest
- * to smallest unit; convert_from_base_unit() relies on that.  (The order of
- * the base units does not matter.)
- */
-#define MAX_UNIT_LEN		3	/* length of longest recognized unit string */
-
-typedef struct
-{
-	char	unit[MAX_UNIT_LEN + 1];	/* unit, as a string, like "kB" or "min" */
-	int		base_unit;		/* GUC_UNIT_XXX */
-	int		multiplier;		/* If positive, multiply the value with this for
-							 * unit -> base_unit conversion.  If negative,
-							 * divide (with the absolute value) */
-} unit_conversion;
-
-/* Ensure that the constants in the tables don't overflow or underflow */
-#if BLCKSZ < 1024 || BLCKSZ > (1024*1024)
-#error BLCKSZ must be between 1KB and 1MB
-#endif
-#if XLOG_BLCKSZ < 1024 || XLOG_BLCKSZ > (1024*1024)
-#error XLOG_BLCKSZ must be between 1KB and 1MB
-#endif
-#if XLOG_SEG_SIZE < (1024*1024) || XLOG_BLCKSZ > (1024*1024*1024)
-#error XLOG_SEG_SIZE must be between 1MB and 1GB
-#endif
-
-static const char *memory_units_hint =
-	gettext_noop("Valid units for this parameter are \"kB\", \"MB\", \"GB\", and \"TB\".");
-
-static const unit_conversion memory_unit_conversion_table[] =
-{
-	{ "TB",		GUC_UNIT_KB,	 	1024*1024*1024 },
-	{ "GB",		GUC_UNIT_KB,	 	1024*1024 },
-	{ "MB",		GUC_UNIT_KB,	 	1024 },
-	{ "kB",		GUC_UNIT_KB,	 	1 },
-
-	{ "TB",		GUC_UNIT_BLOCKS,	(1024*1024*1024) / (BLCKSZ / 1024) },
-	{ "GB",		GUC_UNIT_BLOCKS,	(1024*1024) / (BLCKSZ / 1024) },
-	{ "MB",		GUC_UNIT_BLOCKS,	1024 / (BLCKSZ / 1024) },
-	{ "kB",		GUC_UNIT_BLOCKS,	-(BLCKSZ / 1024) },
-
-	{ "TB",		GUC_UNIT_XBLOCKS,	(1024*1024*1024) / (XLOG_BLCKSZ / 1024) },
-	{ "GB",		GUC_UNIT_XBLOCKS,	(1024*1024) / (XLOG_BLCKSZ / 1024) },
-	{ "MB",		GUC_UNIT_XBLOCKS,	1024 / (XLOG_BLCKSZ / 1024) },
-	{ "kB",		GUC_UNIT_XBLOCKS,	-(XLOG_BLCKSZ / 1024) },
-
-	{ "TB",		GUC_UNIT_XSEGS,		(1024*1024*1024) / (XLOG_SEG_SIZE / 1024) },
-	{ "GB",		GUC_UNIT_XSEGS,		(1024*1024) / (XLOG_SEG_SIZE / 1024) },
-	{ "MB",		GUC_UNIT_XSEGS,		-(XLOG_SEG_SIZE / (1024 * 1024)) },
-	{ "kB",		GUC_UNIT_XSEGS,		-(XLOG_SEG_SIZE / 1024) },
-
-	{ "" }		/* end of table marker */
-};
-
-static const char *time_units_hint =
-	gettext_noop("Valid units for this parameter are \"ms\", \"s\", \"min\", \"h\", and \"d\".");
-
-static const unit_conversion time_unit_conversion_table[] =
-{
-	{ "d",		GUC_UNIT_MS,	1000 * 60 * 60 * 24 },
-	{ "h",		GUC_UNIT_MS,	1000 * 60 * 60 },
-	{ "min", 	GUC_UNIT_MS,	1000 * 60},
-	{ "s",		GUC_UNIT_MS,	1000 },
-	{ "ms",		GUC_UNIT_MS,	1 },
-
-	{ "d",		GUC_UNIT_S,		60 * 60 * 24 },
-	{ "h",		GUC_UNIT_S,		60 * 60 },
-	{ "min", 	GUC_UNIT_S,		60 },
-	{ "s",		GUC_UNIT_S,		1 },
-	{ "ms", 	GUC_UNIT_S,	 	-1000 },
-
-	{ "d", 		GUC_UNIT_MIN,	60 * 24 },
-	{ "h", 		GUC_UNIT_MIN,	60 },
-	{ "min", 	GUC_UNIT_MIN,	1 },
-	{ "s", 		GUC_UNIT_MIN,	-60 },
-	{ "ms", 	GUC_UNIT_MIN,	-1000 * 60 },
-
-	{ "" }		/* end of table marker */
-};
 
 /*
  * Contents of GUC tables
@@ -2154,28 +2078,16 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"min_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
-			gettext_noop("Sets the minimum size to shrink the WAL to."),
-			NULL,
-			GUC_UNIT_XSEGS
+		{"checkpoint_segments", PGC_SIGHUP, WAL_CHECKPOINTS,
+			gettext_noop("Sets the maximum distance in log segments between automatic WAL checkpoints."),
+			NULL
 		},
-		&min_wal_size,
-		5, 2, INT_MAX,
+		&CheckPointSegments,
+		3, 1, INT_MAX,
 		NULL, NULL, NULL
 	},
 
 	{
-		{"max_wal_size", PGC_SIGHUP, WAL_CHECKPOINTS,
-			gettext_noop("Sets the WAL size that triggers a checkpoint."),
-			NULL,
-			GUC_UNIT_XSEGS
-		},
-		&max_wal_size,
-		8, 2, INT_MAX,
-		NULL, assign_max_wal_size, NULL
-	},
-
-	{
 		{"checkpoint_timeout", PGC_SIGHUP, WAL_CHECKPOINTS,
 			gettext_noop("Sets the maximum time between automatic WAL checkpoints."),
 			NULL,
@@ -2452,18 +2364,6 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"wal_retrieve_retry_interval", PGC_SIGHUP, REPLICATION_STANDBY,
-			gettext_noop("Sets the time to wait before retrying to retrieve WAL"
-						 "after a failed attempt."),
-			NULL,
-			GUC_UNIT_MS
-		},
-		&wal_retrieve_retry_interval,
-		5000, 1, INT_MAX,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS,
 			gettext_noop("Shows the number of pages per write ahead log segment."),
 			NULL,
@@ -5118,88 +5018,6 @@ ReportGUCOption(struct config_generic * record)
 }
 
 /*
- * Convert a value from one of the human-friendly units ("kB", "min" etc.)
- * to the given base unit.  'value' and 'unit' are the input value and unit
- * to convert from.  The converted value is stored in *base_value.
- *
- * Returns true on success, false if the input unit is not recognized.
- */
-static bool
-convert_to_base_unit(int64 value, const char *unit,
-					 int base_unit, int64 *base_value)
-{
-	const unit_conversion *table;
-	int 		i;
-
-	if (base_unit & GUC_UNIT_MEMORY)
-		table = memory_unit_conversion_table;
-	else
-		table = time_unit_conversion_table;
-
-	for (i = 0; *table[i].unit; i++)
-	{
-		if (base_unit == table[i].base_unit &&
-			strcmp(unit, table[i].unit) == 0)
-		{
-			if (table[i].multiplier < 0)
-				*base_value = value / (-table[i].multiplier);
-			else
-				*base_value = value * table[i].multiplier;
-			return true;
-		}
-	}
-	return false;
-}
-
-/*
- * Convert a value in some base unit to a human-friendly unit.  The output
- * unit is chosen so that it's the greatest unit that can represent the value
- * without loss.  For example, if the base unit is GUC_UNIT_KB, 1024 is
- * converted to 1 MB, but 1025 is represented as 1025 kB.
- */
-static void
-convert_from_base_unit(int64 base_value, int base_unit,
-					   int64 *value, const char **unit)
-{
-	const unit_conversion *table;
-	int			i;
-
-	*unit = NULL;
-
-	if (base_unit & GUC_UNIT_MEMORY)
-		table = memory_unit_conversion_table;
-	else
-		table = time_unit_conversion_table;
-
-	for (i = 0; *table[i].unit; i++)
-	{
-		if (base_unit == table[i].base_unit)
-		{
-			/*
-			 * Accept the first conversion that divides the value evenly.
-			 * We assume that the conversions for each base unit are ordered
-			 * from greatest unit to the smallest!
-			 */
-			if (table[i].multiplier < 0)
-			{
-				*value = base_value * (-table[i].multiplier);
-				*unit = table[i].unit;
-				break;
-			}
-			else if (base_value % table[i].multiplier == 0)
-			{
-				*value = base_value / table[i].multiplier;
-				*unit = table[i].unit;
-				break;
-			}
-		}
-	}
-
-	Assert(*unit != NULL);
-}
-
-
-/*
  * Try to parse value as an integer.  The accepted formats are the
  * usual decimal, octal, or hexadecimal formats, optionally followed by
  * a unit name if "flags" indicates a unit is allowed.
@@ -5242,38 +5060,171 @@ parse_int(const char *value, int *result, int flags, const char **hintmsg)
 	/* Handle possible unit */
 	if (*endptr != '\0')
 	{
-		char		unit[MAX_UNIT_LEN + 1];
-		int			unitlen;
-		bool		converted = false;
-
-		if ((flags & GUC_UNIT) == 0)
-			return false;	/* this setting does not accept a unit */
+		/*
+		 * Note: the multiple-switch coding technique here is a bit tedious,
+		 * but seems necessary to avoid intermediate-value overflows.
+		 */
+		if (flags & GUC_UNIT_MEMORY)
+		{
+			/* Set hint for use if no match or trailing garbage */
+			if (hintmsg)
+				*hintmsg = gettext_noop("Valid units for this parameter are \"kB\", \"MB\", \"GB\", and \"TB\".");
 
-		unitlen = 0;
-		while (*endptr != '\0' && !isspace((unsigned char) *endptr) &&
-			   unitlen < MAX_UNIT_LEN)
-			unit[unitlen++] = *(endptr++);
-		unit[unitlen] = '\0';
-		/* allow whitespace after unit */
-		while (isspace((unsigned char) *endptr))
-			endptr++;
+#if BLCKSZ < 1024 || BLCKSZ > (1024*1024)
+#error BLCKSZ must be between 1KB and 1MB
+#endif
+#if XLOG_BLCKSZ < 1024 || XLOG_BLCKSZ > (1024*1024)
+#error XLOG_BLCKSZ must be between 1KB and 1MB
+#endif
 
-		if (*endptr == '\0')
-			converted = convert_to_base_unit(val, unit, (flags & GUC_UNIT),
-											 &val);
-		if (!converted)
+			if (strncmp(endptr, "kB", 2) == 0)
+			{
+				endptr += 2;
+				switch (flags & GUC_UNIT_MEMORY)
+				{
+					case GUC_UNIT_BLOCKS:
+						val /= (BLCKSZ / 1024);
+						break;
+					case GUC_UNIT_XBLOCKS:
+						val /= (XLOG_BLCKSZ / 1024);
+						break;
+				}
+			}
+			else if (strncmp(endptr, "MB", 2) == 0)
+			{
+				endptr += 2;
+				switch (flags & GUC_UNIT_MEMORY)
+				{
+					case GUC_UNIT_KB:
+						val *= KB_PER_MB;
+						break;
+					case GUC_UNIT_BLOCKS:
+						val *= KB_PER_MB / (BLCKSZ / 1024);
+						break;
+					case GUC_UNIT_XBLOCKS:
+						val *= KB_PER_MB / (XLOG_BLCKSZ / 1024);
+						break;
+				}
+			}
+			else if (strncmp(endptr, "GB", 2) == 0)
+			{
+				endptr += 2;
+				switch (flags & GUC_UNIT_MEMORY)
+				{
+					case GUC_UNIT_KB:
+						val *= KB_PER_GB;
+						break;
+					case GUC_UNIT_BLOCKS:
+						val *= KB_PER_GB / (BLCKSZ / 1024);
+						break;
+					case GUC_UNIT_XBLOCKS:
+						val *= KB_PER_GB / (XLOG_BLCKSZ / 1024);
+						break;
+				}
+			}
+			else if (strncmp(endptr, "TB", 2) == 0)
+			{
+				endptr += 2;
+				switch (flags & GUC_UNIT_MEMORY)
+				{
+					case GUC_UNIT_KB:
+						val *= KB_PER_TB;
+						break;
+					case GUC_UNIT_BLOCKS:
+						val *= KB_PER_TB / (BLCKSZ / 1024);
+						break;
+					case GUC_UNIT_XBLOCKS:
+						val *= KB_PER_TB / (XLOG_BLCKSZ / 1024);
+						break;
+				}
+			}
+		}
+		else if (flags & GUC_UNIT_TIME)
 		{
-			/* invalid unit, or garbage after the unit; set hint and fail. */
+			/* Set hint for use if no match or trailing garbage */
 			if (hintmsg)
+				*hintmsg = gettext_noop("Valid units for this parameter are \"ms\", \"s\", \"min\", \"h\", and \"d\".");
+
+			if (strncmp(endptr, "ms", 2) == 0)
 			{
-				if (flags & GUC_UNIT_MEMORY)
-					*hintmsg = memory_units_hint;
-				else
-					*hintmsg = time_units_hint;
+				endptr += 2;
+				switch (flags & GUC_UNIT_TIME)
+				{
+					case GUC_UNIT_S:
+						val /= MS_PER_S;
+						break;
+					case GUC_UNIT_MIN:
+						val /= MS_PER_MIN;
+						break;
+				}
+			}
+			else if (strncmp(endptr, "s", 1) == 0)
+			{
+				endptr += 1;
+				switch (flags & GUC_UNIT_TIME)
+				{
+					case GUC_UNIT_MS:
+						val *= MS_PER_S;
+						break;
+					case GUC_UNIT_MIN:
+						val /= S_PER_MIN;
+						break;
+				}
+			}
+			else if (strncmp(endptr, "min", 3) == 0)
+			{
+				endptr += 3;
+				switch (flags & GUC_UNIT_TIME)
+				{
+					case GUC_UNIT_MS:
+						val *= MS_PER_MIN;
+						break;
+					case GUC_UNIT_S:
+						val *= S_PER_MIN;
+						break;
+				}
+			}
+			else if (strncmp(endptr, "h", 1) == 0)
+			{
+				endptr += 1;
+				switch (flags & GUC_UNIT_TIME)
+				{
+					case GUC_UNIT_MS:
+						val *= MS_PER_H;
+						break;
+					case GUC_UNIT_S:
+						val *= S_PER_H;
+						break;
+					case GUC_UNIT_MIN:
+						val *= MIN_PER_H;
+						break;
+				}
+			}
+			else if (strncmp(endptr, "d", 1) == 0)
+			{
+				endptr += 1;
+				switch (flags & GUC_UNIT_TIME)
+				{
+					case GUC_UNIT_MS:
+						val *= MS_PER_D;
+						break;
+					case GUC_UNIT_S:
+						val *= S_PER_D;
+						break;
+					case GUC_UNIT_MIN:
+						val *= MIN_PER_D;
+						break;
+				}
 			}
-			return false;
 		}
 
+		/* allow whitespace after unit */
+		while (isspace((unsigned char) *endptr))
+			endptr++;
+
+		if (*endptr != '\0')
+			return false;		/* appropriate hint, if any, already set */
+
 		/* Check for overflow due to units conversion */
 		if (val != (int64) ((int32) val))
 		{
@@ -8145,10 +8096,76 @@ _ShowOption(struct config_generic * record, bool use_units)
 					int64		result = *conf->variable;
 					const char *unit;
 
-					if (use_units && result > 0 && (record->flags & GUC_UNIT))
+					if (use_units && result > 0 &&
+						(record->flags & GUC_UNIT_MEMORY))
+					{
+						switch (record->flags & GUC_UNIT_MEMORY)
+						{
+							case GUC_UNIT_BLOCKS:
+								result *= BLCKSZ / 1024;
+								break;
+							case GUC_UNIT_XBLOCKS:
+								result *= XLOG_BLCKSZ / 1024;
+								break;
+						}
+
+						if (result % KB_PER_TB == 0)
+						{
+							result /= KB_PER_TB;
+							unit = "TB";
+						}
+						else if (result % KB_PER_GB == 0)
+						{
+							result /= KB_PER_GB;
+							unit = "GB";
+						}
+						else if (result % KB_PER_MB == 0)
+						{
+							result /= KB_PER_MB;
+							unit = "MB";
+						}
+						else
+						{
+							unit = "kB";
+						}
+					}
+					else if (use_units && result > 0 &&
+							 (record->flags & GUC_UNIT_TIME))
 					{
-						convert_from_base_unit(result, record->flags & GUC_UNIT,
-											   &result, &unit);
+						switch (record->flags & GUC_UNIT_TIME)
+						{
+							case GUC_UNIT_S:
+								result *= MS_PER_S;
+								break;
+							case GUC_UNIT_MIN:
+								result *= MS_PER_MIN;
+								break;
+						}
+
+						if (result % MS_PER_D == 0)
+						{
+							result /= MS_PER_D;
+							unit = "d";
+						}
+						else if (result % MS_PER_H == 0)
+						{
+							result /= MS_PER_H;
+							unit = "h";
+						}
+						else if (result % MS_PER_MIN == 0)
+						{
+							result /= MS_PER_MIN;
+							unit = "min";
+						}
+						else if (result % MS_PER_S == 0)
+						{
+							result /= MS_PER_S;
+							unit = "s";
+						}
+						else
+						{
+							unit = "ms";
+						}
 					}
 					else
 						unit = "";
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f8f9ce1..b053659 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -197,9 +197,8 @@
 
 # - Checkpoints -
 
+#checkpoint_segments = 3		# in logfile segments, min 1, 16MB each
 #checkpoint_timeout = 5min		# range 30s-1h
-#max_wal_size = 128MB			# in logfile segments
-#min_wal_size = 80MB
 #checkpoint_completion_target = 0.5	# checkpoint target duration, 0.0 - 1.0
 #checkpoint_warning = 30s		# 0 disables
 
@@ -261,8 +260,6 @@
 #wal_receiver_timeout = 60s		# time that receiver waits for
 					# communication from master
 					# in milliseconds; 0 disables
-#wal_retrieve_retry_interval = 5s	# time to wait before retrying to
-					# retrieve WAL after a failed attempt
 
 
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index 80a7b6a..45e610d 100644
--- a/src/backend/utils/mmgr/README
+++ b/src/backend/utils/mmgr/README
@@ -14,10 +14,10 @@ memory management system implemented in 7.1.
 Background
 ----------
 
-We do most of our memory allocation in "memory contexts", which are usually
-AllocSets as implemented by src/backend/utils/mmgr/aset.c.  The key to
-successful memory management without lots of overhead is to define a useful
-set of contexts with appropriate lifespans.
+We already do most of our memory allocation in "memory contexts", which
+are usually AllocSets as implemented by backend/utils/mmgr/aset.c.  What
+we need to do is create more contexts and define proper rules about when
+they can be freed.
 
 The basic operations on a memory context are:
 
@@ -32,7 +32,7 @@ The basic operations on a memory context are:
   context object itself)
 
 Given a chunk of memory previously allocated from a context, one can
-free it or reallocate it larger or smaller (corresponding to standard C
+free it or reallocate it larger or smaller (corresponding to standard
 library's free() and realloc() routines).  These operations return memory
 to or get more memory from the same context the chunk was originally
 allocated in.
@@ -46,11 +46,11 @@ so that the caller can restore the previous context before exiting).
 The main advantage of memory contexts over plain use of malloc/free is
 that the entire contents of a memory context can be freed easily, without
 having to request freeing of each individual chunk within it.  This is
-both faster and more reliable than per-chunk bookkeeping.  We use this
-fact to clean up at transaction end: by resetting all the active contexts
-of transaction or shorter lifespan, we can reclaim all transient memory.
-Similarly, we can clean up at the end of each query, or after each tuple
-is processed during a query.
+both faster and more reliable than per-chunk bookkeeping.  We already use
+this fact to clean up at transaction end: by resetting all the active
+contexts, we reclaim all memory.  What we need are additional contexts
+that can be reset or deleted at strategic times within a query, such as
+after each tuple.
 
 
 Some Notes About the palloc API Versus Standard C Library
@@ -64,9 +64,11 @@ are some notes to clarify the behavior.
 return NULL, and it is not necessary or useful to test for such a result.
 
 * palloc(0) is explicitly a valid operation.  It does not return a NULL
-pointer, but a valid chunk of which no bytes may be used.  However, the
+pointer, but a valid chunk of which no bytes may be used.  (However, the
 chunk might later be repalloc'd larger; it can also be pfree'd without
-error.  Similarly, repalloc allows realloc'ing to zero size.
+error.)  (Note: this behavior is new in Postgres 8.0; earlier versions
+disallowed palloc(0).  It seems more consistent to allow it, however.)
+Similarly, repalloc allows realloc'ing to zero size.
 
 * pfree and repalloc do not accept a NULL pointer.  This is intentional.
 
@@ -74,16 +76,20 @@ error.  Similarly, repalloc allows realloc'ing to zero size.
 pfree/repalloc No Longer Depend On CurrentMemoryContext
 -------------------------------------------------------
 
-Since Postgres 7.1, pfree() and repalloc() can be applied to any chunk
+In this proposal, pfree() and repalloc() can be applied to any chunk
 whether it belongs to CurrentMemoryContext or not --- the chunk's owning
 context will be invoked to handle the operation, regardless.  This is a
 change from the old requirement that CurrentMemoryContext must be set
 to the same context the memory was allocated from before one can use
-pfree() or repalloc().
+pfree() or repalloc().  The old coding requirement is obviously fairly
+error-prone, and will become more so the more context-switching we do;
+so I think it's essential to use CurrentMemoryContext only for palloc.
+We can avoid needing it for pfree/repalloc by putting restrictions on
+context managers as discussed below.
 
-There was some consideration of getting rid of CurrentMemoryContext entirely,
+We could even consider getting rid of CurrentMemoryContext entirely,
 instead requiring the target memory context for allocation to be specified
-explicitly.  But we decided that would be too much notational overhead ---
+explicitly.  But I think that would be too much notational overhead ---
 we'd have to pass an appropriate memory context to called routines in
 many places.  For example, the copyObject routines would need to be passed
 a context, as would function execution routines that return a
@@ -94,27 +100,18 @@ a context to use for any temporary memory allocation you might want to
 do".  So there'd still need to be a global variable specifying a suitable
 temporary-allocation context.  That might as well be CurrentMemoryContext.
 
-The upshot of that reasoning, though, is that CurrentMemoryContext should
-generally point at a short-lifespan context if at all possible.  During
-query execution it usually points to a context that gets reset after each
-tuple.  Only in *very* circumscribed code should it ever point at a
-context having greater than transaction lifespan, since doing so risks
-permanent memory leaks.
-
 
 Additions to the Memory-Context Mechanism
 -----------------------------------------
 
-Before 7.1 memory contexts were all independent, but it was too hard to
-keep track of them; with lots of contexts there needs to be explicit
-mechanism for that.
+If we are going to have more contexts, we need more mechanism for keeping
+track of them; else we risk leaking whole contexts under error conditions.
 
-We solved this by creating a tree of "parent" and "child" contexts.  When
+We can do this by creating trees of "parent" and "child" contexts.  When
 creating a memory context, the new context can be specified to be a child
 of some existing context.  A context can have many children, but only one
 parent.  In this way the contexts form a forest (not necessarily a single
-tree, since there could be more than one top-level context; although in
-current practice there is only one top context, TopMemoryContext).
+tree, since there could be more than one top-level context).
 
 We then say that resetting or deleting any particular context resets or
 deletes all its direct and indirect children as well.  This feature allows
@@ -128,24 +125,19 @@ lifetimes that only partially overlap can be handled by allocating
 from different trees of the context forest (there are some examples
 in the next section).
 
-Actually, it turns out that resetting a given context should almost
-always imply deleting, not just resetting, any child contexts it has.
-So MemoryContextReset() means that, and if you really do want a tree of
-empty contexts you need to call MemoryContextResetOnly() plus
-MemoryContextResetChildren().
-
-For convenience we also provide operations like "reset/delete all children
-of a given context, but don't reset or delete that context itself".
+For convenience we will also want operations like "reset/delete all
+children of a given context, but don't reset or delete that context
+itself".
 
 
 Globally Known Contexts
 -----------------------
 
-There are a few widely-known contexts that are typically referenced
-through global variables.  At any instant the system may contain many
-additional contexts, but all other contexts should be direct or indirect
-children of one of these contexts to ensure they are not leaked in event
-of an error.
+There will be several widely-known contexts that will typically be
+referenced through global variables.  At any instant the system may
+contain many additional contexts, but all other contexts should be direct
+or indirect children of one of these contexts to ensure they are not
+leaked in event of an error.
 
 TopMemoryContext --- this is the actual top level of the context tree;
 every other context is a direct or indirect child of this one.  Allocating
@@ -159,17 +151,17 @@ running with CurrentMemoryContext pointing here.
 
 PostmasterContext --- this is the postmaster's normal working context.
 After a backend is spawned, it can delete PostmasterContext to free its
-copy of memory the postmaster was using that it doesn't need.
-(Anything that has to be passed from postmaster to backends is passed
-in TopMemoryContext.  The postmaster has only TopMemoryContext,
+copy of memory the postmaster was using that it doesn't need.  (Anything
+that has to be passed from postmaster to backends will be passed in
+TopMemoryContext.  The postmaster will have only TopMemoryContext,
 PostmasterContext, and ErrorContext --- the remaining top-level contexts
-are set up in each backend during startup.)
+will be set up in each backend during startup.)
 
 CacheMemoryContext --- permanent storage for relcache, catcache, and
 related modules.  This will never be reset or deleted, either, so it's
 not truly necessary to distinguish it from TopMemoryContext.  But it
 seems worthwhile to maintain the distinction for debugging purposes.
-(Note: CacheMemoryContext has child contexts with shorter lifespans.
+(Note: CacheMemoryContext will have child-contexts with shorter lifespans.
 For example, a child context is the best place to keep the subsidiary
 storage associated with a relcache entry; that way we can free rule
 parsetrees and so forth easily, without having to depend on constructing
@@ -214,12 +206,12 @@ global variable pointing to the per-portal context of the currently active
 execution portal.  This can be used if it's necessary to allocate storage
 that will live just as long as the execution of the current portal requires.
 
-ErrorContext --- this permanent context is switched into for error
-recovery processing, and then reset on completion of recovery.  We arrange
-to have a few KB of memory available in it at all times.  In this way, we
-can ensure that some memory is available for error recovery even if the
-backend has run out of memory otherwise.  This allows out-of-memory to be
-treated as a normal ERROR condition, not a FATAL error.
+ErrorContext --- this permanent context will be switched into for error
+recovery processing, and then reset on completion of recovery.  We'll
+arrange to have, say, 8K of memory available in it at all times.  In this
+way, we can ensure that some memory is available for error recovery even
+if the backend has run out of memory otherwise.  This allows out-of-memory
+to be treated as a normal ERROR condition, not a FATAL error.
 
 
 Contexts For Prepared Statements And Portals
@@ -235,7 +227,7 @@ PortalContext when the portal is active.  In the case of a portal created
 by DECLARE CURSOR, this private context contains the query parse and plan
 trees (there being no other object that can hold them).  Portals created
 from prepared statements simply reference the prepared statements' trees,
-and don't actually need any storage allocated in their private contexts.
+and won't actually need any storage allocated in their private contexts.
 
 
 Transient Contexts During Execution
@@ -246,7 +238,7 @@ in a temporary context that's a child of MessageContext (so that it will
 go away automatically upon error).  On success, the finished plan is
 copied to the prepared statement's private context, and the temp context
 is released; this allows planner temporary space to be recovered before
-execution begins.  (In simple-Query mode we don't bother with the extra
+execution begins.  (In simple-Query mode we'll not bother with the extra
 copy step, so the planner temp space stays around till end of query.)
 
 The top-level executor routines, as well as most of the "plan node"
@@ -258,13 +250,13 @@ so this is appropriate for those purposes.  The executor's top context
 is a child of PortalContext, that is, the per-portal context of the
 portal that represents the query's execution.
 
-The main memory-management consideration in the executor is that
-expression evaluation --- both for qual testing and for computation of
-targetlist entries --- needs to not leak memory.  To do this, each
-ExprContext (expression-eval context) created in the executor has a
-private memory context associated with it, and we switch into that context
-when evaluating expressions in that ExprContext.  The plan node that owns
-the ExprContext is responsible for resetting the private context to empty
+The main improvement needed in the executor is that expression evaluation
+--- both for qual testing and for computation of targetlist entries ---
+needs to not leak memory.  To do this, each ExprContext (expression-eval
+context) created in the executor will now have a private memory context
+associated with it, and we'll arrange to switch into that context when
+evaluating expressions in that ExprContext.  The plan node that owns the
+ExprContext is responsible for resetting the private context to empty
 when it no longer needs the results of expression evaluations.  Typically
 the reset is done at the start of each tuple-fetch cycle in the plan node.
 
@@ -284,17 +276,13 @@ and if the comparators leak any memory then that memory won't be recovered
 till end of query.  The comparator functions all return bool or int32,
 so there's no problem with their result data, but there can be a problem
 with leakage of internal temporary data.  In particular, comparator
-functions that operate on TOAST-able data types need to be careful
+functions that operate on TOAST-able data types will need to be careful
 not to leak detoasted versions of their inputs.  This is annoying, but
-it appeared a lot easier to make the comparators conform than to fix the
-index and sort routines, so that's what was done for 7.1.  This remains
-the state of affairs in btree and hash indexes, so btree and hash support
-functions still need to not leak memory.  Most of the other index AMs
-have been modified to run opclass support functions in short-lived
-contexts, so that leakage is not a problem; this is necessary in view
-of the fact that their support functions tend to be far more complex.
-
-There are some special cases, such as aggregate functions.  nodeAgg.c
+it appears a lot easier to make the comparators conform than to fix the
+index and sort routines, so that's what I propose to do for 7.1.  Further
+cleanup can be left for another day.
+
+There will be some special cases, such as aggregate functions.  nodeAgg.c
 needs to remember the results of evaluation of aggregate transition
 functions from one tuple cycle to the next, so it can't just discard
 all per-tuple state in each cycle.  The easiest way to handle this seems
@@ -305,20 +293,30 @@ transition function.
 
 Executor routines that switch the active CurrentMemoryContext may need
 to copy data into their caller's current memory context before returning.
-However, we have minimized the need for that, because of the convention
-of resetting the per-tuple context at the *start* of an execution cycle
-rather than at its end.  With that rule, an execution node can return a
-tuple that is palloc'd in its per-tuple context, and the tuple will remain
-good until the node is called for another tuple or told to end execution.
-This parallels the situation with pass-by-reference values at the table
-scan level, since a scan node can return a direct pointer to a tuple in a
-disk buffer that is only guaranteed to remain good that long.
-
-A more common reason for copying data is to transfer a result from
-per-tuple context to per-query context; for example, a Unique node will
-save the last distinct tuple value in its per-query context, requiring a
+I think there will be relatively little need for that, because of the
+convention of resetting the per-tuple context at the *start* of an
+execution cycle rather than at its end.  With that rule, an execution
+node can return a tuple that is palloc'd in its per-tuple context, and
+the tuple will remain good until the node is called for another tuple
+or told to end execution.  This is pretty much the same state of affairs
+that exists now, since a scan node can return a direct pointer to a tuple
+in a disk buffer that is only guaranteed to remain good that long.
+
+A more common reason for copying data will be to transfer a result from
+per-tuple context to per-run context; for example, a Unique node will
+save the last distinct tuple value in its per-run context, requiring a
 copy step.
 
+Another interesting special case is VACUUM, which needs to allocate
+working space that will survive its forced transaction commits, yet
+be released on error.  Currently it does that through a "portal",
+which is essentially a child context of TopMemoryContext.  While that
+way still works, it's ugly since xact abort needs special processing
+to delete the portal.  Better would be to use a context that's a child
+of PortalContext and hence is certain to go away as part of normal
+processing.  (Eventually we might have an even better solution from
+nested transactions, but this'll do fine for now.)
+
 
 Mechanisms to Allow Multiple Types of Contexts
 ----------------------------------------------
@@ -327,10 +325,9 @@ We may want several different types of memory contexts with different
 allocation policies but similar external behavior.  To handle this,
 memory allocation functions will be accessed via function pointers,
 and we will require all context types to obey the conventions given here.
-(As of 2015, there's actually still just one context type; but interest in
-creating other types has never gone away entirely, so we retain this API.)
+(This is not very far different from the existing code.)
 
-A memory context is represented by an object like
+A memory context will be represented by an object like
 
 typedef struct MemoryContextData
 {
@@ -346,7 +343,7 @@ This is essentially an abstract superclass, and the "methods" pointer is
 its virtual function table.  Specific memory context types will use
 derived structs having these fields as their first fields.  All the
 contexts of a specific type will have methods pointers that point to the
-same static table of function pointers, which look like
+same static table of function pointers, which will look like
 
 typedef struct MemoryContextMethodsData
 {
@@ -359,7 +356,7 @@ typedef struct MemoryContextMethodsData
 
 Alloc, reset, and delete requests will take a MemoryContext pointer
 as parameter, so they'll have no trouble finding the method pointer
-to call.  Free and realloc are trickier.  To make those work, we
+to call.  Free and realloc are trickier.  To make those work, we will
 require all memory context types to produce allocated chunks that
 are immediately preceded by a standard chunk header, which has the
 layout
@@ -370,7 +367,7 @@ typedef struct StandardChunkHeader
     Size          size;              /* Allocated size of chunk */
 };
 
-It turns out that the pre-existing aset.c memory context type did this
+It turns out that the existing aset.c memory context type does this
 already, and probably any other kind of context would need to have the
 same data available to support realloc, so this is not really creating
 any additional overhead.  (Note that if a context type needs more per-
@@ -378,30 +375,36 @@ allocated-chunk information than this, it can make an additional
 nonstandard header that precedes the standard header.  So we're not
 constraining context-type designers very much.)
 
-Given this, the pfree routine looks something like
+Given this, the pfree routine will look something like
 
     StandardChunkHeader * header =
         (StandardChunkHeader *) ((char *) p - sizeof(StandardChunkHeader));
 
     (*header->mycontext->methods->free_p) (p);
 
+We could do it as a macro, but the macro would have to evaluate its
+argument twice, which seems like a bad idea (the current pfree macro
+does not do that).  This is already saving two levels of function call
+compared to the existing code, so I think we're doing fine without
+squeezing out that last little bit ...
+
 
 More Control Over aset.c Behavior
 ---------------------------------
 
-Previously, aset.c always allocated an 8K block upon the first allocation
-in a context, and doubled that size for each successive block request.
+Currently, aset.c allocates an 8K block upon the first allocation in
+a context, and doubles that size for each successive block request.
 That's good behavior for a context that might hold *lots* of data, and
 the overhead wasn't bad when we had only a few contexts in existence.
-With dozens if not hundreds of smaller contexts in the system, we need
-to be able to fine-tune things a little better.
+With dozens if not hundreds of smaller contexts in the system, we will
+want to be able to fine-tune things a little better.
 
-The creator of a context is now able to specify an initial block size
-and a maximum block size.  Selecting smaller values can prevent wastage
+The creator of a context will be able to specify an initial block size
+and a maximum block size.  Selecting smaller values will prevent wastage
 of space in contexts that aren't expected to hold very much (an example is
 the relcache's per-relation contexts).
 
-Also, it is possible to specify a minimum context size.  If this
+Also, it will be possible to specify a minimum context size.  If this
 value is greater than zero then a block of that size will be grabbed
 immediately upon context creation, and cleared but not released during
 context resets.  This feature is needed for ErrorContext (see above),
@@ -414,35 +417,15 @@ back to malloc() during reset, but just cleared.  This avoids malloc
 thrashing.
 
 
-Memory Context Reset/Delete Callbacks
--------------------------------------
-
-A feature introduced in Postgres 9.5 allows memory contexts to be used
-for managing more resources than just plain palloc'd memory.  This is
-done by registering a "reset callback function" for a memory context.
-Such a function will be called, once, just before the context is next
-reset or deleted.  It can be used to give up resources that are in some
-sense associated with an object allocated within the context.  Possible
-use-cases include
-* closing open files associated with a tuplesort object;
-* releasing reference counts on long-lived cache objects that are held
-  by some object within the context being reset;
-* freeing malloc-managed memory associated with some palloc'd object.
-That last case would just represent bad programming practice for pure
-Postgres code; better to have made all the allocations using palloc,
-in the target context or some child context.  However, it could well
-come in handy for code that interfaces to non-Postgres libraries.
-
-Any number of reset callbacks can be established for a memory context;
-they are called in reverse order of registration.  Also, callbacks
-attached to child contexts are called before callbacks attached to
-parent contexts, if a tree of contexts is being reset or deleted.
-
-The API for this requires the caller to provide a MemoryContextCallback
-memory chunk to hold the state for a callback.  Typically this should be
-allocated in the same context it is logically attached to, so that it
-will be released automatically after use.  The reason for asking the
-caller to provide this memory is that in most usage scenarios, the caller
-will be creating some larger struct within the target context, and the
-MemoryContextCallback struct can be made "for free" without a separate
-palloc() call by including it in this larger struct.
+Other Notes
+-----------
+
+The original version of this proposal suggested that functions returning
+pass-by-reference datatypes should be required to return a value freshly
+palloc'd in their caller's memory context, never a pointer to an input
+value.  I've abandoned that notion since it clearly is prone to error.
+In the current proposal, it is possible to discover which context a
+chunk of memory is allocated in (by checking the required standard chunk
+header), so nodeAgg can determine whether or not it's safe to reset
+its working context; it doesn't have to rely on the transition function
+to do what it's expecting.
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 0cfb934..0759e39 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -438,14 +438,14 @@ AllocSetContextCreate(MemoryContext parent,
 					  Size initBlockSize,
 					  Size maxBlockSize)
 {
-	AllocSet	set;
+	AllocSet	context;
 
 	/* Do the type-independent part of context creation */
-	set = (AllocSet) MemoryContextCreate(T_AllocSetContext,
-										 sizeof(AllocSetContext),
-										 &AllocSetMethods,
-										 parent,
-										 name);
+	context = (AllocSet) MemoryContextCreate(T_AllocSetContext,
+											 sizeof(AllocSetContext),
+											 &AllocSetMethods,
+											 parent,
+											 name);
 
 	/*
 	 * Make sure alloc parameters are reasonable, and save them.
@@ -459,9 +459,9 @@ AllocSetContextCreate(MemoryContext parent,
 	if (maxBlockSize < initBlockSize)
 		maxBlockSize = initBlockSize;
 	Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */
-	set->initBlockSize = initBlockSize;
-	set->maxBlockSize = maxBlockSize;
-	set->nextBlockSize = initBlockSize;
+	context->initBlockSize = initBlockSize;
+	context->maxBlockSize = maxBlockSize;
+	context->nextBlockSize = initBlockSize;
 
 	/*
 	 * Compute the allocation chunk size limit for this context.  It can't be
@@ -477,10 +477,10 @@ AllocSetContextCreate(MemoryContext parent,
 	 * and actually-allocated sizes of any chunk must be on the same side of
 	 * the limit, else we get confused about whether the chunk is "big".
 	 */
-	set->allocChunkLimit = ALLOC_CHUNK_LIMIT;
-	while ((Size) (set->allocChunkLimit + ALLOC_CHUNKHDRSZ) >
+	context->allocChunkLimit = ALLOC_CHUNK_LIMIT;
+	while ((Size) (context->allocChunkLimit + ALLOC_CHUNKHDRSZ) >
 		   (Size) ((maxBlockSize - ALLOC_BLOCKHDRSZ) / ALLOC_CHUNK_FRACTION))
-		set->allocChunkLimit >>= 1;
+		context->allocChunkLimit >>= 1;
 
 	/*
 	 * Grab always-allocated space, if requested
@@ -500,20 +500,20 @@ AllocSetContextCreate(MemoryContext parent,
 					 errdetail("Failed while creating memory context \"%s\".",
 							   name)));
 		}
-		block->aset = set;
+		block->aset = context;
 		block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
 		block->endptr = ((char *) block) + blksize;
-		block->next = set->blocks;
-		set->blocks = block;
+		block->next = context->blocks;
+		context->blocks = block;
 		/* Mark block as not to be released at reset time */
-		set->keeper = block;
+		context->keeper = block;
 
 		/* Mark unallocated space NOACCESS; leave the block header alone. */
 		VALGRIND_MAKE_MEM_NOACCESS(block->freeptr,
 								   blksize - ALLOC_BLOCKHDRSZ);
 	}
 
-	return (MemoryContext) set;
+	return (MemoryContext) context;
 }
 
 /*
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index e2fbfd4..202bc78 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -54,7 +54,6 @@ MemoryContext CurTransactionContext = NULL;
 /* This is a transient link to the active portal's memory context: */
 MemoryContext PortalContext = NULL;
 
-static void MemoryContextCallResetCallbacks(MemoryContext context);
 static void MemoryContextStatsInternal(MemoryContext context, int level);
 
 /*
@@ -116,8 +115,9 @@ MemoryContextInit(void)
 	 * where retained memory in a context is *essential* --- we want to be
 	 * sure ErrorContext still has some memory even if we've run out
 	 * elsewhere! Also, allow allocations in ErrorContext within a critical
-	 * section. Otherwise a PANIC will cause an assertion failure in the error
-	 * reporting code, before printing out the real cause of the failure.
+	 * section. Otherwise a PANIC will cause an assertion failure in the
+	 * error reporting code, before printing out the real cause of the
+	 * failure.
 	 *
 	 * This should be the last step in this function, as elog.c assumes memory
 	 * management works once ErrorContext is non-null.
@@ -132,8 +132,11 @@ MemoryContextInit(void)
 
 /*
  * MemoryContextReset
- *		Release all space allocated within a context and delete all its
- *		descendant contexts (but not the named context itself).
+ *		Release all space allocated within a context and its descendants,
+ *		but don't delete the contexts themselves.
+ *
+ * The type-specific reset routine handles the context itself, but we
+ * have to do the recursion for the children.
  */
 void
 MemoryContextReset(MemoryContext context)
@@ -142,27 +145,11 @@ MemoryContextReset(MemoryContext context)
 
 	/* save a function call in common case where there are no children */
 	if (context->firstchild != NULL)
-		MemoryContextDeleteChildren(context);
-
-	/* save a function call if no pallocs since startup or last reset */
-	if (!context->isReset)
-		MemoryContextResetOnly(context);
-}
-
-/*
- * MemoryContextResetOnly
- *		Release all space allocated within a context.
- *		Nothing is done to the context's descendant contexts.
- */
-void
-MemoryContextResetOnly(MemoryContext context)
-{
-	AssertArg(MemoryContextIsValid(context));
+		MemoryContextResetChildren(context);
 
 	/* Nothing to do if no pallocs since startup or last reset */
 	if (!context->isReset)
 	{
-		MemoryContextCallResetCallbacks(context);
 		(*context->methods->reset) (context);
 		context->isReset = true;
 		VALGRIND_DESTROY_MEMPOOL(context);
@@ -184,10 +171,7 @@ MemoryContextResetChildren(MemoryContext context)
 	AssertArg(MemoryContextIsValid(context));
 
 	for (child = context->firstchild; child != NULL; child = child->nextchild)
-	{
-		MemoryContextResetChildren(child);
-		MemoryContextResetOnly(child);
-	}
+		MemoryContextReset(child);
 }
 
 /*
@@ -212,14 +196,6 @@ MemoryContextDelete(MemoryContext context)
 	MemoryContextDeleteChildren(context);
 
 	/*
-	 * It's not entirely clear whether 'tis better to do this before or after
-	 * delinking the context; but an error in a callback will likely result in
-	 * leaking the whole context (if it's not a root context) if we do it
-	 * after, so let's do it before.
-	 */
-	MemoryContextCallResetCallbacks(context);
-
-	/*
 	 * We delink the context from its parent before deleting it, so that if
 	 * there's an error we won't have deleted/busted contexts still attached
 	 * to the context tree.  Better a leak than a crash.
@@ -250,53 +226,20 @@ MemoryContextDeleteChildren(MemoryContext context)
 }
 
 /*
- * MemoryContextRegisterResetCallback
- *		Register a function to be called before next context reset/delete.
- *		Such callbacks will be called in reverse order of registration.
+ * MemoryContextResetAndDeleteChildren
+ *		Release all space allocated within a context and delete all
+ *		its descendants.
  *
- * The caller is responsible for allocating a MemoryContextCallback struct
- * to hold the info about this callback request, and for filling in the
- * "func" and "arg" fields in the struct to show what function to call with
- * what argument.  Typically the callback struct should be allocated within
- * the specified context, since that means it will automatically be freed
- * when no longer needed.
- *
- * There is no API for deregistering a callback once registered.  If you
- * want it to not do anything anymore, adjust the state pointed to by its
- * "arg" to indicate that.
+ * This is a common combination case where we want to preserve the
+ * specific context but get rid of absolutely everything under it.
  */
 void
-MemoryContextRegisterResetCallback(MemoryContext context,
-								   MemoryContextCallback *cb)
+MemoryContextResetAndDeleteChildren(MemoryContext context)
 {
 	AssertArg(MemoryContextIsValid(context));
 
-	/* Push onto head so this will be called before older registrants. */
-	cb->next = context->reset_cbs;
-	context->reset_cbs = cb;
-	/* Mark the context as non-reset (it probably is already). */
-	context->isReset = false;
-}
-
-/*
- * MemoryContextCallResetCallbacks
- *		Internal function to call all registered callbacks for context.
- */
-static void
-MemoryContextCallResetCallbacks(MemoryContext context)
-{
-	MemoryContextCallback *cb;
-
-	/*
-	 * We pop each callback from the list before calling.  That way, if an
-	 * error occurs inside the callback, we won't try to call it a second time
-	 * in the likely event that we reset or delete the context later.
-	 */
-	while ((cb = context->reset_cbs) != NULL)
-	{
-		context->reset_cbs = cb->next;
-		(*cb->func) (cb->arg);
-	}
+	MemoryContextDeleteChildren(context);
+	MemoryContextReset(context);
 }
 
 /*
@@ -375,8 +318,9 @@ void
 MemoryContextAllowInCriticalSection(MemoryContext context, bool allow)
 {
 	AssertArg(MemoryContextIsValid(context));
-
+#ifdef USE_ASSERT_CHECKING
 	context->allowInCritSection = allow;
+#endif
 }
 
 /*
@@ -645,8 +589,11 @@ MemoryContextCreate(NodeTag tag, Size size,
 		node->parent = parent;
 		node->nextchild = parent->firstchild;
 		parent->firstchild = node;
+
+#ifdef USE_ASSERT_CHECKING
 		/* inherit allowInCritSection flag from parent */
 		node->allowInCritSection = parent->allowInCritSection;
+#endif
 	}
 
 	VALGRIND_CREATE_MEMPOOL(node, 0, false);
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 252ba22..2103042 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -166,9 +166,12 @@ struct LogicalTapeSet
 	int			nFreeBlocks;	/* # of currently free blocks */
 	int			freeBlocksLen;	/* current allocated length of freeBlocks[] */
 
-	/* The array of logical tapes. */
+	/*
+	 * tapes[] is declared size 1 since C wants a fixed size, but actually it
+	 * is of length nTapes.
+	 */
 	int			nTapes;			/* # of logical tapes in set */
-	LogicalTape tapes[FLEXIBLE_ARRAY_MEMBER];	/* has nTapes nentries */
+	LogicalTape tapes[1];		/* must be last in struct! */
 };
 
 static void ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer);
@@ -516,11 +519,12 @@ LogicalTapeSetCreate(int ntapes)
 	int			i;
 
 	/*
-	 * Create top-level struct including per-tape LogicalTape structs.
+	 * Create top-level struct including per-tape LogicalTape structs. First
+	 * LogicalTape struct is already counted in sizeof(LogicalTapeSet).
 	 */
 	Assert(ntapes > 0);
-	lts = (LogicalTapeSet *) palloc(offsetof(LogicalTapeSet, tapes) +
-									ntapes * sizeof(LogicalTape));
+	lts = (LogicalTapeSet *) palloc(sizeof(LogicalTapeSet) +
+									(ntapes - 1) *sizeof(LogicalTape));
 	lts->pfile = BufFileCreateTemp(false);
 	lts->nFileBlocks = 0L;
 	lts->forgetFreeSpace = false;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 7e9a776..c966de0 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -2,7 +2,7 @@ use strict;
 use warnings;
 use Cwd;
 use TestLib;
-use Test::More tests => 35;
+use Test::More tests => 33;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -49,13 +49,6 @@ command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft' ],
 	'tar format');
 ok(-f "$tempdir/tarbackup/base.tar", 'backup tar was created');
 
-my $superlongname = "superlongname_" . ("x"x100);
-
-system_or_bail 'touch', "$tempdir/pgdata/$superlongname";
-command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft' ],
-			  'pg_basebackup tar with long name fails');
-unlink "$tempdir/pgdata/$superlongname";
-
 # Create a temporary directory in the system location and symlink it
 # to our physical temp location.  That way we can use shorter names
 # for the tablespace directories, which hopefully won't run afoul of
@@ -124,9 +117,3 @@ command_fails(
 command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ],
 	'-T with invalid format fails');
-
-mkdir "$tempdir/$superlongname";
-psql 'postgres', "CREATE TABLESPACE tblspc3 LOCATION '$tempdir/$superlongname';";
-command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' ],
-			  'pg_basebackup tar with long symlink target fails');
-psql 'postgres', "DROP TABLESPACE tblspc3;";
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 912fc2f..7a18e77 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -453,16 +453,6 @@ struct cfp
 static int	hasSuffix(const char *filename, const char *suffix);
 #endif
 
-/* free() without changing errno; useful in several places below */
-static void
-free_keep_errno(void *p)
-{
-	int			save_errno = errno;
-
-	free(p);
-	errno = save_errno;
-}
-
 /*
  * Open a file for reading. 'path' is the file to open, and 'mode' should
  * be either "r" or "rb".
@@ -470,8 +460,6 @@ free_keep_errno(void *p)
  * If the file at 'path' does not exist, we append the ".gz" suffix (if 'path'
  * doesn't already have it) and try again. So if you pass "foo" as 'path',
  * this will open either "foo" or "foo.gz".
- *
- * On failure, return NULL with an error code in errno.
  */
 cfp *
 cfopen_read(const char *path, const char *mode)
@@ -492,7 +480,7 @@ cfopen_read(const char *path, const char *mode)
 
 			fname = psprintf("%s.gz", path);
 			fp = cfopen(fname, mode, 1);
-			free_keep_errno(fname);
+			free(fname);
 		}
 #endif
 	}
@@ -505,10 +493,8 @@ cfopen_read(const char *path, const char *mode)
  * ("w", "wb", "a", or "ab").
  *
  * If 'compression' is non-zero, a gzip compressed stream is opened, and
- * 'compression' indicates the compression level used. The ".gz" suffix
+ * and 'compression' indicates the compression level used. The ".gz" suffix
  * is automatically added to 'path' in that case.
- *
- * On failure, return NULL with an error code in errno.
  */
 cfp *
 cfopen_write(const char *path, const char *mode, int compression)
@@ -523,8 +509,8 @@ cfopen_write(const char *path, const char *mode, int compression)
 		char	   *fname;
 
 		fname = psprintf("%s.gz", path);
-		fp = cfopen(fname, mode, compression);
-		free_keep_errno(fname);
+		fp = cfopen(fname, mode, 1);
+		free(fname);
 #else
 		exit_horribly(modulename, "not built with zlib support\n");
 		fp = NULL;				/* keep compiler quiet */
@@ -535,9 +521,7 @@ cfopen_write(const char *path, const char *mode, int compression)
 
 /*
  * Opens file 'path' in 'mode'. If 'compression' is non-zero, the file
- * is opened with libz gzopen(), otherwise with plain fopen().
- *
- * On failure, return NULL with an error code in errno.
+ * is opened with libz gzopen(), otherwise with plain fopen()
  */
 cfp *
 cfopen(const char *path, const char *mode, int compression)
@@ -547,15 +531,11 @@ cfopen(const char *path, const char *mode, int compression)
 	if (compression != 0)
 	{
 #ifdef HAVE_LIBZ
-		char		mode_compression[32];
-
-		snprintf(mode_compression, sizeof(mode_compression), "%s%d",
-				 mode, compression);
-		fp->compressedfp = gzopen(path, mode_compression);
+		fp->compressedfp = gzopen(path, mode);
 		fp->uncompressedfp = NULL;
 		if (fp->compressedfp == NULL)
 		{
-			free_keep_errno(fp);
+			free(fp);
 			fp = NULL;
 		}
 #else
@@ -570,7 +550,7 @@ cfopen(const char *path, const char *mode, int compression)
 		fp->uncompressedfp = fopen(path, mode);
 		if (fp->uncompressedfp == NULL)
 		{
-			free_keep_errno(fp);
+			free(fp);
 			fp = NULL;
 		}
 	}
@@ -679,7 +659,7 @@ cfclose(cfp *fp)
 		result = fclose(fp->uncompressedfp);
 		fp->uncompressedfp = NULL;
 	}
-	free_keep_errno(fp);
+	free(fp);
 
 	return result;
 }
diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index d7506e1..095c507 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -1216,8 +1216,9 @@ simple_string_list_append(SimpleStringList *list, const char *val)
 {
 	SimpleStringListCell *cell;
 
+	/* this calculation correctly accounts for the null trailing byte */
 	cell = (SimpleStringListCell *)
-		pg_malloc(offsetof(SimpleStringListCell, val) +strlen(val) + 1);
+		pg_malloc(sizeof(SimpleStringListCell) + strlen(val));
 
 	cell->next = NULL;
 	strcpy(cell->val, val);
diff --git a/src/bin/pg_dump/dumputils.h b/src/bin/pg_dump/dumputils.h
index b176746..a39c1b6 100644
--- a/src/bin/pg_dump/dumputils.h
+++ b/src/bin/pg_dump/dumputils.h
@@ -38,7 +38,7 @@ typedef struct SimpleOidList
 typedef struct SimpleStringListCell
 {
 	struct SimpleStringListCell *next;
-	char		val[FLEXIBLE_ARRAY_MEMBER];		/* null-terminated string here */
+	char		val[1];			/* VARIABLE LENGTH FIELD */
 } SimpleStringListCell;
 
 typedef struct SimpleStringList
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index ca427de..f461393 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -435,6 +435,17 @@ RestoreArchive(Archive *AHX)
 	}
 
 	/*
+	 * Enable row-security if necessary.
+	 */
+	if (PQserverVersion(AH->connection) >= 90500)
+	{
+		if (!ropt->enable_row_security)
+			ahprintf(AH, "SET row_security = off;\n");
+		else
+			ahprintf(AH, "SET row_security = on;\n");
+	}
+
+	/*
 	 * Establish important parameter values right away.
 	 */
 	_doSetFixedOutputState(AH);
@@ -2793,12 +2804,6 @@ _doSetFixedOutputState(ArchiveHandle *AH)
 	if (!AH->public.std_strings)
 		ahprintf(AH, "SET escape_string_warning = off;\n");
 
-	/* Adjust row-security state */
-	if (AH->ropt && AH->ropt->enable_row_security)
-		ahprintf(AH, "SET row_security = on;\n");
-	else
-		ahprintf(AH, "SET row_security = off;\n");
-
 	ahprintf(AH, "\n");
 }
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2b53c72..7e92b74 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1006,17 +1006,6 @@ setup_connection(Archive *AH, DumpOptions *dopt, const char *dumpencoding,
 		ExecuteSqlStatement(AH, "SET quote_all_identifiers = true");
 
 	/*
-	 * Adjust row-security mode, if supported.
-	 */
-	if (AH->remoteVersion >= 90500)
-	{
-		if (dopt->enable_row_security)
-			ExecuteSqlStatement(AH, "SET row_security = on");
-		else
-			ExecuteSqlStatement(AH, "SET row_security = off");
-	}
-
-	/*
 	 * Start transaction-snapshot mode transaction to dump consistent data.
 	 */
 	ExecuteSqlStatement(AH, "BEGIN");
@@ -1069,6 +1058,14 @@ setup_connection(Archive *AH, DumpOptions *dopt, const char *dumpencoding,
 			 AH->remoteVersion >= 90200 &&
 			 !dopt->no_synchronized_snapshots)
 		AH->sync_snapshot_id = get_synchronized_snapshot(AH);
+
+	if (AH->remoteVersion >= 90500)
+	{
+		if (dopt->enable_row_security)
+			ExecuteSqlStatement(AH, "SET row_security TO ON");
+		else
+			ExecuteSqlStatement(AH, "SET row_security TO OFF");
+	}
 }
 
 static void
diff --git a/src/common/Makefile b/src/common/Makefile
index c71415e..372a21b 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -43,7 +43,6 @@ uninstall:
 	rm -f '$(DESTDIR)$(libdir)/libpgcommon.a'
 
 libpgcommon.a: $(OBJS_FRONTEND)
-	rm -f $@
 	$(AR) $(AROPT) $@ $^
 
 #
@@ -51,7 +50,6 @@ libpgcommon.a: $(OBJS_FRONTEND)
 #
 
 libpgcommon_srv.a: $(OBJS_SRV)
-	rm -f $@
 	$(AR) $(AROPT) $@ $^
 
 # Because this uses its own compilation rule, it doesn't use the
diff --git a/src/include/access/brin_page.h b/src/include/access/brin_page.h
index 44ce5f6..d8fa190 100644
--- a/src/include/access/brin_page.h
+++ b/src/include/access/brin_page.h
@@ -56,12 +56,7 @@ typedef struct BrinMetaPageData
 /* Definitions for revmap pages */
 typedef struct RevmapContents
 {
-	/*
-	 * This array will fill all available space on the page.  It should be
-	 * declared [FLEXIBLE_ARRAY_MEMBER], but for some reason you can't do that
-	 * in an otherwise-empty struct.
-	 */
-	ItemPointerData rm_tids[1];
+	ItemPointerData rm_tids[1]; /* really REVMAP_PAGE_MAXITEMS */
 } RevmapContents;
 
 #define REVMAP_CONTENT_SIZE \
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index c1a2049..bda7c28 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -322,7 +322,7 @@ typedef struct GinOptions
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	bool		useFastUpdate;	/* use fast updates? */
-	int			pendingListCleanupSize; /* maximum size of pending list */
+	int			pendingListCleanupSize;	/* maximum size of pending list */
 } GinOptions;
 
 #define GIN_DEFAULT_USE_FASTUPDATE	true
@@ -389,7 +389,7 @@ typedef struct
 {
 	ItemPointerData first;		/* first item in this posting list (unpacked) */
 	uint16		nbytes;			/* number of bytes that follow */
-	unsigned char bytes[FLEXIBLE_ARRAY_MEMBER]; /* varbyte encoded items */
+	unsigned char bytes[1];		/* varbyte encoded items (variable length) */
 } GinPostingList;
 
 #define SizeOfGinPostingList(plist) (offsetof(GinPostingList, bytes) + SHORTALIGN((plist)->nbytes) )
@@ -511,6 +511,34 @@ typedef struct ginxlogSplit
 #define GIN_INSERT_ISLEAF	0x02	/* .. */
 #define GIN_SPLIT_ROOT		0x04	/* only for split records */
 
+typedef struct
+{
+	OffsetNumber separator;
+	OffsetNumber nitem;
+
+	/* FOLLOWS: IndexTuples */
+} ginxlogSplitEntry;
+
+typedef struct
+{
+	uint16		lsize;
+	uint16		rsize;
+	ItemPointerData lrightbound;	/* new right bound of left page */
+	ItemPointerData rrightbound;	/* new right bound of right page */
+
+	/* FOLLOWS: new compressed posting lists of left and right page */
+	char		newdata[1];
+} ginxlogSplitDataLeaf;
+
+typedef struct
+{
+	OffsetNumber separator;
+	OffsetNumber nitem;
+	ItemPointerData rightbound;
+
+	/* FOLLOWS: array of PostingItems */
+} ginxlogSplitDataInternal;
+
 /*
  * Vacuum simply WAL-logs the whole page, when anything is modified. This
  * functionally identical heap_newpage records, but is kept separate for
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index ce83042..382826e 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -47,7 +47,7 @@ typedef struct
 {
 	BlockNumber prev;
 	uint32		freespace;
-	char		tupledata[FLEXIBLE_ARRAY_MEMBER];
+	char		tupledata[1];
 } GISTNodeBufferPage;
 
 #define BUFFER_PAGE_DATA_OFFSET MAXALIGN(offsetof(GISTNodeBufferPage, tupledata))
@@ -131,8 +131,7 @@ typedef struct GISTSearchItem
 		/* we must store parentlsn to detect whether a split occurred */
 		GISTSearchHeapItem heap;	/* heap info, if heap tuple */
 	}			data;
-	double		distances[FLEXIBLE_ARRAY_MEMBER];		/* numberOfOrderBys
-														 * entries */
+	double		distances[1];	/* array with numberOfOrderBys entries */
 } GISTSearchItem;
 
 #define GISTSearchItemIsHeap(item)	((item).blkno == InvalidBlockNumber)
@@ -145,7 +144,7 @@ typedef struct GISTSearchItem
 typedef struct GISTScanOpaqueData
 {
 	GISTSTATE  *giststate;		/* index information, see above */
-	pairingheap *queue;			/* queue of unvisited items */
+	pairingheap *queue;		/* queue of unvisited items */
 	MemoryContext queueCxt;		/* context holding the queue */
 	bool		qual_ok;		/* false if qual can never be satisfied */
 	bool		firstCall;		/* true until first gistgettuple call */
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f0f89de..a2ed2a0 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -132,7 +132,7 @@ typedef struct xl_heap_multi_insert
 {
 	uint8		flags;
 	uint16		ntuples;
-	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
+	OffsetNumber offsets[1];
 } xl_heap_multi_insert;
 
 #define SizeOfHeapMultiInsert	offsetof(xl_heap_multi_insert, offsets)
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index 0a673cd..d2ad910 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -150,15 +150,13 @@ struct HeapTupleHeaderData
 
 	/* ^ - 23 bytes - ^ */
 
-	bits8		t_bits[FLEXIBLE_ARRAY_MEMBER];	/* bitmap of NULLs */
+	bits8		t_bits[1];		/* bitmap of NULLs -- VARIABLE LENGTH */
 
 	/* MORE DATA FOLLOWS AT END OF STRUCT */
 };
 
 /* typedef appears in tupbasics.h */
 
-#define SizeofHeapTupleHeader offsetof(HeapTupleHeaderData, t_bits)
-
 /*
  * information stored in t_infomask:
  */
@@ -500,7 +498,7 @@ do { \
  * you can, say, fit 2 tuples of size MaxHeapTupleSize/2 on the same page.
  */
 #define MaxHeapTupleSize  (BLCKSZ - MAXALIGN(SizeOfPageHeaderData + sizeof(ItemIdData)))
-#define MinHeapTupleSize  MAXALIGN(SizeofHeapTupleHeader)
+#define MinHeapTupleSize  MAXALIGN(offsetof(HeapTupleHeaderData, t_bits))
 
 /*
  * MaxHeapTuplesPerPage is an upper bound on the number of tuples that can
@@ -515,7 +513,7 @@ do { \
  */
 #define MaxHeapTuplesPerPage	\
 	((int) ((BLCKSZ - SizeOfPageHeaderData) / \
-			(MAXALIGN(SizeofHeapTupleHeader) + sizeof(ItemIdData))))
+			(MAXALIGN(offsetof(HeapTupleHeaderData, t_bits)) + sizeof(ItemIdData))))
 
 /*
  * MaxAttrSize is a somewhat arbitrary upper limit on the declared size of
@@ -581,15 +579,13 @@ struct MinimalTupleData
 
 	/* ^ - 23 bytes - ^ */
 
-	bits8		t_bits[FLEXIBLE_ARRAY_MEMBER];	/* bitmap of NULLs */
+	bits8		t_bits[1];		/* bitmap of NULLs -- VARIABLE LENGTH */
 
 	/* MORE DATA FOLLOWS AT END OF STRUCT */
 };
 
 /* typedef appears in htup.h */
 
-#define SizeofMinimalTupleHeader offsetof(MinimalTupleData, t_bits)
-
 
 /*
  * GETSTRUCT - given a HeapTuple pointer, return address of the user data
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 0492ef6..f11d8ef 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -426,7 +426,7 @@ typedef struct spgxlogMoveLeafs
 	 * the dead tuple from the source
 	 *----------
 	 */
-	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
+	OffsetNumber offsets[1];
 } spgxlogMoveLeafs;
 
 #define SizeOfSpgxlogMoveLeafs	offsetof(spgxlogMoveLeafs, offsets)
@@ -534,7 +534,7 @@ typedef struct spgxlogPickSplit
 	 *		list of leaf tuples, length nInsert (unaligned!)
 	 *----------
 	 */
-	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
+	OffsetNumber offsets[1];
 } spgxlogPickSplit;
 
 #define SizeOfSpgxlogPickSplit offsetof(spgxlogPickSplit, offsets)
@@ -558,7 +558,7 @@ typedef struct spgxlogVacuumLeaf
 	 *		tuple numbers to insert in nextOffset links
 	 *----------
 	 */
-	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
+	OffsetNumber offsets[1];
 } spgxlogVacuumLeaf;
 
 #define SizeOfSpgxlogVacuumLeaf offsetof(spgxlogVacuumLeaf, offsets)
@@ -571,7 +571,7 @@ typedef struct spgxlogVacuumRoot
 	spgxlogState stateSrc;
 
 	/* offsets of tuples to delete follow */
-	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
+	OffsetNumber offsets[1];
 } spgxlogVacuumRoot;
 
 #define SizeOfSpgxlogVacuumRoot offsetof(spgxlogVacuumRoot, offsets)
@@ -583,7 +583,7 @@ typedef struct spgxlogVacuumRedirect
 	TransactionId newestRedirectXid;	/* newest XID of removed redirects */
 
 	/* offsets of redirect tuples to make placeholders follow */
-	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
+	OffsetNumber offsets[1];
 } spgxlogVacuumRedirect;
 
 #define SizeOfSpgxlogVacuumRedirect offsetof(spgxlogVacuumRedirect, offsets)
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 7d18535..331dd25 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -90,7 +90,7 @@
 
 #define TOAST_MAX_CHUNK_SIZE	\
 	(EXTERN_TUPLE_MAX_SIZE -							\
-	 MAXALIGN(SizeofHeapTupleHeader) -					\
+	 MAXALIGN(offsetof(HeapTupleHeaderData, t_bits)) -	\
 	 sizeof(Oid) -										\
 	 sizeof(int32) -									\
 	 VARHDRSZ)
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index d7e5f64..8205504 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -118,7 +118,7 @@ typedef struct xl_xact_assignment
 {
 	TransactionId xtop;			/* assigned XID's top-level XID */
 	int			nsubxacts;		/* number of subtransaction XIDs */
-	TransactionId xsub[FLEXIBLE_ARRAY_MEMBER];	/* assigned subxids */
+	TransactionId xsub[1];		/* assigned subxids */
 } xl_xact_assignment;
 
 #define MinSizeOfXactAssignment offsetof(xl_xact_assignment, xsub)
@@ -128,7 +128,7 @@ typedef struct xl_xact_commit_compact
 	TimestampTz xact_time;		/* time of commit */
 	int			nsubxacts;		/* number of subtransaction XIDs */
 	/* ARRAY OF COMMITTED SUBTRANSACTION XIDs FOLLOWS */
-	TransactionId subxacts[FLEXIBLE_ARRAY_MEMBER];
+	TransactionId subxacts[1];	/* VARIABLE LENGTH ARRAY */
 } xl_xact_commit_compact;
 
 #define MinSizeOfXactCommitCompact offsetof(xl_xact_commit_compact, subxacts)
@@ -143,7 +143,7 @@ typedef struct xl_xact_commit
 	Oid			dbId;			/* MyDatabaseId */
 	Oid			tsId;			/* MyDatabaseTableSpace */
 	/* Array of RelFileNode(s) to drop at commit */
-	RelFileNode xnodes[FLEXIBLE_ARRAY_MEMBER];
+	RelFileNode xnodes[1];		/* VARIABLE LENGTH ARRAY */
 	/* ARRAY OF COMMITTED SUBTRANSACTION XIDs FOLLOWS */
 	/* ARRAY OF SHARED INVALIDATION MESSAGES FOLLOWS */
 } xl_xact_commit;
@@ -171,7 +171,7 @@ typedef struct xl_xact_abort
 	int			nrels;			/* number of RelFileNodes */
 	int			nsubxacts;		/* number of subtransaction XIDs */
 	/* Array of RelFileNode(s) to drop at abort */
-	RelFileNode xnodes[FLEXIBLE_ARRAY_MEMBER];
+	RelFileNode xnodes[1];		/* VARIABLE LENGTH ARRAY */
 	/* ARRAY OF ABORTED SUBTRANSACTION XIDs FOLLOWS */
 } xl_xact_abort;
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 0e8e587..138deaf 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -89,12 +89,10 @@ extern XLogRecPtr XactLastRecEnd;
 extern bool reachedConsistency;
 
 /* these variables are GUC parameters related to XLOG */
-extern int	min_wal_size;
-extern int	max_wal_size;
+extern int	CheckPointSegments;
 extern int	wal_keep_segments;
 extern int	XLOGbuffers;
 extern int	XLogArchiveTimeout;
-extern int	wal_retrieve_retry_interval;
 extern bool XLogArchiveMode;
 extern char *XLogArchiveCommand;
 extern bool EnableHotStandby;
@@ -102,8 +100,6 @@ extern bool fullPageWrites;
 extern bool wal_log_hints;
 extern bool log_checkpoints;
 
-extern int	CheckPointSegments;
-
 /* WAL levels */
 typedef enum WalLevel
 {
@@ -249,9 +245,6 @@ extern bool CheckPromoteSignal(void);
 extern void WakeupRecovery(void);
 extern void SetWalWriterSleeping(bool sleeping);
 
-extern void assign_max_wal_size(int newval, void *extra);
-extern void assign_checkpoint_completion_target(double newval, void *extra);
-
 /*
  * Starting/stopping a base backup
  */
diff --git a/src/include/bootstrap/bootstrap.h b/src/include/bootstrap/bootstrap.h
index f9cbc13..be4430a 100644
--- a/src/include/bootstrap/bootstrap.h
+++ b/src/include/bootstrap/bootstrap.h
@@ -23,10 +23,6 @@
  */
 #define MAXATTR 40
 
-#define BOOTCOL_NULL_AUTO			1
-#define BOOTCOL_NULL_FORCE_NULL		2
-#define BOOTCOL_NULL_FORCE_NOT_NULL	3
-
 extern Relation boot_reldesc;
 extern Form_pg_attribute attrtypes[MAXATTR];
 extern int	numattr;
@@ -39,7 +35,7 @@ extern void err_out(void);
 extern void closerel(char *name);
 extern void boot_openrel(char *name);
 
-extern void DefineAttr(char *name, char *type, int attnum, int nullness);
+extern void DefineAttr(char *name, char *type, int attnum);
 extern void InsertOneTuple(Oid objectid);
 extern void InsertOneValue(char *value, int i);
 extern void InsertOneNull(int i);
diff --git a/src/include/c.h b/src/include/c.h
index ee615ee..b187520 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -391,7 +391,7 @@ typedef struct
 struct varlena
 {
 	char		vl_len_[4];		/* Do not touch this field directly! */
-	char		vl_dat[FLEXIBLE_ARRAY_MEMBER];	/* Data content is here */
+	char		vl_dat[1];
 };
 
 #define VARHDRSZ		((int32) sizeof(int32))
@@ -424,8 +424,8 @@ typedef struct
 	Oid			elemtype;
 	int			dim1;
 	int			lbound1;
-	int16		values[FLEXIBLE_ARRAY_MEMBER];
-} int2vector;
+	int16		values[1];		/* VARIABLE LENGTH ARRAY */
+} int2vector;					/* VARIABLE LENGTH STRUCT */
 
 typedef struct
 {
@@ -435,8 +435,8 @@ typedef struct
 	Oid			elemtype;
 	int			dim1;
 	int			lbound1;
-	Oid			values[FLEXIBLE_ARRAY_MEMBER];
-} oidvector;
+	Oid			values[1];		/* VARIABLE LENGTH ARRAY */
+} oidvector;					/* VARIABLE LENGTH STRUCT */
 
 /*
  * Representation of a Name: effectively just a C string, but null-padded to
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 508741f..2b7a0bb 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	201503011
+#define CATALOG_VERSION_NO	201501281
 
 #endif
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index cebf51d..5d6039d 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -28,8 +28,6 @@
 #define BKI_WITHOUT_OIDS
 #define BKI_ROWTYPE_OID(oid)
 #define BKI_SCHEMA_MACRO
-#define BKI_FORCE_NULL
-#define BKI_FORCE_NOT_NULL
 
 /*
  * This is never defined; it's here only for documentation.
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index cf5f7d0..d2e5198 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -34,8 +34,8 @@ typedef struct _FuncCandidateList
 	int			nvargs;			/* number of args to become variadic array */
 	int			ndargs;			/* number of defaulted args */
 	int		   *argnumbers;		/* args' positional indexes, if named call */
-	Oid			args[FLEXIBLE_ARRAY_MEMBER];	/* arg types */
-}	*FuncCandidateList;
+	Oid			args[1];		/* arg types --- VARIABLE LENGTH ARRAY */
+}	*FuncCandidateList;	/* VARIABLE LENGTH STRUCT */
 
 /*
  *	Structure for xxxOverrideSearchPath functions
diff --git a/src/include/catalog/pg_authid.h b/src/include/catalog/pg_authid.h
index b3f43e1..e01e6aa 100644
--- a/src/include/catalog/pg_authid.h
+++ b/src/include/catalog/pg_authid.h
@@ -56,10 +56,8 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC
 	int32		rolconnlimit;	/* max connections allowed (-1=no limit) */
 
 	/* remaining fields may be null; use heap_getattr to read them! */
-#ifdef CATALOG_VARLEN			/* variable-length fields start here */
 	text		rolpassword;	/* password, if any */
 	timestamptz rolvaliduntil;	/* password expiration time, if any */
-#endif
 } FormData_pg_authid;
 
 #undef timestamptz
diff --git a/src/include/catalog/pg_description.h b/src/include/catalog/pg_description.h
index 692455f..5a936e8 100644
--- a/src/include/catalog/pg_description.h
+++ b/src/include/catalog/pg_description.h
@@ -52,7 +52,7 @@ CATALOG(pg_description,2609) BKI_WITHOUT_OIDS
 	int32		objsubid;		/* column number, or 0 if not used */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		description BKI_FORCE_NOT_NULL;	/* description of object */
+	text		description;	/* description of object */
 #endif
 } FormData_pg_description;
 
diff --git a/src/include/catalog/pg_extension.h b/src/include/catalog/pg_extension.h
index 99ab35b..f45d6cb 100644
--- a/src/include/catalog/pg_extension.h
+++ b/src/include/catalog/pg_extension.h
@@ -36,8 +36,8 @@ CATALOG(pg_extension,3079)
 	bool		extrelocatable; /* if true, allow ALTER EXTENSION SET SCHEMA */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	/* extversion may never be null, but the others can be. */
-	text		extversion BKI_FORCE_NOT_NULL;		/* extension version name */
+	/* extversion should never be null, but the others can be. */
+	text		extversion;		/* extension version name */
 	Oid			extconfig[1];	/* dumpable configuration tables */
 	text		extcondition[1];	/* WHERE clauses for config tables */
 #endif
diff --git a/src/include/catalog/pg_largeobject.h b/src/include/catalog/pg_largeobject.h
index 4a33752..6a8d0cc 100644
--- a/src/include/catalog/pg_largeobject.h
+++ b/src/include/catalog/pg_largeobject.h
@@ -34,7 +34,7 @@ CATALOG(pg_largeobject,2613) BKI_WITHOUT_OIDS
 	int32		pageno;			/* Page number (starting from 0) */
 
 	/* data has variable length, but we allow direct access; see inv_api.c */
-	bytea		data BKI_FORCE_NOT_NULL; /* Data for page (may be zero-length) */
+	bytea		data;			/* Data for page (may be zero-length) */
 } FormData_pg_largeobject;
 
 /* ----------------
diff --git a/src/include/catalog/pg_pltemplate.h b/src/include/catalog/pg_pltemplate.h
index 569d724..c5e6554 100644
--- a/src/include/catalog/pg_pltemplate.h
+++ b/src/include/catalog/pg_pltemplate.h
@@ -35,10 +35,10 @@ CATALOG(pg_pltemplate,1136) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 	bool		tmpldbacreate;	/* PL is installable by db owner? */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		tmplhandler BKI_FORCE_NOT_NULL;	/* name of call handler function */
+	text		tmplhandler;	/* name of call handler function */
 	text		tmplinline;		/* name of anonymous-block handler, or NULL */
 	text		tmplvalidator;	/* name of validator function, or NULL */
-	text		tmpllibrary BKI_FORCE_NOT_NULL;	/* path of shared library */
+	text		tmpllibrary;	/* path of shared library */
 	aclitem		tmplacl[1];		/* access privileges for template */
 #endif
 } FormData_pg_pltemplate;
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b8a3660..9edfdb8 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -66,7 +66,7 @@ CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
 	text		proargnames[1]; /* parameter names (NULL if no names) */
 	pg_node_tree proargdefaults;/* list of expression trees for argument
 								 * defaults (NULL if none) */
-	text		prosrc BKI_FORCE_NOT_NULL; /* procedure source text */
+	text		prosrc;			/* procedure source text */
 	text		probin;			/* secondary procedure info (can be NULL) */
 	text		proconfig[1];	/* procedure-local GUC settings */
 	aclitem		proacl[1];		/* access permissions */
@@ -878,9 +878,9 @@ DATA(insert OID = 2176 (  array_length	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2
 DESCR("array length");
 DATA(insert OID = 3179 (  cardinality	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 23 "2277" _null_ _null_ _null_ _null_ array_cardinality _null_ _null_ _null_ ));
 DESCR("array cardinality");
-DATA(insert OID = 378 (  array_append	   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2277 2283" _null_ _null_ _null_ _null_ array_append _null_ _null_ _null_ ));
+DATA(insert OID = 378 (  array_append	   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2277 2283" _null_ _null_ _null_ _null_ array_push _null_ _null_ _null_ ));
 DESCR("append element onto end of array");
-DATA(insert OID = 379 (  array_prepend	   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2283 2277" _null_ _null_ _null_ _null_ array_prepend _null_ _null_ _null_ ));
+DATA(insert OID = 379 (  array_prepend	   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2283 2277" _null_ _null_ _null_ _null_ array_push _null_ _null_ _null_ ));
 DESCR("prepend element onto front of array");
 DATA(insert OID = 383 (  array_cat		   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_cat _null_ _null_ _null_ ));
 DATA(insert OID = 394 (  string_to_array   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 1009 "25 25" _null_ _null_ _null_ _null_ text_to_array _null_ _null_ _null_ ));
@@ -1155,9 +1155,7 @@ DATA(insert OID = 999 (  lseg_eq		   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16
 
 /* OIDS 1000 - 1999 */
 
-DATA(insert OID = 3994 (  timestamp_izone_transform PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ timestamp_izone_transform _null_ _null_ _null_ ));
-DESCR("transform a time zone adjustment");
-DATA(insert OID = 1026 (  timezone		   PGNSP PGUID 12 1 0 0 timestamp_izone_transform f f f f t f i 2 0 1114 "1186 1184" _null_ _null_ _null_ _null_ timestamptz_izone _null_ _null_ _null_ ));
+DATA(insert OID = 1026 (  timezone		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1114 "1186 1184" _null_ _null_ _null_ _null_ timestamptz_izone _null_ _null_ _null_ ));
 DESCR("adjust timestamp to new time zone");
 
 DATA(insert OID = 1031 (  aclitemin		   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 1033 "2275" _null_ _null_ _null_ _null_ aclitemin _null_ _null_ _null_ ));
@@ -1271,9 +1269,7 @@ DATA(insert OID = 1156 (  timestamptz_ge   PGNSP PGUID 12 1 0 0 0 f f f t t f i
 DATA(insert OID = 1157 (  timestamptz_gt   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "1184 1184" _null_ _null_ _null_ _null_ timestamp_gt _null_ _null_ _null_ ));
 DATA(insert OID = 1158 (  to_timestamp	   PGNSP PGUID 14 1 0 0 0 f f f f t f i 1 0 1184 "701" _null_ _null_ _null_ _null_ "select (''epoch''::pg_catalog.timestamptz + $1 * ''1 second''::pg_catalog.interval)" _null_ _null_ _null_ ));
 DESCR("convert UNIX epoch to timestamptz");
-DATA(insert OID = 3995 (  timestamp_zone_transform PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ timestamp_zone_transform _null_ _null_ _null_ ));
-DESCR("transform a time zone adjustment");
-DATA(insert OID = 1159 (  timezone		   PGNSP PGUID 12 1 0 0 timestamp_zone_transform f f f f t f i 2 0 1114 "25 1184" _null_ _null_ _null_ _null_	timestamptz_zone _null_ _null_ _null_ ));
+DATA(insert OID = 1159 (  timezone		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1114 "25 1184" _null_ _null_ _null_ _null_	timestamptz_zone _null_ _null_ _null_ ));
 DESCR("adjust timestamp to new time zone");
 
 DATA(insert OID = 1160 (  interval_in	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 3 0 1186 "2275 26 23" _null_ _null_ _null_ _null_ interval_in _null_ _null_ _null_ ));
@@ -2856,8 +2852,6 @@ DESCR("statistics: total execution time of function in current transaction, in m
 DATA(insert OID = 3048 (  pg_stat_get_xact_function_self_time	PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 701 "26" _null_ _null_ _null_ _null_ pg_stat_get_xact_function_self_time _null_ _null_ _null_ ));
 DESCR("statistics: self execution time of function in current transaction, in msec");
 
-DATA(insert OID = 3788 (  pg_stat_get_snapshot_timestamp PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 1184 "" _null_ _null_ _null_ _null_	pg_stat_get_snapshot_timestamp _null_ _null_ _null_ ));
-DESCR("statistics: timestamp of the current statistics snapshot");
 DATA(insert OID = 2230 (  pg_stat_clear_snapshot		PGNSP PGUID 12 1 0 0 0 f f f f f f v 0 0 2278 "" _null_ _null_ _null_ _null_	pg_stat_clear_snapshot _null_ _null_ _null_ ));
 DESCR("statistics: discard current transaction's statistics snapshot");
 DATA(insert OID = 2274 (  pg_stat_reset					PGNSP PGUID 12 1 0 0 0 f f f f f f v 0 0 2278 "" _null_ _null_ _null_ _null_	pg_stat_reset _null_ _null_ _null_ ));
@@ -3000,9 +2994,9 @@ DESCR("date difference preserving months and years");
 DATA(insert OID = 2059 (  age				PGNSP PGUID 14 1 0 0 0 f f f f t f s 1 0 1186 "1114" _null_ _null_ _null_ _null_ "select pg_catalog.age(cast(current_date as timestamp without time zone), $1)" _null_ _null_ _null_ ));
 DESCR("date difference from today preserving months and years");
 
-DATA(insert OID = 2069 (  timezone			PGNSP PGUID 12 1 0 0 timestamp_zone_transform f f f f t f i 2 0 1184 "25 1114" _null_ _null_ _null_ _null_ timestamp_zone _null_ _null_ _null_ ));
+DATA(insert OID = 2069 (  timezone			PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1184 "25 1114" _null_ _null_ _null_ _null_ timestamp_zone _null_ _null_ _null_ ));
 DESCR("adjust timestamp to new time zone");
-DATA(insert OID = 2070 (  timezone			PGNSP PGUID 12 1 0 0 timestamp_izone_transform f f f f t f i 2 0 1184 "1186 1114" _null_ _null_ _null_ _null_	timestamp_izone _null_ _null_ _null_ ));
+DATA(insert OID = 2070 (  timezone			PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1184 "1186 1114" _null_ _null_ _null_ _null_	timestamp_izone _null_ _null_ _null_ ));
 DESCR("adjust timestamp to new time zone");
 DATA(insert OID = 2071 (  date_pl_interval	PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1114 "1082 1186" _null_ _null_ _null_ _null_	date_pl_interval _null_ _null_ _null_ ));
 DATA(insert OID = 2072 (  date_mi_interval	PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 1114 "1082 1186" _null_ _null_ _null_ _null_	date_mi_interval _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_seclabel.h b/src/include/catalog/pg_seclabel.h
index c9f5b0c..d54e699 100644
--- a/src/include/catalog/pg_seclabel.h
+++ b/src/include/catalog/pg_seclabel.h
@@ -27,8 +27,8 @@ CATALOG(pg_seclabel,3596) BKI_WITHOUT_OIDS
 	int32		objsubid;		/* column number, or 0 if not used */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		provider BKI_FORCE_NOT_NULL; /* name of label provider */
-	text		label BKI_FORCE_NOT_NULL; /* security label of the object */
+	text		provider;		/* name of label provider */
+	text		label;			/* security label of the object */
 #endif
 } FormData_pg_seclabel;
 
diff --git a/src/include/catalog/pg_shdescription.h b/src/include/catalog/pg_shdescription.h
index c524099..723f984 100644
--- a/src/include/catalog/pg_shdescription.h
+++ b/src/include/catalog/pg_shdescription.h
@@ -44,7 +44,7 @@ CATALOG(pg_shdescription,2396) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 	Oid			classoid;		/* OID of table containing object */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		description BKI_FORCE_NOT_NULL; /* description of object */
+	text		description;	/* description of object */
 #endif
 } FormData_pg_shdescription;
 
diff --git a/src/include/catalog/pg_shseclabel.h b/src/include/catalog/pg_shseclabel.h
index 3977b42..f0b9952 100644
--- a/src/include/catalog/pg_shseclabel.h
+++ b/src/include/catalog/pg_shseclabel.h
@@ -26,8 +26,8 @@ CATALOG(pg_shseclabel,3592) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 	Oid			classoid;		/* OID of table containing the shared object */
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		provider BKI_FORCE_NOT_NULL; /* name of label provider */
-	text		label BKI_FORCE_NOT_NULL; /* security label of the object */
+	text		provider;		/* name of label provider */
+	text		label;			/* security label of the object */
 #endif
 } FormData_pg_shseclabel;
 
diff --git a/src/include/catalog/pg_trigger.h b/src/include/catalog/pg_trigger.h
index bff8fcf..40c8c0f 100644
--- a/src/include/catalog/pg_trigger.h
+++ b/src/include/catalog/pg_trigger.h
@@ -57,7 +57,7 @@ CATALOG(pg_trigger,2620)
 	int2vector	tgattr;			/* column numbers, if trigger is on columns */
 
 #ifdef CATALOG_VARLEN
-	bytea		tgargs BKI_FORCE_NOT_NULL; /* first\000second\000tgnargs\000 */
+	bytea		tgargs;			/* first\000second\000tgnargs\000 */
 	pg_node_tree tgqual;		/* WHEN expression, or NULL if none */
 #endif
 } FormData_pg_trigger;
diff --git a/src/include/commands/dbcommands.h b/src/include/commands/dbcommands.h
index 4b60cdb..cb7cc0e 100644
--- a/src/include/commands/dbcommands.h
+++ b/src/include/commands/dbcommands.h
@@ -22,6 +22,21 @@
 #define XLOG_DBASE_CREATE		0x00
 #define XLOG_DBASE_DROP			0x10
 
+typedef struct xl_dbase_create_rec_old
+{
+	/* Records copying of a single subdirectory incl. contents */
+	Oid			db_id;
+	char		src_path[1];	/* VARIABLE LENGTH STRING */
+	/* dst_path follows src_path */
+}	xl_dbase_create_rec_old;
+
+typedef struct xl_dbase_drop_rec_old
+{
+	/* Records dropping of a single subdirectory incl. contents */
+	Oid			db_id;
+	char		dir_path[1];	/* VARIABLE LENGTH STRING */
+}	xl_dbase_drop_rec_old;
+
 typedef struct xl_dbase_create_rec
 {
 	/* Records copying of a single subdirectory incl. contents */
diff --git a/src/include/commands/event_trigger.h b/src/include/commands/event_trigger.h
index 9ac9fc3..e807e65 100644
--- a/src/include/commands/event_trigger.h
+++ b/src/include/commands/event_trigger.h
@@ -48,7 +48,6 @@ extern void AlterEventTriggerOwner_oid(Oid, Oid newOwnerId);
 
 extern bool EventTriggerSupportsObjectType(ObjectType obtype);
 extern bool EventTriggerSupportsObjectClass(ObjectClass objclass);
-extern bool EventTriggerSupportsGrantObjectType(GrantObjectType objtype);
 extern void EventTriggerDDLCommandStart(Node *parsetree);
 extern void EventTriggerDDLCommandEnd(Node *parsetree);
 extern void EventTriggerSQLDrop(Node *parsetree);
diff --git a/src/include/commands/tablespace.h b/src/include/commands/tablespace.h
index 70734d6..e8b9bc4 100644
--- a/src/include/commands/tablespace.h
+++ b/src/include/commands/tablespace.h
@@ -25,7 +25,7 @@
 typedef struct xl_tblspc_create_rec
 {
 	Oid			ts_id;
-	char		ts_path[FLEXIBLE_ARRAY_MEMBER]; /* null-terminated string */
+	char		ts_path[1];		/* VARIABLE LENGTH STRING */
 } xl_tblspc_create_rec;
 
 typedef struct xl_tblspc_drop_rec
diff --git a/src/include/commands/typecmds.h b/src/include/commands/typecmds.h
index 0a63800..e18a714 100644
--- a/src/include/commands/typecmds.h
+++ b/src/include/commands/typecmds.h
@@ -39,6 +39,8 @@ extern Oid AlterDomainDropConstraint(List *names, const char *constrName,
 
 extern void checkDomainOwner(HeapTuple tup);
 
+extern List *GetDomainConstraints(Oid typeOid);
+
 extern Oid	RenameType(RenameStmt *stmt);
 extern Oid	AlterTypeOwner(List *names, Oid newOwnerId, ObjectType objecttype);
 extern void AlterTypeOwnerInternal(Oid typeOid, Oid newOwnerId,
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 71099b1..e79df71 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -114,7 +114,7 @@ typedef struct HashMemoryChunkData
 
 	struct HashMemoryChunkData *next; /* pointer to the next chunk (linked list) */
 
-	char		data[FLEXIBLE_ARRAY_MEMBER];	/* buffer allocated at the end */
+	char		data[1];	/* buffer allocated at the end */
 } HashMemoryChunkData;
 
 typedef struct HashMemoryChunkData *HashMemoryChunk;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..d4ab71a 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,16 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
+typedef void (*GetForeignJoinPath_function ) (PlannerInfo *root,
+											  RelOptInfo *joinrel,
+											  RelOptInfo *outerrel,
+											  RelOptInfo *innerrel,
+											  JoinType jointype,
+											  SpecialJoinInfo *sjinfo,
+											  SemiAntiJoinFactors *semifactors,
+											  List *restrictlist,
+											  Relids extra_lateral_rels);
+
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
 
@@ -150,6 +160,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
+
+	/* Support functions for join push-down */
+	GetForeignJoinPath_function GetForeignJoinPath;
+
 } FdwRoutine;
 
 
@@ -157,6 +171,7 @@ typedef struct FdwRoutine
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid	GetFdwHandlerForRelation(Relation relation);
 extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 
diff --git a/src/include/lib/pairingheap.h b/src/include/lib/pairingheap.h
index eb1856a..e3e320f 100644
--- a/src/include/lib/pairingheap.h
+++ b/src/include/lib/pairingheap.h
@@ -11,11 +11,6 @@
 #ifndef PAIRINGHEAP_H
 #define PAIRINGHEAP_H
 
-#include "lib/stringinfo.h"
-
-/* Enable if you need the pairingheap_dump() debug function */
-/* #define PAIRINGHEAP_DEBUG */
-
 /*
  * This represents an element stored in the heap. Embed this in a larger
  * struct containing the actual data you're storing.
@@ -83,12 +78,6 @@ extern pairingheap_node *pairingheap_first(pairingheap *heap);
 extern pairingheap_node *pairingheap_remove_first(pairingheap *heap);
 extern void pairingheap_remove(pairingheap *heap, pairingheap_node *node);
 
-#ifdef PAIRINGHEAP_DEBUG
-extern char *pairingheap_dump(pairingheap *heap,
-							  void (*dumpfunc) (pairingheap_node *node, StringInfo buf, void *opaque),
-							  void *opaque);
-#endif
-
 /* Resets the heap to be empty. */
 #define pairingheap_reset(h)			((h)->ph_root = NULL)
 
diff --git a/src/include/libpq/ip.h b/src/include/libpq/ip.h
index 796dd41..23051c0 100644
--- a/src/include/libpq/ip.h
+++ b/src/include/libpq/ip.h
@@ -46,6 +46,11 @@ extern int pg_range_sockaddr(const struct sockaddr_storage * addr,
 extern int pg_sockaddr_cidr_mask(struct sockaddr_storage * mask,
 					  char *numbits, int family);
 
+#ifdef HAVE_IPV6
+extern void pg_promote_v4_to_v6_addr(struct sockaddr_storage * addr);
+extern void pg_promote_v4_to_v6_mask(struct sockaddr_storage * addr);
+#endif
+
 extern int	pg_foreach_ifaddr(PgIfAddrCallback callback, void *cb_data);
 
 #endif   /* IP_H */
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index cf520f5..ccd7021 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -209,8 +209,8 @@ typedef struct Port
 extern void be_tls_init(void);
 extern int be_tls_open_server(Port *port);
 extern void be_tls_close(Port *port);
-extern ssize_t be_tls_read(Port *port, void *ptr, size_t len, int *waitfor);
-extern ssize_t be_tls_write(Port *port, void *ptr, size_t len, int *waitfor);
+extern ssize_t be_tls_read(Port *port, void *ptr, size_t len);
+extern ssize_t be_tls_write(Port *port, void *ptr, size_t len);
 
 #endif
 
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 3a556ee..5f45f4d 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -32,8 +32,8 @@ typedef int32 signedbitmapword; /* must be the matching signed type */
 typedef struct Bitmapset
 {
 	int			nwords;			/* number of words in array */
-	bitmapword	words[FLEXIBLE_ARRAY_MEMBER];	/* really [nwords] */
-} Bitmapset;
+	bitmapword	words[1];		/* really [nwords] */
+} Bitmapset;					/* VARIABLE LENGTH STRUCT */
 
 
 /* result of bms_subset_compare */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 59b17f3..41288ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -942,9 +942,8 @@ typedef struct CoerceToDomainState
 {
 	ExprState	xprstate;
 	ExprState  *arg;			/* input expression */
-	/* Cached set of constraints that need to be checked */
-	/* use struct pointer to avoid including typcache.h here */
-	struct DomainConstraintRef *constraint_ref;
+	/* Cached list of constraints that need to be checked */
+	List	   *constraints;	/* list of DomainConstraintState nodes */
 } CoerceToDomainState;
 
 /*
diff --git a/src/include/nodes/memnodes.h b/src/include/nodes/memnodes.h
index 5e036b9..ca9c3de 100644
--- a/src/include/nodes/memnodes.h
+++ b/src/include/nodes/memnodes.h
@@ -54,15 +54,15 @@ typedef struct MemoryContextMethods
 typedef struct MemoryContextData
 {
 	NodeTag		type;			/* identifies exact kind of context */
-	/* these two fields are placed here to minimize alignment wastage: */
-	bool		isReset;		/* T = no space alloced since last reset */
-	bool		allowInCritSection;		/* allow palloc in critical section */
 	MemoryContextMethods *methods;		/* virtual function table */
 	MemoryContext parent;		/* NULL if no parent (toplevel context) */
 	MemoryContext firstchild;	/* head of linked list of children */
 	MemoryContext nextchild;	/* next child of same parent */
 	char	   *name;			/* context name (just for debugging) */
-	MemoryContextCallback *reset_cbs;	/* list of reset/delete callbacks */
+	bool		isReset;		/* T = no space alloced since last reset */
+#ifdef USE_ASSERT_CHECKING
+	bool		allowInCritSection;	/* allow palloc in critical section */
+#endif
 } MemoryContextData;
 
 /* utils/palloc.h contains typedef struct MemoryContextData *MemoryContext */
diff --git a/src/include/nodes/params.h b/src/include/nodes/params.h
index a0f7dd0..5b096c5 100644
--- a/src/include/nodes/params.h
+++ b/src/include/nodes/params.h
@@ -71,7 +71,7 @@ typedef struct ParamListInfoData
 	ParserSetupHook parserSetup;	/* parser setup hook */
 	void	   *parserSetupArg;
 	int			numParams;		/* number of ParamExternDatas following */
-	ParamExternData params[FLEXIBLE_ARRAY_MEMBER];
+	ParamExternData params[1];	/* VARIABLE LENGTH ARRAY */
 }	ParamListInfoData;
 
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index ac13302..b1dfa85 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -232,14 +232,7 @@ typedef enum A_Expr_Kind
 	AEXPR_DISTINCT,				/* IS DISTINCT FROM - name must be "=" */
 	AEXPR_NULLIF,				/* NULLIF - name must be "=" */
 	AEXPR_OF,					/* IS [NOT] OF - name must be "=" or "<>" */
-	AEXPR_IN,					/* [NOT] IN - name must be "=" or "<>" */
-	AEXPR_LIKE,					/* [NOT] LIKE - name must be "~~" or "!~~" */
-	AEXPR_ILIKE,				/* [NOT] ILIKE - name must be "~~*" or "!~~*" */
-	AEXPR_SIMILAR,				/* [NOT] SIMILAR - name must be "~" or "!~" */
-	AEXPR_BETWEEN,				/* name must be "BETWEEN" */
-	AEXPR_NOT_BETWEEN,			/* name must be "NOT BETWEEN" */
-	AEXPR_BETWEEN_SYM,			/* name must be "BETWEEN SYMMETRIC" */
-	AEXPR_NOT_BETWEEN_SYM		/* name must be "NOT BETWEEN SYMMETRIC" */
+	AEXPR_IN					/* [NOT] IN - name must be "=" or "<>" */
 } A_Expr_Kind;
 
 typedef struct A_Expr
@@ -2264,7 +2257,6 @@ typedef struct IndexStmt
 	bool		isconstraint;	/* is it for a pkey/unique constraint? */
 	bool		deferrable;		/* is the constraint DEFERRABLE? */
 	bool		initdeferred;	/* is the constraint INITIALLY DEFERRED? */
-	bool		transformed;	/* true when transformIndexStmt is finished */
 	bool		concurrent;		/* should this be a concurrent index build? */
 	bool		if_not_exists;	/* just do nothing if index already exists? */
 } IndexStmt;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f6683f0..6717c6d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,7 +70,7 @@ typedef struct PlannedStmt
 
 	int			nParamExec;		/* number of PARAM_EXEC Params used */
 
-	bool		hasRowSecurity; /* row security applied? */
+	bool		hasRowSecurity;	/* row security applied? */
 
 } PlannedStmt;
 
@@ -174,7 +174,6 @@ typedef struct ModifyTable
 	Plan		plan;
 	CmdType		operation;		/* INSERT, UPDATE, or DELETE */
 	bool		canSetTag;		/* do we set the command tag/es_processed? */
-	Index		nominalRelation;	/* Parent RT index for use of EXPLAIN */
 	List	   *resultRelations;	/* integer list of RT indexes */
 	int			resultRelIndex; /* index of first resultRel in plan's list */
 	List	   *plans;			/* plan(s) producing source data */
@@ -471,7 +470,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -480,7 +485,9 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
@@ -488,10 +495,11 @@ typedef struct ForeignScan
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -512,6 +520,7 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
 	const CustomScanMethods *methods;
 } CustomScan;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index dbc5a35..1d06f42 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -994,6 +994,7 @@ typedef struct MinMaxExpr
  * Note: result type/typmod/collation are not stored, but can be deduced
  * from the XmlExprOp.  The type/typmod fields are just used for display
  * purposes, and are NOT necessarily the true result type of the node.
+ * (We also use type == InvalidOid to mark a not-yet-parse-analyzed XmlExpr.)
  */
 typedef enum XmlExprOp
 {
@@ -1049,7 +1050,6 @@ typedef struct NullTest
 	Expr	   *arg;			/* input expression */
 	NullTestType nulltesttype;	/* IS NULL, IS NOT NULL */
 	bool		argisrow;		/* T if input is of a composite type */
-	int			location;		/* token location, or -1 if unknown */
 } NullTest;
 
 /*
@@ -1071,7 +1071,6 @@ typedef struct BooleanTest
 	Expr		xpr;
 	Expr	   *arg;			/* input expression */
 	BoolTestType booltesttype;	/* test type */
-	int			location;		/* token location, or -1 if unknown */
 } BooleanTest;
 
 /*
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..9ef0b56 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index cfba64b..fb62c9e 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -41,8 +41,8 @@ typedef struct
 	int			ntuples;		/* -1 indicates lossy result */
 	bool		recheck;		/* should the tuples be rechecked? */
 	/* Note: recheck is always true if ntuples < 0 */
-	OffsetNumber offsets[FLEXIBLE_ARRAY_MEMBER];
-} TBMIterateResult;
+	OffsetNumber offsets[1];	/* VARIABLE LENGTH ARRAY */
+} TBMIterateResult;				/* VARIABLE LENGTH STRUCT */
 
 /* function prototypes in nodes/tidbitmap.c */
 
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fa72918..e66eaa5 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
@@ -82,7 +83,6 @@ extern Result *make_result(PlannerInfo *root, List *tlist,
 			Node *resconstantqual, Plan *subplan);
 extern ModifyTable *make_modifytable(PlannerInfo *root,
 				 CmdType operation, bool canSetTag,
-				 Index nominalRelation,
 				 List *resultRelations, List *subplans,
 				 List *withCheckOptionLists, List *returningLists,
 				 List *rowMarks, int epqParam);
diff --git a/src/include/parser/gramparse.h b/src/include/parser/gramparse.h
index 100fdfb..d9df303 100644
--- a/src/include/parser/gramparse.h
+++ b/src/include/parser/gramparse.h
@@ -46,8 +46,6 @@ typedef struct base_yy_extra_type
 	int			lookahead_token;	/* one-token lookahead */
 	core_YYSTYPE lookahead_yylval;		/* yylval for lookahead token */
 	YYLTYPE		lookahead_yylloc;		/* yylloc for lookahead token */
-	char	   *lookahead_end;	/* end of current token */
-	char		lookahead_hold_char;	/* to be put back at *lookahead_end */
 
 	/*
 	 * State variables that belong to the grammar.
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 30934d3..416769a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -104,7 +104,6 @@ typedef struct PgStat_TableCounts
 	PgStat_Counter t_tuples_updated;
 	PgStat_Counter t_tuples_deleted;
 	PgStat_Counter t_tuples_hot_updated;
-	bool		t_truncated;
 
 	PgStat_Counter t_delta_live_tuples;
 	PgStat_Counter t_delta_dead_tuples;
@@ -166,10 +165,6 @@ typedef struct PgStat_TableXactStatus
 	PgStat_Counter tuples_inserted;		/* tuples inserted in (sub)xact */
 	PgStat_Counter tuples_updated;		/* tuples updated in (sub)xact */
 	PgStat_Counter tuples_deleted;		/* tuples deleted in (sub)xact */
-	bool		truncated;		/* relation truncated in this (sub)xact */
-	PgStat_Counter inserted_pre_trunc;	/* tuples inserted prior to truncate */
-	PgStat_Counter updated_pre_trunc;	/* tuples updated prior to truncate */
-	PgStat_Counter deleted_pre_trunc;	/* tuples deleted prior to truncate */
 	int			nest_level;		/* subtransaction nest level */
 	/* links to other structs for same relation: */
 	struct PgStat_TableXactStatus *upper;		/* next higher subxact if any */
@@ -551,7 +546,7 @@ typedef union PgStat_Msg
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9D
+#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9C
 
 /* ----------
  * PgStat_StatDBEntry			The collector's data per database
@@ -965,7 +960,6 @@ extern void pgstat_initstats(Relation rel);
 extern void pgstat_count_heap_insert(Relation rel, int n);
 extern void pgstat_count_heap_update(Relation rel, bool hot);
 extern void pgstat_count_heap_delete(Relation rel);
-extern void pgstat_count_truncate(Relation rel);
 extern void pgstat_update_heap_dead_tuples(Relation rel, int delta);
 
 extern void pgstat_init_function_usage(FunctionCallInfoData *fcinfo,
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index 906db7c..20e4610 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -11,13 +11,5 @@
  *
  *-------------------------------------------------------------------------
  */
-
-enum tarError
-{
-	TAR_OK = 0,
-	TAR_NAME_TOO_LONG,
-	TAR_SYMLINK_TOO_LONG
-};
-
-extern enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
+extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
 extern int	tarChecksum(char *header);
diff --git a/src/include/port.h b/src/include/port.h
index a1ab42e..26d7fcd 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -328,6 +328,8 @@ extern FILE *pgwin32_popen(const char *command, const char *type);
 #ifndef HAVE_GETTIMEOFDAY
 /* Last parameter not used */
 extern int	gettimeofday(struct timeval * tp, struct timezone * tzp);
+/* On windows we need to call some backend start setup for accurate timing */
+extern void init_win32_gettimeofday(void);
 #endif
 #else							/* !WIN32 */
 
diff --git a/src/include/postgres.h b/src/include/postgres.h
index cbb7f79..082c75b 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -117,20 +117,20 @@ typedef union
 	struct						/* Normal varlena (4-byte length) */
 	{
 		uint32		va_header;
-		char		va_data[FLEXIBLE_ARRAY_MEMBER];
+		char		va_data[1];
 	}			va_4byte;
 	struct						/* Compressed-in-line format */
 	{
 		uint32		va_header;
 		uint32		va_rawsize; /* Original data size (excludes header) */
-		char		va_data[FLEXIBLE_ARRAY_MEMBER];		/* Compressed data */
+		char		va_data[1]; /* Compressed data */
 	}			va_compressed;
 } varattrib_4b;
 
 typedef struct
 {
 	uint8		va_header;
-	char		va_data[FLEXIBLE_ARRAY_MEMBER]; /* Data begins here */
+	char		va_data[1];		/* Data begins here */
 } varattrib_1b;
 
 /* TOAST pointers are a subset of varattrib_1b with an identifying tag byte */
@@ -138,7 +138,7 @@ typedef struct
 {
 	uint8		va_header;		/* Always 0x80 or 0x01 */
 	uint8		va_tag;			/* Type of datum */
-	char		va_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-specific data */
+	char		va_data[1];		/* Data (of the type indicated by va_tag) */
 } varattrib_1b_e;
 
 /*
diff --git a/src/include/postmaster/syslogger.h b/src/include/postmaster/syslogger.h
index 89a535c..602b13c 100644
--- a/src/include/postmaster/syslogger.h
+++ b/src/include/postmaster/syslogger.h
@@ -48,7 +48,7 @@ typedef struct
 	int32		pid;			/* writer's pid */
 	char		is_last;		/* last chunk of message? 't' or 'f' ('T' or
 								 * 'F' for CSV case) */
-	char		data[FLEXIBLE_ARRAY_MEMBER];	/* data payload starts here */
+	char		data[1];		/* data payload starts here */
 } PipeProtoHeader;
 
 typedef union
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index f1e0f57..5a1d9a0 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -28,12 +28,8 @@ typedef struct ReorderBufferTupleBuf
 
 	/* tuple, stored sequentially */
 	HeapTupleData tuple;
-	union
-	{
-		HeapTupleHeaderData header;
-		char		data[MaxHeapTupleSize];
-		double		align_it;	/* ensure t_data is MAXALIGN'd */
-	}			t_data;
+	HeapTupleHeaderData header;
+	char		data[MaxHeapTupleSize];
 } ReorderBufferTupleBuf;
 
 /*
@@ -81,7 +77,7 @@ typedef struct ReorderBufferChange
 			RelFileNode relnode;
 
 			/* no previously reassembled toast chunks are necessary anymore */
-			bool		clear_toast_afterwards;
+			bool clear_toast_afterwards;
 
 			/* valid for DELETE || UPDATE */
 			ReorderBufferTupleBuf *oldtuple;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a400136..d22963d 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -130,10 +130,6 @@ typedef struct ReplicationSlot
  */
 typedef struct ReplicationSlotCtlData
 {
-	/*
-	 * This array should be declared [FLEXIBLE_ARRAY_MEMBER], but for some
-	 * reason you can't do that in an otherwise-empty struct.
-	 */
 	ReplicationSlot replication_slots[1];
 } ReplicationSlotCtlData;
 
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 40351da..8867750 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -54,7 +54,7 @@ typedef struct WalSnd
 	 * Pointer to the walsender's latch. Used by backends to wake up this
 	 * walsender when it has work to do. NULL if the walsender isn't active.
 	 */
-	Latch	   *latch;
+	Latch		*latch;
 
 	/*
 	 * The priority order of the standby managed by this WALSender, as listed
@@ -88,7 +88,7 @@ typedef struct
 	 */
 	bool		sync_standbys_defined;
 
-	WalSnd		walsnds[FLEXIBLE_ARRAY_MEMBER];
+	WalSnd		walsnds[1];		/* VARIABLE LENGTH ARRAY */
 } WalSndCtlData;
 
 extern WalSndCtlData *WalSndCtl;
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index c2fbffc..f693032 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -156,7 +156,7 @@ typedef struct PageHeaderData
 	LocationIndex pd_special;	/* offset to start of special space */
 	uint16		pd_pagesize_version;
 	TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
-	ItemIdData	pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */
+	ItemIdData	pd_linp[1];		/* beginning of line pointer array */
 } PageHeaderData;
 
 typedef PageHeaderData *PageHeader;
diff --git a/src/include/storage/fsm_internals.h b/src/include/storage/fsm_internals.h
index 26340b4..1decd90 100644
--- a/src/include/storage/fsm_internals.h
+++ b/src/include/storage/fsm_internals.h
@@ -39,7 +39,7 @@ typedef struct
 	 * NonLeafNodesPerPage elements are upper nodes, and the following
 	 * LeafNodesPerPage elements are leaf nodes. Unused nodes are zero.
 	 */
-	uint8		fp_nodes[FLEXIBLE_ARRAY_MEMBER];
+	uint8		fp_nodes[1];
 } FSMPageData;
 
 typedef FSMPageData *FSMPage;
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index f4dc0db..52b86e3 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -404,7 +404,7 @@ tas(volatile slock_t *lock)
  * requires a barrier.  We fall through to the default gcc definition of
  * S_UNLOCK in this case.
  */
-#elif defined(__sparcv8)
+#elif  __sparcv8
 /* stbar is available (and required for both PSO, RMO), membar isn't */
 #define S_UNLOCK(lock)	\
 do \
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 7626c4c..c32c963 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -60,7 +60,7 @@ extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
 typedef struct xl_standby_locks
 {
 	int			nlocks;			/* number of entries in locks array */
-	xl_standby_lock locks[FLEXIBLE_ARRAY_MEMBER];
+	xl_standby_lock locks[1];	/* VARIABLE LENGTH ARRAY */
 } xl_standby_locks;
 
 /*
@@ -75,7 +75,7 @@ typedef struct xl_running_xacts
 	TransactionId oldestRunningXid;		/* *not* oldestXmin */
 	TransactionId latestCompletedXid;	/* so we can set xmax */
 
-	TransactionId xids[FLEXIBLE_ARRAY_MEMBER];
+	TransactionId xids[1];		/* VARIABLE LENGTH ARRAY */
 } xl_running_xacts;
 
 #define MinSizeOfXactRunningXacts offsetof(xl_running_xacts, xids)
diff --git a/src/include/tsearch/dicts/regis.h b/src/include/tsearch/dicts/regis.h
index ddf5b60..081a502 100644
--- a/src/include/tsearch/dicts/regis.h
+++ b/src/include/tsearch/dicts/regis.h
@@ -21,7 +21,7 @@ typedef struct RegisNode
 				len:16,
 				unused:14;
 	struct RegisNode *next;
-	unsigned char data[FLEXIBLE_ARRAY_MEMBER];
+	unsigned char data[1];
 } RegisNode;
 
 #define  RNHDRSZ	(offsetof(RegisNode,data))
diff --git a/src/include/tsearch/dicts/spell.h b/src/include/tsearch/dicts/spell.h
index e512532..a75552b 100644
--- a/src/include/tsearch/dicts/spell.h
+++ b/src/include/tsearch/dicts/spell.h
@@ -49,7 +49,7 @@ typedef struct
 typedef struct SPNode
 {
 	uint32		length;
-	SPNodeData	data[FLEXIBLE_ARRAY_MEMBER];
+	SPNodeData	data[1];
 } SPNode;
 
 #define SPNHDRSZ	(offsetof(SPNode,data))
@@ -70,7 +70,7 @@ typedef struct spell_struct
 			int			len;
 		}			d;
 	}			p;
-	char		word[FLEXIBLE_ARRAY_MEMBER];
+	char		word[1];		/* variable length, null-terminated */
 } SPELL;
 
 #define SPELLHDRSZ	(offsetof(SPELL, word))
@@ -120,7 +120,7 @@ typedef struct AffixNode
 {
 	uint32		isvoid:1,
 				length:31;
-	AffixNodeData data[FLEXIBLE_ARRAY_MEMBER];
+	AffixNodeData data[1];
 } AffixNode;
 
 #define ANHRDSZ		   (offsetof(AffixNode, data))
diff --git a/src/include/tsearch/ts_type.h b/src/include/tsearch/ts_type.h
index 281cdd6..1cdfa82 100644
--- a/src/include/tsearch/ts_type.h
+++ b/src/include/tsearch/ts_type.h
@@ -63,16 +63,9 @@ typedef uint16 WordEntryPos;
 typedef struct
 {
 	uint16		npos;
-	WordEntryPos pos[FLEXIBLE_ARRAY_MEMBER];
+	WordEntryPos pos[1];		/* variable length */
 } WordEntryPosVector;
 
-/* WordEntryPosVector with exactly 1 entry */
-typedef struct
-{
-	uint16		npos;
-	WordEntryPos pos[1];
-} WordEntryPosVector1;
-
 
 #define WEP_GETWEIGHT(x)	( (x) >> 14 )
 #define WEP_GETPOS(x)		( (x) & 0x3fff )
@@ -89,7 +82,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		size;
-	WordEntry	entries[FLEXIBLE_ARRAY_MEMBER];
+	WordEntry	entries[1];		/* variable length */
 	/* lexemes follow the entries[] array */
 } TSVectorData;
 
@@ -240,7 +233,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		size;			/* number of QueryItems */
-	char		data[FLEXIBLE_ARRAY_MEMBER];	/* data starts here */
+	char		data[1];		/* data starts here */
 } TSQueryData;
 
 typedef TSQueryData *TSQuery;
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 649688c..694bce7 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -89,7 +89,6 @@ typedef struct ArrayBuildState
 	int16		typlen;			/* needed info about datatype */
 	bool		typbyval;
 	char		typalign;
-	bool		private_cxt;	/* use private memory context */
 } ArrayBuildState;
 
 /*
@@ -110,7 +109,6 @@ typedef struct ArrayBuildStateArr
 	int			lbs[MAXDIM];
 	Oid			array_type;		/* data type of the arrays */
 	Oid			element_type;	/* data type of the array elements */
-	bool		private_cxt;	/* use private memory context */
 } ArrayBuildStateArr;
 
 /*
@@ -250,26 +248,19 @@ extern Datum array_remove(PG_FUNCTION_ARGS);
 extern Datum array_replace(PG_FUNCTION_ARGS);
 extern Datum width_bucket_array(PG_FUNCTION_ARGS);
 
-extern Datum array_get_element(Datum arraydatum, int nSubscripts, int *indx,
-				  int arraytyplen, int elmlen, bool elmbyval, char elmalign,
-				  bool *isNull);
-extern Datum array_set_element(Datum arraydatum, int nSubscripts, int *indx,
-				  Datum dataValue, bool isNull,
-				  int arraytyplen, int elmlen, bool elmbyval, char elmalign);
-extern Datum array_get_slice(Datum arraydatum, int nSubscripts,
-				int *upperIndx, int *lowerIndx,
-				int arraytyplen, int elmlen, bool elmbyval, char elmalign);
-extern Datum array_set_slice(Datum arraydatum, int nSubscripts,
-				int *upperIndx, int *lowerIndx,
-				Datum srcArrayDatum, bool isNull,
-				int arraytyplen, int elmlen, bool elmbyval, char elmalign);
-
 extern Datum array_ref(ArrayType *array, int nSubscripts, int *indx,
 		  int arraytyplen, int elmlen, bool elmbyval, char elmalign,
 		  bool *isNull);
 extern ArrayType *array_set(ArrayType *array, int nSubscripts, int *indx,
 		  Datum dataValue, bool isNull,
 		  int arraytyplen, int elmlen, bool elmbyval, char elmalign);
+extern ArrayType *array_get_slice(ArrayType *array, int nSubscripts,
+				int *upperIndx, int *lowerIndx,
+				int arraytyplen, int elmlen, bool elmbyval, char elmalign);
+extern ArrayType *array_set_slice(ArrayType *array, int nSubscripts,
+				int *upperIndx, int *lowerIndx,
+				ArrayType *srcArray, bool isNull,
+				int arraytyplen, int elmlen, bool elmbyval, char elmalign);
 
 extern Datum array_map(FunctionCallInfo fcinfo, Oid inpType, Oid retType,
 		  ArrayMapState *amstate);
@@ -295,7 +286,7 @@ extern void deconstruct_array(ArrayType *array,
 extern bool array_contains_nulls(ArrayType *array);
 
 extern ArrayBuildState *initArrayResult(Oid element_type,
-				MemoryContext rcontext, bool subcontext);
+				MemoryContext rcontext);
 extern ArrayBuildState *accumArrayResult(ArrayBuildState *astate,
 				 Datum dvalue, bool disnull,
 				 Oid element_type,
@@ -306,7 +297,7 @@ extern Datum makeMdArrayResult(ArrayBuildState *astate, int ndims,
 				  int *dims, int *lbs, MemoryContext rcontext, bool release);
 
 extern ArrayBuildStateArr *initArrayResultArr(Oid array_type, Oid element_type,
-				   MemoryContext rcontext, bool subcontext);
+				   MemoryContext rcontext);
 extern ArrayBuildStateArr *accumArrayResultArr(ArrayBuildStateArr *astate,
 					Datum dvalue, bool disnull,
 					Oid array_type,
@@ -315,7 +306,7 @@ extern Datum makeArrayResultArr(ArrayBuildStateArr *astate,
 				   MemoryContext rcontext, bool release);
 
 extern ArrayBuildStateAny *initArrayResultAny(Oid input_type,
-				   MemoryContext rcontext, bool subcontext);
+				   MemoryContext rcontext);
 extern ArrayBuildStateAny *accumArrayResultAny(ArrayBuildStateAny *astate,
 					Datum dvalue, bool disnull,
 					Oid input_type,
@@ -343,8 +334,7 @@ extern int32 *ArrayGetIntegerTypmods(ArrayType *arr, int *n);
 /*
  * prototypes for functions defined in array_userfuncs.c
  */
-extern Datum array_append(PG_FUNCTION_ARGS);
-extern Datum array_prepend(PG_FUNCTION_ARGS);
+extern Datum array_push(PG_FUNCTION_ARGS);
 extern Datum array_cat(PG_FUNCTION_ARGS);
 
 extern ArrayType *create_singleton_array(FunctionCallInfo fcinfo,
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index a3a699c..8084785 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -147,8 +147,8 @@ typedef struct catclist
 	uint32		hash_value;		/* hash value for lookup keys */
 	HeapTupleData tuple;		/* header for tuple holding keys */
 	int			n_members;		/* number of member tuples */
-	CatCTup    *members[FLEXIBLE_ARRAY_MEMBER]; /* members */
-} CatCList;
+	CatCTup    *members[1];		/* members --- VARIABLE LENGTH ARRAY */
+} CatCList;						/* VARIABLE LENGTH STRUCT */
 
 
 typedef struct catcacheheader
diff --git a/src/include/utils/datetime.h b/src/include/utils/datetime.h
index 6b8ab3c..8912ba5 100644
--- a/src/include/utils/datetime.h
+++ b/src/include/utils/datetime.h
@@ -219,7 +219,7 @@ typedef struct TimeZoneAbbrevTable
 {
 	Size		tblsize;		/* size in bytes of TimeZoneAbbrevTable */
 	int			numabbrevs;		/* number of entries in abbrevs[] array */
-	datetkn		abbrevs[FLEXIBLE_ARRAY_MEMBER];
+	datetkn		abbrevs[1];		/* VARIABLE LENGTH ARRAY */
 	/* DynamicZoneAbbrev(s) may follow the abbrevs[] array */
 } TimeZoneAbbrevTable;
 
@@ -227,7 +227,7 @@ typedef struct TimeZoneAbbrevTable
 typedef struct DynamicZoneAbbrev
 {
 	pg_tz	   *tz;				/* NULL if not yet looked up */
-	char		zone[FLEXIBLE_ARRAY_MEMBER];	/* NUL-terminated zone name */
+	char		zone[1];		/* zone name (var length, NUL-terminated) */
 } DynamicZoneAbbrev;
 
 
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index 8da6c6c..0b6d3c3 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -80,7 +80,7 @@ typedef struct
 	int32		npts;
 	int32		closed;			/* is this a closed polygon? */
 	int32		dummy;			/* padding to make it double align */
-	Point		p[FLEXIBLE_ARRAY_MEMBER];
+	Point		p[1];			/* variable length array of POINTs */
 } PATH;
 
 
@@ -115,7 +115,7 @@ typedef struct
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		npts;
 	BOX			boundbox;
-	Point		p[FLEXIBLE_ARRAY_MEMBER];
+	Point		p[1];			/* variable length array of POINTs */
 } POLYGON;
 
 /*---------------------------------------------------------------------
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index d3100d1..717f46b 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -201,22 +201,20 @@ typedef enum
 #define GUC_CUSTOM_PLACEHOLDER	0x0080	/* placeholder for custom variable */
 #define GUC_SUPERUSER_ONLY		0x0100	/* show only to superusers */
 #define GUC_IS_NAME				0x0200	/* limit string to NAMEDATALEN-1 */
-#define GUC_NOT_WHILE_SEC_REST	0x0400	/* can't set if security restricted */
-#define GUC_DISALLOW_IN_AUTO_FILE 0x0800 /* can't set in PG_AUTOCONF_FILENAME */
 
-#define GUC_UNIT_KB				0x1000	/* value is in kilobytes */
-#define GUC_UNIT_BLOCKS			0x2000	/* value is in blocks */
-#define GUC_UNIT_XBLOCKS		0x3000	/* value is in xlog blocks */
-#define GUC_UNIT_XSEGS			0x4000	/* value is in xlog segments */
-#define GUC_UNIT_MEMORY			0xF000	/* mask for KB, BLOCKS, XBLOCKS */
+#define GUC_UNIT_KB				0x0400	/* value is in kilobytes */
+#define GUC_UNIT_BLOCKS			0x0800	/* value is in blocks */
+#define GUC_UNIT_XBLOCKS		0x0C00	/* value is in xlog blocks */
+#define GUC_UNIT_MEMORY			0x0C00	/* mask for KB, BLOCKS, XBLOCKS */
 
-#define GUC_UNIT_MS			   0x10000	/* value is in milliseconds */
-#define GUC_UNIT_S			   0x20000	/* value is in seconds */
-#define GUC_UNIT_MIN		   0x30000	/* value is in minutes */
-#define GUC_UNIT_TIME		   0xF0000	/* mask for MS, S, MIN */
-
-#define GUC_UNIT				(GUC_UNIT_MEMORY | GUC_UNIT_TIME)
+#define GUC_UNIT_MS				0x1000	/* value is in milliseconds */
+#define GUC_UNIT_S				0x2000	/* value is in seconds */
+#define GUC_UNIT_MIN			0x4000	/* value is in minutes */
+#define GUC_UNIT_TIME			0x7000	/* mask for MS, S, MIN */
 
+#define GUC_NOT_WHILE_SEC_REST	0x8000	/* can't set if security restricted */
+#define GUC_DISALLOW_IN_AUTO_FILE	0x00010000	/* can't set in
+												 * PG_AUTOCONF_FILENAME */
 
 /* GUC vars that are actually declared in guc.c, rather than elsewhere */
 extern bool log_duration;
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 9d1770e..887eb9b 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -194,7 +194,7 @@ typedef struct JsonbContainer
 {
 	uint32		header;			/* number of elements or key/value pairs, and
 								 * flags */
-	JEntry		children[FLEXIBLE_ARRAY_MEMBER];
+	JEntry		children[1];	/* variable length */
 
 	/* the data for each child node follows. */
 } JsonbContainer;
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 9e84d01..85aba7a 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -84,9 +84,6 @@ extern PGDLLIMPORT MemoryContext CurTransactionContext;
 /* This is a transient link to the active portal's memory context: */
 extern PGDLLIMPORT MemoryContext PortalContext;
 
-/* Backwards compatibility macro */
-#define MemoryContextResetAndDeleteChildren(ctx) MemoryContextReset(ctx)
-
 
 /*
  * Memory-context-type-independent functions in mcxt.c
@@ -94,9 +91,9 @@ extern PGDLLIMPORT MemoryContext PortalContext;
 extern void MemoryContextInit(void);
 extern void MemoryContextReset(MemoryContext context);
 extern void MemoryContextDelete(MemoryContext context);
-extern void MemoryContextResetOnly(MemoryContext context);
 extern void MemoryContextResetChildren(MemoryContext context);
 extern void MemoryContextDeleteChildren(MemoryContext context);
+extern void MemoryContextResetAndDeleteChildren(MemoryContext context);
 extern void MemoryContextSetParent(MemoryContext context,
 					   MemoryContext new_parent);
 extern Size GetMemoryChunkSpace(void *pointer);
diff --git a/src/include/utils/palloc.h b/src/include/utils/palloc.h
index 39b318d..f586fd5 100644
--- a/src/include/utils/palloc.h
+++ b/src/include/utils/palloc.h
@@ -36,22 +36,6 @@
 typedef struct MemoryContextData *MemoryContext;
 
 /*
- * A memory context can have callback functions registered on it.  Any such
- * function will be called once just before the context is next reset or
- * deleted.  The MemoryContextCallback struct describing such a callback
- * typically would be allocated within the context itself, thereby avoiding
- * any need to manage it explicitly (the reset/delete action will free it).
- */
-typedef void (*MemoryContextCallbackFunction) (void *arg);
-
-typedef struct MemoryContextCallback
-{
-	MemoryContextCallbackFunction func; /* function to call */
-	void	   *arg;			/* argument to pass it */
-	struct MemoryContextCallback *next; /* next in list of callbacks */
-} MemoryContextCallback;
-
-/*
  * CurrentMemoryContext is the default allocation context for palloc().
  * Avoid accessing it directly!  Instead, use MemoryContextSwitchTo()
  * to change the setting.
@@ -123,10 +107,6 @@ MemoryContextSwitchTo(MemoryContext context)
 #endif   /* PG_USE_INLINE || MCXT_INCLUDE_DEFINITIONS */
 #endif   /* FRONTEND */
 
-/* Registration of memory context reset/delete callbacks */
-extern void MemoryContextRegisterResetCallback(MemoryContext context,
-								   MemoryContextCallback *cb);
-
 /*
  * These are like standard strdup() except the copied string is
  * allocated in a context, not with malloc().
diff --git a/src/include/utils/relmapper.h b/src/include/utils/relmapper.h
index 73b4905..420310d 100644
--- a/src/include/utils/relmapper.h
+++ b/src/include/utils/relmapper.h
@@ -29,7 +29,7 @@ typedef struct xl_relmap_update
 	Oid			dbid;			/* database ID, or 0 for shared map */
 	Oid			tsid;			/* database's tablespace, or pg_global */
 	int32		nbytes;			/* size of relmap data */
-	char		data[FLEXIBLE_ARRAY_MEMBER];
+	char		data[1];		/* VARIABLE LENGTH ARRAY */
 } xl_relmap_update;
 
 #define MinSizeOfRelmapUpdate offsetof(xl_relmap_update, data)
diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h
index 530fef1..70118f5 100644
--- a/src/include/utils/timestamp.h
+++ b/src/include/utils/timestamp.h
@@ -161,9 +161,7 @@ extern Datum timestamp_trunc(PG_FUNCTION_ARGS);
 extern Datum interval_trunc(PG_FUNCTION_ARGS);
 extern Datum timestamp_part(PG_FUNCTION_ARGS);
 extern Datum interval_part(PG_FUNCTION_ARGS);
-extern Datum timestamp_zone_transform(PG_FUNCTION_ARGS);
 extern Datum timestamp_zone(PG_FUNCTION_ARGS);
-extern Datum timestamp_izone_transform(PG_FUNCTION_ARGS);
 extern Datum timestamp_izone(PG_FUNCTION_ARGS);
 extern Datum timestamp_timestamptz(PG_FUNCTION_ARGS);
 
diff --git a/src/include/utils/typcache.h b/src/include/utils/typcache.h
index 1a9befb..b544180 100644
--- a/src/include/utils/typcache.h
+++ b/src/include/utils/typcache.h
@@ -20,9 +20,6 @@
 #include "fmgr.h"
 
 
-/* DomainConstraintCache is an opaque struct known only within typcache.c */
-typedef struct DomainConstraintCache DomainConstraintCache;
-
 /* TypeCacheEnumData is an opaque struct known only within typcache.c */
 struct TypeCacheEnumData;
 
@@ -87,12 +84,6 @@ typedef struct TypeCacheEntry
 	FmgrInfo	rng_canonical_finfo;	/* canonicalization function, if any */
 	FmgrInfo	rng_subdiff_finfo;		/* difference function, if any */
 
-	/*
-	 * Domain constraint data if it's a domain type.  NULL if not domain, or
-	 * if domain has no constraints, or if information hasn't been requested.
-	 */
-	DomainConstraintCache *domainData;
-
 	/* Private data, for internal use of typcache.c only */
 	int			flags;			/* flags about what we've computed */
 
@@ -101,9 +92,6 @@ typedef struct TypeCacheEntry
 	 * information hasn't been requested.
 	 */
 	struct TypeCacheEnumData *enumData;
-
-	/* We also maintain a list of all known domain-type cache entries */
-	struct TypeCacheEntry *nextDomain;
 } TypeCacheEntry;
 
 /* Bit flags to indicate which fields a given caller needs to have set */
@@ -119,34 +107,9 @@ typedef struct TypeCacheEntry
 #define TYPECACHE_BTREE_OPFAMILY	0x0200
 #define TYPECACHE_HASH_OPFAMILY		0x0400
 #define TYPECACHE_RANGE_INFO		0x0800
-#define TYPECACHE_DOMAIN_INFO		0x1000
-
-/*
- * Callers wishing to maintain a long-lived reference to a domain's constraint
- * set must store it in one of these.  Use InitDomainConstraintRef() and
- * UpdateDomainConstraintRef() to manage it.  Note: DomainConstraintState is
- * considered an executable expression type, so it's defined in execnodes.h.
- */
-typedef struct DomainConstraintRef
-{
-	List	   *constraints;	/* list of DomainConstraintState nodes */
-
-	/* Management data --- treat these fields as private to typcache.c */
-	TypeCacheEntry *tcache;		/* owning typcache entry */
-	DomainConstraintCache *dcc; /* current constraints, or NULL if none */
-	MemoryContextCallback callback;		/* used to release refcount when done */
-} DomainConstraintRef;
-
 
 extern TypeCacheEntry *lookup_type_cache(Oid type_id, int flags);
 
-extern void InitDomainConstraintRef(Oid type_id, DomainConstraintRef *ref,
-						MemoryContext refctx);
-
-extern void UpdateDomainConstraintRef(DomainConstraintRef *ref);
-
-extern bool DomainHasConstraints(Oid type_id);
-
 extern TupleDesc lookup_rowtype_tupdesc(Oid type_id, int32 typmod);
 
 extern TupleDesc lookup_rowtype_tupdesc_noerror(Oid type_id, int32 typmod,
diff --git a/src/include/utils/varbit.h b/src/include/utils/varbit.h
index da55e7d..8afc3b1 100644
--- a/src/include/utils/varbit.h
+++ b/src/include/utils/varbit.h
@@ -26,8 +26,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		bit_len;		/* number of valid bits */
-	bits8		bit_dat[FLEXIBLE_ARRAY_MEMBER]; /* bit string, most sig. byte
-												 * first */
+	bits8		bit_dat[1];		/* bit string, most sig. byte first */
 } VarBit;
 
 /*
diff --git a/src/interfaces/ecpg/ecpglib/data.c b/src/interfaces/ecpg/ecpglib/data.c
index 2dcb915..2d0c118 100644
--- a/src/interfaces/ecpg/ecpglib/data.c
+++ b/src/interfaces/ecpg/ecpglib/data.c
@@ -575,7 +575,7 @@ ecpg_get_data(const PGresult *results, int act_tuple, int act_field, int lineno,
 					if (nres == NULL)
 					{
 						ecpg_log("ecpg_get_data on line %d: RESULT %s; errno %d\n",
-								 lineno, pval, errno);
+								 lineno, pval ? pval : "", errno);
 
 						if (INFORMIX_MODE(compat))
 						{
@@ -634,7 +634,7 @@ ecpg_get_data(const PGresult *results, int act_tuple, int act_field, int lineno,
 					if (ires == NULL)
 					{
 						ecpg_log("ecpg_get_data on line %d: RESULT %s; errno %d\n",
-								 lineno, pval, errno);
+								 lineno, pval ? pval : "", errno);
 
 						if (INFORMIX_MODE(compat))
 						{
@@ -688,7 +688,7 @@ ecpg_get_data(const PGresult *results, int act_tuple, int act_field, int lineno,
 					if (errno != 0)
 					{
 						ecpg_log("ecpg_get_data on line %d: RESULT %s; errno %d\n",
-								 lineno, pval, errno);
+								 lineno, pval ? pval : "", errno);
 
 						if (INFORMIX_MODE(compat))
 						{
@@ -736,7 +736,7 @@ ecpg_get_data(const PGresult *results, int act_tuple, int act_field, int lineno,
 					if (errno != 0)
 					{
 						ecpg_log("ecpg_get_data on line %d: RESULT %s; errno %d\n",
-								 lineno, pval, errno);
+								 lineno, pval ? pval : "", errno);
 
 						if (INFORMIX_MODE(compat))
 						{
diff --git a/src/interfaces/ecpg/ecpglib/extern.h b/src/interfaces/ecpg/ecpglib/extern.h
index ca3bf05..2b670e0 100644
--- a/src/interfaces/ecpg/ecpglib/extern.h
+++ b/src/interfaces/ecpg/ecpglib/extern.h
@@ -33,7 +33,7 @@ enum ARRAY_TYPE
 struct ECPGgeneric_varchar
 {
 	int			len;
-	char		arr[FLEXIBLE_ARRAY_MEMBER];
+	char		arr[1];
 };
 
 /*
diff --git a/src/interfaces/ecpg/preproc/parse.pl b/src/interfaces/ecpg/preproc/parse.pl
index 36dce80..f998310 100644
--- a/src/interfaces/ecpg/preproc/parse.pl
+++ b/src/interfaces/ecpg/preproc/parse.pl
@@ -42,8 +42,10 @@ my %replace_token = (
 
 # or in the block
 my %replace_string = (
-	'NULLS_LA'        => 'nulls',
-	'WITH_LA'         => 'with',
+	'WITH_TIME'       => 'with time',
+	'WITH_ORDINALITY' => 'with ordinality',
+	'NULLS_FIRST'     => 'nulls first',
+	'NULLS_LAST'      => 'nulls last',
 	'TYPECAST'        => '::',
 	'DOT_DOT'         => '..',
 	'COLON_EQUALS'    => ':=',);
diff --git a/src/interfaces/ecpg/preproc/parser.c b/src/interfaces/ecpg/preproc/parser.c
index 099a213..f118826 100644
--- a/src/interfaces/ecpg/preproc/parser.c
+++ b/src/interfaces/ecpg/preproc/parser.c
@@ -3,8 +3,11 @@
  * parser.c
  *		Main entry point/driver for PostgreSQL grammar
  *
- * This should match src/backend/parser/parser.c, except that we do not
- * need to bother with re-entrant interfaces.
+ * Note that the grammar is not allowed to perform any table access
+ * (since we need to be able to do basic parsing even while inside an
+ * aborted transaction).  Therefore, the data structures returned by
+ * the grammar are "raw" parsetrees that still need to be analyzed by
+ * analyze.c and related files.
  *
  *
  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
@@ -26,21 +29,18 @@ static bool have_lookahead;		/* is lookahead info valid? */
 static int	lookahead_token;	/* one-token lookahead */
 static YYSTYPE lookahead_yylval;	/* yylval for lookahead token */
 static YYLTYPE lookahead_yylloc;	/* yylloc for lookahead token */
-static char *lookahead_yytext;	/* start current token */
-static char *lookahead_end;		/* end of current token */
-static char lookahead_hold_char;	/* to be put back at *lookahead_end */
 
 
 /*
  * Intermediate filter between parser and base lexer (base_yylex in scan.l).
  *
- * This filter is needed because in some cases the standard SQL grammar
+ * The filter is needed because in some cases the standard SQL grammar
  * requires more than one token lookahead.  We reduce these cases to one-token
- * lookahead by replacing tokens here, in order to keep the grammar LALR(1).
+ * lookahead by combining tokens here, in order to keep the grammar LALR(1).
  *
  * Using a filter is simpler than trying to recognize multiword tokens
  * directly in scan.l, because we'd have to allow for comments between the
- * words.  Furthermore it's not clear how to do that without re-introducing
+ * words.  Furthermore it's not clear how to do it without re-introducing
  * scanner backtrack, which would cost more performance than this filter
  * layer does.
  */
@@ -49,10 +49,8 @@ filtered_base_yylex(void)
 {
 	int			cur_token;
 	int			next_token;
-	int			cur_token_length;
 	YYSTYPE		cur_yylval;
 	YYLTYPE		cur_yylloc;
-	char	   *cur_yytext;
 
 	/* Get next token --- we might already have it */
 	if (have_lookahead)
@@ -60,86 +58,74 @@ filtered_base_yylex(void)
 		cur_token = lookahead_token;
 		base_yylval = lookahead_yylval;
 		base_yylloc = lookahead_yylloc;
-		yytext = lookahead_yytext;
-		*lookahead_end = lookahead_hold_char;
 		have_lookahead = false;
 	}
 	else
 		cur_token = base_yylex();
 
-	/*
-	 * If this token isn't one that requires lookahead, just return it.  If it
-	 * does, determine the token length.  (We could get that via strlen(), but
-	 * since we have such a small set of possibilities, hardwiring seems
-	 * feasible and more efficient.)
-	 */
+	/* Do we need to look ahead for a possible multiword token? */
 	switch (cur_token)
 	{
 		case NULLS_P:
-			cur_token_length = 5;
-			break;
-		case WITH:
-			cur_token_length = 4;
-			break;
-		default:
-			return cur_token;
-	}
-
-	/*
-	 * Identify end+1 of current token.  base_yylex() has temporarily stored a
-	 * '\0' here, and will undo that when we call it again.  We need to redo
-	 * it to fully revert the lookahead call for error reporting purposes.
-	 */
-	lookahead_end = yytext + cur_token_length;
-	Assert(*lookahead_end == '\0');
-
-	/* Save and restore lexer output variables around the call */
-	cur_yylval = base_yylval;
-	cur_yylloc = base_yylloc;
-	cur_yytext = yytext;
-
-	/* Get next token, saving outputs into lookahead variables */
-	next_token = base_yylex();
-
-	lookahead_token = next_token;
-	lookahead_yylval = base_yylval;
-	lookahead_yylloc = base_yylloc;
-	lookahead_yytext = yytext;
-
-	base_yylval = cur_yylval;
-	base_yylloc = cur_yylloc;
-	yytext = cur_yytext;
-
-	/* Now revert the un-truncation of the current token */
-	lookahead_hold_char = *lookahead_end;
-	*lookahead_end = '\0';
-
-	have_lookahead = true;
 
-	/* Replace cur_token if needed, based on lookahead */
-	switch (cur_token)
-	{
-		case NULLS_P:
-			/* Replace NULLS_P by NULLS_LA if it's followed by FIRST or LAST */
+			/*
+			 * NULLS FIRST and NULLS LAST must be reduced to one token
+			 */
+			cur_yylval = base_yylval;
+			cur_yylloc = base_yylloc;
+			next_token = base_yylex();
 			switch (next_token)
 			{
 				case FIRST_P:
+					cur_token = NULLS_FIRST;
+					break;
 				case LAST_P:
-					cur_token = NULLS_LA;
+					cur_token = NULLS_LAST;
+					break;
+				default:
+					/* save the lookahead token for next time */
+					lookahead_token = next_token;
+					lookahead_yylval = base_yylval;
+					lookahead_yylloc = base_yylloc;
+					have_lookahead = true;
+					/* and back up the output info to cur_token */
+					base_yylval = cur_yylval;
+					base_yylloc = cur_yylloc;
 					break;
 			}
 			break;
 
 		case WITH:
-			/* Replace WITH by WITH_LA if it's followed by TIME or ORDINALITY */
+
+			/*
+			 * WITH TIME must be reduced to one token
+			 */
+			cur_yylval = base_yylval;
+			cur_yylloc = base_yylloc;
+			next_token = base_yylex();
 			switch (next_token)
 			{
 				case TIME:
+					cur_token = WITH_TIME;
+					break;
 				case ORDINALITY:
-					cur_token = WITH_LA;
+					cur_token = WITH_ORDINALITY;
+					break;
+				default:
+					/* save the lookahead token for next time */
+					lookahead_token = next_token;
+					lookahead_yylval = base_yylval;
+					lookahead_yylloc = base_yylloc;
+					have_lookahead = true;
+					/* and back up the output info to cur_token */
+					base_yylval = cur_yylval;
+					base_yylloc = cur_yylloc;
 					break;
 			}
 			break;
+
+		default:
+			break;
 	}
 
 	return cur_token;
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index e2a06b3..25961b1 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -4934,34 +4934,41 @@ conninfo_uri_parse_params(char *params,
 				{
 					printfPQExpBuffer(errorMessage,
 									  libpq_gettext("extra key/value separator \"=\" in URI query parameter: \"%s\"\n"),
-									  keyword);
+									  params);
 					return false;
 				}
 				/* Cut off keyword, advance to value */
-				*p++ = '\0';
-				value = p;
+				*p = '\0';
+				value = ++p;
 			}
 			else if (*p == '&' || *p == '\0')
 			{
-				/*
-				 * If not at the end, cut off value and advance; leave p
-				 * pointing to start of the next parameter, if any.
-				 */
-				if (*p != '\0')
-					*p++ = '\0';
+				char		prevchar;
+
+				/* Cut off value, remember old value */
+				prevchar = *p;
+				*p = '\0';
+
 				/* Was there '=' at all? */
 				if (value == NULL)
 				{
 					printfPQExpBuffer(errorMessage,
 									  libpq_gettext("missing key/value separator \"=\" in URI query parameter: \"%s\"\n"),
-									  keyword);
+									  params);
 					return false;
 				}
-				/* Got keyword and value, go process them. */
+
+				/*
+				 * If not at the end, advance; now pointing to start of the
+				 * next parameter, if any.
+				 */
+				if (prevchar != '\0')
+					++p;
 				break;
 			}
-			else
-				++p;			/* Advance over all other bytes. */
+
+			/* Advance, NUL is checked in the 'if' above */
+			++p;
 		}
 
 		keyword = conninfo_uri_decode(keyword, errorMessage);
@@ -5001,12 +5008,24 @@ conninfo_uri_parse_params(char *params,
 		if (!conninfo_storeval(connOptions, keyword, value,
 							   errorMessage, true, false))
 		{
-			/* Insert generic message if conninfo_storeval didn't give one. */
-			if (errorMessage->len == 0)
-				printfPQExpBuffer(errorMessage,
-								  libpq_gettext("invalid URI query parameter: \"%s\"\n"),
-								  keyword);
-			/* And fail. */
+			/*
+			 * Check if there was a hard error when decoding or storing the
+			 * option.
+			 */
+			if (errorMessage->len != 0)
+			{
+				if (malloced)
+				{
+					free(keyword);
+					free(value);
+				}
+				return false;
+			}
+
+			printfPQExpBuffer(errorMessage,
+							  libpq_gettext(
+									"invalid URI query parameter: \"%s\"\n"),
+							  keyword);
 			if (malloced)
 			{
 				free(keyword);
@@ -5014,14 +5033,13 @@ conninfo_uri_parse_params(char *params,
 			}
 			return false;
 		}
-
 		if (malloced)
 		{
 			free(keyword);
 			free(value);
 		}
 
-		/* Proceed to next key=value pair, if any */
+		/* Proceed to next key=value pair */
 		params = p;
 	}
 
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 3d46e15..6912028 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -892,8 +892,7 @@ pqSaveMessageField(PGresult *res, char code, const char *value)
 
 	pfield = (PGMessageField *)
 		pqResultAlloc(res,
-					  offsetof(PGMessageField, contents) +
-					  strlen(value) + 1,
+					  sizeof(PGMessageField) + strlen(value),
 					  TRUE);
 	if (!pfield)
 		return;					/* out of memory? */
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 25aecc2..945e283 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -905,6 +905,16 @@ pqSendSome(PGconn *conn, int len)
 			/*
 			 * We didn't send it all, wait till we can send more.
 			 *
+			 * If the connection is in non-blocking mode we don't wait, but
+			 * return 1 to indicate that data is still pending.
+			 */
+			if (pqIsnonblocking(conn))
+			{
+				result = 1;
+				break;
+			}
+
+			/*
 			 * There are scenarios in which we can't send data because the
 			 * communications channel is full, but we cannot expect the server
 			 * to clear the channel eventually because it's blocked trying to
@@ -915,29 +925,12 @@ pqSendSome(PGconn *conn, int len)
 			 * again.  Furthermore, it is possible that such incoming data
 			 * might not arrive until after we've gone to sleep.  Therefore,
 			 * we wait for either read ready or write ready.
-			 *
-			 * In non-blocking mode, we don't wait here directly, but return
-			 * 1 to indicate that data is still pending.  The caller should
-			 * wait for both read and write ready conditions, and call
-			 * PQconsumeInput() on read ready, but just in case it doesn't, we
-			 * call pqReadData() ourselves before returning.  That's not
-			 * enough if the data has not arrived yet, but it's the best we
-			 * can do, and works pretty well in practice.  (The documentation
-			 * used to say that you only need to wait for write-ready, so
-			 * there are still plenty of applications like that out there.)
 			 */
 			if (pqReadData(conn) < 0)
 			{
 				result = -1;	/* error message already set up */
 				break;
 			}
-
-			if (pqIsnonblocking(conn))
-			{
-				result = 1;
-				break;
-			}
-
 			if (pqWait(TRUE, TRUE, conn))
 			{
 				result = -1;
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 1b9f3a4..a32af34 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1569,9 +1569,12 @@ PQsslAttribute(PGconn *conn, const char *attribute_name)
 }
 
 /*
- * Private substitute BIO: this does the sending and receiving using
- * pqsecure_raw_write() and pqsecure_raw_read() instead, to allow those
- * functions to disable SIGPIPE and give better error messages on I/O errors.
+ * Private substitute BIO: this does the sending and receiving using send() and
+ * recv() instead. This is so that we can enable and disable interrupts
+ * just while calling recv(). We cannot have interrupts occurring while
+ * the bulk of openssl runs, because it uses malloc() and possibly other
+ * non-reentrant libc facilities. We also need to call send() and recv()
+ * directly so it gets passed through the socket/signals layer on Win32.
  *
  * These functions are closely modelled on the standard socket BIO in OpenSSL;
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 64579d2..008fd67 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -145,7 +145,7 @@ typedef struct pgMessageField
 {
 	struct pgMessageField *next;	/* list link */
 	char		code;			/* field code */
-	char		contents[FLEXIBLE_ARRAY_MEMBER];		/* value, nul-terminated */
+	char		contents[1];	/* field value (VARIABLE LENGTH) */
 } PGMessageField;
 
 /* Fields needed for notice handling */
@@ -637,7 +637,7 @@ extern void pq_reset_sigpipe(sigset_t *osigset, bool sigpipe_pending,
  * The SSL implementatation provides these functions (fe-secure-openssl.c)
  */
 extern void pgtls_init_library(bool do_ssl, int do_crypto);
-extern int	pgtls_init(PGconn *conn);
+extern int pgtls_init(PGconn *conn);
 extern PostgresPollingStatusType pgtls_open_client(PGconn *conn);
 extern void pgtls_close(PGconn *conn);
 extern ssize_t pgtls_read(PGconn *conn, void *ptr, size_t len);
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index e3dda5d..492c1ef 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -1218,7 +1218,7 @@ plperl_array_to_datum(SV *src, Oid typid, int32 typmod)
 				 errmsg("cannot convert Perl array to non-array type %s",
 						format_type_be(typid))));
 
-	astate = initArrayResult(elemtypid, CurrentMemoryContext, true);
+	astate = initArrayResult(elemtypid, CurrentMemoryContext);
 
 	_sv_to_datum_finfo(elemtypid, &finfo, &typioparam);
 
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 41a68f8..ae5421f 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -158,8 +158,7 @@ static bool exec_eval_simple_expr(PLpgSQL_execstate *estate,
 					  PLpgSQL_expr *expr,
 					  Datum *result,
 					  bool *isNull,
-					  Oid *rettype,
-					  int32 *rettypmod);
+					  Oid *rettype);
 
 static void exec_assign_expr(PLpgSQL_execstate *estate,
 				 PLpgSQL_datum *target,
@@ -169,8 +168,7 @@ static void exec_assign_c_string(PLpgSQL_execstate *estate,
 					 const char *str);
 static void exec_assign_value(PLpgSQL_execstate *estate,
 				  PLpgSQL_datum *target,
-				  Datum value, bool isNull,
-				  Oid valtype, int32 valtypmod);
+				  Datum value, Oid valtype, bool *isNull);
 static void exec_eval_datum(PLpgSQL_execstate *estate,
 				PLpgSQL_datum *datum,
 				Oid *typeid,
@@ -186,8 +184,7 @@ static bool exec_eval_boolean(PLpgSQL_execstate *estate,
 static Datum exec_eval_expr(PLpgSQL_execstate *estate,
 			   PLpgSQL_expr *expr,
 			   bool *isNull,
-			   Oid *rettype,
-			   int32 *rettypmod);
+			   Oid *rettype);
 static int exec_run_select(PLpgSQL_execstate *estate,
 				PLpgSQL_expr *expr, long maxtuples, Portal *portalP);
 static int exec_for_query(PLpgSQL_execstate *estate, PLpgSQL_stmt_forq *stmt,
@@ -211,15 +208,16 @@ static void exec_move_row_from_datum(PLpgSQL_execstate *estate,
 static char *convert_value_to_string(PLpgSQL_execstate *estate,
 						Datum value, Oid valtype);
 static Datum exec_cast_value(PLpgSQL_execstate *estate,
-				Datum value, bool isnull,
-				Oid valtype, int32 valtypmod,
-				Oid reqtype, int32 reqtypmod,
+				Datum value, Oid valtype,
+				Oid reqtype,
 				FmgrInfo *reqinput,
-				Oid reqtypioparam);
+				Oid reqtypioparam,
+				int32 reqtypmod,
+				bool isnull);
 static Datum exec_simple_cast_value(PLpgSQL_execstate *estate,
-					   Datum value, bool isnull,
-					   Oid valtype, int32 valtypmod,
-					   Oid reqtype, int32 reqtypmod);
+					   Datum value, Oid valtype,
+					   Oid reqtype, int32 reqtypmod,
+					   bool isnull);
 static void exec_init_tuple_store(PLpgSQL_execstate *estate);
 static void exec_set_found(PLpgSQL_execstate *estate, bool state);
 static void plpgsql_create_econtext(PLpgSQL_execstate *estate);
@@ -454,13 +452,12 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
 			/* Cast value to proper type */
 			estate.retval = exec_cast_value(&estate,
 											estate.retval,
-											fcinfo->isnull,
 											estate.rettype,
-											-1,
 											func->fn_rettype,
-											-1,
 											&(func->fn_retinput),
-											func->fn_rettypioparam);
+											func->fn_rettypioparam,
+											-1,
+											fcinfo->isnull);
 
 			/*
 			 * If the function's return type isn't by value, copy the value
@@ -1080,13 +1077,15 @@ exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
 						 * exec_assign_value.)
 						 */
 						if (!var->datatype->typinput.fn_strict)
+						{
+							bool		valIsNull = true;
+
 							exec_assign_value(estate,
 											  (PLpgSQL_datum *) var,
 											  (Datum) 0,
-											  true,
 											  UNKNOWNOID,
-											  -1);
-
+											  &valIsNull);
+						}
 						if (var->notnull)
 							ereport(ERROR,
 									(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
@@ -1244,9 +1243,8 @@ exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
 				{
 					/*
 					 * Initialize the magic SQLSTATE and SQLERRM variables for
-					 * the exception block; this also frees values from any
-					 * prior use of the same exception. We needn't do this
-					 * until we have found a matching exception.
+					 * the exception block. We needn't do this until we have
+					 * found a matching exception.
 					 */
 					PLpgSQL_var *state_var;
 					PLpgSQL_var *errm_var;
@@ -1270,6 +1268,13 @@ exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
 
 					rc = exec_stmts(estate, exception->action);
 
+					free_var(state_var);
+					state_var->value = (Datum) 0;
+					state_var->isnull = true;
+					free_var(errm_var);
+					errm_var->value = (Datum) 0;
+					errm_var->isnull = true;
+
 					break;
 				}
 			}
@@ -1557,19 +1562,20 @@ exec_stmt_getdiag(PLpgSQL_execstate *estate, PLpgSQL_stmt_getdiag *stmt)
 	{
 		PLpgSQL_diag_item *diag_item = (PLpgSQL_diag_item *) lfirst(lc);
 		PLpgSQL_datum *var = estate->datums[diag_item->target];
+		bool		isnull = false;
 
 		switch (diag_item->kind)
 		{
 			case PLPGSQL_GETDIAG_ROW_COUNT:
 				exec_assign_value(estate, var,
 								  UInt32GetDatum(estate->eval_processed),
-								  false, INT4OID, -1);
+								  INT4OID, &isnull);
 				break;
 
 			case PLPGSQL_GETDIAG_RESULT_OID:
 				exec_assign_value(estate, var,
 								  ObjectIdGetDatum(estate->eval_lastoid),
-								  false, OIDOID, -1);
+								  OIDOID, &isnull);
 				break;
 
 			case PLPGSQL_GETDIAG_ERROR_CONTEXT:
@@ -1688,11 +1694,9 @@ exec_stmt_case(PLpgSQL_execstate *estate, PLpgSQL_stmt_case *stmt)
 	{
 		/* simple case */
 		Datum		t_val;
-		Oid			t_typoid;
-		int32		t_typmod;
+		Oid			t_oid;
 
-		t_val = exec_eval_expr(estate, stmt->t_expr,
-							   &isnull, &t_typoid, &t_typmod);
+		t_val = exec_eval_expr(estate, stmt->t_expr, &isnull, &t_oid);
 
 		t_var = (PLpgSQL_var *) estate->datums[stmt->t_varno];
 
@@ -1701,19 +1705,17 @@ exec_stmt_case(PLpgSQL_execstate *estate, PLpgSQL_stmt_case *stmt)
 		 * what we're modifying here is an execution copy of the datum, so
 		 * this doesn't affect the originally stored function parse tree.
 		 */
-		if (t_var->datatype->typoid != t_typoid ||
-			t_var->datatype->atttypmod != t_typmod)
-			t_var->datatype = plpgsql_build_datatype(t_typoid,
-													 t_typmod,
+		if (t_var->datatype->typoid != t_oid)
+			t_var->datatype = plpgsql_build_datatype(t_oid,
+													 -1,
 										   estate->func->fn_input_collation);
 
 		/* now we can assign to the variable */
 		exec_assign_value(estate,
 						  (PLpgSQL_datum *) t_var,
 						  t_val,
-						  isnull,
-						  t_typoid,
-						  t_typmod);
+						  t_oid,
+						  &isnull);
 
 		exec_eval_cleanup(estate);
 	}
@@ -1889,7 +1891,6 @@ exec_stmt_fori(PLpgSQL_execstate *estate, PLpgSQL_stmt_fori *stmt)
 	Datum		value;
 	bool		isnull;
 	Oid			valtype;
-	int32		valtypmod;
 	int32		loop_value;
 	int32		end_value;
 	int32		step_value;
@@ -1901,14 +1902,11 @@ exec_stmt_fori(PLpgSQL_execstate *estate, PLpgSQL_stmt_fori *stmt)
 	/*
 	 * Get the value of the lower bound
 	 */
-	value = exec_eval_expr(estate, stmt->lower,
-						   &isnull, &valtype, &valtypmod);
-	value = exec_cast_value(estate, value, isnull,
-							valtype, valtypmod,
-							var->datatype->typoid,
-							var->datatype->atttypmod,
+	value = exec_eval_expr(estate, stmt->lower, &isnull, &valtype);
+	value = exec_cast_value(estate, value, valtype, var->datatype->typoid,
 							&(var->datatype->typinput),
-							var->datatype->typioparam);
+							var->datatype->typioparam,
+							var->datatype->atttypmod, isnull);
 	if (isnull)
 		ereport(ERROR,
 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
@@ -1919,14 +1917,11 @@ exec_stmt_fori(PLpgSQL_execstate *estate, PLpgSQL_stmt_fori *stmt)
 	/*
 	 * Get the value of the upper bound
 	 */
-	value = exec_eval_expr(estate, stmt->upper,
-						   &isnull, &valtype, &valtypmod);
-	value = exec_cast_value(estate, value, isnull,
-							valtype, valtypmod,
-							var->datatype->typoid,
-							var->datatype->atttypmod,
+	value = exec_eval_expr(estate, stmt->upper, &isnull, &valtype);
+	value = exec_cast_value(estate, value, valtype, var->datatype->typoid,
 							&(var->datatype->typinput),
-							var->datatype->typioparam);
+							var->datatype->typioparam,
+							var->datatype->atttypmod, isnull);
 	if (isnull)
 		ereport(ERROR,
 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
@@ -1939,14 +1934,11 @@ exec_stmt_fori(PLpgSQL_execstate *estate, PLpgSQL_stmt_fori *stmt)
 	 */
 	if (stmt->step)
 	{
-		value = exec_eval_expr(estate, stmt->step,
-							   &isnull, &valtype, &valtypmod);
-		value = exec_cast_value(estate, value, isnull,
-								valtype, valtypmod,
-								var->datatype->typoid,
-								var->datatype->atttypmod,
+		value = exec_eval_expr(estate, stmt->step, &isnull, &valtype);
+		value = exec_cast_value(estate, value, valtype, var->datatype->typoid,
 								&(var->datatype->typinput),
-								var->datatype->typioparam);
+								var->datatype->typioparam,
+								var->datatype->atttypmod, isnull);
 		if (isnull)
 			ereport(ERROR,
 					(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
@@ -2241,19 +2233,17 @@ exec_stmt_foreach_a(PLpgSQL_execstate *estate, PLpgSQL_stmt_foreach_a *stmt)
 {
 	ArrayType  *arr;
 	Oid			arrtype;
-	int32		arrtypmod;
 	PLpgSQL_datum *loop_var;
 	Oid			loop_var_elem_type;
 	bool		found = false;
 	int			rc = PLPGSQL_RC_OK;
 	ArrayIterator array_iterator;
 	Oid			iterator_result_type;
-	int32		iterator_result_typmod;
 	Datum		value;
 	bool		isnull;
 
 	/* get the value of the array expression */
-	value = exec_eval_expr(estate, stmt->expr, &isnull, &arrtype, &arrtypmod);
+	value = exec_eval_expr(estate, stmt->expr, &isnull, &arrtype);
 	if (isnull)
 		ereport(ERROR,
 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
@@ -2321,13 +2311,11 @@ exec_stmt_foreach_a(PLpgSQL_execstate *estate, PLpgSQL_stmt_foreach_a *stmt)
 	{
 		/* When slicing, nominal type of result is same as array type */
 		iterator_result_type = arrtype;
-		iterator_result_typmod = arrtypmod;
 	}
 	else
 	{
 		/* Without slicing, results are individual array elements */
 		iterator_result_type = ARR_ELEMTYPE(arr);
-		iterator_result_typmod = arrtypmod;
 	}
 
 	/* Iterate over the array elements or slices */
@@ -2336,8 +2324,8 @@ exec_stmt_foreach_a(PLpgSQL_execstate *estate, PLpgSQL_stmt_foreach_a *stmt)
 		found = true;			/* looped at least once */
 
 		/* Assign current element/slice to the loop variable */
-		exec_assign_value(estate, loop_var, value, isnull,
-						  iterator_result_type, iterator_result_typmod);
+		exec_assign_value(estate, loop_var, value, iterator_result_type,
+						  &isnull);
 
 		/* In slice case, value is temporary; must free it to avoid leakage */
 		if (stmt->slice > 0)
@@ -2464,9 +2452,8 @@ exec_stmt_return(PLpgSQL_execstate *estate, PLpgSQL_stmt_return *stmt)
 	estate->retisnull = true;
 
 	/*
-	 * Special case path when the RETURN expression is a simple variable
-	 * reference; in particular, this path is always taken in functions with
-	 * one or more OUT parameters.
+	 * This special-case path covers record/row variables in fn_retistuple
+	 * functions, as well as functions with one or more OUT parameters.
 	 */
 	if (stmt->retvarno >= 0)
 	{
@@ -2521,12 +2508,9 @@ exec_stmt_return(PLpgSQL_execstate *estate, PLpgSQL_stmt_return *stmt)
 
 	if (stmt->expr != NULL)
 	{
-		int32		rettypmod;
-
 		estate->retval = exec_eval_expr(estate, stmt->expr,
 										&(estate->retisnull),
-										&(estate->rettype),
-										&rettypmod);
+										&(estate->rettype));
 
 		if (estate->retistuple && !estate->retisnull)
 		{
@@ -2592,9 +2576,8 @@ exec_stmt_return_next(PLpgSQL_execstate *estate,
 	natts = tupdesc->natts;
 
 	/*
-	 * Special case path when the RETURN NEXT expression is a simple variable
-	 * reference; in particular, this path is always taken in functions with
-	 * one or more OUT parameters.
+	 * This special-case path covers record/row variables in fn_retistuple
+	 * functions, as well as functions with one or more OUT parameters.
 	 */
 	if (stmt->retvarno >= 0)
 	{
@@ -2616,11 +2599,10 @@ exec_stmt_return_next(PLpgSQL_execstate *estate,
 					/* coerce type if needed */
 					retval = exec_simple_cast_value(estate,
 													retval,
-													isNull,
 													var->datatype->typoid,
-													var->datatype->atttypmod,
 												 tupdesc->attrs[0]->atttypid,
-											   tupdesc->attrs[0]->atttypmod);
+												tupdesc->attrs[0]->atttypmod,
+													isNull);
 
 					tuplestore_putvalues(estate->tuple_store, tupdesc,
 										 &retval, &isNull);
@@ -2676,13 +2658,11 @@ exec_stmt_return_next(PLpgSQL_execstate *estate,
 		Datum		retval;
 		bool		isNull;
 		Oid			rettype;
-		int32		rettypmod;
 
 		retval = exec_eval_expr(estate,
 								stmt->expr,
 								&isNull,
-								&rettype,
-								&rettypmod);
+								&rettype);
 
 		if (estate->retistuple)
 		{
@@ -2742,11 +2722,10 @@ exec_stmt_return_next(PLpgSQL_execstate *estate,
 			/* coerce type if needed */
 			retval = exec_simple_cast_value(estate,
 											retval,
-											isNull,
 											rettype,
-											rettypmod,
 											tupdesc->attrs[0]->atttypid,
-											tupdesc->attrs[0]->atttypmod);
+											tupdesc->attrs[0]->atttypmod,
+											isNull);
 
 			tuplestore_putvalues(estate->tuple_store, tupdesc,
 								 &retval, &isNull);
@@ -2949,7 +2928,6 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt)
 			if (cp[0] == '%')
 			{
 				Oid			paramtypeid;
-				int32		paramtypmod;
 				Datum		paramvalue;
 				bool		paramisnull;
 				char	   *extval;
@@ -2968,8 +2946,7 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt)
 				paramvalue = exec_eval_expr(estate,
 									  (PLpgSQL_expr *) lfirst(current_param),
 											&paramisnull,
-											&paramtypeid,
-											&paramtypmod);
+											&paramtypeid);
 
 				if (paramisnull)
 					extval = "<NULL>";
@@ -2999,13 +2976,11 @@ exec_stmt_raise(PLpgSQL_execstate *estate, PLpgSQL_stmt_raise *stmt)
 		Datum		optionvalue;
 		bool		optionisnull;
 		Oid			optiontypeid;
-		int32		optiontypmod;
 		char	   *extval;
 
 		optionvalue = exec_eval_expr(estate, opt->expr,
 									 &optionisnull,
-									 &optiontypeid,
-									 &optiontypmod);
+									 &optiontypeid);
 		if (optionisnull)
 			ereport(ERROR,
 					(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
@@ -3507,9 +3482,8 @@ exec_stmt_dynexecute(PLpgSQL_execstate *estate,
 					 PLpgSQL_stmt_dynexecute *stmt)
 {
 	Datum		query;
-	bool		isnull;
+	bool		isnull = false;
 	Oid			restype;
-	int32		restypmod;
 	char	   *querystr;
 	int			exec_res;
 	PreparedParamsData *ppd = NULL;
@@ -3518,7 +3492,7 @@ exec_stmt_dynexecute(PLpgSQL_execstate *estate,
 	 * First we evaluate the string expression after the EXECUTE keyword. Its
 	 * result is the querystring we have to execute.
 	 */
-	query = exec_eval_expr(estate, stmt->query, &isnull, &restype, &restypmod);
+	query = exec_eval_expr(estate, stmt->query, &isnull, &restype);
 	if (isnull)
 		ereport(ERROR,
 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
@@ -4012,12 +3986,11 @@ exec_assign_expr(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 				 PLpgSQL_expr *expr)
 {
 	Datum		value;
-	bool		isnull;
 	Oid			valtype;
-	int32		valtypmod;
+	bool		isnull = false;
 
-	value = exec_eval_expr(estate, expr, &isnull, &valtype, &valtypmod);
-	exec_assign_value(estate, target, value, isnull, valtype, valtypmod);
+	value = exec_eval_expr(estate, expr, &isnull, &valtype);
+	exec_assign_value(estate, target, value, valtype, &isnull);
 	exec_eval_cleanup(estate);
 }
 
@@ -4033,13 +4006,14 @@ exec_assign_c_string(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 					 const char *str)
 {
 	text	   *value;
+	bool		isnull = false;
 
 	if (str != NULL)
 		value = cstring_to_text(str);
 	else
 		value = cstring_to_text("");
-	exec_assign_value(estate, target, PointerGetDatum(value), false,
-					  TEXTOID, -1);
+	exec_assign_value(estate, target, PointerGetDatum(value),
+					  TEXTOID, &isnull);
 	pfree(value);
 }
 
@@ -4055,8 +4029,7 @@ exec_assign_c_string(PLpgSQL_execstate *estate, PLpgSQL_datum *target,
 static void
 exec_assign_value(PLpgSQL_execstate *estate,
 				  PLpgSQL_datum *target,
-				  Datum value, bool isNull,
-				  Oid valtype, int32 valtypmod)
+				  Datum value, Oid valtype, bool *isNull)
 {
 	switch (target->dtype)
 	{
@@ -4070,15 +4043,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
 
 				newvalue = exec_cast_value(estate,
 										   value,
-										   isNull,
 										   valtype,
-										   valtypmod,
 										   var->datatype->typoid,
-										   var->datatype->atttypmod,
 										   &(var->datatype->typinput),
-										   var->datatype->typioparam);
+										   var->datatype->typioparam,
+										   var->datatype->atttypmod,
+										   *isNull);
 
-				if (isNull && var->notnull)
+				if (*isNull && var->notnull)
 					ereport(ERROR,
 							(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 							 errmsg("null value cannot be assigned to variable \"%s\" declared NOT NULL",
@@ -4089,7 +4061,7 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				 * probably in the eval_econtext) into the procedure's memory
 				 * context.
 				 */
-				if (!var->datatype->typbyval && !isNull)
+				if (!var->datatype->typbyval && !*isNull)
 					newvalue = datumCopy(newvalue,
 										 false,
 										 var->datatype->typlen);
@@ -4104,8 +4076,8 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				free_var(var);
 
 				var->value = newvalue;
-				var->isnull = isNull;
-				if (!var->datatype->typbyval && !isNull)
+				var->isnull = *isNull;
+				if (!var->datatype->typbyval && !*isNull)
 					var->freeval = true;
 				break;
 			}
@@ -4117,7 +4089,7 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				 */
 				PLpgSQL_row *row = (PLpgSQL_row *) target;
 
-				if (isNull)
+				if (*isNull)
 				{
 					/* If source is null, just assign nulls to the row */
 					exec_move_row(estate, NULL, row, NULL, NULL);
@@ -4141,7 +4113,7 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				 */
 				PLpgSQL_rec *rec = (PLpgSQL_rec *) target;
 
-				if (isNull)
+				if (*isNull)
 				{
 					/* If source is null, just assign nulls to the record */
 					exec_move_row(estate, rec, NULL, NULL, NULL);
@@ -4171,6 +4143,7 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				Datum	   *values;
 				bool	   *nulls;
 				bool	   *replaces;
+				bool		attisnull;
 				Oid			atttype;
 				int32		atttypmod;
 
@@ -4218,16 +4191,16 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				 * Now insert the new value, being careful to cast it to the
 				 * right type.
 				 */
-				atttype = rec->tupdesc->attrs[fno]->atttypid;
+				atttype = SPI_gettypeid(rec->tupdesc, fno + 1);
 				atttypmod = rec->tupdesc->attrs[fno]->atttypmod;
+				attisnull = *isNull;
 				values[fno] = exec_simple_cast_value(estate,
 													 value,
-													 isNull,
 													 valtype,
-													 valtypmod,
 													 atttype,
-													 atttypmod);
-				nulls[fno] = isNull;
+													 atttypmod,
+													 attisnull);
+				nulls[fno] = attisnull;
 
 				/*
 				 * Now call heap_modify_tuple() to create a new tuple that
@@ -4260,11 +4233,12 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				PLpgSQL_expr *subscripts[MAXDIM];
 				int			subscriptvals[MAXDIM];
 				Datum		oldarraydatum,
-							newarraydatum,
 							coerced_value;
 				bool		oldarrayisnull;
 				Oid			parenttypoid;
 				int32		parenttypmod;
+				ArrayType  *oldarrayval;
+				ArrayType  *newarrayval;
 				SPITupleTable *save_eval_tuptable;
 				MemoryContext oldcontext;
 
@@ -4385,11 +4359,10 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				/* Coerce source value to match array element type. */
 				coerced_value = exec_simple_cast_value(estate,
 													   value,
-													   isNull,
 													   valtype,
-													   valtypmod,
 													   arrayelem->elemtypoid,
-													 arrayelem->arraytypmod);
+													   arrayelem->arraytypmod,
+													   *isNull);
 
 				/*
 				 * If the original array is null, cons up an empty array so
@@ -4402,27 +4375,29 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				 * corresponds to the current behavior of ExecEvalArrayRef().
 				 */
 				if (arrayelem->arraytyplen > 0 &&		/* fixed-length array? */
-					(oldarrayisnull || isNull))
+					(oldarrayisnull || *isNull))
 					return;
 
-				/* empty array, if any, and newarraydatum are short-lived */
+				/* oldarrayval and newarrayval should be short-lived */
 				oldcontext = MemoryContextSwitchTo(estate->eval_econtext->ecxt_per_tuple_memory);
 
 				if (oldarrayisnull)
-					oldarraydatum = PointerGetDatum(construct_empty_array(arrayelem->elemtypoid));
+					oldarrayval = construct_empty_array(arrayelem->elemtypoid);
+				else
+					oldarrayval = (ArrayType *) DatumGetPointer(oldarraydatum);
 
 				/*
 				 * Build the modified array value.
 				 */
-				newarraydatum = array_set_element(oldarraydatum,
-												  nsubscripts,
-												  subscriptvals,
-												  coerced_value,
-												  isNull,
-												  arrayelem->arraytyplen,
-												  arrayelem->elemtyplen,
-												  arrayelem->elemtypbyval,
-												  arrayelem->elemtypalign);
+				newarrayval = array_set(oldarrayval,
+										nsubscripts,
+										subscriptvals,
+										coerced_value,
+										*isNull,
+										arrayelem->arraytyplen,
+										arrayelem->elemtyplen,
+										arrayelem->elemtypbyval,
+										arrayelem->elemtypalign);
 
 				MemoryContextSwitchTo(oldcontext);
 
@@ -4432,11 +4407,10 @@ exec_assign_value(PLpgSQL_execstate *estate,
 				 * coercing the base array type back up to the domain will
 				 * happen within exec_assign_value.
 				 */
+				*isNull = false;
 				exec_assign_value(estate, target,
-								  newarraydatum,
-								  false,
-								  arrayelem->arraytypoid,
-								  arrayelem->arraytypmod);
+								  PointerGetDatum(newarrayval),
+								  arrayelem->arraytypoid, isNull);
 				break;
 			}
 
@@ -4757,12 +4731,11 @@ exec_eval_integer(PLpgSQL_execstate *estate,
 {
 	Datum		exprdatum;
 	Oid			exprtypeid;
-	int32		exprtypmod;
 
-	exprdatum = exec_eval_expr(estate, expr, isNull, &exprtypeid, &exprtypmod);
-	exprdatum = exec_simple_cast_value(estate, exprdatum, *isNull,
-									   exprtypeid, exprtypmod,
-									   INT4OID, -1);
+	exprdatum = exec_eval_expr(estate, expr, isNull, &exprtypeid);
+	exprdatum = exec_simple_cast_value(estate, exprdatum, exprtypeid,
+									   INT4OID, -1,
+									   *isNull);
 	return DatumGetInt32(exprdatum);
 }
 
@@ -4780,18 +4753,17 @@ exec_eval_boolean(PLpgSQL_execstate *estate,
 {
 	Datum		exprdatum;
 	Oid			exprtypeid;
-	int32		exprtypmod;
 
-	exprdatum = exec_eval_expr(estate, expr, isNull, &exprtypeid, &exprtypmod);
-	exprdatum = exec_simple_cast_value(estate, exprdatum, *isNull,
-									   exprtypeid, exprtypmod,
-									   BOOLOID, -1);
+	exprdatum = exec_eval_expr(estate, expr, isNull, &exprtypeid);
+	exprdatum = exec_simple_cast_value(estate, exprdatum, exprtypeid,
+									   BOOLOID, -1,
+									   *isNull);
 	return DatumGetBool(exprdatum);
 }
 
 /* ----------
  * exec_eval_expr			Evaluate an expression and return
- *					the result Datum, along with data type/typmod.
+ *					the result Datum.
  *
  * NOTE: caller must do exec_eval_cleanup when done with the Datum.
  * ----------
@@ -4800,8 +4772,7 @@ static Datum
 exec_eval_expr(PLpgSQL_execstate *estate,
 			   PLpgSQL_expr *expr,
 			   bool *isNull,
-			   Oid *rettype,
-			   int32 *rettypmod)
+			   Oid *rettype)
 {
 	Datum		result = 0;
 	int			rc;
@@ -4816,8 +4787,7 @@ exec_eval_expr(PLpgSQL_execstate *estate,
 	 * If this is a simple expression, bypass SPI and use the executor
 	 * directly
 	 */
-	if (exec_eval_simple_expr(estate, expr,
-							  &result, isNull, rettype, rettypmod))
+	if (exec_eval_simple_expr(estate, expr, &result, isNull, rettype))
 		return result;
 
 	/*
@@ -4844,8 +4814,7 @@ exec_eval_expr(PLpgSQL_execstate *estate,
 	/*
 	 * ... and get the column's datatype.
 	 */
-	*rettype = estate->eval_tuptable->tupdesc->attrs[0]->atttypid;
-	*rettypmod = estate->eval_tuptable->tupdesc->attrs[0]->atttypmod;
+	*rettype = SPI_gettypeid(estate->eval_tuptable->tupdesc, 1);
 
 	/*
 	 * If there are no rows selected, the result is a NULL of that type.
@@ -5098,8 +5067,8 @@ loop_exit:
  * exec_eval_simple_expr -		Evaluate a simple expression returning
  *								a Datum by directly calling ExecEvalExpr().
  *
- * If successful, store results into *result, *isNull, *rettype, *rettypmod
- * and return TRUE.  If the expression cannot be handled by simple evaluation,
+ * If successful, store results into *result, *isNull, *rettype and return
+ * TRUE.  If the expression cannot be handled by simple evaluation,
  * return FALSE.
  *
  * Because we only store one execution tree for a simple expression, we
@@ -5130,8 +5099,7 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 					  PLpgSQL_expr *expr,
 					  Datum *result,
 					  bool *isNull,
-					  Oid *rettype,
-					  int32 *rettypmod)
+					  Oid *rettype)
 {
 	ExprContext *econtext = estate->eval_econtext;
 	LocalTransactionId curlxid = MyProc->lxid;
@@ -5181,7 +5149,6 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 	 * Pass back previously-determined result type.
 	 */
 	*rettype = expr->expr_simple_type;
-	*rettypmod = expr->expr_simple_typmod;
 
 	/*
 	 * Prepare the expression for execution, if it's not been done already in
@@ -5497,7 +5464,6 @@ exec_move_row(PLpgSQL_execstate *estate,
 			Datum		value;
 			bool		isnull;
 			Oid			valtype;
-			int32		valtypmod;
 
 			if (row->varnos[fnum] < 0)
 				continue;		/* skip dropped column in row struct */
@@ -5516,20 +5482,23 @@ exec_move_row(PLpgSQL_execstate *estate,
 					value = (Datum) 0;
 					isnull = true;
 				}
-				valtype = tupdesc->attrs[anum]->atttypid;
-				valtypmod = tupdesc->attrs[anum]->atttypmod;
+				valtype = SPI_gettypeid(tupdesc, anum + 1);
 				anum++;
 			}
 			else
 			{
 				value = (Datum) 0;
 				isnull = true;
-				valtype = UNKNOWNOID;
-				valtypmod = -1;
+
+				/*
+				 * InvalidOid is OK because exec_assign_value doesn't care
+				 * about the type of a source NULL
+				 */
+				valtype = InvalidOid;
 			}
 
 			exec_assign_value(estate, (PLpgSQL_datum *) var,
-							  value, isnull, valtype, valtypmod);
+							  value, valtype, &isnull);
 		}
 
 		return;
@@ -5713,17 +5682,17 @@ convert_value_to_string(PLpgSQL_execstate *estate, Datum value, Oid valtype)
  */
 static Datum
 exec_cast_value(PLpgSQL_execstate *estate,
-				Datum value, bool isnull,
-				Oid valtype, int32 valtypmod,
-				Oid reqtype, int32 reqtypmod,
+				Datum value, Oid valtype,
+				Oid reqtype,
 				FmgrInfo *reqinput,
-				Oid reqtypioparam)
+				Oid reqtypioparam,
+				int32 reqtypmod,
+				bool isnull)
 {
 	/*
 	 * If the type of the given value isn't what's requested, convert it.
 	 */
-	if (valtype != reqtype ||
-		(valtypmod != reqtypmod && reqtypmod != -1))
+	if (valtype != reqtype || reqtypmod != -1)
 	{
 		MemoryContext oldcontext;
 
@@ -5757,12 +5726,11 @@ exec_cast_value(PLpgSQL_execstate *estate,
  */
 static Datum
 exec_simple_cast_value(PLpgSQL_execstate *estate,
-					   Datum value, bool isnull,
-					   Oid valtype, int32 valtypmod,
-					   Oid reqtype, int32 reqtypmod)
+					   Datum value, Oid valtype,
+					   Oid reqtype, int32 reqtypmod,
+					   bool isnull)
 {
-	if (valtype != reqtype ||
-		(valtypmod != reqtypmod && reqtypmod != -1))
+	if (valtype != reqtype || reqtypmod != -1)
 	{
 		Oid			typinput;
 		Oid			typioparam;
@@ -5774,13 +5742,12 @@ exec_simple_cast_value(PLpgSQL_execstate *estate,
 
 		value = exec_cast_value(estate,
 								value,
-								isnull,
 								valtype,
-								valtypmod,
 								reqtype,
-								reqtypmod,
 								&finfo_input,
-								typioparam);
+								typioparam,
+								reqtypmod,
+								isnull);
 	}
 
 	return value;
@@ -6211,7 +6178,6 @@ exec_simple_recheck_plan(PLpgSQL_expr *expr, CachedPlan *cplan)
 	expr->expr_simple_lxid = InvalidLocalTransactionId;
 	/* Also stash away the expression result type */
 	expr->expr_simple_type = exprType((Node *) tle->expr);
-	expr->expr_simple_typmod = exprTypmod((Node *) tle->expr);
 }
 
 /* ----------
@@ -6412,12 +6378,10 @@ exec_eval_using_params(PLpgSQL_execstate *estate, List *params)
 	{
 		PLpgSQL_expr *param = (PLpgSQL_expr *) lfirst(lc);
 		bool		isnull;
-		int32		ppdtypmod;
 
 		ppd->values[i] = exec_eval_expr(estate, param,
 										&isnull,
-										&ppd->types[i],
-										&ppdtypmod);
+										&ppd->types[i]);
 		ppd->nulls[i] = isnull ? 'n' : ' ';
 		ppd->freevals[i] = false;
 
@@ -6495,14 +6459,13 @@ exec_dynquery_with_params(PLpgSQL_execstate *estate,
 	Datum		query;
 	bool		isnull;
 	Oid			restype;
-	int32		restypmod;
 	char	   *querystr;
 
 	/*
 	 * Evaluate the string expression after the EXECUTE keyword. Its result is
 	 * the querystring we have to execute.
 	 */
-	query = exec_eval_expr(estate, dynquery, &isnull, &restype, &restypmod);
+	query = exec_eval_expr(estate, dynquery, &isnull, &restype);
 	if (isnull)
 		ereport(ERROR,
 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
diff --git a/src/pl/plpgsql/src/pl_funcs.c b/src/pl/plpgsql/src/pl_funcs.c
index b6023cc..1dcea73 100644
--- a/src/pl/plpgsql/src/pl_funcs.c
+++ b/src/pl/plpgsql/src/pl_funcs.c
@@ -97,7 +97,7 @@ plpgsql_ns_additem(int itemtype, int itemno, const char *name)
 	/* first item added must be a label */
 	Assert(ns_top != NULL || itemtype == PLPGSQL_NSTYPE_LABEL);
 
-	nse = palloc(offsetof(PLpgSQL_nsitem, name) +strlen(name) + 1);
+	nse = palloc(sizeof(PLpgSQL_nsitem) + strlen(name));
 	nse->itemtype = itemtype;
 	nse->itemno = itemno;
 	nse->prev = ns_top;
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 506a313..590aac5 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3036,17 +3036,16 @@ make_return_stmt(int location)
 					 errmsg("RETURN cannot have a parameter in function returning void"),
 					 parser_errposition(yylloc)));
 	}
-	else
+	else if (plpgsql_curr_compile->fn_retistuple)
 	{
 		/*
-		 * We want to special-case simple variable references for efficiency.
-		 * So peek ahead to see if that's what we have.
+		 * We want to special-case simple row or record references for
+		 * efficiency.  So peek ahead to see if that's what we have.
 		 */
 		int		tok = yylex();
 
 		if (tok == T_DATUM && plpgsql_peek() == ';' &&
-			(yylval.wdatum.datum->dtype == PLPGSQL_DTYPE_VAR ||
-			 yylval.wdatum.datum->dtype == PLPGSQL_DTYPE_ROW ||
+			(yylval.wdatum.datum->dtype == PLPGSQL_DTYPE_ROW ||
 			 yylval.wdatum.datum->dtype == PLPGSQL_DTYPE_REC))
 		{
 			new->retvarno = yylval.wdatum.datum->dno;
@@ -3056,16 +3055,19 @@ make_return_stmt(int location)
 		}
 		else
 		{
-			/*
-			 * Not (just) a variable name, so treat as expression.
-			 *
-			 * Note that a well-formed expression is _required_ here;
-			 * anything else is a compile-time error.
-			 */
+			/* Not (just) a row/record name, so treat as expression */
 			plpgsql_push_back_token(tok);
 			new->expr = read_sql_expression(';', ";");
 		}
 	}
+	else
+	{
+		/*
+		 * Note that a well-formed expression is _required_ here;
+		 * anything else is a compile-time error.
+		 */
+		new->expr = read_sql_expression(';', ";");
+	}
 
 	return (PLpgSQL_stmt *) new;
 }
@@ -3097,17 +3099,16 @@ make_return_next_stmt(int location)
 					 parser_errposition(yylloc)));
 		new->retvarno = plpgsql_curr_compile->out_param_varno;
 	}
-	else
+	else if (plpgsql_curr_compile->fn_retistuple)
 	{
 		/*
-		 * We want to special-case simple variable references for efficiency.
-		 * So peek ahead to see if that's what we have.
+		 * We want to special-case simple row or record references for
+		 * efficiency.  So peek ahead to see if that's what we have.
 		 */
 		int		tok = yylex();
 
 		if (tok == T_DATUM && plpgsql_peek() == ';' &&
-			(yylval.wdatum.datum->dtype == PLPGSQL_DTYPE_VAR ||
-			 yylval.wdatum.datum->dtype == PLPGSQL_DTYPE_ROW ||
+			(yylval.wdatum.datum->dtype == PLPGSQL_DTYPE_ROW ||
 			 yylval.wdatum.datum->dtype == PLPGSQL_DTYPE_REC))
 		{
 			new->retvarno = yylval.wdatum.datum->dno;
@@ -3117,16 +3118,13 @@ make_return_next_stmt(int location)
 		}
 		else
 		{
-			/*
-			 * Not (just) a variable name, so treat as expression.
-			 *
-			 * Note that a well-formed expression is _required_ here;
-			 * anything else is a compile-time error.
-			 */
+			/* Not (just) a row/record name, so treat as expression */
 			plpgsql_push_back_token(tok);
 			new->expr = read_sql_expression(';', ";");
 		}
 	}
+	else
+		new->expr = read_sql_expression(';', ";");
 
 	return (PLpgSQL_stmt *) new;
 }
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 624c91e..00f2f77 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -226,7 +226,6 @@ typedef struct PLpgSQL_expr
 	Expr	   *expr_simple_expr;		/* NULL means not a simple expr */
 	int			expr_simple_generation; /* plancache generation we checked */
 	Oid			expr_simple_type;		/* result type Oid, if simple */
-	int32		expr_simple_typmod;		/* result typmod, if simple */
 
 	/*
 	 * if expr is simple AND prepared in current transaction,
@@ -330,7 +329,7 @@ typedef struct PLpgSQL_nsitem
 	int			itemtype;
 	int			itemno;
 	struct PLpgSQL_nsitem *prev;
-	char		name[FLEXIBLE_ARRAY_MEMBER];	/* nul-terminated string */
+	char		name[1];		/* actually, as long as needed */
 } PLpgSQL_nsitem;
 
 
diff --git a/src/port/Makefile b/src/port/Makefile
index a862d51..a0908bf 100644
--- a/src/port/Makefile
+++ b/src/port/Makefile
@@ -51,7 +51,6 @@ uninstall:
 	rm -f '$(DESTDIR)$(libdir)/libpgport.a'
 
 libpgport.a: $(OBJS)
-	rm -f $@
 	$(AR) $(AROPT) $@ $^
 
 # thread.o needs PTHREAD_CFLAGS (but thread_srv.o does not)
@@ -62,7 +61,6 @@ thread.o: CFLAGS+=$(PTHREAD_CFLAGS)
 #
 
 libpgport_srv.a: $(OBJS_SRV)
-	rm -f $@
 	$(AR) $(AROPT) $@ $^
 
 # Because this uses its own compilation rule, it doesn't use the
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index 0d8b8a8..6187a0a 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -143,7 +143,7 @@ typedef struct
 	WORD		SubstituteNameLength;
 	WORD		PrintNameOffset;
 	WORD		PrintNameLength;
-	WCHAR		PathBuffer[FLEXIBLE_ARRAY_MEMBER];
+	WCHAR		PathBuffer[1];
 } REPARSE_JUNCTION_DATA_BUFFER;
 
 #define REPARSE_JUNCTION_DATA_BUFFER_HEADER_SIZE   \
@@ -160,7 +160,7 @@ pgsymlink(const char *oldpath, const char *newpath)
 {
 	HANDLE		dirhandle;
 	DWORD		len;
-	char		buffer[MAX_PATH * sizeof(WCHAR) + offsetof(REPARSE_JUNCTION_DATA_BUFFER, PathBuffer)];
+	char		buffer[MAX_PATH * sizeof(WCHAR) + sizeof(REPARSE_JUNCTION_DATA_BUFFER)];
 	char		nativeTarget[MAX_PATH];
 	char	   *p = nativeTarget;
 	REPARSE_JUNCTION_DATA_BUFFER *reparseBuf = (REPARSE_JUNCTION_DATA_BUFFER *) buffer;
@@ -174,10 +174,10 @@ pgsymlink(const char *oldpath, const char *newpath)
 		return -1;
 
 	/* make sure we have an unparsed native win32 path */
-	if (memcmp("\\??\\", oldpath, 4) != 0)
-		snprintf(nativeTarget, sizeof(nativeTarget), "\\??\\%s", oldpath);
+	if (memcmp("\\??\\", oldpath, 4))
+		sprintf(nativeTarget, "\\??\\%s", oldpath);
 	else
-		strlcpy(nativeTarget, oldpath, sizeof(nativeTarget));
+		strcpy(nativeTarget, oldpath);
 
 	while ((p = strchr(p, '/')) != NULL)
 		*p++ = '\\';
@@ -239,7 +239,7 @@ pgreadlink(const char *path, char *buf, size_t size)
 {
 	DWORD		attr;
 	HANDLE		h;
-	char		buffer[MAX_PATH * sizeof(WCHAR) + offsetof(REPARSE_JUNCTION_DATA_BUFFER, PathBuffer)];
+	char		buffer[MAX_PATH * sizeof(WCHAR) + sizeof(REPARSE_JUNCTION_DATA_BUFFER)];
 	REPARSE_JUNCTION_DATA_BUFFER *reparseBuf = (REPARSE_JUNCTION_DATA_BUFFER *) buffer;
 	DWORD		len;
 	int			r;
diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c
index 3c60238..eabf161 100644
--- a/src/port/gettimeofday.c
+++ b/src/port/gettimeofday.c
@@ -30,6 +30,7 @@
 
 #include <sys/time.h>
 
+
 /* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
 static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
 
@@ -47,19 +48,15 @@ static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
  */
 typedef VOID (WINAPI *PgGetSystemTimeFn)(LPFILETIME);
 
-/* One-time initializer function, must match that signature. */
-static void WINAPI init_gettimeofday(LPFILETIME lpSystemTimeAsFileTime);
-
 /* Storage for the function we pick at runtime */
-static PgGetSystemTimeFn pg_get_system_time = &init_gettimeofday;
+static PgGetSystemTimeFn pg_get_system_time = NULL;
 
 /*
- * One time initializer.  Determine whether GetSystemTimePreciseAsFileTime
- * is available and if so, plan to use it; if not, fall back to
- * GetSystemTimeAsFileTime.
+ * During backend startup, determine if GetSystemTimePreciseAsFileTime is
+ * available and use it; if not, fall back to GetSystemTimeAsFileTime.
  */
-static void WINAPI
-init_gettimeofday(LPFILETIME lpSystemTimeAsFileTime)
+void
+init_win32_gettimeofday(void)
 {
 	/*
 	 * Because it's guaranteed that kernel32.dll will be linked into our
@@ -83,16 +80,14 @@ init_gettimeofday(LPFILETIME lpSystemTimeAsFileTime)
 		 * The expected error from GetLastError() is ERROR_PROC_NOT_FOUND, if
 		 * the function isn't present. No other error should occur.
 		 *
-		 * We can't report an error here because this might be running in
-		 * frontend code; and even if we're in the backend, it's too early
-		 * to elog(...) if we get some unexpected error.  Also, it's not a
-		 * serious problem, so just silently fall back to
+		 * It's too early in startup to elog(...) if we get some unexpected
+		 * error, and not serious enough to warrant a fprintf to stderr about
+		 * it or save the error and report it later. So silently fall back to
 		 * GetSystemTimeAsFileTime irrespective of why the failure occurred.
 		 */
 		pg_get_system_time = &GetSystemTimeAsFileTime;
 	}
 
-	(*pg_get_system_time)(lpSystemTimeAsFileTime);
 }
 
 /*
diff --git a/src/port/pgcheckdir.c b/src/port/pgcheckdir.c
index 7102f2c..7061893 100644
--- a/src/port/pgcheckdir.c
+++ b/src/port/pgcheckdir.c
@@ -22,9 +22,7 @@
  * Returns:
  *		0 if nonexistent
  *		1 if exists and empty
- *		2 if exists and contains _only_ dot files
- *		3 if exists and contains a mount point
- *		4 if exists and not empty
+ *		2 if exists and not empty
  *		-1 if trouble accessing directory (errno reflects the error)
  */
 int
@@ -34,8 +32,6 @@ pg_check_dir(const char *dir)
 	DIR		   *chkdir;
 	struct dirent *file;
 	bool		dot_found = false;
-	bool		mount_found = false;
-	int			readdir_errno;
 
 	chkdir = opendir(dir);
 	if (chkdir == NULL)
@@ -55,10 +51,10 @@ pg_check_dir(const char *dir)
 		{
 			dot_found = true;
 		}
-		/* lost+found directory */
 		else if (strcmp("lost+found", file->d_name) == 0)
 		{
-			mount_found = true;
+			result = 3;			/* not empty, mount point */
+			break;
 		}
 #endif
 		else
@@ -68,20 +64,9 @@ pg_check_dir(const char *dir)
 		}
 	}
 
-	if (errno)
+	if (errno || closedir(chkdir))
 		result = -1;			/* some kind of I/O error? */
 
-	/* Close chkdir and avoid overwriting the readdir errno on success */
-	readdir_errno = errno;
-	if (closedir(chkdir))
-		result = -1;			/* error executing closedir */
-	else
-		errno = readdir_errno;
-
-	/* We report on mount point if we find a lost+found directory */
-	if (result == 1 && mount_found)
-		result = 3;
-
 	/* We report on dot-files if we _only_ find dot files */
 	if (result == 1 && dot_found)
 		result = 2;
diff --git a/src/port/tar.c b/src/port/tar.c
index 4721df3..8ef4f9c 100644
--- a/src/port/tar.c
+++ b/src/port/tar.c
@@ -49,16 +49,10 @@ tarChecksum(char *header)
  * must always have space for 512 characters, which is a requirement by
  * the tar format.
  */
-enum tarError
+void
 tarCreateHeader(char *h, const char *filename, const char *linktarget,
 				size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime)
 {
-	if (strlen(filename) > 99)
-		return TAR_NAME_TOO_LONG;
-
-	if (linktarget && strlen(linktarget) > 99)
-		return TAR_SYMLINK_TOO_LONG;
-
 	/*
 	 * Note: most of the fields in a tar header are not supposed to be
 	 * null-terminated.  We use sprintf, which will write a null after the
@@ -147,6 +141,4 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget,
 	 * 6 digits, a space, and a null, which is legal per POSIX.
 	 */
 	sprintf(&h[148], "%06o ", tarChecksum(h));
-
-	return TAR_OK;
 }
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 5e31737..035d843 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -429,40 +429,6 @@ COPY forcetest (d, e) FROM STDIN WITH (FORMAT csv, FORCE_NULL(b));
 ERROR:  FORCE NULL column "b" not referenced by COPY
 ROLLBACK;
 \pset null ''
--- test case with whole-row Var in a check constraint
-create table check_con_tbl (f1 int);
-create function check_con_function(check_con_tbl) returns bool as $$
-begin
-  raise notice 'input = %', row_to_json($1);
-  return $1.f1 > 0;
-end $$ language plpgsql immutable;
-alter table check_con_tbl add check (check_con_function(check_con_tbl.*));
-\d+ check_con_tbl
-                    Table "public.check_con_tbl"
- Column |  Type   | Modifiers | Storage | Stats target | Description 
---------+---------+-----------+---------+--------------+-------------
- f1     | integer |           | plain   |              | 
-Check constraints:
-    "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
-
-copy check_con_tbl from stdin;
-NOTICE:  input = {"f1":1}
-CONTEXT:  COPY check_con_tbl, line 1: "1"
-NOTICE:  input = {"f1":null}
-CONTEXT:  COPY check_con_tbl, line 2: "\N"
-copy check_con_tbl from stdin;
-NOTICE:  input = {"f1":0}
-CONTEXT:  COPY check_con_tbl, line 1: "0"
-ERROR:  new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
-DETAIL:  Failing row contains (0).
-CONTEXT:  COPY check_con_tbl, line 1: "0"
-select * from check_con_tbl;
- f1 
-----
-  1
-   
-(2 rows)
-
 DROP TABLE forcetest;
 DROP TABLE vistest;
 DROP FUNCTION truncate_in_subxact();
diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out
index c107d37..78e7704 100644
--- a/src/test/regress/expected/domain.out
+++ b/src/test/regress/expected/domain.out
@@ -652,36 +652,6 @@ ERROR:  value for domain orderedpair violates check constraint "orderedpair_chec
 CONTEXT:  PL/pgSQL function array_elem_check(integer) line 5 at assignment
 drop function array_elem_check(int);
 --
--- Check enforcement of changing constraints in plpgsql
---
-create domain di as int;
-create function dom_check(int) returns di as $$
-declare d di;
-begin
-  d := $1;
-  return d;
-end
-$$ language plpgsql immutable;
-select dom_check(0);
- dom_check 
------------
-         0
-(1 row)
-
-alter domain di add constraint pos check (value > 0);
-select dom_check(0); -- fail
-ERROR:  value for domain di violates check constraint "pos"
-CONTEXT:  PL/pgSQL function dom_check(integer) line 4 at assignment
-alter domain di drop constraint pos;
-select dom_check(0);
- dom_check 
------------
-         0
-(1 row)
-
-drop function dom_check(int);
-drop domain di;
---
 -- Renaming
 --
 create domain testdomain1 as int;
diff --git a/src/test/regress/expected/event_trigger.out b/src/test/regress/expected/event_trigger.out
index eaf47f0..2095794 100644
--- a/src/test/regress/expected/event_trigger.out
+++ b/src/test/regress/expected/event_trigger.out
@@ -370,25 +370,6 @@ alter table rewriteme
 NOTICE:  Table 'rewriteme' is being rewritten (reason = 6)
 -- shouldn't trigger a table_rewrite event
 alter table rewriteme alter column foo type numeric(12,4);
--- typed tables are rewritten when their type changes.  Don't emit table
--- name, because firing order is not stable.
-CREATE OR REPLACE FUNCTION test_evtrig_no_rewrite() RETURNS event_trigger
-LANGUAGE plpgsql AS $$
-BEGIN
-  RAISE NOTICE 'Table is being rewritten (reason = %)',
-               pg_event_trigger_table_rewrite_reason();
-END;
-$$;
-create type rewritetype as (a int);
-create table rewritemetoo1 of rewritetype;
-create table rewritemetoo2 of rewritetype;
-alter type rewritetype alter attribute a type text cascade;
-NOTICE:  Table is being rewritten (reason = 4)
-NOTICE:  Table is being rewritten (reason = 4)
--- but this doesn't work
-create table rewritemetoo3 (a rewritetype);
-alter type rewritetype alter attribute a type varchar cascade;
-ERROR:  cannot alter type "rewritetype" because column "rewritemetoo3.a" uses it
 drop table rewriteme;
 drop event trigger no_rewrite_allowed;
 drop function test_evtrig_no_rewrite();
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 632b7e5..4795c83 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -663,7 +663,7 @@ LINE 1: CREATE FOREIGN TABLE ft1 ();
 CREATE FOREIGN TABLE ft1 () SERVER no_server;                   -- ERROR
 ERROR:  server "no_server" does not exist
 CREATE FOREIGN TABLE ft1 () SERVER s0 WITH OIDS;                -- ERROR
-ERROR:  syntax error at or near "WITH"
+ERROR:  syntax error at or near "WITH OIDS"
 LINE 1: CREATE FOREIGN TABLE ft1 () SERVER s0 WITH OIDS;
                                               ^
 CREATE FOREIGN TABLE ft1 (
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index ca3a17b..2501184 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2781,25 +2781,6 @@ where thousand = (q1 + q2);
 (12 rows)
 
 --
--- test ability to generate a suitable plan for a star-schema query
---
-explain (costs off)
-select * from
-  tenk1, int8_tbl a, int8_tbl b
-where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
-                             QUERY PLAN                              
----------------------------------------------------------------------
- Nested Loop
-   ->  Seq Scan on int8_tbl b
-         Filter: (q2 = 2)
-   ->  Nested Loop
-         ->  Seq Scan on int8_tbl a
-               Filter: (q2 = 1)
-         ->  Index Scan using tenk1_thous_tenthous on tenk1
-               Index Cond: ((thousand = a.q1) AND (tenthous = b.q1))
-(8 rows)
-
---
 -- test extraction of restriction OR clauses from join OR clause
 -- (we used to only do this for indexable clauses)
 --
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index 3942c3b..1670436 100644
--- a/src/test/regress/expected/json.out
+++ b/src/test/regress/expected/json.out
@@ -426,30 +426,6 @@ select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
 (1 row)
 
 COMMIT;
-select to_json(date '2014-05-28');
-   to_json    
---------------
- "2014-05-28"
-(1 row)
-
-select to_json(date 'Infinity');
-  to_json   
-------------
- "infinity"
-(1 row)
-
-select to_json(timestamp 'Infinity');
-  to_json   
-------------
- "infinity"
-(1 row)
-
-select to_json(timestamptz 'Infinity');
-  to_json   
-------------
- "infinity"
-(1 row)
-
 --json_agg
 SELECT json_agg(q)
   FROM ( SELECT $$a$$ || x AS b, y AS c,
diff --git a/src/test/regress/expected/json_1.out b/src/test/regress/expected/json_1.out
index 38f1526..8078146 100644
--- a/src/test/regress/expected/json_1.out
+++ b/src/test/regress/expected/json_1.out
@@ -426,30 +426,6 @@ select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
 (1 row)
 
 COMMIT;
-select to_json(date '2014-05-28');
-   to_json    
---------------
- "2014-05-28"
-(1 row)
-
-select to_json(date 'Infinity');
-  to_json   
-------------
- "infinity"
-(1 row)
-
-select to_json(timestamp 'Infinity');
-  to_json   
-------------
- "infinity"
-(1 row)
-
-select to_json(timestamptz 'Infinity');
-  to_json   
-------------
- "infinity"
-(1 row)
-
 --json_agg
 SELECT json_agg(q)
   FROM ( SELECT $$a$$ || x AS b, y AS c,
diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out
index 0d55890..6c6ed95 100644
--- a/src/test/regress/expected/jsonb.out
+++ b/src/test/regress/expected/jsonb.out
@@ -330,30 +330,6 @@ select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04');
 (1 row)
 
 COMMIT;
-select to_jsonb(date '2014-05-28');
-   to_jsonb   
---------------
- "2014-05-28"
-(1 row)
-
-select to_jsonb(date 'Infinity');
-  to_jsonb  
-------------
- "infinity"
-(1 row)
-
-select to_jsonb(timestamp 'Infinity');
-  to_jsonb  
-------------
- "infinity"
-(1 row)
-
-select to_jsonb(timestamptz 'Infinity');
-  to_jsonb  
-------------
- "infinity"
-(1 row)
-
 --jsonb_agg
 CREATE TEMP TABLE rows AS
 SELECT x, 'txt' || x as y
diff --git a/src/test/regress/expected/jsonb_1.out b/src/test/regress/expected/jsonb_1.out
index 694b6ea..f30148d 100644
--- a/src/test/regress/expected/jsonb_1.out
+++ b/src/test/regress/expected/jsonb_1.out
@@ -330,30 +330,6 @@ select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04');
 (1 row)
 
 COMMIT;
-select to_jsonb(date '2014-05-28');
-   to_jsonb   
---------------
- "2014-05-28"
-(1 row)
-
-select to_jsonb(date 'Infinity');
-  to_jsonb  
-------------
- "infinity"
-(1 row)
-
-select to_jsonb(timestamp 'Infinity');
-  to_jsonb  
-------------
- "infinity"
-(1 row)
-
-select to_jsonb(timestamptz 'Infinity');
-  to_jsonb  
-------------
- "infinity"
-(1 row)
-
 --jsonb_agg
 CREATE TEMP TABLE rows AS
 SELECT x, 'txt' || x as y
diff --git a/src/test/regress/expected/object_address.out b/src/test/regress/expected/object_address.out
index dcf1b46..8e11b42 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -370,14 +370,14 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, addr1.subobjid)).*,
  cast                      |            |                   | (bigint AS integer)                                                  | t
  table constraint          | addr_nsp   |                   | a_chk on addr_nsp.gentable                                           | t
  domain constraint         | addr_nsp   |                   | domconstr on addr_nsp.gendomain                                      | t
- conversion                | pg_catalog | ascii_to_mic      | pg_catalog.ascii_to_mic                                              | t
+ conversion                | pg_catalog | ascii_to_mic      | ascii_to_mic                                                         | t
  language                  |            | plpgsql           | plpgsql                                                              | t
  schema                    |            | addr_nsp          | addr_nsp                                                             | t
- operator class            | pg_catalog | int4_ops          | pg_catalog.int4_ops USING btree                                      | t
+ operator class            | pg_catalog | int4_ops          | pg_catalog.int4_ops for btree                                        | t
  operator                  | pg_catalog |                   | pg_catalog.+(integer,integer)                                        | t
  rule                      |            |                   | "_RETURN" on addr_nsp.genview                                        | t
  trigger                   |            |                   | t on addr_nsp.gentable                                               | t
- operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops USING btree                                   | t
+ operator family           | pg_catalog | integer_ops       | pg_catalog.integer_ops for btree                                     | t
  policy                    |            |                   | genpol on addr_nsp.gentable                                          | t
  collation                 | pg_catalog | "default"         | pg_catalog."default"                                                 | t
  text search dictionary    | addr_nsp   | addr_ts_dict      | addr_nsp.addr_ts_dict                                                | t
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 6b248f2..9870bfa 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -176,7 +176,8 @@ ORDER BY 1, 2;
           25 |        1043
         1114 |        1184
         1560 |        1562
-(4 rows)
+        2277 |        2283
+(5 rows)
 
 SELECT DISTINCT p1.proargtypes[1], p2.proargtypes[1]
 FROM pg_proc AS p1, pg_proc AS p2
@@ -193,7 +194,8 @@ ORDER BY 1, 2;
           23 |          28
         1114 |        1184
         1560 |        1562
-(3 rows)
+        2277 |        2283
+(4 rows)
 
 SELECT DISTINCT p1.proargtypes[2], p2.proargtypes[2]
 FROM pg_proc AS p1, pg_proc AS p2
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index 2c0b2e5..daf3447 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -2655,21 +2655,9 @@ NOTICE:  P0001 user exception
  
 (1 row)
 
-create function excpt_test4() returns text as $$
-begin
-	begin perform 1/0;
-	exception when others then return sqlerrm; end;
-end; $$ language plpgsql;
-select excpt_test4();
-   excpt_test4    
-------------------
- division by zero
-(1 row)
-
 drop function excpt_test1();
 drop function excpt_test2();
 drop function excpt_test3();
-drop function excpt_test4();
 -- parameters of raise stmt can be expressions
 create function raise_exprs() returns void as $$
 declare
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index f41bef1..21817d8 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1034,25 +1034,22 @@ EXPLAIN (COSTS OFF) EXECUTE p2(2);
 --
 SET SESSION AUTHORIZATION rls_regress_user1;
 EXPLAIN (COSTS OFF) UPDATE t1 SET b = b || b WHERE f_leak(b);
-                QUERY PLAN                 
--------------------------------------------
+             QUERY PLAN              
+-------------------------------------
  Update on t1 t1_3
    ->  Subquery Scan on t1
          Filter: f_leak(t1.b)
-         ->  LockRows
-               ->  Seq Scan on t1 t1_4
-                     Filter: ((a % 2) = 0)
+         ->  Seq Scan on t1 t1_4
+               Filter: ((a % 2) = 0)
    ->  Subquery Scan on t1_1
          Filter: f_leak(t1_1.b)
-         ->  LockRows
-               ->  Seq Scan on t2
-                     Filter: ((a % 2) = 0)
+         ->  Seq Scan on t2
+               Filter: ((a % 2) = 0)
    ->  Subquery Scan on t1_2
          Filter: f_leak(t1_2.b)
-         ->  LockRows
-               ->  Seq Scan on t3
-                     Filter: ((a % 2) = 0)
-(16 rows)
+         ->  Seq Scan on t3
+               Filter: ((a % 2) = 0)
+(13 rows)
 
 UPDATE t1 SET b = b || b WHERE f_leak(b);
 NOTICE:  f_leak => bbb
@@ -1061,15 +1058,14 @@ NOTICE:  f_leak => bcd
 NOTICE:  f_leak => def
 NOTICE:  f_leak => yyy
 EXPLAIN (COSTS OFF) UPDATE only t1 SET b = b || '_updt' WHERE f_leak(b);
-                QUERY PLAN                 
--------------------------------------------
+             QUERY PLAN              
+-------------------------------------
  Update on t1 t1_1
    ->  Subquery Scan on t1
          Filter: f_leak(t1.b)
-         ->  LockRows
-               ->  Seq Scan on t1 t1_2
-                     Filter: ((a % 2) = 0)
-(6 rows)
+         ->  Seq Scan on t1 t1_2
+               Filter: ((a % 2) = 0)
+(5 rows)
 
 UPDATE only t1 SET b = b || '_updt' WHERE f_leak(b);
 NOTICE:  f_leak => bbbbbb
@@ -1135,36 +1131,32 @@ SELECT * FROM t1;
 SET SESSION AUTHORIZATION rls_regress_user1;
 SET row_security TO ON;
 EXPLAIN (COSTS OFF) DELETE FROM only t1 WHERE f_leak(b);
-                QUERY PLAN                 
--------------------------------------------
+             QUERY PLAN              
+-------------------------------------
  Delete on t1 t1_1
    ->  Subquery Scan on t1
          Filter: f_leak(t1.b)
-         ->  LockRows
-               ->  Seq Scan on t1 t1_2
-                     Filter: ((a % 2) = 0)
-(6 rows)
+         ->  Seq Scan on t1 t1_2
+               Filter: ((a % 2) = 0)
+(5 rows)
 
 EXPLAIN (COSTS OFF) DELETE FROM t1 WHERE f_leak(b);
-                QUERY PLAN                 
--------------------------------------------
+             QUERY PLAN              
+-------------------------------------
  Delete on t1 t1_3
    ->  Subquery Scan on t1
          Filter: f_leak(t1.b)
-         ->  LockRows
-               ->  Seq Scan on t1 t1_4
-                     Filter: ((a % 2) = 0)
+         ->  Seq Scan on t1 t1_4
+               Filter: ((a % 2) = 0)
    ->  Subquery Scan on t1_1
          Filter: f_leak(t1_1.b)
-         ->  LockRows
-               ->  Seq Scan on t2
-                     Filter: ((a % 2) = 0)
+         ->  Seq Scan on t2
+               Filter: ((a % 2) = 0)
    ->  Subquery Scan on t1_2
          Filter: f_leak(t1_2.b)
-         ->  LockRows
-               ->  Seq Scan on t3
-                     Filter: ((a % 2) = 0)
-(16 rows)
+         ->  Seq Scan on t3
+               Filter: ((a % 2) = 0)
+(13 rows)
 
 DELETE FROM only t1 WHERE f_leak(b) RETURNING oid, *, t1;
 NOTICE:  f_leak => bbbbbb_updt
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 26c60e4..d50b103 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2507,7 +2507,6 @@ select * from only t1_2;
  19
 (10 rows)
 
-reset constraint_exclusion;
 -- test various flavors of pg_get_viewdef()
 select pg_get_viewdef('shoe'::regclass) as unpretty;
                     unpretty                    
@@ -2679,56 +2678,3 @@ ALTER RULE "_RETURN" ON rule_v1 RENAME TO abc; -- ON SELECT rule cannot be renam
 ERROR:  renaming an ON SELECT rule is not allowed
 DROP VIEW rule_v1;
 DROP TABLE rule_t1;
---
--- check display of VALUES in view definitions
---
-create view rule_v1 as values(1,2);
-\d+ rule_v1
-                 View "public.rule_v1"
- Column  |  Type   | Modifiers | Storage | Description 
----------+---------+-----------+---------+-------------
- column1 | integer |           | plain   | 
- column2 | integer |           | plain   | 
-View definition:
- VALUES (1,2);
-
-drop view rule_v1;
-create view rule_v1(x) as values(1,2);
-\d+ rule_v1
-                 View "public.rule_v1"
- Column  |  Type   | Modifiers | Storage | Description 
----------+---------+-----------+---------+-------------
- x       | integer |           | plain   | 
- column2 | integer |           | plain   | 
-View definition:
- SELECT "*VALUES*".column1 AS x,
-    "*VALUES*".column2
-   FROM (VALUES (1,2)) "*VALUES*";
-
-drop view rule_v1;
-create view rule_v1(x) as select * from (values(1,2)) v;
-\d+ rule_v1
-                 View "public.rule_v1"
- Column  |  Type   | Modifiers | Storage | Description 
----------+---------+-----------+---------+-------------
- x       | integer |           | plain   | 
- column2 | integer |           | plain   | 
-View definition:
- SELECT v.column1 AS x,
-    v.column2
-   FROM ( VALUES (1,2)) v;
-
-drop view rule_v1;
-create view rule_v1(x) as select * from (values(1,2)) v(q,w);
-\d+ rule_v1
-                View "public.rule_v1"
- Column |  Type   | Modifiers | Storage | Description 
---------+---------+-----------+---------+-------------
- x      | integer |           | plain   | 
- w      | integer |           | plain   | 
-View definition:
- SELECT v.q AS x,
-    v.w
-   FROM ( VALUES (1,2)) v(q, w);
-
-drop view rule_v1;
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index f5be70f..ec0ff65 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -28,8 +28,7 @@ SELECT pg_sleep_for('2 seconds');
 CREATE TEMP TABLE prevstats AS
 SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch,
        (b.heap_blks_read + b.heap_blks_hit) AS heap_blks,
-       (b.idx_blks_read + b.idx_blks_hit) AS idx_blks,
-       pg_stat_get_snapshot_timestamp() as snap_ts
+       (b.idx_blks_read + b.idx_blks_hit) AS idx_blks
   FROM pg_catalog.pg_stat_user_tables AS t,
        pg_catalog.pg_statio_user_tables AS b
  WHERE t.relname='tenk2' AND b.relname='tenk2';
@@ -62,57 +61,6 @@ begin
     extract(epoch from clock_timestamp() - start_time);
 end
 $$ language plpgsql;
--- test effects of TRUNCATE on n_live_tup/n_dead_tup counters
-CREATE TABLE trunc_stats_test(id serial);
-CREATE TABLE trunc_stats_test1(id serial);
-CREATE TABLE trunc_stats_test2(id serial);
-CREATE TABLE trunc_stats_test3(id serial);
-CREATE TABLE trunc_stats_test4(id serial);
--- check that n_live_tup is reset to 0 after truncate
-INSERT INTO trunc_stats_test DEFAULT VALUES;
-INSERT INTO trunc_stats_test DEFAULT VALUES;
-INSERT INTO trunc_stats_test DEFAULT VALUES;
-TRUNCATE trunc_stats_test;
--- test involving a truncate in a transaction; 4 ins but only 1 live
-INSERT INTO trunc_stats_test1 DEFAULT VALUES;
-INSERT INTO trunc_stats_test1 DEFAULT VALUES;
-INSERT INTO trunc_stats_test1 DEFAULT VALUES;
-UPDATE trunc_stats_test1 SET id = id + 10 WHERE id IN (1, 2);
-DELETE FROM trunc_stats_test1 WHERE id = 3;
-BEGIN;
-UPDATE trunc_stats_test1 SET id = id + 100;
-TRUNCATE trunc_stats_test1;
-INSERT INTO trunc_stats_test1 DEFAULT VALUES;
-COMMIT;
--- use a savepoint: 1 insert, 1 live
-BEGIN;
-INSERT INTO trunc_stats_test2 DEFAULT VALUES;
-INSERT INTO trunc_stats_test2 DEFAULT VALUES;
-SAVEPOINT p1;
-INSERT INTO trunc_stats_test2 DEFAULT VALUES;
-TRUNCATE trunc_stats_test2;
-INSERT INTO trunc_stats_test2 DEFAULT VALUES;
-RELEASE SAVEPOINT p1;
-COMMIT;
--- rollback a savepoint: this should count 4 inserts and have 2
--- live tuples after commit (and 2 dead ones due to aborted subxact)
-BEGIN;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-SAVEPOINT p1;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-TRUNCATE trunc_stats_test3;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-ROLLBACK TO SAVEPOINT p1;
-COMMIT;
--- rollback a truncate: this should count 2 inserts and produce 2 dead tuples
-BEGIN;
-INSERT INTO trunc_stats_test4 DEFAULT VALUES;
-INSERT INTO trunc_stats_test4 DEFAULT VALUES;
-TRUNCATE trunc_stats_test4;
-INSERT INTO trunc_stats_test4 DEFAULT VALUES;
-ROLLBACK;
 -- do a seqscan
 SELECT count(*) FROM tenk2;
  count 
@@ -143,18 +91,6 @@ SELECT wait_for_stats();
 (1 row)
 
 -- check effects
-SELECT relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup
-  FROM pg_stat_user_tables
- WHERE relname like 'trunc_stats_test%' order by relname;
-      relname      | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup 
--------------------+-----------+-----------+-----------+------------+------------
- trunc_stats_test  |         3 |         0 |         0 |          0 |          0
- trunc_stats_test1 |         4 |         2 |         1 |          1 |          0
- trunc_stats_test2 |         1 |         0 |         0 |          1 |          0
- trunc_stats_test3 |         4 |         0 |         0 |          2 |          2
- trunc_stats_test4 |         2 |         0 |         0 |          0 |          2
-(5 rows)
-
 SELECT st.seq_scan >= pr.seq_scan + 1,
        st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
        st.idx_scan >= pr.idx_scan + 1,
@@ -175,12 +111,4 @@ SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
  t        | t
 (1 row)
 
-SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer
-FROM prevstats AS pr;
- snapshot_newer 
-----------------
- t
-(1 row)
-
-DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4;
 -- End of Stats Test
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index c49e769..80c5706 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1842,26 +1842,24 @@ EXPLAIN (costs off) SELECT * FROM rw_view1 WHERE snoop(person);
 (4 rows)
 
 EXPLAIN (costs off) UPDATE rw_view1 SET person=person WHERE snoop(person);
-                        QUERY PLAN                         
------------------------------------------------------------
+                     QUERY PLAN                      
+-----------------------------------------------------
  Update on base_tbl base_tbl_1
    ->  Subquery Scan on base_tbl
          Filter: snoop(base_tbl.person)
-         ->  LockRows
-               ->  Seq Scan on base_tbl base_tbl_2
-                     Filter: (visibility = 'public'::text)
-(6 rows)
+         ->  Seq Scan on base_tbl base_tbl_2
+               Filter: (visibility = 'public'::text)
+(5 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view1 WHERE NOT snoop(person);
-                        QUERY PLAN                         
------------------------------------------------------------
+                     QUERY PLAN                      
+-----------------------------------------------------
  Delete on base_tbl base_tbl_1
    ->  Subquery Scan on base_tbl
          Filter: (NOT snoop(base_tbl.person))
-         ->  LockRows
-               ->  Seq Scan on base_tbl base_tbl_2
-                     Filter: (visibility = 'public'::text)
-(6 rows)
+         ->  Seq Scan on base_tbl base_tbl_2
+               Filter: (visibility = 'public'::text)
+(5 rows)
 
 -- security barrier view on top of security barrier view
 CREATE VIEW rw_view2 WITH (security_barrier = true) AS
@@ -1924,30 +1922,28 @@ EXPLAIN (costs off) SELECT * FROM rw_view2 WHERE snoop(person);
 (6 rows)
 
 EXPLAIN (costs off) UPDATE rw_view2 SET person=person WHERE snoop(person);
-                           QUERY PLAN                            
------------------------------------------------------------------
+                        QUERY PLAN                         
+-----------------------------------------------------------
  Update on base_tbl base_tbl_1
    ->  Subquery Scan on base_tbl
          Filter: snoop(base_tbl.person)
          ->  Subquery Scan on base_tbl_2
                Filter: snoop(base_tbl_2.person)
-               ->  LockRows
-                     ->  Seq Scan on base_tbl base_tbl_3
-                           Filter: (visibility = 'public'::text)
-(8 rows)
+               ->  Seq Scan on base_tbl base_tbl_3
+                     Filter: (visibility = 'public'::text)
+(7 rows)
 
 EXPLAIN (costs off) DELETE FROM rw_view2 WHERE NOT snoop(person);
-                           QUERY PLAN                            
------------------------------------------------------------------
+                        QUERY PLAN                         
+-----------------------------------------------------------
  Delete on base_tbl base_tbl_1
    ->  Subquery Scan on base_tbl
          Filter: (NOT snoop(base_tbl.person))
          ->  Subquery Scan on base_tbl_2
                Filter: snoop(base_tbl_2.person)
-               ->  LockRows
-                     ->  Seq Scan on base_tbl base_tbl_3
-                           Filter: (visibility = 'public'::text)
-(8 rows)
+               ->  Seq Scan on base_tbl base_tbl_3
+                     Filter: (visibility = 'public'::text)
+(7 rows)
 
 DROP TABLE base_tbl CASCADE;
 NOTICE:  drop cascades to 2 other objects
@@ -2061,78 +2057,70 @@ SELECT * FROM v1 WHERE a=8;
 
 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a = 3;
-                                                             QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------------
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
  Update on public.t1 t1_4
    ->  Subquery Scan on t1
          Output: 100, t1.b, t1.c, t1.ctid
          Filter: snoop(t1.a)
-         ->  LockRows
-               Output: t1_5.ctid, t1_5.a, t1_5.b, t1_5.c, t1_5.ctid, t12.ctid, t12.tableoid
-               ->  Nested Loop Semi Join
-                     Output: t1_5.ctid, t1_5.a, t1_5.b, t1_5.c, t1_5.ctid, t12.ctid, t12.tableoid
-                     ->  Seq Scan on public.t1 t1_5
-                           Output: t1_5.ctid, t1_5.a, t1_5.b, t1_5.c
-                           Filter: ((t1_5.a > 5) AND (t1_5.a = 3) AND leakproof(t1_5.a))
-                     ->  Append
-                           ->  Seq Scan on public.t12
-                                 Output: t12.ctid, t12.tableoid, t12.a
-                                 Filter: (t12.a = 3)
-                           ->  Seq Scan on public.t111
-                                 Output: t111.ctid, t111.tableoid, t111.a
-                                 Filter: (t111.a = 3)
+         ->  Nested Loop Semi Join
+               Output: t1_5.ctid, t1_5.a, t1_5.b, t1_5.c
+               ->  Seq Scan on public.t1 t1_5
+                     Output: t1_5.ctid, t1_5.a, t1_5.b, t1_5.c
+                     Filter: ((t1_5.a > 5) AND (t1_5.a = 3) AND leakproof(t1_5.a))
+               ->  Append
+                     ->  Seq Scan on public.t12
+                           Output: t12.a
+                           Filter: (t12.a = 3)
+                     ->  Seq Scan on public.t111
+                           Output: t111.a
+                           Filter: (t111.a = 3)
    ->  Subquery Scan on t1_1
          Output: 100, t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
          Filter: snoop(t1_1.a)
-         ->  LockRows
-               Output: t11.ctid, t11.a, t11.b, t11.c, t11.d, t11.ctid, t12_1.ctid, t12_1.tableoid
-               ->  Nested Loop Semi Join
-                     Output: t11.ctid, t11.a, t11.b, t11.c, t11.d, t11.ctid, t12_1.ctid, t12_1.tableoid
-                     ->  Seq Scan on public.t11
-                           Output: t11.ctid, t11.a, t11.b, t11.c, t11.d
-                           Filter: ((t11.a > 5) AND (t11.a = 3) AND leakproof(t11.a))
-                     ->  Append
-                           ->  Seq Scan on public.t12 t12_1
-                                 Output: t12_1.ctid, t12_1.tableoid, t12_1.a
-                                 Filter: (t12_1.a = 3)
-                           ->  Seq Scan on public.t111 t111_1
-                                 Output: t111_1.ctid, t111_1.tableoid, t111_1.a
-                                 Filter: (t111_1.a = 3)
+         ->  Nested Loop Semi Join
+               Output: t11.ctid, t11.a, t11.b, t11.c, t11.d
+               ->  Seq Scan on public.t11
+                     Output: t11.ctid, t11.a, t11.b, t11.c, t11.d
+                     Filter: ((t11.a > 5) AND (t11.a = 3) AND leakproof(t11.a))
+               ->  Append
+                     ->  Seq Scan on public.t12 t12_1
+                           Output: t12_1.a
+                           Filter: (t12_1.a = 3)
+                     ->  Seq Scan on public.t111 t111_1
+                           Output: t111_1.a
+                           Filter: (t111_1.a = 3)
    ->  Subquery Scan on t1_2
          Output: 100, t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
          Filter: snoop(t1_2.a)
-         ->  LockRows
-               Output: t12_2.ctid, t12_2.a, t12_2.b, t12_2.c, t12_2.e, t12_2.ctid, t12_3.ctid, t12_3.tableoid
-               ->  Nested Loop Semi Join
-                     Output: t12_2.ctid, t12_2.a, t12_2.b, t12_2.c, t12_2.e, t12_2.ctid, t12_3.ctid, t12_3.tableoid
-                     ->  Seq Scan on public.t12 t12_2
-                           Output: t12_2.ctid, t12_2.a, t12_2.b, t12_2.c, t12_2.e
-                           Filter: ((t12_2.a > 5) AND (t12_2.a = 3) AND leakproof(t12_2.a))
-                     ->  Append
-                           ->  Seq Scan on public.t12 t12_3
-                                 Output: t12_3.ctid, t12_3.tableoid, t12_3.a
-                                 Filter: (t12_3.a = 3)
-                           ->  Seq Scan on public.t111 t111_2
-                                 Output: t111_2.ctid, t111_2.tableoid, t111_2.a
-                                 Filter: (t111_2.a = 3)
+         ->  Nested Loop Semi Join
+               Output: t12_2.ctid, t12_2.a, t12_2.b, t12_2.c, t12_2.e
+               ->  Seq Scan on public.t12 t12_2
+                     Output: t12_2.ctid, t12_2.a, t12_2.b, t12_2.c, t12_2.e
+                     Filter: ((t12_2.a > 5) AND (t12_2.a = 3) AND leakproof(t12_2.a))
+               ->  Append
+                     ->  Seq Scan on public.t12 t12_3
+                           Output: t12_3.a
+                           Filter: (t12_3.a = 3)
+                     ->  Seq Scan on public.t111 t111_2
+                           Output: t111_2.a
+                           Filter: (t111_2.a = 3)
    ->  Subquery Scan on t1_3
          Output: 100, t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
          Filter: snoop(t1_3.a)
-         ->  LockRows
-               Output: t111_3.ctid, t111_3.a, t111_3.b, t111_3.c, t111_3.d, t111_3.e, t111_3.ctid, t12_4.ctid, t12_4.tableoid
-               ->  Nested Loop Semi Join
-                     Output: t111_3.ctid, t111_3.a, t111_3.b, t111_3.c, t111_3.d, t111_3.e, t111_3.ctid, t12_4.ctid, t12_4.tableoid
-                     ->  Seq Scan on public.t111 t111_3
-                           Output: t111_3.ctid, t111_3.a, t111_3.b, t111_3.c, t111_3.d, t111_3.e
-                           Filter: ((t111_3.a > 5) AND (t111_3.a = 3) AND leakproof(t111_3.a))
-                     ->  Append
-                           ->  Seq Scan on public.t12 t12_4
-                                 Output: t12_4.ctid, t12_4.tableoid, t12_4.a
-                                 Filter: (t12_4.a = 3)
-                           ->  Seq Scan on public.t111 t111_4
-                                 Output: t111_4.ctid, t111_4.tableoid, t111_4.a
-                                 Filter: (t111_4.a = 3)
-(69 rows)
+         ->  Nested Loop Semi Join
+               Output: t111_3.ctid, t111_3.a, t111_3.b, t111_3.c, t111_3.d, t111_3.e
+               ->  Seq Scan on public.t111 t111_3
+                     Output: t111_3.ctid, t111_3.a, t111_3.b, t111_3.c, t111_3.d, t111_3.e
+                     Filter: ((t111_3.a > 5) AND (t111_3.a = 3) AND leakproof(t111_3.a))
+               ->  Append
+                     ->  Seq Scan on public.t12 t12_4
+                           Output: t12_4.a
+                           Filter: (t12_4.a = 3)
+                     ->  Seq Scan on public.t111 t111_4
+                           Output: t111_4.a
+                           Filter: (t111_4.a = 3)
+(61 rows)
 
 UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a = 3;
 SELECT * FROM v1 WHERE a=100; -- Nothing should have been changed to 100
@@ -2147,78 +2135,70 @@ SELECT * FROM t1 WHERE a=100; -- Nothing should have been changed to 100
 
 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
-                                                             QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------------
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
  Update on public.t1 t1_4
    ->  Subquery Scan on t1
          Output: (t1.a + 1), t1.b, t1.c, t1.ctid
          Filter: snoop(t1.a)
-         ->  LockRows
-               Output: t1_5.a, t1_5.ctid, t1_5.b, t1_5.c, t1_5.ctid, t12.ctid, t12.tableoid
-               ->  Nested Loop Semi Join
-                     Output: t1_5.a, t1_5.ctid, t1_5.b, t1_5.c, t1_5.ctid, t12.ctid, t12.tableoid
-                     ->  Seq Scan on public.t1 t1_5
-                           Output: t1_5.a, t1_5.ctid, t1_5.b, t1_5.c
-                           Filter: ((t1_5.a > 5) AND (t1_5.a = 8) AND leakproof(t1_5.a))
-                     ->  Append
-                           ->  Seq Scan on public.t12
-                                 Output: t12.ctid, t12.tableoid, t12.a
-                                 Filter: (t12.a = 8)
-                           ->  Seq Scan on public.t111
-                                 Output: t111.ctid, t111.tableoid, t111.a
-                                 Filter: (t111.a = 8)
+         ->  Nested Loop Semi Join
+               Output: t1_5.a, t1_5.ctid, t1_5.b, t1_5.c
+               ->  Seq Scan on public.t1 t1_5
+                     Output: t1_5.a, t1_5.ctid, t1_5.b, t1_5.c
+                     Filter: ((t1_5.a > 5) AND (t1_5.a = 8) AND leakproof(t1_5.a))
+               ->  Append
+                     ->  Seq Scan on public.t12
+                           Output: t12.a
+                           Filter: (t12.a = 8)
+                     ->  Seq Scan on public.t111
+                           Output: t111.a
+                           Filter: (t111.a = 8)
    ->  Subquery Scan on t1_1
          Output: (t1_1.a + 1), t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
          Filter: snoop(t1_1.a)
-         ->  LockRows
-               Output: t11.a, t11.ctid, t11.b, t11.c, t11.d, t11.ctid, t12_1.ctid, t12_1.tableoid
-               ->  Nested Loop Semi Join
-                     Output: t11.a, t11.ctid, t11.b, t11.c, t11.d, t11.ctid, t12_1.ctid, t12_1.tableoid
-                     ->  Seq Scan on public.t11
-                           Output: t11.a, t11.ctid, t11.b, t11.c, t11.d
-                           Filter: ((t11.a > 5) AND (t11.a = 8) AND leakproof(t11.a))
-                     ->  Append
-                           ->  Seq Scan on public.t12 t12_1
-                                 Output: t12_1.ctid, t12_1.tableoid, t12_1.a
-                                 Filter: (t12_1.a = 8)
-                           ->  Seq Scan on public.t111 t111_1
-                                 Output: t111_1.ctid, t111_1.tableoid, t111_1.a
-                                 Filter: (t111_1.a = 8)
+         ->  Nested Loop Semi Join
+               Output: t11.a, t11.ctid, t11.b, t11.c, t11.d
+               ->  Seq Scan on public.t11
+                     Output: t11.a, t11.ctid, t11.b, t11.c, t11.d
+                     Filter: ((t11.a > 5) AND (t11.a = 8) AND leakproof(t11.a))
+               ->  Append
+                     ->  Seq Scan on public.t12 t12_1
+                           Output: t12_1.a
+                           Filter: (t12_1.a = 8)
+                     ->  Seq Scan on public.t111 t111_1
+                           Output: t111_1.a
+                           Filter: (t111_1.a = 8)
    ->  Subquery Scan on t1_2
          Output: (t1_2.a + 1), t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
          Filter: snoop(t1_2.a)
-         ->  LockRows
-               Output: t12_2.a, t12_2.ctid, t12_2.b, t12_2.c, t12_2.e, t12_2.ctid, t12_3.ctid, t12_3.tableoid
-               ->  Nested Loop Semi Join
-                     Output: t12_2.a, t12_2.ctid, t12_2.b, t12_2.c, t12_2.e, t12_2.ctid, t12_3.ctid, t12_3.tableoid
-                     ->  Seq Scan on public.t12 t12_2
-                           Output: t12_2.a, t12_2.ctid, t12_2.b, t12_2.c, t12_2.e
-                           Filter: ((t12_2.a > 5) AND (t12_2.a = 8) AND leakproof(t12_2.a))
-                     ->  Append
-                           ->  Seq Scan on public.t12 t12_3
-                                 Output: t12_3.ctid, t12_3.tableoid, t12_3.a
-                                 Filter: (t12_3.a = 8)
-                           ->  Seq Scan on public.t111 t111_2
-                                 Output: t111_2.ctid, t111_2.tableoid, t111_2.a
-                                 Filter: (t111_2.a = 8)
+         ->  Nested Loop Semi Join
+               Output: t12_2.a, t12_2.ctid, t12_2.b, t12_2.c, t12_2.e
+               ->  Seq Scan on public.t12 t12_2
+                     Output: t12_2.a, t12_2.ctid, t12_2.b, t12_2.c, t12_2.e
+                     Filter: ((t12_2.a > 5) AND (t12_2.a = 8) AND leakproof(t12_2.a))
+               ->  Append
+                     ->  Seq Scan on public.t12 t12_3
+                           Output: t12_3.a
+                           Filter: (t12_3.a = 8)
+                     ->  Seq Scan on public.t111 t111_2
+                           Output: t111_2.a
+                           Filter: (t111_2.a = 8)
    ->  Subquery Scan on t1_3
          Output: (t1_3.a + 1), t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
          Filter: snoop(t1_3.a)
-         ->  LockRows
-               Output: t111_3.a, t111_3.ctid, t111_3.b, t111_3.c, t111_3.d, t111_3.e, t111_3.ctid, t12_4.ctid, t12_4.tableoid
-               ->  Nested Loop Semi Join
-                     Output: t111_3.a, t111_3.ctid, t111_3.b, t111_3.c, t111_3.d, t111_3.e, t111_3.ctid, t12_4.ctid, t12_4.tableoid
-                     ->  Seq Scan on public.t111 t111_3
-                           Output: t111_3.a, t111_3.ctid, t111_3.b, t111_3.c, t111_3.d, t111_3.e
-                           Filter: ((t111_3.a > 5) AND (t111_3.a = 8) AND leakproof(t111_3.a))
-                     ->  Append
-                           ->  Seq Scan on public.t12 t12_4
-                                 Output: t12_4.ctid, t12_4.tableoid, t12_4.a
-                                 Filter: (t12_4.a = 8)
-                           ->  Seq Scan on public.t111 t111_4
-                                 Output: t111_4.ctid, t111_4.tableoid, t111_4.a
-                                 Filter: (t111_4.a = 8)
-(69 rows)
+         ->  Nested Loop Semi Join
+               Output: t111_3.a, t111_3.ctid, t111_3.b, t111_3.c, t111_3.d, t111_3.e
+               ->  Seq Scan on public.t111 t111_3
+                     Output: t111_3.a, t111_3.ctid, t111_3.b, t111_3.c, t111_3.d, t111_3.e
+                     Filter: ((t111_3.a > 5) AND (t111_3.a = 8) AND leakproof(t111_3.a))
+               ->  Append
+                     ->  Seq Scan on public.t12 t12_4
+                           Output: t12_4.a
+                           Filter: (t12_4.a = 8)
+                     ->  Seq Scan on public.t111 t111_4
+                           Output: t111_4.a
+                           Filter: (t111_4.a = 8)
+(61 rows)
 
 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
 NOTICE:  snooped value: 8
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 524e0ef..06b372b 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -2155,18 +2155,3 @@ WITH t AS (
 VALUES(FALSE);
 ERROR:  conditional DO INSTEAD rules are not supported for data-modifying statements in WITH
 DROP RULE y_rule ON y;
--- check that parser lookahead for WITH doesn't cause any odd behavior
-create table foo (with baz);  -- fail, WITH is a reserved word
-ERROR:  syntax error at or near "with"
-LINE 1: create table foo (with baz);
-                          ^
-create table foo (with ordinality);  -- fail, WITH is a reserved word
-ERROR:  syntax error at or near "with"
-LINE 1: create table foo (with ordinality);
-                          ^
-with ordinality as (select 1 as x) select * from ordinality;
- x 
----
- 1
-(1 row)
-
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 3af0e57..0de1af6 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -613,7 +613,7 @@ convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c
 static void
 convert_sourcefiles(void)
 {
-	convert_sourcefiles_in("input", outputdir, "sql", "sql");
+	convert_sourcefiles_in("input", inputdir, "sql", "sql");
 	convert_sourcefiles_in("output", outputdir, "expected", "out");
 }
 
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 39a9deb..248055f 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -308,25 +308,6 @@ BEGIN;
 COPY forcetest (d, e) FROM STDIN WITH (FORMAT csv, FORCE_NULL(b));
 ROLLBACK;
 \pset null ''
-
--- test case with whole-row Var in a check constraint
-create table check_con_tbl (f1 int);
-create function check_con_function(check_con_tbl) returns bool as $$
-begin
-  raise notice 'input = %', row_to_json($1);
-  return $1.f1 > 0;
-end $$ language plpgsql immutable;
-alter table check_con_tbl add check (check_con_function(check_con_tbl.*));
-\d+ check_con_tbl
-copy check_con_tbl from stdin;
-1
-\N
-\.
-copy check_con_tbl from stdin;
-0
-\.
-select * from check_con_tbl;
-
 DROP TABLE forcetest;
 DROP TABLE vistest;
 DROP FUNCTION truncate_in_subxact();
diff --git a/src/test/regress/sql/domain.sql b/src/test/regress/sql/domain.sql
index ab1fcd3..5af36af 100644
--- a/src/test/regress/sql/domain.sql
+++ b/src/test/regress/sql/domain.sql
@@ -487,33 +487,6 @@ select array_elem_check(-1);
 
 drop function array_elem_check(int);
 
---
--- Check enforcement of changing constraints in plpgsql
---
-
-create domain di as int;
-
-create function dom_check(int) returns di as $$
-declare d di;
-begin
-  d := $1;
-  return d;
-end
-$$ language plpgsql immutable;
-
-select dom_check(0);
-
-alter domain di add constraint pos check (value > 0);
-
-select dom_check(0); -- fail
-
-alter domain di drop constraint pos;
-
-select dom_check(0);
-
-drop function dom_check(int);
-
-drop domain di;
 
 --
 -- Renaming
diff --git a/src/test/regress/sql/event_trigger.sql b/src/test/regress/sql/event_trigger.sql
index bd672e1..c6e47ed 100644
--- a/src/test/regress/sql/event_trigger.sql
+++ b/src/test/regress/sql/event_trigger.sql
@@ -276,25 +276,6 @@ alter table rewriteme
 -- shouldn't trigger a table_rewrite event
 alter table rewriteme alter column foo type numeric(12,4);
 
--- typed tables are rewritten when their type changes.  Don't emit table
--- name, because firing order is not stable.
-CREATE OR REPLACE FUNCTION test_evtrig_no_rewrite() RETURNS event_trigger
-LANGUAGE plpgsql AS $$
-BEGIN
-  RAISE NOTICE 'Table is being rewritten (reason = %)',
-               pg_event_trigger_table_rewrite_reason();
-END;
-$$;
-
-create type rewritetype as (a int);
-create table rewritemetoo1 of rewritetype;
-create table rewritemetoo2 of rewritetype;
-alter type rewritetype alter attribute a type text cascade;
-
--- but this doesn't work
-create table rewritemetoo3 (a rewritetype);
-alter type rewritetype alter attribute a type varchar cascade;
-
 drop table rewriteme;
 drop event trigger no_rewrite_allowed;
 drop function test_evtrig_no_rewrite();
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6005476..718e1d9 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -764,15 +764,6 @@ select * from
 where thousand = (q1 + q2);
 
 --
--- test ability to generate a suitable plan for a star-schema query
---
-
-explain (costs off)
-select * from
-  tenk1, int8_tbl a, int8_tbl b
-where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
-
---
 -- test extraction of restriction OR clauses from join OR clause
 -- (we used to only do this for indexable clauses)
 --
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
index 53832a0..53a37a8 100644
--- a/src/test/regress/sql/json.sql
+++ b/src/test/regress/sql/json.sql
@@ -111,12 +111,6 @@ SET LOCAL TIME ZONE -8;
 select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
 COMMIT;
 
-select to_json(date '2014-05-28');
-
-select to_json(date 'Infinity');
-select to_json(timestamp 'Infinity');
-select to_json(timestamptz 'Infinity');
-
 --json_agg
 
 SELECT json_agg(q)
diff --git a/src/test/regress/sql/jsonb.sql b/src/test/regress/sql/jsonb.sql
index 676e1a7..53cc239 100644
--- a/src/test/regress/sql/jsonb.sql
+++ b/src/test/regress/sql/jsonb.sql
@@ -74,12 +74,6 @@ SET LOCAL TIME ZONE -8;
 select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04');
 COMMIT;
 
-select to_jsonb(date '2014-05-28');
-
-select to_jsonb(date 'Infinity');
-select to_jsonb(timestamp 'Infinity');
-select to_jsonb(timestamptz 'Infinity');
-
 --jsonb_agg
 
 CREATE TEMP TABLE rows AS
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index 001138e..a0840c9 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -2246,19 +2246,11 @@ begin
 	    raise notice '% %', sqlstate, sqlerrm;
     end;
 end; $$ language plpgsql;
-select excpt_test3();
-
-create function excpt_test4() returns text as $$
-begin
-	begin perform 1/0;
-	exception when others then return sqlerrm; end;
-end; $$ language plpgsql;
-select excpt_test4();
 
+select excpt_test3();
 drop function excpt_test1();
 drop function excpt_test2();
 drop function excpt_test3();
-drop function excpt_test4();
 
 -- parameters of raise stmt can be expressions
 create function raise_exprs() returns void as $$
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index c385e41..1e15f84 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -953,8 +953,6 @@ select * from only t1;
 select * from only t1_1;
 select * from only t1_2;
 
-reset constraint_exclusion;
-
 -- test various flavors of pg_get_viewdef()
 
 select pg_get_viewdef('shoe'::regclass) as unpretty;
@@ -1009,19 +1007,3 @@ ALTER RULE "_RETURN" ON rule_v1 RENAME TO abc; -- ON SELECT rule cannot be renam
 
 DROP VIEW rule_v1;
 DROP TABLE rule_t1;
-
---
--- check display of VALUES in view definitions
---
-create view rule_v1 as values(1,2);
-\d+ rule_v1
-drop view rule_v1;
-create view rule_v1(x) as values(1,2);
-\d+ rule_v1
-drop view rule_v1;
-create view rule_v1(x) as select * from (values(1,2)) v;
-\d+ rule_v1
-drop view rule_v1;
-create view rule_v1(x) as select * from (values(1,2)) v(q,w);
-\d+ rule_v1
-drop view rule_v1;
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index cd2d592..646b9ac 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -22,8 +22,7 @@ SELECT pg_sleep_for('2 seconds');
 CREATE TEMP TABLE prevstats AS
 SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch,
        (b.heap_blks_read + b.heap_blks_hit) AS heap_blks,
-       (b.idx_blks_read + b.idx_blks_hit) AS idx_blks,
-       pg_stat_get_snapshot_timestamp() as snap_ts
+       (b.idx_blks_read + b.idx_blks_hit) AS idx_blks
   FROM pg_catalog.pg_stat_user_tables AS t,
        pg_catalog.pg_statio_user_tables AS b
  WHERE t.relname='tenk2' AND b.relname='tenk2';
@@ -58,64 +57,6 @@ begin
 end
 $$ language plpgsql;
 
--- test effects of TRUNCATE on n_live_tup/n_dead_tup counters
-CREATE TABLE trunc_stats_test(id serial);
-CREATE TABLE trunc_stats_test1(id serial);
-CREATE TABLE trunc_stats_test2(id serial);
-CREATE TABLE trunc_stats_test3(id serial);
-CREATE TABLE trunc_stats_test4(id serial);
-
--- check that n_live_tup is reset to 0 after truncate
-INSERT INTO trunc_stats_test DEFAULT VALUES;
-INSERT INTO trunc_stats_test DEFAULT VALUES;
-INSERT INTO trunc_stats_test DEFAULT VALUES;
-TRUNCATE trunc_stats_test;
-
--- test involving a truncate in a transaction; 4 ins but only 1 live
-INSERT INTO trunc_stats_test1 DEFAULT VALUES;
-INSERT INTO trunc_stats_test1 DEFAULT VALUES;
-INSERT INTO trunc_stats_test1 DEFAULT VALUES;
-UPDATE trunc_stats_test1 SET id = id + 10 WHERE id IN (1, 2);
-DELETE FROM trunc_stats_test1 WHERE id = 3;
-
-BEGIN;
-UPDATE trunc_stats_test1 SET id = id + 100;
-TRUNCATE trunc_stats_test1;
-INSERT INTO trunc_stats_test1 DEFAULT VALUES;
-COMMIT;
-
--- use a savepoint: 1 insert, 1 live
-BEGIN;
-INSERT INTO trunc_stats_test2 DEFAULT VALUES;
-INSERT INTO trunc_stats_test2 DEFAULT VALUES;
-SAVEPOINT p1;
-INSERT INTO trunc_stats_test2 DEFAULT VALUES;
-TRUNCATE trunc_stats_test2;
-INSERT INTO trunc_stats_test2 DEFAULT VALUES;
-RELEASE SAVEPOINT p1;
-COMMIT;
-
--- rollback a savepoint: this should count 4 inserts and have 2
--- live tuples after commit (and 2 dead ones due to aborted subxact)
-BEGIN;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-SAVEPOINT p1;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-TRUNCATE trunc_stats_test3;
-INSERT INTO trunc_stats_test3 DEFAULT VALUES;
-ROLLBACK TO SAVEPOINT p1;
-COMMIT;
-
--- rollback a truncate: this should count 2 inserts and produce 2 dead tuples
-BEGIN;
-INSERT INTO trunc_stats_test4 DEFAULT VALUES;
-INSERT INTO trunc_stats_test4 DEFAULT VALUES;
-TRUNCATE trunc_stats_test4;
-INSERT INTO trunc_stats_test4 DEFAULT VALUES;
-ROLLBACK;
-
 -- do a seqscan
 SELECT count(*) FROM tenk2;
 -- do an indexscan
@@ -129,24 +70,15 @@ SELECT pg_sleep(1.0);
 SELECT wait_for_stats();
 
 -- check effects
-SELECT relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup
-  FROM pg_stat_user_tables
- WHERE relname like 'trunc_stats_test%' order by relname;
-
 SELECT st.seq_scan >= pr.seq_scan + 1,
        st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
        st.idx_scan >= pr.idx_scan + 1,
        st.idx_tup_fetch >= pr.idx_tup_fetch + 1
   FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
  WHERE st.relname='tenk2' AND cl.relname='tenk2';
-
 SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
        st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
   FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
  WHERE st.relname='tenk2' AND cl.relname='tenk2';
 
-SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer
-FROM prevstats AS pr;
-
-DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4;
 -- End of Stats Test
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 1687c11..c716369 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -956,8 +956,3 @@ WITH t AS (
 )
 VALUES(FALSE);
 DROP RULE y_rule ON y;
-
--- check that parser lookahead for WITH doesn't cause any odd behavior
-create table foo (with baz);  -- fail, WITH is a reserved word
-create table foo (with ordinality);  -- fail, WITH is a reserved word
-with ordinality as (select 1 as x) select * from ordinality;
diff --git a/src/test/ssl/Makefile b/src/test/ssl/Makefile
index d9fd29a..608cd0d 100644
--- a/src/test/ssl/Makefile
+++ b/src/test/ssl/Makefile
@@ -41,7 +41,7 @@ ssl/%.key:
 # Root CA certificate
 ssl/root_ca.crt: ssl/root_ca.key cas.config
 	touch ssl/root_ca-certindex
-	openssl req -new -out ssl/root_ca.crt -x509 -config cas.config -config root_ca.config -key ssl/root_ca.key -days 10000
+	openssl req -new -out ssl/root_ca.crt -x509 -config cas.config -config root_ca.config -key ssl/root_ca.key
 	echo "01" > ssl/root_ca.srl
 
 # Client and server CAs
diff --git a/src/test/ssl/ssl/both-cas-1.crt b/src/test/ssl/ssl/both-cas-1.crt
index abf4612..7229f50 100644
--- a/src/test/ssl/ssl/both-cas-1.crt
+++ b/src/test/ssl/ssl/both-cas-1.crt
@@ -1,39 +1,39 @@
 -----BEGIN CERTIFICATE-----
-MIIB9zCCAWACCQDrgvp38CAy8DANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
+MIIB9zCCAWACCQD13ziQMRDLGTANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
 ZXN0IHJvb3QgQ0EgZm9yIFBvc3RncmVTUUwgU1NMIHJlZ3Jlc3Npb24gdGVzdCBz
-dWl0ZTAeFw0xNTAyMTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMEAxPjA8BgNVBAMM
+dWl0ZTAeFw0xNDEyMDQxMTUyMDFaFw0xNTAxMDMxMTUyMDFaMEAxPjA8BgNVBAMM
 NVRlc3Qgcm9vdCBDQSBmb3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0
-IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCyTfGMPAjAylLr3G7c
-/QToCA3da5YZzdhd3TiQGugrJjWI4TzVB7pQ8IwDYk/jZf5TzVdEtz0B4TeIeUZl
-FLW9dMpa/8SY2TETvMTuXR5MOxyw6FMEKb3buolsIksCCQ1btEIrDZ+gv9SJXcdL
-ylU+VI1lKmn2fLNWWATzWrIUawIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAF2T84iG
-zWKXu+3PysuPOn7RuRpMgYQKouQktErNJ8hM7Yqj3vu879zUkX1rP0HGnx7xQC3d
-nBkoJ7yNDR0MwQpWo1Dj1HLKNEY6ojKJgPd0+m8nG+02yUmmOjo0oMYzJx2DQy0u
-Y4qecEd6aDbqXTo+qOJ7Qm/U+U4kD9MTT6GD
+IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDC7TLars/P/obbNlsz
+cX/wZFnZ97L4dAiJAE+ZusoTqLalRnPbQEtrPfMA/eL/gjq69ehnPcehMIxnYRAV
++xqOnMiUacf+6TQBrjrnfCQZkYkngzYajTqhQogdM7sUHtvBvTs1EkjdVznQUN9B
+BRZi6zEvUMkc8/+KaiEKc0zAKQIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAAhmmj+R
+XP1+AREKWE33P8AkXTTGkXMvULZSgteHWxbBc08TbxJLTsqDvwp0lY/9nH48Ejx5
+XYIdDAED9Bwsm50y9u5p5OsO9YqHJfIsC9+Ui3paDHU543Y8CtZC4Ye5OcFn4/lp
+ew5Ix9E0LHJlY+LCfVEKSV0jDP6aMsYETpIe
 -----END CERTIFICATE-----
 -----BEGIN CERTIFICATE-----
 MIIB8TCCAVoCAQIwDQYJKoZIhvcNAQEFBQAwQDE+MDwGA1UEAww1VGVzdCByb290
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc3VpdGUwHhcN
-MTUwMjE2MjAwNjIzWhcNNDIwNzA0MjAwNjIzWjBCMUAwPgYDVQQDDDdUZXN0IENB
+MTQxMjA0MTE1MjAxWhcNNDIwNDIxMTE1MjAxWjBCMUAwPgYDVQQDDDdUZXN0IENB
 IGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3QgY2xpZW50IGNlcnRz
-MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC8wIYcmeePSXVufP/Hn/6ICEog
-IUXqSNls5QIJR7Sate4iKGGTDEsRTxI4oDgkOYtcQNuEeXMf6k3xo+PRR08IEQNk
-XKy1zUWds6UBFboD72SyuTE2lxJBg+xOAWgl7JSNA+g8e0Y+wfhfxGZgRuqVxVNP
-9sAsfCEzGKna1l46dQIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAD20Bmina/uXTLXO
-oPWgpMmKwQu7Q6DPXxItCUdWgK1k1D82brRjH+usrkrmCW5BQNXOC/0zJS22ioC1
-CJbhAujH3iPaV0C3xsVSf+bvTL6OMkwV/9x9OdDN+LI2auEt4S+fP5ZTVsTXt4wA
-A9cQIl2Qy88enZZAFKxrScFFlstp
+MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDIJfmyeyPe6g3+16+WsGB8LRMW
+zfMXKKjxsBd3Zu2ka7jvfKe4ockw87kY01k0G4NHJgWH5zO2OuCNDOa8z+GLqSSO
+LYoMvik9+BLgFR8zBPshy77Rpb3CtpDjJUAU8TWQOT0cC56IwEgj2zswctqKIeFg
+ogkTbfg5KTNKSd4VUwIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAIHyYzFTIvvrUFFD
+yxhU3xyH6nx7HC47fxN+1kQjDa4MjvNsm/dOKETvS4b6GUKOudEKENBHzJW08hhs
+vn8uvmWEmyYcUyhp9r5lH2oaa6fySbnc+PE8YD2WNe+et1OdIMwqVwOegCeI85FN
+UtZk2tJjiRXJxBlheaaBxrzYjOBO
 -----END CERTIFICATE-----
 -----BEGIN CERTIFICATE-----
 MIIB8TCCAVoCAQEwDQYJKoZIhvcNAQEFBQAwQDE+MDwGA1UEAww1VGVzdCByb290
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc3VpdGUwHhcN
-MTUwMjE2MjAwNjIzWhcNNDIwNzA0MjAwNjIzWjBCMUAwPgYDVQQDDDdUZXN0IENB
+MTQxMjA0MTE1MjAxWhcNNDIwNDIxMTE1MjAxWjBCMUAwPgYDVQQDDDdUZXN0IENB
 IGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNlcnRz
-MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDDAYtajRx8vM6IB0SLZsAhTD0Y
-VHM+/+t0a4m3JXolJBbo9/B2/WAN0IH1E2zmlalLc3JBmGsH1a8U5ZlRow3p2ODL
-rFra9FbOl0wekmRFvZeaRln/99dpI5itVpL97QPHO8QMMK1IsyurFA5GfuPOBx9P
-i0MvzsT0tYsRvR929QIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAJw4ngOYElfyMYkS
-K6bOgMosrBoX8ns6jQgdXEzf7QOIa110bs6nD+XeJeKmzUAZ3wumXBTalPaiqkEz
-bq4nlsEs1phvj0Coy5eehjV3DB8bDLEneOlV5N9y4Z4VO1BrhX61bLiPXBRp1MZR
-I0sCdxhswSrq02/OuFGe6mqrSBBI
+MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC15tOzDBVKaCRDz9L5LMpPk8DR
+RGHHOe4OuO6WTkUzjbjuKyiQbmtcp00R4dULbSM57ESvI/Ny0gPt+J/QKAOG8S5t
+09wDpKxKcgZSZ6Nd6FaK+D+ZhUVAkP3hB0ba0wo1JZff/0e4B+VJhXTjl7RRHfbr
+AEuDYFxv9T3K/Jq04wIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAJys1pnYvO+u8Wca
+6xUToGMpqTnImKa+dX8tMKsp6mXAN/dWrOVMDWnjBhQxShhAZBsaJ4iUeXPJlctw
+KzkUCQo6BsUbPMTSQlPuyHHdZBOTHDIW4SylKaBQvkundkhhBO7aHwFV3QjxZKcH
+XqpGyY2ryrgdj2D4+H55NDXYjj/m
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/both-cas-2.crt b/src/test/ssl/ssl/both-cas-2.crt
index b0bc3f5..d85896c 100644
--- a/src/test/ssl/ssl/both-cas-2.crt
+++ b/src/test/ssl/ssl/both-cas-2.crt
@@ -1,39 +1,39 @@
 -----BEGIN CERTIFICATE-----
-MIIB9zCCAWACCQDrgvp38CAy8DANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
+MIIB9zCCAWACCQD13ziQMRDLGTANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
 ZXN0IHJvb3QgQ0EgZm9yIFBvc3RncmVTUUwgU1NMIHJlZ3Jlc3Npb24gdGVzdCBz
-dWl0ZTAeFw0xNTAyMTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMEAxPjA8BgNVBAMM
+dWl0ZTAeFw0xNDEyMDQxMTUyMDFaFw0xNTAxMDMxMTUyMDFaMEAxPjA8BgNVBAMM
 NVRlc3Qgcm9vdCBDQSBmb3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0
-IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCyTfGMPAjAylLr3G7c
-/QToCA3da5YZzdhd3TiQGugrJjWI4TzVB7pQ8IwDYk/jZf5TzVdEtz0B4TeIeUZl
-FLW9dMpa/8SY2TETvMTuXR5MOxyw6FMEKb3buolsIksCCQ1btEIrDZ+gv9SJXcdL
-ylU+VI1lKmn2fLNWWATzWrIUawIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAF2T84iG
-zWKXu+3PysuPOn7RuRpMgYQKouQktErNJ8hM7Yqj3vu879zUkX1rP0HGnx7xQC3d
-nBkoJ7yNDR0MwQpWo1Dj1HLKNEY6ojKJgPd0+m8nG+02yUmmOjo0oMYzJx2DQy0u
-Y4qecEd6aDbqXTo+qOJ7Qm/U+U4kD9MTT6GD
+IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDC7TLars/P/obbNlsz
+cX/wZFnZ97L4dAiJAE+ZusoTqLalRnPbQEtrPfMA/eL/gjq69ehnPcehMIxnYRAV
++xqOnMiUacf+6TQBrjrnfCQZkYkngzYajTqhQogdM7sUHtvBvTs1EkjdVznQUN9B
+BRZi6zEvUMkc8/+KaiEKc0zAKQIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAAhmmj+R
+XP1+AREKWE33P8AkXTTGkXMvULZSgteHWxbBc08TbxJLTsqDvwp0lY/9nH48Ejx5
+XYIdDAED9Bwsm50y9u5p5OsO9YqHJfIsC9+Ui3paDHU543Y8CtZC4Ye5OcFn4/lp
+ew5Ix9E0LHJlY+LCfVEKSV0jDP6aMsYETpIe
 -----END CERTIFICATE-----
 -----BEGIN CERTIFICATE-----
 MIIB8TCCAVoCAQEwDQYJKoZIhvcNAQEFBQAwQDE+MDwGA1UEAww1VGVzdCByb290
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc3VpdGUwHhcN
-MTUwMjE2MjAwNjIzWhcNNDIwNzA0MjAwNjIzWjBCMUAwPgYDVQQDDDdUZXN0IENB
+MTQxMjA0MTE1MjAxWhcNNDIwNDIxMTE1MjAxWjBCMUAwPgYDVQQDDDdUZXN0IENB
 IGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNlcnRz
-MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDDAYtajRx8vM6IB0SLZsAhTD0Y
-VHM+/+t0a4m3JXolJBbo9/B2/WAN0IH1E2zmlalLc3JBmGsH1a8U5ZlRow3p2ODL
-rFra9FbOl0wekmRFvZeaRln/99dpI5itVpL97QPHO8QMMK1IsyurFA5GfuPOBx9P
-i0MvzsT0tYsRvR929QIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAJw4ngOYElfyMYkS
-K6bOgMosrBoX8ns6jQgdXEzf7QOIa110bs6nD+XeJeKmzUAZ3wumXBTalPaiqkEz
-bq4nlsEs1phvj0Coy5eehjV3DB8bDLEneOlV5N9y4Z4VO1BrhX61bLiPXBRp1MZR
-I0sCdxhswSrq02/OuFGe6mqrSBBI
+MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC15tOzDBVKaCRDz9L5LMpPk8DR
+RGHHOe4OuO6WTkUzjbjuKyiQbmtcp00R4dULbSM57ESvI/Ny0gPt+J/QKAOG8S5t
+09wDpKxKcgZSZ6Nd6FaK+D+ZhUVAkP3hB0ba0wo1JZff/0e4B+VJhXTjl7RRHfbr
+AEuDYFxv9T3K/Jq04wIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAJys1pnYvO+u8Wca
+6xUToGMpqTnImKa+dX8tMKsp6mXAN/dWrOVMDWnjBhQxShhAZBsaJ4iUeXPJlctw
+KzkUCQo6BsUbPMTSQlPuyHHdZBOTHDIW4SylKaBQvkundkhhBO7aHwFV3QjxZKcH
+XqpGyY2ryrgdj2D4+H55NDXYjj/m
 -----END CERTIFICATE-----
 -----BEGIN CERTIFICATE-----
 MIIB8TCCAVoCAQIwDQYJKoZIhvcNAQEFBQAwQDE+MDwGA1UEAww1VGVzdCByb290
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc3VpdGUwHhcN
-MTUwMjE2MjAwNjIzWhcNNDIwNzA0MjAwNjIzWjBCMUAwPgYDVQQDDDdUZXN0IENB
+MTQxMjA0MTE1MjAxWhcNNDIwNDIxMTE1MjAxWjBCMUAwPgYDVQQDDDdUZXN0IENB
 IGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3QgY2xpZW50IGNlcnRz
-MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC8wIYcmeePSXVufP/Hn/6ICEog
-IUXqSNls5QIJR7Sate4iKGGTDEsRTxI4oDgkOYtcQNuEeXMf6k3xo+PRR08IEQNk
-XKy1zUWds6UBFboD72SyuTE2lxJBg+xOAWgl7JSNA+g8e0Y+wfhfxGZgRuqVxVNP
-9sAsfCEzGKna1l46dQIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAD20Bmina/uXTLXO
-oPWgpMmKwQu7Q6DPXxItCUdWgK1k1D82brRjH+usrkrmCW5BQNXOC/0zJS22ioC1
-CJbhAujH3iPaV0C3xsVSf+bvTL6OMkwV/9x9OdDN+LI2auEt4S+fP5ZTVsTXt4wA
-A9cQIl2Qy88enZZAFKxrScFFlstp
+MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDIJfmyeyPe6g3+16+WsGB8LRMW
+zfMXKKjxsBd3Zu2ka7jvfKe4ockw87kY01k0G4NHJgWH5zO2OuCNDOa8z+GLqSSO
+LYoMvik9+BLgFR8zBPshy77Rpb3CtpDjJUAU8TWQOT0cC56IwEgj2zswctqKIeFg
+ogkTbfg5KTNKSd4VUwIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAIHyYzFTIvvrUFFD
+yxhU3xyH6nx7HC47fxN+1kQjDa4MjvNsm/dOKETvS4b6GUKOudEKENBHzJW08hhs
+vn8uvmWEmyYcUyhp9r5lH2oaa6fySbnc+PE8YD2WNe+et1OdIMwqVwOegCeI85FN
+UtZk2tJjiRXJxBlheaaBxrzYjOBO
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/client-revoked.crt b/src/test/ssl/ssl/client-revoked.crt
index c38229f..5503c8f 100644
--- a/src/test/ssl/ssl/client-revoked.crt
+++ b/src/test/ssl/ssl/client-revoked.crt
@@ -1,12 +1,12 @@
 -----BEGIN CERTIFICATE-----
 MIIBxzCCATACAQIwDQYJKoZIhvcNAQEFBQAwQjFAMD4GA1UEAww3VGVzdCBDQSBm
 b3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0IGNsaWVudCBjZXJ0czAe
-Fw0xNTAyMTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMBYxFDASBgNVBAMMC3NzbHRl
-c3R1c2VyMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDmApiFBFLZi/hgQOMz
-iAHXBbY7A5hNMitQZMSTUB+/fLnzofkUjf/7GiRCLmdTCa4w1wvQp5VbrEhIbSGW
-sFSam6GuE0IBfSRJA0IouBtxdk8bCY4HDpXsh/6eC9XtV4k9YDp4JlkUNxOVu8Pb
-Z86OEQf3Ww/EZP5AfwORXLYgVQIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAEarnPO1
-Rc88mDYZWM8H/I18L0omdib21+lJczkm4sgv2hVp2nR4Wfb51DojYruLxNJ0k/A5
-T0nEZghQDtNQQpMko9e8jn8gmEAs83zQIsVsmosfTYg0Zr2pSkT0ILSfR6BupHFJ
-I96I+qcRKc4rotOirgMrcgo/VpUcWnz8VPEo
+Fw0xNDEyMDQxMTUyMDFaFw00MjA0MjExMTUyMDFaMBYxFDASBgNVBAMMC3NzbHRl
+c3R1c2VyMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDg8LJSdf9nhkMSYmhn
++F3yqVSu+UXTcmPejKTBZRd4moLL2ti41K3M2xZDiZOn8V7To9AAD/tN3lPkn2y4
+ZqKD+zChVPJ5yUSpenVxKRckyK2pO4aNItgt60YJp119IG7mH/nfobl6nraI3xxk
+WGyT7O2sOOpokW9fF4DJfLe6lQIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAMTRz+Nl
+hJeym72oGRIiJ7ODsIS4cVRQ13TwEtV/wPZ+skc/V7RFfHro5hyRwkbfIoZvhCld
+ZkyAXhQyiru0JzoklfEbOtwuC+J+XvXQ7aupIrnGRHF0yyEIYAEhgSRzaUvKWKlB
+gttm9tKwJuVCBYHh+cCGU0LnR3jhxVUqaL9d
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/client-revoked.key b/src/test/ssl/ssl/client-revoked.key
index b272f70..1752e02 100644
--- a/src/test/ssl/ssl/client-revoked.key
+++ b/src/test/ssl/ssl/client-revoked.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXgIBAAKBgQDmApiFBFLZi/hgQOMziAHXBbY7A5hNMitQZMSTUB+/fLnzofkU
-jf/7GiRCLmdTCa4w1wvQp5VbrEhIbSGWsFSam6GuE0IBfSRJA0IouBtxdk8bCY4H
-DpXsh/6eC9XtV4k9YDp4JlkUNxOVu8PbZ86OEQf3Ww/EZP5AfwORXLYgVQIDAQAB
-AoGBAOV1iXqJya1Fuc8sbHyoHk3IYPeWqoW4mwVkwcbElCeP4mJvH/Glh82VUr7D
-VEi+y4vlvN+3j4UY5jN6y5ts5bhDam4RjdHzhLT+ddlztoH4LNcgPDokQtPDtfOd
-UbbMcM6Pim7+ynBLncAj7dTin4/pVL2tYUIrKWvLhCU2zISxAkEA+CyHJZs49vOs
-hx8xEXGStdLq3k9vUk8G4BISicswixDWPQmJ3YN053FAZ+moHjqNpU5dMn7cVIcA
-HEW6LLS7IwJBAO1DbyWtCNQJZBKXzvBKxx9nOBb+5ovQZWs92bpvxkATPn4TUbQx
-nEe7eOPX+R2szP9+/3ApmZA1gV1mpVKsyicCQQCcUmf6jzCtlUXKgyJES5bPAwFA
-cSa84NyCzb9xnlSAdGWOYvC9YC2GD3czPSHRkK5iPt9DjFc6wyKVrHId8OWjAkBh
-8Yp6dRnF3jKPclec3mGg1w1SgNtPMDINuTSeP/IJFWigxvzdc/Vdr0hSVh+iXmkp
-t5VfCe04mL3UfsEUhfvVAkEA5Y05DCgaT+rOZbl6hMXlIqT5eT+xTatmDZzv6FUJ
-eAaYYhja/FrWa5JFXFUpFTamWGMTkfd6zsDS1bI6hwg/5Q==
+MIICXgIBAAKBgQDg8LJSdf9nhkMSYmhn+F3yqVSu+UXTcmPejKTBZRd4moLL2ti4
+1K3M2xZDiZOn8V7To9AAD/tN3lPkn2y4ZqKD+zChVPJ5yUSpenVxKRckyK2pO4aN
+Itgt60YJp119IG7mH/nfobl6nraI3xxkWGyT7O2sOOpokW9fF4DJfLe6lQIDAQAB
+AoGBAKh3PGaL3zPuly8eqqkrl1kVPsopAQXCx083MHFzP+fgeJMqnWOYTW5+qyb7
+061VFbsWFcLmNUV1fIleaTOWEqG0BXkG8VgS0sxEEV6N4sR6ePK2tOA81ZxFhXOR
+bJx8oys2U0kZZVRLvuj5+KjLMSBwWHEIpobE+zz4F9xcTXjlAkEA/6O0yApJ7sBQ
+XS54tK3m7NCYU8yEUD3Yidg9SmaYjiNwhLZ2e9KreQEkcbiHR8R0FHUxzKb/dItt
+2SauaHpCzwJBAOFB6DF0KM0XsfK209LoGvcA6t/aazOtbBlq9I49siKBE74Z7wJu
+0xsH8ndCkBPatoSn2ZuuXv3ozGNU9J+JFVsCQQDAOdk2koYFgZbseoVJV3rNmAzy
+9laH//lTrcZoq70LJJr3MDzn3wIRe0psONWAobinqXhI60or2KxBHVUIOucBAkEA
+qfDSHzU2bvx4aNeb2Vr4tO7BRB8Bj5w/mLGDTSiokrV00o+4LMq1g4gsWeMi1YfE
++TG0z2nvCnoucKYwY4fFTwJAbW0FLKUzRvX8dM3nXxs8vGktH8TH+dqsUfrZt9ms
+2nF1wwAD2OUXf94dnRvlgSMC7RMbTPAeoHnkqCpb1w++lg==
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/client.crl b/src/test/ssl/ssl/client.crl
index cb86d82..83f9e67 100644
--- a/src/test/ssl/ssl/client.crl
+++ b/src/test/ssl/ssl/client.crl
@@ -1,9 +1,9 @@
 -----BEGIN X509 CRL-----
 MIIBHTCBhzANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0IENBIGZvciBQ
-b3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3QgY2xpZW50IGNlcnRzFw0xNTAy
-MTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMBQwEgIBAhcNMTUwMjE2MjAwNjIzWjAN
-BgkqhkiG9w0BAQUFAAOBgQAsrnXoVeyU8vmxPOVQrHvoMXkEvF9dOnSHIQD0ZnAW
-pxbj98hCMSIW+DPIXXFebMQ6GIPp4S/w5kVpngY51paT4iztRMlV+YeyuZQuZX9a
-EVgpj4t+i6hhtBHk5p9DeknERoAIsl4m2maQ58lT5UyeN4fdz4eNP6y3mQRfSTUn
-bQ==
+b3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3QgY2xpZW50IGNlcnRzFw0xNDEy
+MDQxMTUyMDFaFw00MjA0MjExMTUyMDFaMBQwEgIBAhcNMTQxMjA0MTE1MjAxWjAN
+BgkqhkiG9w0BAQUFAAOBgQDGZTiMkukrcJheXCzKlNKKTyteOmR/sQYj36nyyV2U
+Iac9gYSYAb8ecjUrtcL/innhDAupGUxGR3QltSPo6q1yn9L8BJWJIz+BqK6aV4fb
+3lqGtTQKr+8qaKC7mi5TBafJmkUiNsbclNZl/ooQPW+Gzm++JpunK4uGzdW+4I6/
+fQ==
 -----END X509 CRL-----
diff --git a/src/test/ssl/ssl/client.crt b/src/test/ssl/ssl/client.crt
index 0c397c0..9c24e0e 100644
--- a/src/test/ssl/ssl/client.crt
+++ b/src/test/ssl/ssl/client.crt
@@ -1,12 +1,12 @@
 -----BEGIN CERTIFICATE-----
 MIIBxzCCATACAQEwDQYJKoZIhvcNAQEFBQAwQjFAMD4GA1UEAww3VGVzdCBDQSBm
 b3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0IGNsaWVudCBjZXJ0czAe
-Fw0xNTAyMTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMBYxFDASBgNVBAMMC3NzbHRl
-c3R1c2VyMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDay1hT3/0Nj5ndv2TJ
-DxJZWQTOZD+/0ktqW9qyiRY9o0nf7PCQZE9OCh3ylaaPPfpL7iITZi1KASmSIn7M
-E4w1ibmBqFiogDE0Bq0DgJaoeUgLHMERDUtcxBJgwyCGjfI9Om4jy74kwMXb8I5i
-jVwZLUTSWzRSgany3WRqMb6CwwIDAQABMA0GCSqGSIb3DQEBBQUAA4GBALfP/i7o
-ZVYsIZWksIb/uxr/AlghyNQjLPVJTAOjrm9PP9rwKR2alI/zjkDrHVH57n4MfcmD
-Xn247DRv/MJFJ1xWCSh4PCy0vyTCFAerNDcqniSTqp2+Yusdr0mH/gHa+34ASYu/
-MXXB4UBMjTnZ/KhaVTmAv3cPeiMAQODRud65
+Fw0xNDEyMDQxMTUyMDFaFw00MjA0MjExMTUyMDFaMBYxFDASBgNVBAMMC3NzbHRl
+c3R1c2VyMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDErK6Z/Mv8oTKe026e
+SXhtrAeHPVAWM69Sb3zeQb8bnoqusfc0jQhaqvQqq6UYyCPRsH2qAp8B8Cdf93/B
+I5WIKGWcj107fB+dxqeuCS8QHyvO9Ygr9KYHWMKMz4DR+AsWYqoBXxgFjzwDAQB9
+SZRMRgUyHR+qQRGEXkgMLgwrbQIDAQABMA0GCSqGSIb3DQEBBQUAA4GBACnd/MjL
+pGmAHIdbMKcIqMPmAhnkzyfoPE+V6V/6Fm3f9iwHYr4ivxTMhdTffFFyVMxvCDEw
+a0Hlx/wPPnbvsJLiWCHYzXJsyISarIU+euxUYQY1w2tTkmgITESM1eDq2SOMnvqK
+iLoSyGNPrq2tAWPTyx7il1Q72ZNl6w3w+uY4
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/client.key b/src/test/ssl/ssl/client.key
index 6cb6655..786e529 100644
--- a/src/test/ssl/ssl/client.key
+++ b/src/test/ssl/ssl/client.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXQIBAAKBgQDay1hT3/0Nj5ndv2TJDxJZWQTOZD+/0ktqW9qyiRY9o0nf7PCQ
-ZE9OCh3ylaaPPfpL7iITZi1KASmSIn7ME4w1ibmBqFiogDE0Bq0DgJaoeUgLHMER
-DUtcxBJgwyCGjfI9Om4jy74kwMXb8I5ijVwZLUTSWzRSgany3WRqMb6CwwIDAQAB
-AoGAJAMiR7Pvb+L5/XC6QwmzCHfJfbssbwNLHHd/+LDtszmEOFJEik+oafzqTvpo
-ztzxrLvGahEAVVT5pa791dNF2V//AKCDj3mOSVfrh6aYeA5naMT91JjnuRVgpdlc
-1b7p1FpbnwmzppqSbAfVQxmTlFxvVevukTqkAzP03uuQZ+kCQQD8XMpgCYXFuAl9
-n59OjS9Fi4oISI2lxFFxUK4KjGW4fOzS9/PdHepc4YBJQXSrDdELkH/un5AZQ7tr
-67R5YkB1AkEA3fKwaV0dPlXg78rVImUEXwNRM9SgxHquE6itzuT7RYg47bEnDHDm
-EGzN5QVs7TrxApk8KCxPUzlv/3vSWszPVwJBAMN+2mN1XQTi4a9IhW+jnZghVce+
-9MQShgjjOEABrRcy538zB94mO5TCN9AH/eo45NUxlnlzcHyx5LHgwUk7HLUCQQCP
-RhT/Ty6LiOCVqvf/Jfq2YuvOa5oEe7VX13Grt0FFV3R4a/1rGI5LWBFpoCD62yut
-o8GjpUbn0JIt+H6IQuItAkAqnLiP2ZJMhDoey8+btgwiAEoUnkDfa3bNA9wncJhO
-M3G4BM/bZhX5cuIaU7kPUHUS5fQeLLUfveWtbVSu1cRn
+MIICXgIBAAKBgQDErK6Z/Mv8oTKe026eSXhtrAeHPVAWM69Sb3zeQb8bnoqusfc0
+jQhaqvQqq6UYyCPRsH2qAp8B8Cdf93/BI5WIKGWcj107fB+dxqeuCS8QHyvO9Ygr
+9KYHWMKMz4DR+AsWYqoBXxgFjzwDAQB9SZRMRgUyHR+qQRGEXkgMLgwrbQIDAQAB
+AoGAbbKLaKRR+sTGgUQY7Py5ySIsyMfwBZIqdeZtVWKCf5s8axgkdBE92aSEr9Ax
+M9Nd9zVjwhHYMrKKo8JeZZG9csrt/XxgHXDbp+6y4lx0SW1XOmOp39K7h9mUmEVj
+XtICn75z4xYvJDG61xjqtrkh0lKaDr87VDJuuIjbcB2RdNkCQQD3A0Jue8hjoKhN
+H/CjtF/zfL/rkY0BO2Ryyp882AsUZu4y6YbAkrUJbySVIEU7oHleZaTJ/tzC7Ifs
+3XNO7iTnAkEAy9SPQNGU0SNkR2/H7x5JdllMOlOZzl+YUMQpDzH08o/u3RnfcFM8
+72rYJenxLorKpuG5YXTYxRFet4GIhMqOiwJBAOHStgoh2lrSxurzl2FihxIoa6Em
+iP2mWbfkbF4IuWBmlcAv5QTrWt0MIiq/vOu9Uxgs3tHY0eTWr5GqB0AS0eMCQQCw
+S80LlzpMGXxmfTxEicGoZ1wTJrPlV7F6Se/pgKAIHI3RFsu3b4dI3PTO9iTwyIK3
+DI02ycWjzX5K4fKeSEQ5AkEAr87kSTl5xM9Z9Cew+FX3ICJRbRNChJEsMPgo+2GW
+PVrzAxEMk/zP0vb3Mjf5yYjpCYPF0BCgVRsbmN86DE5bng==
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/client_ca.crt b/src/test/ssl/ssl/client_ca.crt
index 003baed..de21b49 100644
--- a/src/test/ssl/ssl/client_ca.crt
+++ b/src/test/ssl/ssl/client_ca.crt
@@ -1,13 +1,13 @@
 -----BEGIN CERTIFICATE-----
 MIIB8TCCAVoCAQIwDQYJKoZIhvcNAQEFBQAwQDE+MDwGA1UEAww1VGVzdCByb290
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc3VpdGUwHhcN
-MTUwMjE2MjAwNjIzWhcNNDIwNzA0MjAwNjIzWjBCMUAwPgYDVQQDDDdUZXN0IENB
+MTQxMjA0MTE1MjAxWhcNNDIwNDIxMTE1MjAxWjBCMUAwPgYDVQQDDDdUZXN0IENB
 IGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3QgY2xpZW50IGNlcnRz
-MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC8wIYcmeePSXVufP/Hn/6ICEog
-IUXqSNls5QIJR7Sate4iKGGTDEsRTxI4oDgkOYtcQNuEeXMf6k3xo+PRR08IEQNk
-XKy1zUWds6UBFboD72SyuTE2lxJBg+xOAWgl7JSNA+g8e0Y+wfhfxGZgRuqVxVNP
-9sAsfCEzGKna1l46dQIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAD20Bmina/uXTLXO
-oPWgpMmKwQu7Q6DPXxItCUdWgK1k1D82brRjH+usrkrmCW5BQNXOC/0zJS22ioC1
-CJbhAujH3iPaV0C3xsVSf+bvTL6OMkwV/9x9OdDN+LI2auEt4S+fP5ZTVsTXt4wA
-A9cQIl2Qy88enZZAFKxrScFFlstp
+MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDIJfmyeyPe6g3+16+WsGB8LRMW
+zfMXKKjxsBd3Zu2ka7jvfKe4ockw87kY01k0G4NHJgWH5zO2OuCNDOa8z+GLqSSO
+LYoMvik9+BLgFR8zBPshy77Rpb3CtpDjJUAU8TWQOT0cC56IwEgj2zswctqKIeFg
+ogkTbfg5KTNKSd4VUwIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAIHyYzFTIvvrUFFD
+yxhU3xyH6nx7HC47fxN+1kQjDa4MjvNsm/dOKETvS4b6GUKOudEKENBHzJW08hhs
+vn8uvmWEmyYcUyhp9r5lH2oaa6fySbnc+PE8YD2WNe+et1OdIMwqVwOegCeI85FN
+UtZk2tJjiRXJxBlheaaBxrzYjOBO
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/client_ca.key b/src/test/ssl/ssl/client_ca.key
index 1d636ec..3300188 100644
--- a/src/test/ssl/ssl/client_ca.key
+++ b/src/test/ssl/ssl/client_ca.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXQIBAAKBgQC8wIYcmeePSXVufP/Hn/6ICEogIUXqSNls5QIJR7Sate4iKGGT
-DEsRTxI4oDgkOYtcQNuEeXMf6k3xo+PRR08IEQNkXKy1zUWds6UBFboD72SyuTE2
-lxJBg+xOAWgl7JSNA+g8e0Y+wfhfxGZgRuqVxVNP9sAsfCEzGKna1l46dQIDAQAB
-AoGAMAXDmU9G7NvBtuSypvV76txBD7+nbB4ww1XYmMfXmW0kMyiW+rSr/LFjb2jE
-H+NMI6KUtzW3Jq2UOyB5e+tqnbDjqZlQjnBYFnWKRa8SxsuamSAvGNLPIzur8rxm
-qxOWkxlHpS+I6OXn263sWzG38ptQ3X4zK6ADcTpg7FkkYJkCQQDhCJH630aXQyia
-90QM+BaKp7rhr+DHW+vVU/5pg3FrPIuvRlLo+E8iJItY3Ae+AJylK6kP6/5AJzOz
-s1tXjZezAkEA1rnW4YIlWRlaJE4hXMNvF4VJO5MBtS60F4/z1kR0uVO5+JAgTZT0
-GE7ghZQ3VwdyRiWc59zXr2qkA75qtMFRNwJBAK0x82iqP6Jbxfy/Ilj4+CBvR547
-xzyourHNm5mJ2Nk4GCombdlwgzc7+SPC9RJ/VhCpsczXTTAC+//qovqXt5ECQEtF
-rlwzQWBwkLb1ZKCeKg12vetSZ2DaVGuGHRZZvQlSnnjSHWDU/JSg4fgxswyhIaAR
-g2WMd1eY7JIbaFChDBUCQQC46CikUDq2kfIPOkj/dsa4wLkUETrcgBx+eaZbOCgx
-JU7GqsoSXxTgKcjZPm/5O/rWWtwB9XhtTuvS/NYi3aSs
+MIICXgIBAAKBgQDIJfmyeyPe6g3+16+WsGB8LRMWzfMXKKjxsBd3Zu2ka7jvfKe4
+ockw87kY01k0G4NHJgWH5zO2OuCNDOa8z+GLqSSOLYoMvik9+BLgFR8zBPshy77R
+pb3CtpDjJUAU8TWQOT0cC56IwEgj2zswctqKIeFgogkTbfg5KTNKSd4VUwIDAQAB
+AoGBALNvsFu+IFuiFKgLsGT1fZr2Qi3ot+5kSopbp74pbhZBaUxzwl451YjoiGJk
+YI3huKEZyk2cDvVp9ZUfIuHVsUsRkUtlMYAWJoxypbLWFw0efa9TNDbsoxGSjs8N
+TCZOqK6VKEbckTd2Mg8vanB+A8PswOPW94es32Y9XKwBaFsJAkEA5WVHtYs8aczd
+uJMuteUkv2R0OFL8wgIgkXRyk0BNJjVYwe/DbvW/J6W2DTsvoeFMZ9U+p2tEX9ab
+ak7RlCFNtwJBAN9cWRfVzKY6P62UZmIdsvDYJNaWaamfZguKx69q0FD1jcjl0C8R
+3w6xCVrGQCPbbQibNTLbIKPC/jrUcu6c9UUCQQDNiNGXeAnJQiXnGvjfQVCLrBX1
+4WVW71D/Arcl+JcnhOTh31HcOZPski1r7XvgL12mKwrYNuQser0Fo1lkv/JBAkBx
+VOUrz+KP8Xw/8c1lOVaDF9jRPO6OD3/ymU8qtZLPkViIt/rC91lrle5+LZt71ilj
+tYTvsfnEvfrLFOLgKanVAkEAvyofcM5gr7gTiC+XxhjUyDNn2lYwoog+D67E6YvL
+chheY2FNRrqpCi0Zhi8KlUXnp4wtHA6zBW46l1xSxz4lYg==
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/root+client.crl b/src/test/ssl/ssl/root+client.crl
index 017f16b..47b8ab1 100644
--- a/src/test/ssl/ssl/root+client.crl
+++ b/src/test/ssl/ssl/root+client.crl
@@ -1,17 +1,17 @@
 -----BEGIN X509 CRL-----
 MIIBBDBvMA0GCSqGSIb3DQEBBQUAMEAxPjA8BgNVBAMMNVRlc3Qgcm9vdCBDQSBm
-b3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0IHN1aXRlFw0xNTAyMTYy
-MDA2MjNaFw00MjA3MDQyMDA2MjNaMA0GCSqGSIb3DQEBBQUAA4GBACEwQiR8BKoD
-eGuJKMy73AGLzNu3m7jUPWBPntGpNrUMZXNQXgtfm1t3twXbklQq4pao+9SKgT5X
-guXpfoa/mPLs//gsTEx0EQV/YzsXm2xFBUtaRq46GbJK3XTfRJLw7OOzBFij1o3i
-GaeVMn7IXwQBNkxQT0AAAiCUz5yz/Wvx
+b3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0IHN1aXRlFw0xNDEyMDQx
+MTUyMDFaFw00MjA0MjExMTUyMDFaMA0GCSqGSIb3DQEBBQUAA4GBAK7EbBLD03t6
+zv2yRS6ByDg7X9CPbPVReUQ21ntI652lsJ4veAJeSWQXITEjC/mt+VkN8pKH8eEg
+hp0vZmS7zIzL+UdPZkJYokAdmBsmP1ymDvOHd52XssjM1e6d7pNKwk6Z40x6Tpvq
+cStL3sC4tomx+vn7zzSUcS3hwdcHvnwZ
 -----END X509 CRL-----
 -----BEGIN X509 CRL-----
 MIIBHTCBhzANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0IENBIGZvciBQ
-b3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3QgY2xpZW50IGNlcnRzFw0xNTAy
-MTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMBQwEgIBAhcNMTUwMjE2MjAwNjIzWjAN
-BgkqhkiG9w0BAQUFAAOBgQAsrnXoVeyU8vmxPOVQrHvoMXkEvF9dOnSHIQD0ZnAW
-pxbj98hCMSIW+DPIXXFebMQ6GIPp4S/w5kVpngY51paT4iztRMlV+YeyuZQuZX9a
-EVgpj4t+i6hhtBHk5p9DeknERoAIsl4m2maQ58lT5UyeN4fdz4eNP6y3mQRfSTUn
-bQ==
+b3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3QgY2xpZW50IGNlcnRzFw0xNDEy
+MDQxMTUyMDFaFw00MjA0MjExMTUyMDFaMBQwEgIBAhcNMTQxMjA0MTE1MjAxWjAN
+BgkqhkiG9w0BAQUFAAOBgQDGZTiMkukrcJheXCzKlNKKTyteOmR/sQYj36nyyV2U
+Iac9gYSYAb8ecjUrtcL/innhDAupGUxGR3QltSPo6q1yn9L8BJWJIz+BqK6aV4fb
+3lqGtTQKr+8qaKC7mi5TBafJmkUiNsbclNZl/ooQPW+Gzm++JpunK4uGzdW+4I6/
+fQ==
 -----END X509 CRL-----
diff --git a/src/test/ssl/ssl/root+client_ca.crt b/src/test/ssl/ssl/root+client_ca.crt
index 227ab72..ebe1520 100644
--- a/src/test/ssl/ssl/root+client_ca.crt
+++ b/src/test/ssl/ssl/root+client_ca.crt
@@ -1,26 +1,26 @@
 -----BEGIN CERTIFICATE-----
-MIIB9zCCAWACCQDrgvp38CAy8DANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
+MIIB9zCCAWACCQD13ziQMRDLGTANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
 ZXN0IHJvb3QgQ0EgZm9yIFBvc3RncmVTUUwgU1NMIHJlZ3Jlc3Npb24gdGVzdCBz
-dWl0ZTAeFw0xNTAyMTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMEAxPjA8BgNVBAMM
+dWl0ZTAeFw0xNDEyMDQxMTUyMDFaFw0xNTAxMDMxMTUyMDFaMEAxPjA8BgNVBAMM
 NVRlc3Qgcm9vdCBDQSBmb3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0
-IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCyTfGMPAjAylLr3G7c
-/QToCA3da5YZzdhd3TiQGugrJjWI4TzVB7pQ8IwDYk/jZf5TzVdEtz0B4TeIeUZl
-FLW9dMpa/8SY2TETvMTuXR5MOxyw6FMEKb3buolsIksCCQ1btEIrDZ+gv9SJXcdL
-ylU+VI1lKmn2fLNWWATzWrIUawIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAF2T84iG
-zWKXu+3PysuPOn7RuRpMgYQKouQktErNJ8hM7Yqj3vu879zUkX1rP0HGnx7xQC3d
-nBkoJ7yNDR0MwQpWo1Dj1HLKNEY6ojKJgPd0+m8nG+02yUmmOjo0oMYzJx2DQy0u
-Y4qecEd6aDbqXTo+qOJ7Qm/U+U4kD9MTT6GD
+IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDC7TLars/P/obbNlsz
+cX/wZFnZ97L4dAiJAE+ZusoTqLalRnPbQEtrPfMA/eL/gjq69ehnPcehMIxnYRAV
++xqOnMiUacf+6TQBrjrnfCQZkYkngzYajTqhQogdM7sUHtvBvTs1EkjdVznQUN9B
+BRZi6zEvUMkc8/+KaiEKc0zAKQIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAAhmmj+R
+XP1+AREKWE33P8AkXTTGkXMvULZSgteHWxbBc08TbxJLTsqDvwp0lY/9nH48Ejx5
+XYIdDAED9Bwsm50y9u5p5OsO9YqHJfIsC9+Ui3paDHU543Y8CtZC4Ye5OcFn4/lp
+ew5Ix9E0LHJlY+LCfVEKSV0jDP6aMsYETpIe
 -----END CERTIFICATE-----
 -----BEGIN CERTIFICATE-----
 MIIB8TCCAVoCAQIwDQYJKoZIhvcNAQEFBQAwQDE+MDwGA1UEAww1VGVzdCByb290
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc3VpdGUwHhcN
-MTUwMjE2MjAwNjIzWhcNNDIwNzA0MjAwNjIzWjBCMUAwPgYDVQQDDDdUZXN0IENB
+MTQxMjA0MTE1MjAxWhcNNDIwNDIxMTE1MjAxWjBCMUAwPgYDVQQDDDdUZXN0IENB
 IGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3QgY2xpZW50IGNlcnRz
-MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC8wIYcmeePSXVufP/Hn/6ICEog
-IUXqSNls5QIJR7Sate4iKGGTDEsRTxI4oDgkOYtcQNuEeXMf6k3xo+PRR08IEQNk
-XKy1zUWds6UBFboD72SyuTE2lxJBg+xOAWgl7JSNA+g8e0Y+wfhfxGZgRuqVxVNP
-9sAsfCEzGKna1l46dQIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAD20Bmina/uXTLXO
-oPWgpMmKwQu7Q6DPXxItCUdWgK1k1D82brRjH+usrkrmCW5BQNXOC/0zJS22ioC1
-CJbhAujH3iPaV0C3xsVSf+bvTL6OMkwV/9x9OdDN+LI2auEt4S+fP5ZTVsTXt4wA
-A9cQIl2Qy88enZZAFKxrScFFlstp
+MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDIJfmyeyPe6g3+16+WsGB8LRMW
+zfMXKKjxsBd3Zu2ka7jvfKe4ockw87kY01k0G4NHJgWH5zO2OuCNDOa8z+GLqSSO
+LYoMvik9+BLgFR8zBPshy77Rpb3CtpDjJUAU8TWQOT0cC56IwEgj2zswctqKIeFg
+ogkTbfg5KTNKSd4VUwIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAIHyYzFTIvvrUFFD
+yxhU3xyH6nx7HC47fxN+1kQjDa4MjvNsm/dOKETvS4b6GUKOudEKENBHzJW08hhs
+vn8uvmWEmyYcUyhp9r5lH2oaa6fySbnc+PE8YD2WNe+et1OdIMwqVwOegCeI85FN
+UtZk2tJjiRXJxBlheaaBxrzYjOBO
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/root+server.crl b/src/test/ssl/ssl/root+server.crl
index ac31888..e904233 100644
--- a/src/test/ssl/ssl/root+server.crl
+++ b/src/test/ssl/ssl/root+server.crl
@@ -1,17 +1,17 @@
 -----BEGIN X509 CRL-----
 MIIBBDBvMA0GCSqGSIb3DQEBBQUAMEAxPjA8BgNVBAMMNVRlc3Qgcm9vdCBDQSBm
-b3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0IHN1aXRlFw0xNTAyMTYy
-MDA2MjNaFw00MjA3MDQyMDA2MjNaMA0GCSqGSIb3DQEBBQUAA4GBACEwQiR8BKoD
-eGuJKMy73AGLzNu3m7jUPWBPntGpNrUMZXNQXgtfm1t3twXbklQq4pao+9SKgT5X
-guXpfoa/mPLs//gsTEx0EQV/YzsXm2xFBUtaRq46GbJK3XTfRJLw7OOzBFij1o3i
-GaeVMn7IXwQBNkxQT0AAAiCUz5yz/Wvx
+b3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0IHN1aXRlFw0xNDEyMDQx
+MTUyMDFaFw00MjA0MjExMTUyMDFaMA0GCSqGSIb3DQEBBQUAA4GBAK7EbBLD03t6
+zv2yRS6ByDg7X9CPbPVReUQ21ntI652lsJ4veAJeSWQXITEjC/mt+VkN8pKH8eEg
+hp0vZmS7zIzL+UdPZkJYokAdmBsmP1ymDvOHd52XssjM1e6d7pNKwk6Z40x6Tpvq
+cStL3sC4tomx+vn7zzSUcS3hwdcHvnwZ
 -----END X509 CRL-----
 -----BEGIN X509 CRL-----
 MIIBHTCBhzANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0IENBIGZvciBQ
-b3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNlcnRzFw0xNTAy
-MTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMBQwEgIBBhcNMTUwMjE2MjAwNjIzWjAN
-BgkqhkiG9w0BAQUFAAOBgQB1c54zLMueMtLiSmBT6kfXJe9o3Krd2n774g7kzNlR
-DeLpCHeUvyLF0m8YK09vbLv2W0r6VQnbjyQGr9xyweRLLtOXc0FIDsTO8g/jvMSq
-Q9zITuqWiCHRbNhi2B3HPo2NsrfA+tQEAZvMUgnynlerNvGkLWQZeC2UsxrrSs4t
-9Q==
+b3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNlcnRzFw0xNDEy
+MDQxMTUyMDFaFw00MjA0MjExMTUyMDFaMBQwEgIBBhcNMTQxMjA0MTE1MjAxWjAN
+BgkqhkiG9w0BAQUFAAOBgQCmFnFkEt0+Ialw4E+4nIAJWJO9XDE71FdRfX3QChs8
+ZJtBseaMNeUC1FY1zHOYQhtMy+Uatda6hx/QiyidF2oP5KpWp+R11M554Ifxem3X
+KDQDBQNee+1IIJ7a1kxAUxeSNP+0a3/bmUxI5sbomINnKeIDqDO8d2vmO2VLxJm6
+MA==
 -----END X509 CRL-----
diff --git a/src/test/ssl/ssl/root+server_ca.crt b/src/test/ssl/ssl/root+server_ca.crt
index 4a33f77..f886582 100644
--- a/src/test/ssl/ssl/root+server_ca.crt
+++ b/src/test/ssl/ssl/root+server_ca.crt
@@ -1,26 +1,26 @@
 -----BEGIN CERTIFICATE-----
-MIIB9zCCAWACCQDrgvp38CAy8DANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
+MIIB9zCCAWACCQD13ziQMRDLGTANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
 ZXN0IHJvb3QgQ0EgZm9yIFBvc3RncmVTUUwgU1NMIHJlZ3Jlc3Npb24gdGVzdCBz
-dWl0ZTAeFw0xNTAyMTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMEAxPjA8BgNVBAMM
+dWl0ZTAeFw0xNDEyMDQxMTUyMDFaFw0xNTAxMDMxMTUyMDFaMEAxPjA8BgNVBAMM
 NVRlc3Qgcm9vdCBDQSBmb3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0
-IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCyTfGMPAjAylLr3G7c
-/QToCA3da5YZzdhd3TiQGugrJjWI4TzVB7pQ8IwDYk/jZf5TzVdEtz0B4TeIeUZl
-FLW9dMpa/8SY2TETvMTuXR5MOxyw6FMEKb3buolsIksCCQ1btEIrDZ+gv9SJXcdL
-ylU+VI1lKmn2fLNWWATzWrIUawIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAF2T84iG
-zWKXu+3PysuPOn7RuRpMgYQKouQktErNJ8hM7Yqj3vu879zUkX1rP0HGnx7xQC3d
-nBkoJ7yNDR0MwQpWo1Dj1HLKNEY6ojKJgPd0+m8nG+02yUmmOjo0oMYzJx2DQy0u
-Y4qecEd6aDbqXTo+qOJ7Qm/U+U4kD9MTT6GD
+IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDC7TLars/P/obbNlsz
+cX/wZFnZ97L4dAiJAE+ZusoTqLalRnPbQEtrPfMA/eL/gjq69ehnPcehMIxnYRAV
++xqOnMiUacf+6TQBrjrnfCQZkYkngzYajTqhQogdM7sUHtvBvTs1EkjdVznQUN9B
+BRZi6zEvUMkc8/+KaiEKc0zAKQIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAAhmmj+R
+XP1+AREKWE33P8AkXTTGkXMvULZSgteHWxbBc08TbxJLTsqDvwp0lY/9nH48Ejx5
+XYIdDAED9Bwsm50y9u5p5OsO9YqHJfIsC9+Ui3paDHU543Y8CtZC4Ye5OcFn4/lp
+ew5Ix9E0LHJlY+LCfVEKSV0jDP6aMsYETpIe
 -----END CERTIFICATE-----
 -----BEGIN CERTIFICATE-----
 MIIB8TCCAVoCAQEwDQYJKoZIhvcNAQEFBQAwQDE+MDwGA1UEAww1VGVzdCByb290
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc3VpdGUwHhcN
-MTUwMjE2MjAwNjIzWhcNNDIwNzA0MjAwNjIzWjBCMUAwPgYDVQQDDDdUZXN0IENB
+MTQxMjA0MTE1MjAxWhcNNDIwNDIxMTE1MjAxWjBCMUAwPgYDVQQDDDdUZXN0IENB
 IGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNlcnRz
-MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDDAYtajRx8vM6IB0SLZsAhTD0Y
-VHM+/+t0a4m3JXolJBbo9/B2/WAN0IH1E2zmlalLc3JBmGsH1a8U5ZlRow3p2ODL
-rFra9FbOl0wekmRFvZeaRln/99dpI5itVpL97QPHO8QMMK1IsyurFA5GfuPOBx9P
-i0MvzsT0tYsRvR929QIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAJw4ngOYElfyMYkS
-K6bOgMosrBoX8ns6jQgdXEzf7QOIa110bs6nD+XeJeKmzUAZ3wumXBTalPaiqkEz
-bq4nlsEs1phvj0Coy5eehjV3DB8bDLEneOlV5N9y4Z4VO1BrhX61bLiPXBRp1MZR
-I0sCdxhswSrq02/OuFGe6mqrSBBI
+MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC15tOzDBVKaCRDz9L5LMpPk8DR
+RGHHOe4OuO6WTkUzjbjuKyiQbmtcp00R4dULbSM57ESvI/Ny0gPt+J/QKAOG8S5t
+09wDpKxKcgZSZ6Nd6FaK+D+ZhUVAkP3hB0ba0wo1JZff/0e4B+VJhXTjl7RRHfbr
+AEuDYFxv9T3K/Jq04wIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAJys1pnYvO+u8Wca
+6xUToGMpqTnImKa+dX8tMKsp6mXAN/dWrOVMDWnjBhQxShhAZBsaJ4iUeXPJlctw
+KzkUCQo6BsUbPMTSQlPuyHHdZBOTHDIW4SylKaBQvkundkhhBO7aHwFV3QjxZKcH
+XqpGyY2ryrgdj2D4+H55NDXYjj/m
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/root.crl b/src/test/ssl/ssl/root.crl
index 65e470c..fa53fd7 100644
--- a/src/test/ssl/ssl/root.crl
+++ b/src/test/ssl/ssl/root.crl
@@ -1,8 +1,8 @@
 -----BEGIN X509 CRL-----
 MIIBBDBvMA0GCSqGSIb3DQEBBQUAMEAxPjA8BgNVBAMMNVRlc3Qgcm9vdCBDQSBm
-b3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0IHN1aXRlFw0xNTAyMTYy
-MDA2MjNaFw00MjA3MDQyMDA2MjNaMA0GCSqGSIb3DQEBBQUAA4GBACEwQiR8BKoD
-eGuJKMy73AGLzNu3m7jUPWBPntGpNrUMZXNQXgtfm1t3twXbklQq4pao+9SKgT5X
-guXpfoa/mPLs//gsTEx0EQV/YzsXm2xFBUtaRq46GbJK3XTfRJLw7OOzBFij1o3i
-GaeVMn7IXwQBNkxQT0AAAiCUz5yz/Wvx
+b3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0IHN1aXRlFw0xNDEyMDQx
+MTUyMDFaFw00MjA0MjExMTUyMDFaMA0GCSqGSIb3DQEBBQUAA4GBAK7EbBLD03t6
+zv2yRS6ByDg7X9CPbPVReUQ21ntI652lsJ4veAJeSWQXITEjC/mt+VkN8pKH8eEg
+hp0vZmS7zIzL+UdPZkJYokAdmBsmP1ymDvOHd52XssjM1e6d7pNKwk6Z40x6Tpvq
+cStL3sC4tomx+vn7zzSUcS3hwdcHvnwZ
 -----END X509 CRL-----
diff --git a/src/test/ssl/ssl/root_ca.crt b/src/test/ssl/ssl/root_ca.crt
index e491d73..ca5faab 100644
--- a/src/test/ssl/ssl/root_ca.crt
+++ b/src/test/ssl/ssl/root_ca.crt
@@ -1,13 +1,13 @@
 -----BEGIN CERTIFICATE-----
-MIIB9zCCAWACCQDrgvp38CAy8DANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
+MIIB9zCCAWACCQD13ziQMRDLGTANBgkqhkiG9w0BAQsFADBAMT4wPAYDVQQDDDVU
 ZXN0IHJvb3QgQ0EgZm9yIFBvc3RncmVTUUwgU1NMIHJlZ3Jlc3Npb24gdGVzdCBz
-dWl0ZTAeFw0xNTAyMTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMEAxPjA8BgNVBAMM
+dWl0ZTAeFw0xNDEyMDQxMTUyMDFaFw0xNTAxMDMxMTUyMDFaMEAxPjA8BgNVBAMM
 NVRlc3Qgcm9vdCBDQSBmb3IgUG9zdGdyZVNRTCBTU0wgcmVncmVzc2lvbiB0ZXN0
-IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCyTfGMPAjAylLr3G7c
-/QToCA3da5YZzdhd3TiQGugrJjWI4TzVB7pQ8IwDYk/jZf5TzVdEtz0B4TeIeUZl
-FLW9dMpa/8SY2TETvMTuXR5MOxyw6FMEKb3buolsIksCCQ1btEIrDZ+gv9SJXcdL
-ylU+VI1lKmn2fLNWWATzWrIUawIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAF2T84iG
-zWKXu+3PysuPOn7RuRpMgYQKouQktErNJ8hM7Yqj3vu879zUkX1rP0HGnx7xQC3d
-nBkoJ7yNDR0MwQpWo1Dj1HLKNEY6ojKJgPd0+m8nG+02yUmmOjo0oMYzJx2DQy0u
-Y4qecEd6aDbqXTo+qOJ7Qm/U+U4kD9MTT6GD
+IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDC7TLars/P/obbNlsz
+cX/wZFnZ97L4dAiJAE+ZusoTqLalRnPbQEtrPfMA/eL/gjq69ehnPcehMIxnYRAV
++xqOnMiUacf+6TQBrjrnfCQZkYkngzYajTqhQogdM7sUHtvBvTs1EkjdVznQUN9B
+BRZi6zEvUMkc8/+KaiEKc0zAKQIDAQABMA0GCSqGSIb3DQEBCwUAA4GBAAhmmj+R
+XP1+AREKWE33P8AkXTTGkXMvULZSgteHWxbBc08TbxJLTsqDvwp0lY/9nH48Ejx5
+XYIdDAED9Bwsm50y9u5p5OsO9YqHJfIsC9+Ui3paDHU543Y8CtZC4Ye5OcFn4/lp
+ew5Ix9E0LHJlY+LCfVEKSV0jDP6aMsYETpIe
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/root_ca.key b/src/test/ssl/ssl/root_ca.key
index e5cddee..c88af49 100644
--- a/src/test/ssl/ssl/root_ca.key
+++ b/src/test/ssl/ssl/root_ca.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXAIBAAKBgQCyTfGMPAjAylLr3G7c/QToCA3da5YZzdhd3TiQGugrJjWI4TzV
-B7pQ8IwDYk/jZf5TzVdEtz0B4TeIeUZlFLW9dMpa/8SY2TETvMTuXR5MOxyw6FME
-Kb3buolsIksCCQ1btEIrDZ+gv9SJXcdLylU+VI1lKmn2fLNWWATzWrIUawIDAQAB
-AoGAQ8TmMuO6e/QqUiUlKe8tBzfQdUDn+wTG4N4tGnBvn77VCCJ7qYhXY14aCUs7
-i/V/FcDtE1wF3woHvmJBxDd731TILBhuqn3UIWJafoiezlhqwR2uvTfnWh62N15w
-xlmGDuPwXMtQCazbcD6I9hgbADBbMmsyym8cuwN+hxU7bKECQQDfzkAN0RNI/m31
-7GVjOvWrd3+brwf19jXtxhzJCRThjyVyCMYfG9ELP/u76aNCMs2otn2K49Vd0s5A
-rG6uN4Z7AkEAy/QXExktz0YxaTMHYHafwSkraoygKVUIoxWm2AHhNXdSY1M8AqkL
-6VqGpNgcwiEE0QJHG0MFjB0tZAe9/kq+0QJAWqc+htozR5PXko+ImeMd87BZvgPt
-45ExUvi2XDAThzHmZwRqy9sGl9n466q9eGj/qOEShRm4KWLkLIor4uGW1QJAbj2h
-u1EA0ei/DH3ontt/vojiTtV0POMZqA0sAdYCRUQZ5FY5ObbmGVw1KyUlZkkysUbp
-6HJxrSqYPllw+OKuAQJBAN54Aep6BvzI+arJrOm2Un5l27jfPbuKmvJWjus1mU+e
-HkaXYUF31/LIN4gNeu0ULbCSKpvk00UaBfjbwvfLmAk=
+MIICWwIBAAKBgQDC7TLars/P/obbNlszcX/wZFnZ97L4dAiJAE+ZusoTqLalRnPb
+QEtrPfMA/eL/gjq69ehnPcehMIxnYRAV+xqOnMiUacf+6TQBrjrnfCQZkYkngzYa
+jTqhQogdM7sUHtvBvTs1EkjdVznQUN9BBRZi6zEvUMkc8/+KaiEKc0zAKQIDAQAB
+AoGAa1z8kqiQe86Edr9UslwEjOKo/r5IzEIU5WjPbywL25Ikr5nDfHLIV5QygUxV
+uEgBkzKYxCyqBOVZoCM9Ge5JrGcWO+N7IVVpirJRLgafu17sWyOsFIdT0QfNBYEl
+sdcz70c0Rsfk+hnsJ3KDOAxEhmPZe1mT9Rl6g1qpzva1/1ECQQDkxzDyhm8/F41B
+1z9m6Gz8X3fIb4cx1WpMZHG3XNyD5rzPiUhuIqATUHTCGIig0nYzYr1AOVNSN6pb
+5whOuW4VAkEA2h7bIZd3yfn1YNZYk24ORwZssE5r5ryQeOwuDGJH8FxPk/LqZE8T
+OX+ptPRyDowd2UZFRz0jxKl3RR4W/VtixQJAehaK2oI/j+3jpkVWQna64puX8tEB
+1uhLR+U6gl3+GC3kiOR8ULoNrwD6rjIlh52JErcYw9NT0cZ/FXhfiJOQWQJAFr3a
+2RDC05M1K0iN6aky4eLgmC1FAMSuR31Qe8gPehcV0PYlzBmWhosx9YT7E1s2jX3P
+IVNVlF6a6eDuQrIxhQJAL8ELX6MNW05PzETZpYSOTvkOeGND7INc5Md28Yv9SkPd
+c/HvFDVQF0OsgFrIcuBP8o7YaQBETJPqFHMao+WB6w==
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/server-cn-and-alt-names.crt b/src/test/ssl/ssl/server-cn-and-alt-names.crt
index 04f9b58..20cc1c8 100644
--- a/src/test/ssl/ssl/server-cn-and-alt-names.crt
+++ b/src/test/ssl/ssl/server-cn-and-alt-names.crt
@@ -1,15 +1,15 @@
 -----BEGIN CERTIFICATE-----
 MIICSTCCAbKgAwIBAgIBATANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNl
-cnRzMB4XDTE1MDIxNjIwMDYyM1oXDTQyMDcwNDIwMDYyM1owRjEeMBwGA1UECwwV
+cnRzMB4XDTE0MTIwNDExNTIwMVoXDTQyMDQyMTExNTIwMVowRjEeMBwGA1UECwwV
 UG9zdGdyZVNRTCB0ZXN0IHN1aXRlMSQwIgYDVQQDDBtjb21tb24tbmFtZS5wZy1z
-c2x0ZXN0LnRlc3QwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAMH7OtRvW0qP
-gYDMInkd0mgKnqhexEUnTf90mGihzd4sw91J0bJBnC/wfLmpP9a1wOwvAma1GSJ2
-1lLFrSC8bXkT+6nIiqXlFK4HqW5w3PktbO1InujFS1PoxXOdlSwdcIzQ+VDk3Kv3
-IVnCq9w8rcchthnSb+3kYx5QjA0Gb1vhAgMBAAGjSzBJMEcGA1UdEQRAMD6CHWRu
+c2x0ZXN0LnRlc3QwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAJ0ERoXVz7aK
+ZLL2W8psViLKVorl1pvLz4m0Uw0X8iQkHgN+/gMNs2nHDzQFtbOc3NVbBxnnosbF
+XuGeCrlz+xK3J4Y5g6up9xNPCbtLM+RxBtMx/a/8naO+4yraQD51pZgNUMjSYeIL
+9UeB3VFg928+swacichtJIlwU3KAiEzRAgMBAAGjSzBJMEcGA1UdEQRAMD6CHWRu
 czEuYWx0LW5hbWUucGctc3NsdGVzdC50ZXN0gh1kbnMyLmFsdC1uYW1lLnBnLXNz
-bHRlc3QudGVzdDANBgkqhkiG9w0BAQUFAAOBgQCBBVEMkprc18bqWcZ8P93JGc1r
-lJoSARfIkBuAkJODyQHJ6kp6fq1kuR8seax35VPNXIvBlPqXoS9zvXYVmF/qOJEk
-TtW8YAACZywn02dM5CQRS7T9HCcBJeFUHxbGcBCY+AqzbhM+tGii6UnogjvqdKje
-ApVvu0m4MsSn+WWQlw==
+bHRlc3QudGVzdDANBgkqhkiG9w0BAQUFAAOBgQBwQP7EuwvnURefrRvLKP+Txzwg
+xEZK/ZG/dSExX7CP8ib5JZQUuJpMzYmyGFbTpLJOU5qrE+vI2rxfHrWOYZU4IB1f
+u053N3slzi5ClKGKZt4y7LM4hupQ13xRfgIpasSJdEI3n/BCmeEdFVBqzKMQJxQe
+tHn3NapFkra7DHe2xQ==
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/server-cn-and-alt-names.key b/src/test/ssl/ssl/server-cn-and-alt-names.key
index 7577e6f..851dad9 100644
--- a/src/test/ssl/ssl/server-cn-and-alt-names.key
+++ b/src/test/ssl/ssl/server-cn-and-alt-names.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXAIBAAKBgQDB+zrUb1tKj4GAzCJ5HdJoCp6oXsRFJ03/dJhooc3eLMPdSdGy
-QZwv8Hy5qT/WtcDsLwJmtRkidtZSxa0gvG15E/upyIql5RSuB6lucNz5LWztSJ7o
-xUtT6MVznZUsHXCM0PlQ5Nyr9yFZwqvcPK3HIbYZ0m/t5GMeUIwNBm9b4QIDAQAB
-AoGAVOp2gWJR81zI0yoJeT2dyt/DPm9lueQP1+EhisQyC61K/IcBHehsx+udneTC
-RmqADqQxh+aFHzoobkmMlUUHInIF8gQe/brw6s27BemUSrT2M47BrZINnOKTvhVa
-6xnqcD46DkdYE3z4dF2DsZ+uzgw/bO4sksw/yus2C+2tLlUCQQD8dy5+Ivw7AUVW
-H5VNR0joFlR8xeJA8FA460+UhNle/oDtqEjq/YDotHdOnd8EePpR24/c3cMVfXj3
-uqTnKyo7AkEAxLJx8D55ZiDQYprL9DWONVuEk5WZJZIgCNRX+hlymf00Hfm67cue
-aD0Y8G1DA5vNywNVpUihdm9wDFPz/PSUkwJAevnG4NRDzq4QyyG5RRpLDhoKb3io
-e/9S5FbivbJ0e4w22wzU7/opt7BoSRgnUPNo40Sy79/precfbHQy7ROejwJASovu
-zsR+sgwhrh1Iywc5HFPRDTYXUrvs1CvWI/1dB6uFAw9QnysaoBr3xrdCPK3h8t0S
-qo+6Ue6uIp32zJnNbQJBALLb34EY6Au69ztILcUpYgzTE8wmXtBTt4RBQDMIw+F1
-ZBw3e3tZjKmOPJySq5v8jyNF5L3s5gd/GRtPRCTkOfo=
+MIICWwIBAAKBgQCdBEaF1c+2imSy9lvKbFYiylaK5daby8+JtFMNF/IkJB4Dfv4D
+DbNpxw80BbWznNzVWwcZ56LGxV7hngq5c/sStyeGOYOrqfcTTwm7SzPkcQbTMf2v
+/J2jvuMq2kA+daWYDVDI0mHiC/VHgd1RYPdvPrMGnInIbSSJcFNygIhM0QIDAQAB
+AoGAFVtJhFaqo/d67uSXY5cMuDqxPr84S4STO/Ws/jDtnIDVHECfqCaq6o5KwRat
+ujpxxwtUke9xsnuSBjoK12KxGYoEFCstNJx2L77TvjkxcC85C6aGHWxLWCELqnn3
+3HmCE4I9i/kltO2YTje12nEWVkntqjvnqpAaFeQQ2vRO9KkCQQDOQo2FT0DX7VI7
+riSNjspfPQS5ESB1xTlcJL7aBS7iubAkVmRPBGM/UZpLLXacgV2WqUp6swcXculc
+SXyx15zDAkEAwuGxNx+SzoLxHzRi2P64+xCo3O4OOP7Fle0/Uyk+DyS9dFljdZbf
+mh28uKqflF0LRe+J4vKKqSHb3dqLdOrm2wJARAFdd96xmn/85QB9vM6fmtcbf4lO
+EoZ8aw0Sf//FfauLj++MEyF3N6FIJhFPUjq1CL+4dswgQnL4zhzMqDZW0QJAchas
+p8e9K1bvEESb5cthweGj6gsXmnhUdgw5eVb4tObeXuIB3xJffxsPo9CHsdSyx9OP
+FqTFVnSzAfNylxT55wJAICUYlyM6/VVMKb9bAoMz7nqg7N/utGEMijT2AqukQues
+jYm2TNtP033yibtWHjwBPDKL5JxsgDfG2x1LYiG6Kw==
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/server-cn-only.crt b/src/test/ssl/ssl/server-cn-only.crt
index edf0bb8..e7bef70 100644
--- a/src/test/ssl/ssl/server-cn-only.crt
+++ b/src/test/ssl/ssl/server-cn-only.crt
@@ -1,13 +1,13 @@
 -----BEGIN CERTIFICATE-----
 MIIB/DCCAWWgAwIBAgIBAjANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNl
-cnRzMB4XDTE1MDIxNjIwMDYyM1oXDTQyMDcwNDIwMDYyM1owRjEeMBwGA1UECwwV
+cnRzMB4XDTE0MTIwNDExNTIwMVoXDTQyMDQyMTExNTIwMVowRjEeMBwGA1UECwwV
 UG9zdGdyZVNRTCB0ZXN0IHN1aXRlMSQwIgYDVQQDDBtjb21tb24tbmFtZS5wZy1z
-c2x0ZXN0LnRlc3QwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAOqxI3Umy7P+
-FqPSenj/4SgwJgKMM73Q0tJvaDNXb1ipfAUHbvnKCNUs693YjRdZwTAVXsYq8btC
-ja/4L24FLCNktLzQfxmVuueMgi7HuYevxVbhOXhxuy8cbTeC6FZj3F6vU7Obg5rM
-L6FNzVljbtx/YA2lM4H/lWafTp0mXnmFAgMBAAEwDQYJKoZIhvcNAQEFBQADgYEA
-DLLwuxmM5ReGFO6L95kxK++vFa7oXNBw8JxagzqfnF85N2leNbpQxxsDS9U/Bavu
-D0okKJR1ezdWlT0AwJcOtnt/X/qoxVFo35rIEjDZv4rWveiPwe3BeYm2tWLRHgKI
-6NrPD+kXXqGFPHobbXBPvE2MrW4p+ojD0DTeO8ZXjj4=
+c2x0ZXN0LnRlc3QwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAON0emWfWbLb
+4cSdixPiKiSwwiRbVw371L2t90jVY1ucJc/8YRUrRMhdsKdsP2NuwtFx1Mghspzt
+c/v6Dj/V9raYsHDGKK7OSPDF97GT9xM6yqm3FAY3l0QdP78XhiIZOhTO4fOJkAfQ
+LVhXca2X0krl0jF57/o5in6GHuyhulLPAgMBAAEwDQYJKoZIhvcNAQEFBQADgYEA
+V7NgLA+RVPeWo5/TLsJyEf3tPnvpdq4Dfr/nDNDyWLhfrmny3Nuykfwap8JZHXG7
+oo+owRzgAXaJnr++5PvCo82Jp+gCNf5foZBx3GWdPsJY8d/0oREhFXhpqLCUUoiO
+2295A+mrgwfXoI+tlFypNb0T9x6qHOQlBUX+o1JBdnw=
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/server-cn-only.key b/src/test/ssl/ssl/server-cn-only.key
index 9037a9b..158a9d4 100644
--- a/src/test/ssl/ssl/server-cn-only.key
+++ b/src/test/ssl/ssl/server-cn-only.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXgIBAAKBgQDqsSN1Jsuz/haj0np4/+EoMCYCjDO90NLSb2gzV29YqXwFB275
-ygjVLOvd2I0XWcEwFV7GKvG7Qo2v+C9uBSwjZLS80H8ZlbrnjIIux7mHr8VW4Tl4
-cbsvHG03guhWY9xer1Ozm4OazC+hTc1ZY27cf2ANpTOB/5Vmn06dJl55hQIDAQAB
-AoGBAN1Vp9oBd5VNqS5g/y4EK+VJ218FuHpoaZsahEv/Rrx4QsU/aHLdDg11qxBy
-/UUrWZ2uWc5Mi+ON9bAiQSDicec0ybYD5+Nn3Yv6v82J4Lr6Nlg6lsMSXxr0tfh7
-1Jh4EZWkIvMilSyo2ft2bP5o/rBCiIKXPzLDOmaoYUurNwPVAkEA+uR8icow3Ig4
-DXatPDIVaCr66cfndBSmbXe9M0eY23ic/8VNqjyuo3CNLOqBupl5teZTv6dTLXY4
-9RD5U3x70wJBAO94OTptH8Mp5aJX5PX6x2eggydTBnSNUyZZp1FquFpE5GRhyd5O
-RO7V4f0fcZCyuJcZI9xNvkqLIC8WzyZ8FkcCQQCwJk2d/HxzyZv5L/KPCebnvQ1v
-p+/EG1+CCgingUQ8CyHHngJaXMKMc9Ba0ccFeQ3v/WedbuBCUffJcAJtcEALAkA7
-fIn60ZDKUmYQ5fSihiFyxJTP9/fqjBDTvgGqX/BbvDFgHkqfRqIpEkiJMH5ti3f/
-UOdvmoBi1Byyld/vl3ORAkEAzruQTKAG5HeD+XPijO1qtaNXAiyxLr49FkJsm/Yx
-sgM/ZMbLmYZwu6JHt3+Tvo1++scUuwrsYCUmTP1+Ca37Uw==
+MIICXgIBAAKBgQDjdHpln1my2+HEnYsT4ioksMIkW1cN+9S9rfdI1WNbnCXP/GEV
+K0TIXbCnbD9jbsLRcdTIIbKc7XP7+g4/1fa2mLBwxiiuzkjwxfexk/cTOsqptxQG
+N5dEHT+/F4YiGToUzuHziZAH0C1YV3Gtl9JK5dIxee/6OYp+hh7sobpSzwIDAQAB
+AoGAHzYfiYxZQarcix9XM05InCpJKbYC9x9EbRbPJQZrEOoXYjfulnoOgTQiBodb
+F2jegOEO4ruFB/Wpgb0pcWcJ6Hgqh+GptulX1yWl7XzivvTDN6DO796pyNa581kN
+CrS9Sy0owktidlX5SJiXw2AOV1bNsvUBNapwyBFKsB3+XgECQQD/4yTJEAdRMZw1
+3czmcmzw+Aq/IBsd3w/GgX6jME02Br3LqnTIelNTPVNqEpy4wb6rNug2Dm79OWwh
+PNr799vPAkEA444gzKEUB7O2N8RnfLD5/n7Gl+P1MGOH0Rk6lxxAJ+Py8Itm1pV/
+3o9xwi1kguXdd+wqpL+B2gJDCpgSmJDZAQJBAIxbn5XaAOl8eN7jJr1RDoiuxdZI
+Whdsf063QStqFzAHSpwoh55f2szR2qtYQjblrxxjJcRg7mhf0vv4UXXcYukCQQDR
+wqZBewp3vxVtesLaklkgW8S9JwlRva3o9hSoTwZkvx+m1RnLHKxugFQg5q8MatAo
+R69XhqEwUX1zOpOJx5wBAkEAkvuuVdjo4baxildMPWSH/CcYlZs8c7ofs/Y6VkdV
+zIYMoGnGS7CfJbKLkXP7amlp9Gn1xgCPpbRJrik3Nafa9Q==
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/server-multiple-alt-names.crt b/src/test/ssl/ssl/server-multiple-alt-names.crt
index d15c911..473d61d 100644
--- a/src/test/ssl/ssl/server-multiple-alt-names.crt
+++ b/src/test/ssl/ssl/server-multiple-alt-names.crt
@@ -1,15 +1,15 @@
 -----BEGIN CERTIFICATE-----
 MIICPzCCAaigAwIBAgIBBDANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNl
-cnRzMB4XDTE1MDIxNjIwMDYyM1oXDTQyMDcwNDIwMDYyM1owIDEeMBwGA1UECwwV
+cnRzMB4XDTE0MTIwNDExNTIwMVoXDTQyMDQyMTExNTIwMVowIDEeMBwGA1UECwwV
 UG9zdGdyZVNRTCB0ZXN0IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKB
-gQC0Bkarg2DSU/0+vFG5dgmxV38SSC006t69zkFmrkUMIEg0iuj4I44qlOf/6EP4
-++RfDwQpiUNRTQmTNwDmjmc1bsXysIvVPzIDKEgpvqI82T1sLpF14PogoNlAzdpu
-CnpnU+QTUS3Ic5dhxK8YHyVtsG5nfF/3u1S15p5UaPGiOwIDAQABo2cwZTBjBgNV
+gQDMaR3uJ4+1P76DaUDWklA4L3Uic0ogorawfGuBO9pFB0w/kV6AAJGwhEy1DTi2
+neaAraa383F+e0Kpmbp1hXZ8k5DwKe6pHCv+R1RwRMqGrC6nQPM/tRsU97z7ROM8
++5QDE64zYtWkJGjQrpXmLC2sHSFIFyrHoi6MIzopKNSCpwIDAQABo2cwZTBjBgNV
 HREEXDBagh1kbnMxLmFsdC1uYW1lLnBnLXNzbHRlc3QudGVzdIIdZG5zMi5hbHQt
 bmFtZS5wZy1zc2x0ZXN0LnRlc3SCGioud2lsZGNhcmQucGctc3NsdGVzdC50ZXN0
-MA0GCSqGSIb3DQEBBQUAA4GBAASEAOEwDFE4qCPXJPpEzma7+vRqVFedWPXFXoW0
-R3HCGlvYJKwnlgxf41ipWHWmJPHLdg+KVJtlfRQ5U2SIIn7yjr3Wk+apcvWMvDpQ
-lkIVTwCmSINnj8GjQqgJsHD6I75edRaMQk3PlurzdBWJp6oz+UWbYvGDRDC4pHWu
-nLhZ
+MA0GCSqGSIb3DQEBBQUAA4GBAG4lVLFuJsXsaeFpZBiudnklH17bAx11X51UsL7r
+oDp1AL2bHZnACqedHyed4n4+4UYezPbLOO5ITFSkIdkXYa5ohTjrwymhVGN9Sxlb
+1fitKKXWenvixOwPVk8g4e1Ev8JDofTPQNIFA7C8IbGunm8J0Pe7jF6KxlknP9A0
+x0Li
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/server-multiple-alt-names.key b/src/test/ssl/ssl/server-multiple-alt-names.key
index 64266e3..9b905e2 100644
--- a/src/test/ssl/ssl/server-multiple-alt-names.key
+++ b/src/test/ssl/ssl/server-multiple-alt-names.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXQIBAAKBgQC0Bkarg2DSU/0+vFG5dgmxV38SSC006t69zkFmrkUMIEg0iuj4
-I44qlOf/6EP4++RfDwQpiUNRTQmTNwDmjmc1bsXysIvVPzIDKEgpvqI82T1sLpF1
-4PogoNlAzdpuCnpnU+QTUS3Ic5dhxK8YHyVtsG5nfF/3u1S15p5UaPGiOwIDAQAB
-AoGAPa3gzKbIp4drPvFatsZAb+hgey0LgBPwmOtv8PRIZ+0vkAD/7PSRovk9u6oi
-j84N4pvMe0ayL8rLOwsfXd7wcQTxDPxy+RkkMW7RRFzusPjeTMgS753/l4IqehCX
-2SLPBkE9e3/UMRR0vds8T7btgTrv3R8pcgntli7W6RPrmLECQQDgZDjxx9X9O36v
-SR29RhMUQdz0PQpHYwhtmBLECmr1Lpecu5Zr0JOaabWvd5Lzx1cV2hmldZFQP/gO
-fEdzhsfHAkEAzWIjB0y/NH61U4Bj4fML1dGnMEzO0wm0MVEMKjcmPJUbtktvZ6jD
-MedYw5VLcWbjXMAJt70UFjcxxAJPmZXZ7QJBAMKEnwiZX1uCc7OoAmvNj0SEQ/JF
-598ybl/y8HGZRlb86NkplKAp04qMEL/nPDCvoUKEKq9QV4PlsDd+bMItGIkCQFml
-omCHUVZakE84VWDEs7/K2U0t2YEoVSzJkaPDmr8K3qO9XY1Djp/zuTz1p46COG+9
-qwA2WdQwl1pVH+WMESkCQQC1UPYLBYIDj0JaJokSgBPh71Ui8/iBP0J1cvhvKOsS
-LrEO4JUq2HBFVcxb7QahHPC22dWI8HlIJgzlUi9BEJPv
+MIICXAIBAAKBgQDMaR3uJ4+1P76DaUDWklA4L3Uic0ogorawfGuBO9pFB0w/kV6A
+AJGwhEy1DTi2neaAraa383F+e0Kpmbp1hXZ8k5DwKe6pHCv+R1RwRMqGrC6nQPM/
+tRsU97z7ROM8+5QDE64zYtWkJGjQrpXmLC2sHSFIFyrHoi6MIzopKNSCpwIDAQAB
+AoGBAI+m+/LHeLYO0yt1B60D7D5gE7ifPyQKVctX1RFgZ7eFNm+iEMByJfDgOSwv
+24BzHW+nGfhCrKsPorygHarDnY0TfInIX5OHaOgiJ2z6mQnCmH9nwX+ZAezeVlZR
+3QqdmRJzFRcqVD2cU2nk/DG2MJDpyqyfaBQ+FXHZGX03LWbRAkEA5SpdsgQkOW/S
+/5ENpACEixD9WU2EJjNOymddy64ODy6ug9Xc46nw6Xkn4EI+wmcmrmi65oH+hx+G
+g0k+aQmi2QJBAORYri1cNxupOSdQ7tY6DoPuMKMrV4FmuiF3XJWb2qZoNZLG4Lai
+kLn9A8yWJmKyu6V+tvz/DYRHlA870FDM4X8CQENoE8lB+JnAb7Lmqrl7wYDaTXsQ
+FvfZjapxfyBjIRWMKJ70sBVzLj6ueXE4axdpmfIhMiCNSh3awwko6SeiQvkCQHwx
+LypEiURmGUuk3QFuug5PMezM2d7rPDiPbq+AAL+Y1epqeDVc3VIKplJTJ7VueFhe
+PrADGBrlw0U1xurrQ4kCQCui0QT42Ppey48lvxu5S5+dOvkjF8p7Ml59aRqlcjPm
+2q38zM/AJlrekSclGszy/PjaXeZzFM+aCYKAsk8h7YE=
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/server-no-names.crt b/src/test/ssl/ssl/server-no-names.crt
index 378050c..0bb02f7 100644
--- a/src/test/ssl/ssl/server-no-names.crt
+++ b/src/test/ssl/ssl/server-no-names.crt
@@ -1,12 +1,12 @@
 -----BEGIN CERTIFICATE-----
 MIIB1jCCAT+gAwIBAgIBBTANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNl
-cnRzMB4XDTE1MDIxNjIwMDYyM1oXDTQyMDcwNDIwMDYyM1owIDEeMBwGA1UECwwV
+cnRzMB4XDTE0MTIwNDExNTIwMVoXDTQyMDQyMTExNTIwMVowIDEeMBwGA1UECwwV
 UG9zdGdyZVNRTCB0ZXN0IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKB
-gQDE2mzybnsgbq7owCPC0m+igNt5pBk5hDpzFAhpbAZ6hZ85AzHnLbpCDTH5w0Zm
-HeevCRkNcDgjqfoDo4DruXYpI8jH+QwuKvUwTt3GGm6C4lb3SBtfNdzJsk5kRE3o
-ziMG/OxtiApxFu14nbCnqMuDs3meykD1jHheK0CsHCKq2wIDAQABMA0GCSqGSIb3
-DQEBBQUAA4GBAFNfiKDTLJ2V7kgIWDEOcyKQY8T4cAzgz6jcpN9CePgATB2Yrb9P
-x7kkKW68h9SbEk6qtS4YQZjSXWKUqrjjIW22+DJSQAXMZoyADZTnZOASHjNXIzLE
-y6B1RX+c7CjolHHSYkbki3RqKGhTQr1hnwkq3N8Fl9bftT5zFuezwnjD
+gQDXRZ+AOvlbHnupGqPsW4zYOE36wyMgwYnsafRMRkZd9R5GLtqnchIBdA9Lg2VP
+gK0355KTYtsdfdnC1kjflkeY+ZAvpRAMarV9fuE3z5kE3qDh4IsSJ3EyxoH7QwEL
+x01INf7QVEb/7y6s1Cw0GBonXnMel/8kfPBFpJ3+p4rw1wIDAQABMA0GCSqGSIb3
+DQEBBQUAA4GBAIvauevxaS0gnHu3RivMZp9UZe0r9Ja38CqdFnNUT+Z4MS0fOyWq
+9uz3JO7rdLmBtLEUNX2VNR8jIet9/gfxAO5MTYw+nSQ7Ci39kgSQYkCs4gZVS2TJ
+GeOpfcBipOFkI9O0nElAmNVFDB5j5bY5NfCNAMoD1q/FzovxlcskCOdh
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/server-no-names.key b/src/test/ssl/ssl/server-no-names.key
index 01a0915..7453ba8 100644
--- a/src/test/ssl/ssl/server-no-names.key
+++ b/src/test/ssl/ssl/server-no-names.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICWwIBAAKBgQDE2mzybnsgbq7owCPC0m+igNt5pBk5hDpzFAhpbAZ6hZ85AzHn
-LbpCDTH5w0ZmHeevCRkNcDgjqfoDo4DruXYpI8jH+QwuKvUwTt3GGm6C4lb3SBtf
-NdzJsk5kRE3oziMG/OxtiApxFu14nbCnqMuDs3meykD1jHheK0CsHCKq2wIDAQAB
-AoGATKkLWHXx+TVhZD6/LnWpB83KqtpfAGkgIgShKfzpoPk8goVd/7ttF5/v4GZN
-miL3QND4MqWLF0hwls4rvKDjBH7q4zw+AR55pnfwoQMsfqMvAn7wZi5HKTah1xbj
-yf2J1N62pNW4ZdFnlcXmAPLVDxKyCYaZqdeqgr4VkLvgIVECQQD05OYFasP/5be1
-wSj7zxd5vPK2/EJ6CFN+gwXXYOZWWR7m90g3CXxMWeH7RPIlrfcPC8o8r6xna2BS
-E+BKzTYXAkEAzcfLpwZUHcCPrCipMsoC35FQhNCpecuZjVYx0oGsfiE6gu87ddLX
-H3YL7+EEmtPdps4fF/9WK87MSpj1IRFv3QJAJIEOTJZqmvV6GeyuGEL5Y9snbuFR
-Y3FkSMJtF3rJOuvT8GfB6vpN/e+UAOl5EubIogSG497n2w6lb/aog13thwJADtel
-WcO8F3VHJ5y7L32gnW2GyD2gq7dCuQ4Jg+x0e5h79uu4dzQg7hT+oWuygFRdvWVK
-mtmA5qIA3DSSIbN3RQJAd97xYxEPMF2NU+vdsLBxrkdH9tCHrqOlzEVTdhBCJrx/
-L/lJQvtxkpWEiFtQdd5OhAurNZ6iWoIdA7fhNHPCqg==
+MIICXwIBAAKBgQDXRZ+AOvlbHnupGqPsW4zYOE36wyMgwYnsafRMRkZd9R5GLtqn
+chIBdA9Lg2VPgK0355KTYtsdfdnC1kjflkeY+ZAvpRAMarV9fuE3z5kE3qDh4IsS
+J3EyxoH7QwELx01INf7QVEb/7y6s1Cw0GBonXnMel/8kfPBFpJ3+p4rw1wIDAQAB
+AoGBAKMdFCBbjzmlvVmC4BZlwDDNaPjLB0D4pQNHvV5WGVd0Nb5EHlWmL1J+mGBF
+bWxyOc4UX5Hh49lS1L+3EnyoKBKzsuPafLLXpVM2ujkkJt8iYenWUDqw1+g6zM97
+bHaQAa/U6+Mqn+dfcAn4FpYknZ0V4cvKqKw6CzjypkmHeLwxAkEA7F0tiV2nhkzN
+huOifLaxQHkOOBIgaFLGAMLwYHLlwVjPxk6O34+XPMehFQbetL431ZweUdGSY1fX
+jURXR72APQJBAOkn4AuhVWoS3lMWQc58kMCzY4+Xwd6ILKXMhFTZA2iNn7IUVEUe
+F2wjq292lu+tIfH+CdRjgCAC7B4OVaI2EqMCQQCZzHeY7ovXY5pIr05HgEkN/rc1
+3PWhbFrSnAX1fE3r5XItQ2jMJ47tSaiTGglH6o5CPHeuHYP3iG0FyvZQBAqxAkEA
+izskr81IFG/wE+3WnlgEmQ6HBdi6DQmEn/3hiEmPn3/zPYSmTiAKHKmwVn+a4sWg
+38G0XQCOIo+cMNaejJ99wQJBAINI+vfNR8A3wDsvz9hVpa+yZ4aDOeFETJprGsin
+D0v/xzviq1LPrOCuarioyEpYS47bzIKGSeAtC38VB4tQyEo=
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/server-revoked.crt b/src/test/ssl/ssl/server-revoked.crt
index 0197116..70150d2 100644
--- a/src/test/ssl/ssl/server-revoked.crt
+++ b/src/test/ssl/ssl/server-revoked.crt
@@ -1,13 +1,13 @@
 -----BEGIN CERTIFICATE-----
 MIIB/DCCAWWgAwIBAgIBBjANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNl
-cnRzMB4XDTE1MDIxNjIwMDYyM1oXDTQyMDcwNDIwMDYyM1owRjEeMBwGA1UECwwV
+cnRzMB4XDTE0MTIwNDExNTIwMVoXDTQyMDQyMTExNTIwMVowRjEeMBwGA1UECwwV
 UG9zdGdyZVNRTCB0ZXN0IHN1aXRlMSQwIgYDVQQDDBtjb21tb24tbmFtZS5wZy1z
-c2x0ZXN0LnRlc3QwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAMGFtZgJN+Lt
-w1Bu6MmAB6h9IUrSFEVxUrrxwz5RG3UDiBkr8StZCM2hXLdSB9tSjBLIWILmuPCR
-ydyf70XFTTO8L0Mc6F38I+4GVthNp8h1VJIrl1wRQIfVqFbbKYKiyCQYITzezVuC
-UjHjo6xklmMewdInRrcNbWxNVkWH91zLAgMBAAEwDQYJKoZIhvcNAQEFBQADgYEA
-m9bRiYypdOrU/1hCzo6bj3Ly39/zUZp+T5xBkLJQpgVLTU8GSEdP35kc3CWzEu77
-39610RY3X0A5fNTLs74t7w2dCViYPvNu/suu87AVtlioHMkwL3QEOUnWM/l23XUR
-mj33SwQfmLOV94cNLVTd8IZ9PIT0ARn/YrS1Prx1zeg=
+c2x0ZXN0LnRlc3QwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAK09F4gQyZP/
+Z5fbOrLbQySBonwt9Wbb5iAEm5618oDk/YLkl1AQo2eoTabevY7+DwHPMMwR9MSA
+yUlvJ8Gc3MNAOIag0o63NNOZxYIzpqqAElOMPNE/FRlpVJyauGZ7lV/Y34vjtHxu
+4Pmi0jOLNMzUNjlN0rQrz0xaTGQ1rX1bAgMBAAEwDQYJKoZIhvcNAQEFBQADgYEA
+ghgb9HNsSfyX1JMLYlCudOTQ/LuoXeXFsqFxRDAOXCCaSrH9T4lUZayBGNOd8kgZ
+FFHJo4WhZx7sE/foXuax/QGLi/mGrVw2xfJdD9SIQndzdnExoQndb+gvGSH/23s4
+Oif6jcSMPCLpoTaqVQdyPcw7DI9h26YzZ71IybBxPNE=
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/server-revoked.key b/src/test/ssl/ssl/server-revoked.key
index b12071a..550c1d4 100644
--- a/src/test/ssl/ssl/server-revoked.key
+++ b/src/test/ssl/ssl/server-revoked.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICWwIBAAKBgQDBhbWYCTfi7cNQbujJgAeofSFK0hRFcVK68cM+URt1A4gZK/Er
-WQjNoVy3UgfbUowSyFiC5rjwkcncn+9FxU0zvC9DHOhd/CPuBlbYTafIdVSSK5dc
-EUCH1ahW2ymCosgkGCE83s1bglIx46OsZJZjHsHSJ0a3DW1sTVZFh/dcywIDAQAB
-AoGADWTrxLIepB5AvnhutEOgRBElFMCllojZaQcCtHV4qQititB3yMBI07KvcCDF
-WnDEMSict7KwajYs+pA3R2T4itVLCoz5iXYJ2CIf6XOJK+NYcf7XulSm5IqukbqT
-3KlofUY2GY/5DN9tgUUnAsZ7wh6iMaq/H+BPBcblZg2kyYECQQDpYRAjwepzpr0P
-gfohKYlfKJwQ9WWTRkMasn6q4DY6txeXNk5nMC9C3FHeiTgpfRr8GZBvk61lb6pV
-pFWADR2TAkEA1EepQ95Mums8BxU6/PAOhXKLlyYvldaIXcajv/+/PclVuEL8ko5z
-jspEGk7U/jqonwcN98R/h4ui7nxhoxIG6QJAFydgGIwWnJ7ARxeYH04lqOE4ip4u
-E6x23+Exm/ZeqvibSI9EvAwVxEZjgPaQMd2NndFTeR5np5aqiZCiQvAKLQJAfRs+
-xqDc14Ebf5Ejkq5n4H4BhrMamFQ3Sg0ntKAlNWTTACV6dWU+9Yh/WoHbRXmMpyyh
-LsS/5EKHY8YqRND7AQJAd+qIgqFUI0RAwvbmLxW/iR5JIKM5kZ4xJ13/O4x55XEI
-4H+8YS/nYPnjMpaEWrFppNfv2UEXD2L1OkJVuYx1Sg==
+MIICXQIBAAKBgQCtPReIEMmT/2eX2zqy20MkgaJ8LfVm2+YgBJuetfKA5P2C5JdQ
+EKNnqE2m3r2O/g8BzzDMEfTEgMlJbyfBnNzDQDiGoNKOtzTTmcWCM6aqgBJTjDzR
+PxUZaVScmrhme5Vf2N+L47R8buD5otIzizTM1DY5TdK0K89MWkxkNa19WwIDAQAB
+AoGBAJNJ5b/hxgD2nXUXB4kZsrRPI37A9GxHehiu0kDWISBFkOTAxYVlIAj5p0vB
+BRmWF9xJ9AsNGTYY6QpuXzbVzzsqxpzqfrmcbpnEwJPIN74cWSBU3As6SVtkD414
+TjV3TxJlER87D4Jtk2vWvwjWt2tj7fAe/9B44l211jStT0/BAkEA5oRVIBUrLGhb
+ZCHMFEHfF5BtGYNUa54QDK1cJCmstv3CEfR/g1cHLKgnbXRzbIZ4u54sjY4PhviB
+nCVdr6umbwJBAMBjxj8BOFceswCgse3LdaO5O2YPh/h014iiwrll+XB5ufxq5mOa
+9gaKmE8eUBmuWwcE05xbzSUaBhCTjVNNrdUCQQCvylcIYlxMP0ECuWtiP2GcHL22
+aRql/yIKKOJNiaJ24klvW98qD+IewhVfOSEUr+++VD9xq9ZXfYeJxk0NvH7tAkAR
+hQR4mFPZGyKR3BBX5z8/OY7/LErlhT5bYvb4iyC77VnScqmoSGQ/FD/qdIg2znnb
+mcTraDC2QDhtKgKko15BAkAxtNExYOsPlW3kuhMRnDV3mB3h1TLghl9rYuHfSM5F
+9D7tpJ8FZa4P7BE5bfI1CPoRJsVVw0rUf1ihaMkUyxfI
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/server-single-alt-name.crt b/src/test/ssl/ssl/server-single-alt-name.crt
index 349792f..e0cdd82 100644
--- a/src/test/ssl/ssl/server-single-alt-name.crt
+++ b/src/test/ssl/ssl/server-single-alt-name.crt
@@ -1,13 +1,13 @@
 -----BEGIN CERTIFICATE-----
 MIICBjCCAW+gAwIBAgIBAzANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNl
-cnRzMB4XDTE1MDIxNjIwMDYyM1oXDTQyMDcwNDIwMDYyM1owIDEeMBwGA1UECwwV
+cnRzMB4XDTE0MTIwNDExNTIwMVoXDTQyMDQyMTExNTIwMVowIDEeMBwGA1UECwwV
 UG9zdGdyZVNRTCB0ZXN0IHN1aXRlMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKB
-gQDnDMJShFXba5o4o1ubyRmV9AyJLVM+8nZaC1iJzqeIPObXYpbcp3bhXtowAwvk
-d3IGI/fBm/2/NKvYnyagTS9DUNRTnykHxlCKsMitx38+sU1EerkDltK0OId+obvZ
-eVD+h3j7pVfA0NPKXkpcP3xoihQU9I5kOPKQEIQPNAUfdwIDAQABoy4wLDAqBgNV
+gQCqGqSv5i36Nk7FDQhRmBXi9M81Ox6YNAm1ha6Wj/MCfOlsCwWFfTFoTf5k1Why
+dB8yDlqGvZ2daf6mfgaui4R9uwVgbrPhhDUArtT1u5PBIqDdS10Z8E8iGStga/HQ
+J4CPlrXltX0ll4EC0EMj6SaF1dsYTMld4O7ipCZ62U7EowIDAQABoy4wLDAqBgNV
 HREEIzAhgh9zaW5nbGUuYWx0LW5hbWUucGctc3NsdGVzdC50ZXN0MA0GCSqGSIb3
-DQEBBQUAA4GBAHDkKwNT8p+/sv0Q2dIo0jdFxtBdS/fnsHzjr5ZJNa9SXEer5m3v
-lMllzYW8lfTdR9PgBZ3y2lFZixD63ZF3nkIEYcD5njLp4ui7g2OVqxRfmt+rHefh
-HiKm5v5GLs72lhR4GQT13AsjGVS1WWZtYhO4LwTjN+nbjnRIpXIhrSC/
+DQEBBQUAA4GBAC/otQQ0Ie+YqgH53nDvMpZ3Ol8xodOUWneRPI8Wf601Qi8Q4yY5
+9Gv+iU6ZEidn92y0hYXiHzItl/1K6LvIQM/yuro0j+48M/rdzl1qImSnxy0eGHbA
+3fOvUNt8ymYKsgsAfeqHZmWUidEZ/jF9Y72oER/6ImnmsbjEFtHVFDRL
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/server-single-alt-name.key b/src/test/ssl/ssl/server-single-alt-name.key
index 71fd85c..c486c26 100644
--- a/src/test/ssl/ssl/server-single-alt-name.key
+++ b/src/test/ssl/ssl/server-single-alt-name.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXgIBAAKBgQDnDMJShFXba5o4o1ubyRmV9AyJLVM+8nZaC1iJzqeIPObXYpbc
-p3bhXtowAwvkd3IGI/fBm/2/NKvYnyagTS9DUNRTnykHxlCKsMitx38+sU1EerkD
-ltK0OId+obvZeVD+h3j7pVfA0NPKXkpcP3xoihQU9I5kOPKQEIQPNAUfdwIDAQAB
-AoGBAOP42uPAX1aY3Rp1VLZpvi0PGC9h4XmCkvRVrY6LsRHjxYFPbbtaIRpOFMq6
-tsk+cetNIfCOkdhPiB+9KMeSYMYShiyCrHfFxuS0FIP4rQhBB89wzcjffw2CYLGD
-Umx65+XVv6RBW85p6v4s1+LQMVUtf41yxm9JXT0TVDjEcgRBAkEA+/FKxv9DuZNZ
-Abjak3MeaULpnPl+Fxp+jg1M4wK12MFYCm2eBUx0X+cqVORErwLJ3gdXQBT7fJQz
-bNwxjUKuTQJBAOrFVKF2dtuPAeFBlKG4sy5azGfgzS6cAJQ4LPp4uGX7ve9C8OzI
-oZU21LT4cm3nuFSeMjcCKHmur85gFQrETtMCQQDKWu1yk8gzn1OX/H8iew3sAaBd
-Qk6yA8euFKSymJSyOeiax5xqKRQ3ixYHBSjdYGH/AOplP/UWBHqhbuIl0W7pAkAr
-f9qZfCizr8CqawtOF7njeeFr0eRSoYcd73auBhYsl0NvBJk9VkNSMXGiAnK5WHj3
-/MPTG2xCd5KNi5H6h7sPAkEApf8JUvEA5ZPkFAA6x+OXLmEL+nXOnJnhKjSUIVJx
-Pgp7FTy6eKg+/iUEyhRHw5So7QjwHqH61+CIBNS41vGPuA==
+MIICXQIBAAKBgQCqGqSv5i36Nk7FDQhRmBXi9M81Ox6YNAm1ha6Wj/MCfOlsCwWF
+fTFoTf5k1WhydB8yDlqGvZ2daf6mfgaui4R9uwVgbrPhhDUArtT1u5PBIqDdS10Z
+8E8iGStga/HQJ4CPlrXltX0ll4EC0EMj6SaF1dsYTMld4O7ipCZ62U7EowIDAQAB
+AoGANPD34psEIkS2vVNyDFsGLM2+k7WjrwE7KFjD3q5MlrCjwXGotUQilXD4xQ86
+Y6zKbLzU5eyr2ms7yzub/sUDZdBdJkU39NzEAf8dKN/UvhSGSZH1zxRCm1SmDrxZ
+BM3TEEGZUVrcamJF2EldPdbmBo8EiFyuPT7UMObvYt0li6kCQQDUol5X57+6ZGvF
+QrmE/3dz+zxa5fZikiG82mrpS9RiXlSssHbE4z04UARgLCGloqlFIQTsKCHnsK7i
+hTX41FL/AkEAzMvGWthn3QG1cVT8tLhinhAL2C/v9Mjq1SnlWi/9qs28mAtYa926
+wNv+DwcEOEvamCStj+n3q8LRZ/zxbpliXQJAVq7eoR1z9uuLV75s3QA8VUbdgvzu
+pZ6HLHMqVHM6YOOtxzylHny472UHc6FqEhkuwmTEmfV+ZPKNSQEfUJJWRwJBAK1Z
+p7LqDzCh66Xc3HNUyBUnW/9IxIKdNznsVrk6eiwELik9IUFc1GG/VZP+ynGks4mp
+MkjpML3xEDRHhU2rA/kCQQCzhjCPuiZtPpe7fM/gl78q+TSXu/Rr8e/LYYmHeRbq
+G9ojPe6Kx+toOPrOaaDLbRd6tkmwoLl9eSW/EW/aHwWY
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/server-ss.crt b/src/test/ssl/ssl/server-ss.crt
index d0c9b83..e11d841 100644
--- a/src/test/ssl/ssl/server-ss.crt
+++ b/src/test/ssl/ssl/server-ss.crt
@@ -1,13 +1,13 @@
 -----BEGIN CERTIFICATE-----
-MIICCDCCAXGgAwIBAgIJAJyw4sQKTY2UMA0GCSqGSIb3DQEBCwUAMEYxJDAiBgNV
+MIICCDCCAXGgAwIBAgIJAJKfiuFnjnPZMA0GCSqGSIb3DQEBCwUAMEYxJDAiBgNV
 BAMMG2NvbW1vbi1uYW1lLnBnLXNzbHRlc3QudGVzdDEeMBwGA1UECwwVUG9zdGdy
-ZVNRTCB0ZXN0IHN1aXRlMB4XDTE1MDIxNjIwMDYyM1oXDTQyMDcwNDIwMDYyM1ow
+ZVNRTCB0ZXN0IHN1aXRlMB4XDTE0MTIwNDExNTIwMVoXDTQyMDQyMTExNTIwMVow
 RjEkMCIGA1UEAwwbY29tbW9uLW5hbWUucGctc3NsdGVzdC50ZXN0MR4wHAYDVQQL
 DBVQb3N0Z3JlU1FMIHRlc3Qgc3VpdGUwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJ
-AoGBAOqxI3Umy7P+FqPSenj/4SgwJgKMM73Q0tJvaDNXb1ipfAUHbvnKCNUs693Y
-jRdZwTAVXsYq8btCja/4L24FLCNktLzQfxmVuueMgi7HuYevxVbhOXhxuy8cbTeC
-6FZj3F6vU7Obg5rML6FNzVljbtx/YA2lM4H/lWafTp0mXnmFAgMBAAEwDQYJKoZI
-hvcNAQELBQADgYEAGweDmEYzoEWb3WNn7Mc58ToPnl5DbRZdVmRjsyC6J5oZRu2E
-e/GZZ/1MSNPgccoyhdcPmSqTzUzbQnvYsqcHfuncA/oNJR3wvMV/wSy0QepklX1b
-ixjZg9c+mhQ/JTSjYnRK5iSTPNX4F3zkpvP79POuQYl/7Oihqxl0Mmkezuc=
+AoGBAON0emWfWbLb4cSdixPiKiSwwiRbVw371L2t90jVY1ucJc/8YRUrRMhdsKds
+P2NuwtFx1Mghspztc/v6Dj/V9raYsHDGKK7OSPDF97GT9xM6yqm3FAY3l0QdP78X
+hiIZOhTO4fOJkAfQLVhXca2X0krl0jF57/o5in6GHuyhulLPAgMBAAEwDQYJKoZI
+hvcNAQELBQADgYEAoMiPEDM4EmgyOc6kVRSVa+Q+6Wc+O7WX7LmEZprXyJErQ51H
+X0KWcznjass1YzVeT+hCOyEQWSbEs8W1+b0FZleD6Cng9ZfD10Oz/4nCBy8al7sn
+GlQk0KHYlMOcQDfmFr5CNuoIo77rtapDlVxIhMAxBJKTlYvimbNh3XM9g4U=
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/server-ss.key b/src/test/ssl/ssl/server-ss.key
index 39cf3e3..a32fe81 100644
--- a/src/test/ssl/ssl/server-ss.key
+++ b/src/test/ssl/ssl/server-ss.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXAIBAAKBgQC6FTHDuNKbYQChNtxAFLKzJESIKOZh8WpgCN91HFtnXX4hp3rS
-bkEDIABlQdqfcXLk7PmlR/rboOIqwuIAaIa12BxEJ5KW2vtcSAFd17anhG/a9n8w
-cQnoEUvLLAp7V2xGJ8Cu8mtyv9Qsmd5bS+SFchmbIcMZUb4znZfIr7AWWQIDAQAB
-AoGAJvVzAtA6P8+ySw5qVHxA4aKxOnSdr1nU9KBG8ITsWhrH4pHm9BGjSN01V/3O
-oN0mueknZ0RHsB3h3CQLHxzDPwmsah7apU8W/1AVyZ9LDEMuoZQef3+JfegmuNMj
-YYtBR8xozTviOH0UH6t3VOW8Y2TLtZo5kMz3XwjWBS+cCYECQQDlPEfH1x9QGXNW
-Eo37QK4UkL6/2czIXWitvb5+79KiG70XYIxrQR9NhpZHSGjBlS+TqJ4tnQa/fv95
-v4I7Q5NpAkEAz88ax91FeHr8y41s01MmJ6Gs6EOrFEpoHGboDdbwJ50pME5XnVJu
-xjHPklHgwiWFf4dQURjv6hCPUMVpe1w9cQJAZocPk9Ijry+y5kxmNHo5YflbV3OS
-pAsjRpIXIa8iBl9hs5L7Ov1lgscvb7JzKCIRpXlFRiF1YzDqEwoUtW0EAQJAH+/c
-VcsT2ihMoZvilbe5rW2TfT6pFD07MuI916Ko1e25Xssre+onTB5roDklKbFKiwbo
-uQ30ESzqWad9RpAugQJBANmRD25BmlHbdBDg+Zfd+4jDPAjXN8OesslEs5dMvs8C
-vqGrozvmtpLRcLiIitTiT4TzuUPowgZQtCjC0X6jSGY=
+MIICXAIBAAKBgQDHz31rLL10XaTbD+K4UQcrtV+2Jq9C9U65lDk6LtsGVkW3os0P
+b0MtXZEijnPvVDa7Kwq4mvKzTJ/SZwtA6zxSC8gs9gpql3ZSINsrzHC9XNxNbUgZ
+gPGiAq9j8C2kRnTKae3tDsEa54XWZXTFniu0Lbk1lXEeompCZZv4wqaSfwIDAQAB
+AoGADrqV1TOsF4rbnyZRoSKf87HgB05ctwPcNMPfYBGaJaJwazP+B7g87HgsPa7g
+jvDXQ/7NQIRzhZINafYcl0F/5a7tbO6DyCXJzoYYZ4NOb8ng1HBpaBMhcigOeeeZ
+i+KBYDjPzEeVfUOxIADWuh8HuVQWgB2WOdWg2GSuC4MZrXkCQQD3AuKxW7W4PJzP
+ZMY4RpnvFlfUyNWKM/0vMeDv88QUK+LH1MWMSIsYPSBS5sTB8b7lR0kb3IhDkuNP
+MeoiZNIlAkEAzxTj5ITO92RL7JI85Z1WBMPgOjvw5ffQF95GiLUN7My1SB5KJUWX
+pI108sY6oigHNDYjlL8rfsJHoz3MpPaG0wJBAMNOwcXwuMebDXYivWSD1nUoGnyB
+6+5h2yA09SFlgjVc2eydfTHFrk2VD3jdRNgA+Kq7acAg6JFdlGPrGLDnPQ0CQGrs
+X2tMA82LVQSW0ajBn3ugY/PNpWoolaLtW0AVNFZzsJrHQQOTtmP5wkvkfLvjrSyR
+U7fnKZ8u02x/aV44CI8CQCQSiBF1mTqiNUZswX0z8m8KRiYplS9+UdtHqbwo91dw
+BD/KWzZKJjGEbc3RN1MIQUq02cp1ZU3pNU7afALBF4s=
 -----END RSA PRIVATE KEY-----
diff --git a/src/test/ssl/ssl/server.crl b/src/test/ssl/ssl/server.crl
index d36ce7f..059f8c5 100644
--- a/src/test/ssl/ssl/server.crl
+++ b/src/test/ssl/ssl/server.crl
@@ -1,9 +1,9 @@
 -----BEGIN X509 CRL-----
 MIIBHTCBhzANBgkqhkiG9w0BAQUFADBCMUAwPgYDVQQDDDdUZXN0IENBIGZvciBQ
-b3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNlcnRzFw0xNTAy
-MTYyMDA2MjNaFw00MjA3MDQyMDA2MjNaMBQwEgIBBhcNMTUwMjE2MjAwNjIzWjAN
-BgkqhkiG9w0BAQUFAAOBgQB1c54zLMueMtLiSmBT6kfXJe9o3Krd2n774g7kzNlR
-DeLpCHeUvyLF0m8YK09vbLv2W0r6VQnbjyQGr9xyweRLLtOXc0FIDsTO8g/jvMSq
-Q9zITuqWiCHRbNhi2B3HPo2NsrfA+tQEAZvMUgnynlerNvGkLWQZeC2UsxrrSs4t
-9Q==
+b3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNlcnRzFw0xNDEy
+MDQxMTUyMDFaFw00MjA0MjExMTUyMDFaMBQwEgIBBhcNMTQxMjA0MTE1MjAxWjAN
+BgkqhkiG9w0BAQUFAAOBgQCmFnFkEt0+Ialw4E+4nIAJWJO9XDE71FdRfX3QChs8
+ZJtBseaMNeUC1FY1zHOYQhtMy+Uatda6hx/QiyidF2oP5KpWp+R11M554Ifxem3X
+KDQDBQNee+1IIJ7a1kxAUxeSNP+0a3/bmUxI5sbomINnKeIDqDO8d2vmO2VLxJm6
+MA==
 -----END X509 CRL-----
diff --git a/src/test/ssl/ssl/server_ca.crt b/src/test/ssl/ssl/server_ca.crt
index 517a30a..b444bdc 100644
--- a/src/test/ssl/ssl/server_ca.crt
+++ b/src/test/ssl/ssl/server_ca.crt
@@ -1,13 +1,13 @@
 -----BEGIN CERTIFICATE-----
 MIIB8TCCAVoCAQEwDQYJKoZIhvcNAQEFBQAwQDE+MDwGA1UEAww1VGVzdCByb290
 IENBIGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc3VpdGUwHhcN
-MTUwMjE2MjAwNjIzWhcNNDIwNzA0MjAwNjIzWjBCMUAwPgYDVQQDDDdUZXN0IENB
+MTQxMjA0MTE1MjAxWhcNNDIwNDIxMTE1MjAxWjBCMUAwPgYDVQQDDDdUZXN0IENB
 IGZvciBQb3N0Z3JlU1FMIFNTTCByZWdyZXNzaW9uIHRlc3Qgc2VydmVyIGNlcnRz
-MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDDAYtajRx8vM6IB0SLZsAhTD0Y
-VHM+/+t0a4m3JXolJBbo9/B2/WAN0IH1E2zmlalLc3JBmGsH1a8U5ZlRow3p2ODL
-rFra9FbOl0wekmRFvZeaRln/99dpI5itVpL97QPHO8QMMK1IsyurFA5GfuPOBx9P
-i0MvzsT0tYsRvR929QIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAJw4ngOYElfyMYkS
-K6bOgMosrBoX8ns6jQgdXEzf7QOIa110bs6nD+XeJeKmzUAZ3wumXBTalPaiqkEz
-bq4nlsEs1phvj0Coy5eehjV3DB8bDLEneOlV5N9y4Z4VO1BrhX61bLiPXBRp1MZR
-I0sCdxhswSrq02/OuFGe6mqrSBBI
+MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC15tOzDBVKaCRDz9L5LMpPk8DR
+RGHHOe4OuO6WTkUzjbjuKyiQbmtcp00R4dULbSM57ESvI/Ny0gPt+J/QKAOG8S5t
+09wDpKxKcgZSZ6Nd6FaK+D+ZhUVAkP3hB0ba0wo1JZff/0e4B+VJhXTjl7RRHfbr
+AEuDYFxv9T3K/Jq04wIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAJys1pnYvO+u8Wca
+6xUToGMpqTnImKa+dX8tMKsp6mXAN/dWrOVMDWnjBhQxShhAZBsaJ4iUeXPJlctw
+KzkUCQo6BsUbPMTSQlPuyHHdZBOTHDIW4SylKaBQvkundkhhBO7aHwFV3QjxZKcH
+XqpGyY2ryrgdj2D4+H55NDXYjj/m
 -----END CERTIFICATE-----
diff --git a/src/test/ssl/ssl/server_ca.key b/src/test/ssl/ssl/server_ca.key
index ac4e76f..a7c3214 100644
--- a/src/test/ssl/ssl/server_ca.key
+++ b/src/test/ssl/ssl/server_ca.key
@@ -1,15 +1,15 @@
 -----BEGIN RSA PRIVATE KEY-----
-MIICXgIBAAKBgQDDAYtajRx8vM6IB0SLZsAhTD0YVHM+/+t0a4m3JXolJBbo9/B2
-/WAN0IH1E2zmlalLc3JBmGsH1a8U5ZlRow3p2ODLrFra9FbOl0wekmRFvZeaRln/
-99dpI5itVpL97QPHO8QMMK1IsyurFA5GfuPOBx9Pi0MvzsT0tYsRvR929QIDAQAB
-AoGAcq9i1INvAJFN6cRUdKOeVTbwK3HnQWLjh9mC6bpZxqQd8S94NZK4Pgelloux
-HT9hjGU+CgPo1ne+e0y4ycFaeWf6SFyMJ3KmGFKCliE6A5zd/g+rIp8oja0Y7eLZ
-PUdx984qynfvFMxgB+VJk22cLui9az65WCY+akdWbnwfR4ECQQD4GH6S71bZya9G
-/DDS2YYi3Cvke6wsGSXTMyfDaW42M3mtJOrmoczrx1sAzTmO4rwhuzFFQRs662IS
-/c9nmXOhAkEAyTgK9BNbkb5n2KN0Ebpx+x9cCh7fJ6qY54DOk+svp2jOhBcV9Aqd
-fYPHzPI0v358buPjozXgALNl7FGrO6sC1QJBAPKrwuMmiOVuiav9ciRL8RCYG7bZ
-4Ycg8garuvFBZzRNFW9u9PWyvibCURlvpCVHUo4L9B2xmVkAdGXvLbhAOQECQQDD
-9zKjtl6NuFRGphmaUmxDV605pgtLBFhZzhZh9MC6V9YYyqr0u4nZ/YeOz6wTe0oQ
-bRz7jLKVvCHdX0RWnhvpAkEAhY+plw7q6fyXSBBOVUcHUO2Wtmdm8clvKbs64Wdl
-bjryhvBhq3gPii7jnLGwS2v5jwqCcKpK1tszO/8+gj2T+A==
+MIICXQIBAAKBgQC15tOzDBVKaCRDz9L5LMpPk8DRRGHHOe4OuO6WTkUzjbjuKyiQ
+bmtcp00R4dULbSM57ESvI/Ny0gPt+J/QKAOG8S5t09wDpKxKcgZSZ6Nd6FaK+D+Z
+hUVAkP3hB0ba0wo1JZff/0e4B+VJhXTjl7RRHfbrAEuDYFxv9T3K/Jq04wIDAQAB
+AoGBAKxkghg7iGYHQu9dpCXw9B/s+R2bgEuPNHWRgNTEg0MzuqNGFeCkNW4PRLSA
+4ic9HNiFeia+nLgiIAVFzzg44/VCvzD8P0EdJo9bRqV+mmm15YBcpV+F3I5RLOuq
+IWuRHyDbt+wsZyzdzPN0ElV3AUmj/0vkfX0xoRwXeGqimBFpAkEA5/PpK0qjvK3z
+hv4lC0nX3bmaKfFFEDRiufK+/WUGMHx8YS55CqjpcbR+xMPAFswWEcBEQnj6zbDm
+a4hEjlwXBwJBAMjCi9UcDe3Sp/mmxFklxmMusIHqldA5YsOyCjtSLxpgHJLdwcMx
+KWH3Q9nUrn4WxhlHhY8W6smNgQDzVk1TgEUCQCd2ef8hjcX2Gm6nIopPH+jbQP1N
+zSA6qWlVgWT/IRRyuX6XN4S2xDDSMpcrbqzyP/b5LSPaDWGdbTZyUqedx1UCQDjA
+/sTVNH7aAZCK+5D0I9xgE5f2mDmQL4KBL3FLr3M2Xn2KYT9sA3Xlb/IBtP6CM6hr
+1q733JH0Bdcd83TSuT0CQQCb4dzfNLuYscHBnQYsMCZvMSKmQZ2LKUANGra/mX+i
+7JZ7wngI548ypMK2lJWnb2Ce+0cR8GAPVHWOTx2srtH4
 -----END RSA PRIVATE KEY-----
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 714585f..39e41f6 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -71,9 +71,17 @@ sub DeterminePlatform
 	my $self = shift;
 
 	# Examine CL help output to determine if we are in 32 or 64-bit mode.
-	my $output = `cl /? 2>&1`;
-	$? >> 8 == 0 or die "cl command not found";
-	$self->{platform} = ($output =~ /^\/favor:<.+AMD64/m) ? 'x64' : 'Win32';
+	$self->{platform} = 'Win32';
+	open(P, "cl /? 2>&1 |") || die "cl command not found";
+	while (<P>)
+	{
+		if (/^\/favor:<.+AMD64/)
+		{
+			$self->{platform} = 'x64';
+			last;
+		}
+	}
+	close(P);
 	print "Detected hardware platform: $self->{platform}\n";
 }
 
diff --git a/src/tools/msvc/VSObjectFactory.pm b/src/tools/msvc/VSObjectFactory.pm
index b83af40..d255bec 100644
--- a/src/tools/msvc/VSObjectFactory.pm
+++ b/src/tools/msvc/VSObjectFactory.pm
@@ -92,16 +92,30 @@ sub CreateProject
 
 sub DetermineVisualStudioVersion
 {
-	# To determine version of Visual Studio we use nmake as it has
-	# existed for a long time and still exists in current Visual
-	# Studio versions.
-	my $output = `nmake /? 2>&1`;
-	$? >> 8 == 0 or croak "Unable to determine Visual Studio version: The nmake command wasn't found.";
-	if ($output =~ /(\d+)\.(\d+)\.\d+(\.\d+)?$/m)
+	my $nmakeVersion = shift;
+
+	if (!defined($nmakeVersion))
+	{
+
+# Determine version of nmake command, to set proper version of visual studio
+# we use nmake as it has existed for a long time and still exists in current visual studio versions
+		open(P, "nmake /? 2>&1 |")
+		  || croak
+"Unable to determine Visual Studio version: The nmake command wasn't found.";
+		while (<P>)
+		{
+			chomp;
+			if (/(\d+)\.(\d+)\.\d+(\.\d+)?$/)
+			{
+				return _GetVisualStudioVersion($1, $2);
+			}
+		}
+		close(P);
+	}
+	elsif ($nmakeVersion =~ /(\d+)\.(\d+)\.\d+(\.\d+)?$/)
 	{
 		return _GetVisualStudioVersion($1, $2);
 	}
-
 	croak
 "Unable to determine Visual Studio version: The nmake version could not be determined.";
 }
#27Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#26)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Sorry, I misoperated on patch creation.
Attached one is the correct version.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Tuesday, March 03, 2015 6:31 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

The attached version of custom/foreign-join interface patch
fixes up the problem reported on the join-pushdown support
thread.

The previous version referenced *_ps_tlist on setrefs.c, to
check whether the Custom/ForeignScan node is associated with
a particular base relation, or not.
This logic considered above nodes performs base relation scan,
if *_ps_tlist is valid. However, it was incorrect in case when
underlying pseudo-scan relation has empty targetlist.
Instead of the previous logic, it shall be revised to check
scanrelid itself. If zero, it means Custom/ForeignScan node is
not associated with a particular base relation, thus, its slot
descriptor for scan shall be constructed based on *_ps_tlist.

Also, I noticed a potential problem if CSP/FDW driver want to
displays expression nodes using deparse_expression() but
varnode within this expression does not appear in the *_ps_tlist.
For example, a remote query below shall return rows with two
columns.

SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid;

Thus, ForeignScan will perform like as a scan on relation with
two columns, and FDW driver will set two TargetEntry on the
fdw_ps_tlist. If FDW is designed to keep the join condition
(aid = bid) using expression node form, it is expected to be
saved on custom/fdw_expr variable, then setrefs.c rewrites the
varnode according to *_ps_tlist.
It means, we also have to add *_ps_tlist both of "aid" and "bid"
to avoid failure on variable lookup. However, these additional
entries changes the definition of the slot descriptor.
So, I adjusted ExecInitForeignScan and ExecInitCustomScan to
use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct
the slot descriptor based on the *_ps_tlist.
It expects CSP/FDW drivers to add target-entries with resjunk=true,
if it wants to have additional entries for variable lookups on
EXPLAIN command.

Fortunately or unfortunately, postgres_fdw keeps its remote query
in cstring form, so it does not need to add junk entries on the
fdw_ps_tlist.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Sunday, February 15, 2015 11:01 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

The attached patch is a rebased version of join replacement with
foreign-/custom-scan. Here is no feature updates at this moment
but SGML documentation is added (according to Michael's comment).

This infrastructure allows foreign-data-wrapper and custom-scan-
provider to add alternative scan paths towards relations join.
From viewpoint of the executor, it looks like a scan on a pseudo-
relation that is materialized from multiple relations, even though
FDW/CSP internally processes relations join with their own logic.

Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
scan node runs on a pseudo relation and (2) fdw_ps_tlist and
custom_ps_tlist introduce the definition of the pseudo relation,
because it is not associated with a tangible relation unlike
simple scan case, thus planner cannot know the expected record
type to be returned without these additional information.
These two enhancement enables extensions to process relations
join internally, and to perform as like existing scan node from
viewpoint of the core backend.

Also, as an aside. I had a discussion with Hanada-san about this
interface off-list. He had an idea to keep create_plan_recurse()
static, using a special list field in CustomPath structure to
chain underlying Path node. If core backend translate the Path
node to Plan node if valid list given, extension does not need to
call create_plan_recurse() by itself.
I have no preference about this. Does anybody have opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Thursday, January 15, 2015 8:03 AM
To: Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN
----------------------------------------------------------------------
------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922
width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009
width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000
width=37)
(13 rows)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, January 15, 2015 2:07 AM
To: Kaigai Kouhei(海外 浩平)
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS]
[v9.5] Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

pgsql-v9.5-custom-join.v6.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v6.patchDownload
 doc/src/sgml/custom-scan.sgml           | 278 ++++++++++++++++++++++++++++++++
 doc/src/sgml/fdwhandler.sgml            |  54 +++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |   5 +-
 src/backend/executor/execScan.c         |   4 +
 src/backend/executor/nodeCustom.c       |  38 +++--
 src/backend/executor/nodeForeignscan.c  |  34 ++--
 src/backend/foreign/foreign.c           |  32 +++-
 src/backend/nodes/copyfuncs.c           |   3 +
 src/backend/nodes/outfuncs.c            |   3 +
 src/backend/optimizer/path/joinpath.c   |  34 ++++
 src/backend/optimizer/plan/createplan.c |  76 +++++++--
 src/backend/optimizer/plan/setrefs.c    |  56 +++++++
 src/backend/optimizer/util/plancat.c    |   7 +-
 src/backend/optimizer/util/relnode.c    |  13 ++
 src/backend/utils/adt/ruleutils.c       |   4 +
 src/include/foreign/fdwapi.h            |  15 ++
 src/include/nodes/plannodes.h           |  20 ++-
 src/include/nodes/relation.h            |   2 +
 src/include/optimizer/paths.h           |  13 ++
 src/include/optimizer/planmain.h        |   1 +
 22 files changed, 647 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..1d103f5
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,278 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan provider</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+
+ <para>
+  Prior to query execution, the PostgreSQL planner constructs a plan tree
+  that usually consists of built-in plan nodes (eg: SeqScan, HashJoin, etc).
+  The custom-scan interface allows extensions to provide a custom-scan plan
+  that implements its own logic, in addition to the built-in nodes, to scan
+  a relation or join relations. Once a custom-scan node is chosen by planner,
+  callback functions associated with this custom-scan node shall be invoked
+  during query execution. Custom-scan provider is responsible for returning
+  equivalent result set as built-in logic would, but it is free to scan or
+  join the target relations according to its own logic.
+  This chapter explains how to write a custom-scan provider.
+ </para>
+
+ <para>
+  The first thing custom-scan provider to do is adding alternative paths
+  to scan a relation (on the <literal>set_rel_pathlist_hook</>) or
+  to join relations (on the <literal>set_join_pathlist_hook</>).
+  It expects <literal>CustomPath</> node is added with estimated execution
+  cost and a set of callbacks defined at <literal>CustomPathMethods</>.
+  Both of hooks also give extensions enough information to construct
+  <literal>CustomPath</> node, like <literal>RelOptInfo</> of relations
+  to be scanned, joined or read as source of join. Custom-scan provider
+  is responsible to compute a reasonable cost estimation which is
+  comparable to built-in logics.
+ </para>
+
+ <para>
+  Once a custom-path got chosen by planner, custom-scan provider has to
+  populate a plan node according to the <literal>CustomPath</> node.
+  At this moment, <literal>CustomScan</> is the only node type that allows
+  to implement custom-logic towards any <literal>CustomPath</> node.
+  The <literal>CustomScan</> structure has two special fields to keep
+  private information; <literal>custom_exprs</> and <literal>custom_private</>.
+  The <literal>custom_exprs</> intends to save a couple of expression trees
+  that shall be updated on <filename>setrefs.c</> and <filename>subselect.c</>.
+  On the other hands, <literal>custom_private</> is expected to save really
+  private information nobody will touch except for the custom-scan provider
+  itself. A plan-tree, which contains custom-scan node, can be duplicated
+  using <literal>copyObject()</>, so all the data structure stored within
+  these two fields must be safe to <literal>copyObject()</>.
+ </para>
+
+ <para>
+  In case when extension implements its own logic to join relations, it looks
+  like a simple relation scan but on a pseudo materialized relation from
+  multiple source relations, from the standpoint of the core executor.
+  Custom-scan provider is expected to process relation join with its own
+  logic internally, then return a set of records according to the tuple
+  descriptor of the scan node.
+  <literal>CustomScan</> node that replaced a relations join is not
+  associated with a particular tangible relation, unlike simple scan case,
+  so extension needs to inform the core planner expected records type to be
+  fetched from this node.
+  What we should do here is, setting zero on the <literal>scanrelid</> and
+  a valid list of <literal>TargetEntry</> on the <literal>custom_ps_tlist</>
+  instead. These configuration informs the core planner this custom-scan
+  node is not associated with a particular physical table and expected
+  record type to be returned.
+ </para>
+
+ <para>
+  Once a plan-tree is moved to the executor, it has to construct plan-state
+  objects according to the supplied plan-node.
+  Custom-scan is not an exception. Executor invokes a callback to populate
+  <literal>CustomScanState</> node, if <literal>CustomScan</> node gets
+  found in the supplied plan-tree.
+  It does not have fields to save private information unlike
+  <literal>CustomScan</> node, because custom-scan provider can allocate
+  larger object than the bare <literal>CustomScanState</> to store various
+  private execution state.
+  It looks like a relationship of <literal>ScanState</> structure towards
+  <literal>PlanState</>; that expands scan specific fields towards generic
+  plan-state. In addition, custom-scan provider can expand fields on demand.
+  Once a CustomScanState gets constructed, BeginCustomScan is invoked during
+  executor initialization; ExecCustomScan is repeatedly called during
+  execution (returning a TupleTableSlot with each fetched record), then
+  EndCustomScan is invoked on cleanup of the executor.
+ </para>
+
+ <sect1 id="custom-scan-reference">
+  <title>Custom Scan Hooks and Callbacks</title>
+
+  <sect2 id="custom-scan-hooks">
+   <title>Custom Scan Hooks</title>
+   <para>
+    This hooks is invoked when the planner investigates the optimal way to
+    scan a particular relation. Extension can add alternative paths if it
+    can provide its own logic to scan towards the given scan and qualifiers.
+<programlisting>
+typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
+                                            RelOptInfo *rel,
+                                            Index rti,
+                                            RangeTblEntry *rte);
+extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+</programlisting>
+   </para>
+
+   <para>
+    This hook is invoked when the planner investigates the optimal combination
+    of relations join. Extension can add alternative paths that replaces the
+    relation join with its own logic. 
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+                                             RelOptInfo *joinrel,
+                                             RelOptInfo *outerrel,
+                                             RelOptInfo *innerrel,
+                                             List *restrictlist,
+                                             JoinType jointype,
+                                             SpecialJoinInfo *sjinfo,
+                                             SemiAntiJoinFactors *semifactors,
+                                             Relids param_source_rels,
+                                             Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-path-callbacks">
+   <title>Custom Path Callbacks</title>
+   <para>
+    A <literal>CustomPathMethods</> table contains a set of callbacks related
+    to <literal>CustomPath</> node. The core backend invokes these callbacks
+    during query planning.
+   </para>
+   <para>
+    This callback is invoked when the core backend tries to populate
+    <literal>CustomScan</> node according to the supplied
+    <literal>CustomPath</> node.
+    Custom-scan provider is responsible to allocate a <literal>CustomScan</>
+    node and initialize each fields of them.
+<programlisting>
+Plan *(*PlanCustomPath) (PlannerInfo *root,
+                         RelOptInfo *rel,
+                         CustomPath *best_path,
+                         List *tlist,
+                         List *clauses);
+</programlisting>
+   </para>
+   <para>
+    This optional callback will be invoked when <literal>nodeToString()</>
+    tries to create a text representation of <literal>CustomPath</> node.
+    A custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that expression nodes linked to
+    <literal>custom_private</> shall be transformed to text representation
+    by the core, so nothing to do by extension.
+<programlisting>
+void (*TextOutCustomPath) (StringInfo str,
+                           const CustomPath *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-callbacks">
+   <title>Custom Scan Callbacks</title>
+   <para>
+    A <literal>CustomScanMethods</> contains a set of callbacks related to
+    <literal>CustomScan</> node, then the core backend invokes these callbacks
+    during query planning and initialization of executor.
+   </para>
+   <para>
+    This callback shall be invoked when the core backend tries to populate
+    <literal>CustomScanState</> node according to the supplied
+    <literal>CustomScan</> node. The custom-scan provider is responsible to
+    allocate a <literal>CustomScanState</> (or its own data-type enhanced
+    from it), but no need to initialize the fields here, because
+    <literal>ExecInitCustomScan</> initializes the fields in
+    <literal>CustomScanState</>, then <literal>BeginCustomScan</> shall be
+    kicked on the end of executor initialization.
+<programlisting>
+Node *(*CreateCustomScanState) (CustomScan *cscan);
+</programlisting>
+   </para>
+   <para>
+    This optional callback shall be invoked when <literal>nodeToString()</>
+    tries to make text representation of <literal>CustomScan</> node.
+    Custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that it is not allowed to expand the data
+    structure of <literal>CustomScan</> node, so we usually don't need to
+    implement this callback.
+<programlisting>
+void (*TextOutCustomScan) (StringInfo str,
+                           const CustomScan *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-exec-callbacks">
+   <title>Custom Exec Callbacks</title>
+   <para>
+    A <literal>CustomExecMethods</> contains a set of callbacks related to
+    <literal>CustomScanState</> node, then the core backend invokes these
+    callbacks during query execution.
+   </para>
+   <para>
+    This callback allows a custom-scan provider to have final initialization
+    of the <literal>CustomScanState</> node.
+    The supplied <literal>CustomScanState</> node is partially initialized
+    according to either <literal>scanrelid</> or <literal>custom_ps_tlist</>
+    of <literal>CustomScan</> node. If the custom-scan provider wants to
+    apply additional initialization to the private fields, it can be done
+    by this callback.
+<programlisting>
+void (*BeginCustomScan) (CustomScanState *node,
+                         EState *estate,
+                         int eflags);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to produce the next tuple
+    of the relation scan. If any tuples, it should set it on the
+    <literal>ps_ResultTupleSlot</> then returns the tuple slot. Elsewhere,
+    <literal>NULL</> or empty slot shall be returned to inform end of the
+    relation scan.
+<programlisting>
+TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback allows a custom-scan provider to cleanup the
+    <literal>CustomScanState</> node. If it holds any private (and not
+    released automatically) resources on the supplied node, it can release
+    these resources prior to the cleanup of the common portion.
+<programlisting>
+void (*EndCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to rewind the current scan
+    position to the head of relation. Custom-scan provider is expected to
+    reset its internal state to restart the relation scan again.
+<programlisting>
+void (*ReScanCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to save the current
+    scan position on its internal state. It shall be able to restore the
+    position using <literal>RestrPosCustomScan</> callback. It shall be never
+    called unless <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*MarkPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to restore the
+    previous scan position that was saved by <literal>MarkPosCustomScan</>
+   callback. It shall be never called unless
+   <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*RestrPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback allows custom-scan provider to output additional
+    information on <command>EXPLAIN</> that involves custom-scan node.
+    Note that it can output common items; target-list, qualifiers, relation
+    to be scanned. So, it can be used when custom-scan provider wants to show
+    something others in addition to the items.
+<programlisting>
+void (*ExplainCustomScan) (CustomScanState *node,
+                           List *ancestors,
+                           ExplainState *es);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
+
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index c1daa4b..d25d5c9 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -598,6 +598,60 @@ IsForeignRelUpdatable (Relation rel);
 
    </sect2>
 
+   <sect2>
+    <title>FDW Routines for remote join</title>
+    <para>
+<programlisting>
+void
+GetForeignJoinPath(PlannerInfo *root,
+                   RelOptInfo *joinrel,
+                   RelOptInfo *outerrel,
+                   RelOptInfo *innerrel,
+                   JoinType jointype,
+                   SpecialJoinInfo *sjinfo,
+                   SemiAntiJoinFactors *semifactors,
+                   List *restrictlist,
+                   Relids extra_lateral_rels);
+</programlisting>
+     Create possible access paths for a join of two foreign tables or
+     joined relations, but both of them needs to be managed with same
+     FDW driver.
+     This optional function is called during query planning.
+    </para>
+    <para>
+     This function allows FDW driver to add <literal>ForeignScan</> path
+     towards the supplied <literal>joinrel</>. From the standpoint of
+     query planner, it looks like scan-node is added for join-relation.
+     It means, <literal>ForeignScan</> path added instead of the built-in
+     local join logic has to generate tuples as if it scans on a joined
+     and materialized relations.
+    </para>
+    <para>
+     Usually, we expect FDW drivers issues a remote query that involves
+     tables join on remote side, then FDW driver fetches the joined result
+     on local side.
+     Unlike simple table scan, definition of slot descriptor of the joined
+     relations is determined on the fly, thus we cannot know its definition
+     from the system catalog.
+     So, FDW driver is responsible to introduce the query planner expected
+     form of the joined relations. In case when <literal>ForeignScan</>
+     replaced a relations join, <literal>scanrelid</> of the generated plan
+     node shall be zero, to mark this <literal>ForeignScan</> node is not
+     associated with a particular foreign tables.
+     Also, it need to construct pseudo scan tlist (<literal>fdw_ps_tlist</>)
+     to indicate expected tuple definition.
+    </para>
+    <para>
+     Once <literal>scanrelid</> equals zero, executor initializes the slot
+     for scan according to <literal>fdw_ps_tlist</>, but excludes junk
+     entries. This list is also used to solve the name of the original
+     relation and columns, so FDW can chains expression nodes which are
+     not run on local side actually, like a join clause to be executed on
+     the remote side, however, target-entries of them will have
+     <literal>resjunk=true</>.
+    </para>
+   </sect2>
+
    <sect2 id="fdw-callbacks-explain">
     <title>FDW Routines for <command>EXPLAIN</></title>
 
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index f03b72a..89fff77 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -93,6 +93,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index a648a4c..e378d69 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -242,6 +242,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a951c55..9281874 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1072,9 +1072,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..2f18a8a 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0 &&
+			 (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+		varno = INDEX_VAR;
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..2344129 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
+	Index				scan_relid = cscan->scan.scanrelid;
 	Relation			scan_rel;
 
 	/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,31 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +109,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..542d176 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
+	Index		scanrelid = node->scan.scanrelid;
 	Relation	currentRelation;
 	FdwRoutine *fdwroutine;
 
@@ -141,16 +142,28 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +174,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +206,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..df69a95 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -302,13 +302,12 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +349,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,6 +408,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+	return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
 
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9fe8008..9300b70 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -592,7 +592,9 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
@@ -617,6 +619,7 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
 
 	/*
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 775f482..f3676ec 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -558,7 +558,9 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
@@ -572,6 +574,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1da953f..d68164c 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,10 +17,13 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -260,6 +263,37 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDW drivers or custom-scan providers, in
+	 * addition to built-in paths.
+	 *
+	 * XXX - In case of FDW, we may be able to omit invocation if joinrel's
+	 * fdwhandler (set only if both relations are managed by same FDW server).
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 6. Consider paths added by FDWs when both outer and inner relations are
+	 * managed by same foreign-data wrapper.  Matching of foreign server and/or
+	 * checkAsUser should be checked in GetForeignJoinPath by the FDW.
+	 */
+	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPath)
+	{
+		joinrel->fdwroutine->GetForeignJoinPath(root,
+												joinrel,
+												outerrel,
+												innerrel,
+												jointype,
+												sjinfo,
+												&semifactors,
+												restrictlist,
+												extra_lateral_rels);
+	}
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76ba1bf..7a37824 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1958,16 +1957,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1982,13 +1991,35 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
+	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+	 * ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (scan_plan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, scan_plan->fdw_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2050,12 +2081,7 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
-
-	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
+	ListCell   *lc;
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2075,6 +2101,26 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	Assert(IsA(cplan, CustomScan));
 
 	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the custom_ps_tlist, excluding resjunk=true, so we need
+	 * to ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (cplan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, cplan->custom_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+
+	/*
 	 * Copy cost data from Path to Plan; no need to make custom-plan providers
 	 * do this
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..a41c4f0 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -568,6 +568,34 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -582,6 +610,34 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 313a5c1..1c570c8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -378,10 +378,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerForRelation(relation);
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..ca71093 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -427,6 +428,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 2fa30be..87f84a7 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3842,6 +3842,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..d4ab71a 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,16 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
+typedef void (*GetForeignJoinPath_function ) (PlannerInfo *root,
+											  RelOptInfo *joinrel,
+											  RelOptInfo *outerrel,
+											  RelOptInfo *innerrel,
+											  JoinType jointype,
+											  SpecialJoinInfo *sjinfo,
+											  SemiAntiJoinFactors *semifactors,
+											  List *restrictlist,
+											  Relids extra_lateral_rels);
+
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
 
@@ -150,6 +160,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
+
+	/* Support functions for join push-down */
+	GetForeignJoinPath_function GetForeignJoinPath;
+
 } FdwRoutine;
 
 
@@ -157,6 +171,7 @@ typedef struct FdwRoutine
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid	GetFdwHandlerForRelation(Relation relation);
 extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f6683f0..213034b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -471,7 +471,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -480,7 +486,9 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
@@ -488,10 +496,11 @@ typedef struct ForeignScan
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -512,6 +521,7 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
 	const CustomScanMethods *methods;
 } CustomScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..9ef0b56 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fa72918..0c8cbcd 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
#28Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Kouhei Kaigai (#27)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Kaigai-san,

The v6 patch was cleanly applied on master branch. I'll rebase my
patch onto it, but before that I have a comment about name of the new
FDW API handler GetForeignJoinPath.

Obviously FDW can add multiple paths at a time, like GetForeignPaths,
so IMO it should be renamed to GetForeignJoinPaths, with plural form.

In addition to that, new member of RelOptInfo, fdw_handler, should be
initialized explicitly in build_simple_rel.

Please see attached a patch for these changes.

I'll review the v6 path afterwards.

2015-03-03 20:20 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Sorry, I misoperated on patch creation.
Attached one is the correct version.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Tuesday, March 03, 2015 6:31 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

The attached version of custom/foreign-join interface patch
fixes up the problem reported on the join-pushdown support
thread.

The previous version referenced *_ps_tlist on setrefs.c, to
check whether the Custom/ForeignScan node is associated with
a particular base relation, or not.
This logic considered above nodes performs base relation scan,
if *_ps_tlist is valid. However, it was incorrect in case when
underlying pseudo-scan relation has empty targetlist.
Instead of the previous logic, it shall be revised to check
scanrelid itself. If zero, it means Custom/ForeignScan node is
not associated with a particular base relation, thus, its slot
descriptor for scan shall be constructed based on *_ps_tlist.

Also, I noticed a potential problem if CSP/FDW driver want to
displays expression nodes using deparse_expression() but
varnode within this expression does not appear in the *_ps_tlist.
For example, a remote query below shall return rows with two
columns.

SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid;

Thus, ForeignScan will perform like as a scan on relation with
two columns, and FDW driver will set two TargetEntry on the
fdw_ps_tlist. If FDW is designed to keep the join condition
(aid = bid) using expression node form, it is expected to be
saved on custom/fdw_expr variable, then setrefs.c rewrites the
varnode according to *_ps_tlist.
It means, we also have to add *_ps_tlist both of "aid" and "bid"
to avoid failure on variable lookup. However, these additional
entries changes the definition of the slot descriptor.
So, I adjusted ExecInitForeignScan and ExecInitCustomScan to
use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct
the slot descriptor based on the *_ps_tlist.
It expects CSP/FDW drivers to add target-entries with resjunk=true,
if it wants to have additional entries for variable lookups on
EXPLAIN command.

Fortunately or unfortunately, postgres_fdw keeps its remote query
in cstring form, so it does not need to add junk entries on the
fdw_ps_tlist.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Sunday, February 15, 2015 11:01 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

The attached patch is a rebased version of join replacement with
foreign-/custom-scan. Here is no feature updates at this moment
but SGML documentation is added (according to Michael's comment).

This infrastructure allows foreign-data-wrapper and custom-scan-
provider to add alternative scan paths towards relations join.
From viewpoint of the executor, it looks like a scan on a pseudo-
relation that is materialized from multiple relations, even though
FDW/CSP internally processes relations join with their own logic.

Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
scan node runs on a pseudo relation and (2) fdw_ps_tlist and
custom_ps_tlist introduce the definition of the pseudo relation,
because it is not associated with a tangible relation unlike
simple scan case, thus planner cannot know the expected record
type to be returned without these additional information.
These two enhancement enables extensions to process relations
join internally, and to perform as like existing scan node from
viewpoint of the core backend.

Also, as an aside. I had a discussion with Hanada-san about this
interface off-list. He had an idea to keep create_plan_recurse()
static, using a special list field in CustomPath structure to
chain underlying Path node. If core backend translate the Path
node to Plan node if valid list given, extension does not need to
call create_plan_recurse() by itself.
I have no preference about this. Does anybody have opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Thursday, January 15, 2015 8:03 AM
To: Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN
----------------------------------------------------------------------
------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922
width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009
width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000
width=37)
(13 rows)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, January 15, 2015 2:07 AM
To: Kaigai Kouhei(海外 浩平)
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS]
[v9.5] Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Shigeru HANADA

Attachments:

mod_cjv6.patchapplication/octet-stream; name=mod_cjv6.patchDownload
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index d25d5c9..77477c8 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -603,15 +603,15 @@ IsForeignRelUpdatable (Relation rel);
     <para>
 <programlisting>
 void
-GetForeignJoinPath(PlannerInfo *root,
-                   RelOptInfo *joinrel,
-                   RelOptInfo *outerrel,
-                   RelOptInfo *innerrel,
-                   JoinType jointype,
-                   SpecialJoinInfo *sjinfo,
-                   SemiAntiJoinFactors *semifactors,
-                   List *restrictlist,
-                   Relids extra_lateral_rels);
+GetForeignJoinPaths(PlannerInfo *root,
+                    RelOptInfo *joinrel,
+                    RelOptInfo *outerrel,
+                    RelOptInfo *innerrel,
+                    JoinType jointype,
+                    SpecialJoinInfo *sjinfo,
+                    SemiAntiJoinFactors *semifactors,
+                    List *restrictlist,
+                    Relids extra_lateral_rels);
 </programlisting>
      Create possible access paths for a join of two foreign tables or
      joined relations, but both of them needs to be managed with same
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index d68164c..03d5781 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -280,19 +280,19 @@ add_paths_to_joinrel(PlannerInfo *root,
 	/*
 	 * 6. Consider paths added by FDWs when both outer and inner relations are
 	 * managed by same foreign-data wrapper.  Matching of foreign server and/or
-	 * checkAsUser should be checked in GetForeignJoinPath by the FDW.
+	 * checkAsUser should be checked in GetForeignJoinPaths by the FDW.
 	 */
-	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPath)
+	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPaths)
 	{
-		joinrel->fdwroutine->GetForeignJoinPath(root,
-												joinrel,
-												outerrel,
-												innerrel,
-												jointype,
-												sjinfo,
-												&semifactors,
-												restrictlist,
-												extra_lateral_rels);
+		joinrel->fdwroutine->GetForeignJoinPaths(root,
+												 joinrel,
+												 outerrel,
+												 innerrel,
+												 jointype,
+												 sjinfo,
+												 &semifactors,
+												 restrictlist,
+												 extra_lateral_rels);
 	}
 }
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ca71093..5623566 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -123,6 +123,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->fdwroutine = NULL;
+	rel->fdw_handler = InvalidOid;
 	rel->fdw_private = NULL;
 	rel->baserestrictinfo = NIL;
 	rel->baserestrictcost.startup = 0;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index d4ab71a..5a8bd39 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,15 +82,15 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
-typedef void (*GetForeignJoinPath_function ) (PlannerInfo *root,
-											  RelOptInfo *joinrel,
-											  RelOptInfo *outerrel,
-											  RelOptInfo *innerrel,
-											  JoinType jointype,
-											  SpecialJoinInfo *sjinfo,
-											  SemiAntiJoinFactors *semifactors,
-											  List *restrictlist,
-											  Relids extra_lateral_rels);
+typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,
+											   RelOptInfo *joinrel,
+											   RelOptInfo *outerrel,
+											   RelOptInfo *innerrel,
+											   JoinType jointype,
+											   SpecialJoinInfo *sjinfo,
+											   SemiAntiJoinFactors *semifactors,
+											   List *restrictlist,
+											   Relids extra_lateral_rels);
 
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
@@ -162,7 +162,7 @@ typedef struct FdwRoutine
 	ImportForeignSchema_function ImportForeignSchema;
 
 	/* Support functions for join push-down */
-	GetForeignJoinPath_function GetForeignJoinPath;
+	GetForeignJoinPaths_function GetForeignJoinPaths;
 
 } FdwRoutine;
 
#29Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#28)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Obviously FDW can add multiple paths at a time, like GetForeignPaths,
so IMO it should be renamed to GetForeignJoinPaths, with plural form.

In addition to that, new member of RelOptInfo, fdw_handler, should be
initialized explicitly in build_simple_rel.

Please see attached a patch for these changes.

Thanks for your checks. Yep, the name of FDW handler should be ...Paths(),
instead of Path().

The attached one integrates Hanada-san's updates.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
Sent: Tuesday, March 03, 2015 9:26 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom
Plan API)

Kaigai-san,

The v6 patch was cleanly applied on master branch. I'll rebase my
patch onto it, but before that I have a comment about name of the new
FDW API handler GetForeignJoinPath.

Obviously FDW can add multiple paths at a time, like GetForeignPaths,
so IMO it should be renamed to GetForeignJoinPaths, with plural form.

In addition to that, new member of RelOptInfo, fdw_handler, should be
initialized explicitly in build_simple_rel.

Please see attached a patch for these changes.

I'll review the v6 path afterwards.

2015-03-03 20:20 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Sorry, I misoperated on patch creation.
Attached one is the correct version.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Tuesday, March 03, 2015 6:31 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

The attached version of custom/foreign-join interface patch
fixes up the problem reported on the join-pushdown support
thread.

The previous version referenced *_ps_tlist on setrefs.c, to
check whether the Custom/ForeignScan node is associated with
a particular base relation, or not.
This logic considered above nodes performs base relation scan,
if *_ps_tlist is valid. However, it was incorrect in case when
underlying pseudo-scan relation has empty targetlist.
Instead of the previous logic, it shall be revised to check
scanrelid itself. If zero, it means Custom/ForeignScan node is
not associated with a particular base relation, thus, its slot
descriptor for scan shall be constructed based on *_ps_tlist.

Also, I noticed a potential problem if CSP/FDW driver want to
displays expression nodes using deparse_expression() but
varnode within this expression does not appear in the *_ps_tlist.
For example, a remote query below shall return rows with two
columns.

SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid;

Thus, ForeignScan will perform like as a scan on relation with
two columns, and FDW driver will set two TargetEntry on the
fdw_ps_tlist. If FDW is designed to keep the join condition
(aid = bid) using expression node form, it is expected to be
saved on custom/fdw_expr variable, then setrefs.c rewrites the
varnode according to *_ps_tlist.
It means, we also have to add *_ps_tlist both of "aid" and "bid"
to avoid failure on variable lookup. However, these additional
entries changes the definition of the slot descriptor.
So, I adjusted ExecInitForeignScan and ExecInitCustomScan to
use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct
the slot descriptor based on the *_ps_tlist.
It expects CSP/FDW drivers to add target-entries with resjunk=true,
if it wants to have additional entries for variable lookups on
EXPLAIN command.

Fortunately or unfortunately, postgres_fdw keeps its remote query
in cstring form, so it does not need to add junk entries on the
fdw_ps_tlist.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Sunday, February 15, 2015 11:01 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

The attached patch is a rebased version of join replacement with
foreign-/custom-scan. Here is no feature updates at this moment
but SGML documentation is added (according to Michael's comment).

This infrastructure allows foreign-data-wrapper and custom-scan-
provider to add alternative scan paths towards relations join.
From viewpoint of the executor, it looks like a scan on a pseudo-
relation that is materialized from multiple relations, even though
FDW/CSP internally processes relations join with their own logic.

Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
scan node runs on a pseudo relation and (2) fdw_ps_tlist and
custom_ps_tlist introduce the definition of the pseudo relation,
because it is not associated with a tangible relation unlike
simple scan case, thus planner cannot know the expected record
type to be returned without these additional information.
These two enhancement enables extensions to process relations
join internally, and to perform as like existing scan node from
viewpoint of the core backend.

Also, as an aside. I had a discussion with Hanada-san about this
interface off-list. He had an idea to keep create_plan_recurse()
static, using a special list field in CustomPath structure to
chain underlying Path node. If core backend translate the Path
node to Plan node if valid list given, extension does not need to
call create_plan_recurse() by itself.
I have no preference about this. Does anybody have opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Thursday, January 15, 2015 8:03 AM
To: Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls create_plan_recurse()
to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN
----------------------------------------------------------------------
------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922
width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009
width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000
width=37)
(13 rows)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, January 15, 2015 2:07 AM
To: Kaigai Kouhei(海外 浩平)
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS]
[v9.5] Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Shigeru HANADA

Attachments:

pgsql-v9.5-custom-join.v7.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v7.patchDownload
 doc/src/sgml/custom-scan.sgml           | 278 ++++++++++++++++++++++++++++++++
 doc/src/sgml/fdwhandler.sgml            |  54 +++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |   5 +-
 src/backend/executor/execScan.c         |   4 +
 src/backend/executor/nodeCustom.c       |  38 +++--
 src/backend/executor/nodeForeignscan.c  |  34 ++--
 src/backend/foreign/foreign.c           |  32 +++-
 src/backend/nodes/copyfuncs.c           |   3 +
 src/backend/nodes/outfuncs.c            |   3 +
 src/backend/optimizer/path/joinpath.c   |  34 ++++
 src/backend/optimizer/plan/createplan.c |  76 +++++++--
 src/backend/optimizer/plan/setrefs.c    |  56 +++++++
 src/backend/optimizer/util/plancat.c    |   7 +-
 src/backend/optimizer/util/relnode.c    |  14 ++
 src/backend/utils/adt/ruleutils.c       |   4 +
 src/include/foreign/fdwapi.h            |  15 ++
 src/include/nodes/plannodes.h           |  20 ++-
 src/include/nodes/relation.h            |   2 +
 src/include/optimizer/paths.h           |  13 ++
 src/include/optimizer/planmain.h        |   1 +
 22 files changed, 648 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..1d103f5
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,278 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan provider</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+
+ <para>
+  Prior to query execution, the PostgreSQL planner constructs a plan tree
+  that usually consists of built-in plan nodes (eg: SeqScan, HashJoin, etc).
+  The custom-scan interface allows extensions to provide a custom-scan plan
+  that implements its own logic, in addition to the built-in nodes, to scan
+  a relation or join relations. Once a custom-scan node is chosen by planner,
+  callback functions associated with this custom-scan node shall be invoked
+  during query execution. Custom-scan provider is responsible for returning
+  equivalent result set as built-in logic would, but it is free to scan or
+  join the target relations according to its own logic.
+  This chapter explains how to write a custom-scan provider.
+ </para>
+
+ <para>
+  The first thing custom-scan provider to do is adding alternative paths
+  to scan a relation (on the <literal>set_rel_pathlist_hook</>) or
+  to join relations (on the <literal>set_join_pathlist_hook</>).
+  It expects <literal>CustomPath</> node is added with estimated execution
+  cost and a set of callbacks defined at <literal>CustomPathMethods</>.
+  Both of hooks also give extensions enough information to construct
+  <literal>CustomPath</> node, like <literal>RelOptInfo</> of relations
+  to be scanned, joined or read as source of join. Custom-scan provider
+  is responsible to compute a reasonable cost estimation which is
+  comparable to built-in logics.
+ </para>
+
+ <para>
+  Once a custom-path got chosen by planner, custom-scan provider has to
+  populate a plan node according to the <literal>CustomPath</> node.
+  At this moment, <literal>CustomScan</> is the only node type that allows
+  to implement custom-logic towards any <literal>CustomPath</> node.
+  The <literal>CustomScan</> structure has two special fields to keep
+  private information; <literal>custom_exprs</> and <literal>custom_private</>.
+  The <literal>custom_exprs</> intends to save a couple of expression trees
+  that shall be updated on <filename>setrefs.c</> and <filename>subselect.c</>.
+  On the other hands, <literal>custom_private</> is expected to save really
+  private information nobody will touch except for the custom-scan provider
+  itself. A plan-tree, which contains custom-scan node, can be duplicated
+  using <literal>copyObject()</>, so all the data structure stored within
+  these two fields must be safe to <literal>copyObject()</>.
+ </para>
+
+ <para>
+  In case when extension implements its own logic to join relations, it looks
+  like a simple relation scan but on a pseudo materialized relation from
+  multiple source relations, from the standpoint of the core executor.
+  Custom-scan provider is expected to process relation join with its own
+  logic internally, then return a set of records according to the tuple
+  descriptor of the scan node.
+  <literal>CustomScan</> node that replaced a relations join is not
+  associated with a particular tangible relation, unlike simple scan case,
+  so extension needs to inform the core planner expected records type to be
+  fetched from this node.
+  What we should do here is, setting zero on the <literal>scanrelid</> and
+  a valid list of <literal>TargetEntry</> on the <literal>custom_ps_tlist</>
+  instead. These configuration informs the core planner this custom-scan
+  node is not associated with a particular physical table and expected
+  record type to be returned.
+ </para>
+
+ <para>
+  Once a plan-tree is moved to the executor, it has to construct plan-state
+  objects according to the supplied plan-node.
+  Custom-scan is not an exception. Executor invokes a callback to populate
+  <literal>CustomScanState</> node, if <literal>CustomScan</> node gets
+  found in the supplied plan-tree.
+  It does not have fields to save private information unlike
+  <literal>CustomScan</> node, because custom-scan provider can allocate
+  larger object than the bare <literal>CustomScanState</> to store various
+  private execution state.
+  It looks like a relationship of <literal>ScanState</> structure towards
+  <literal>PlanState</>; that expands scan specific fields towards generic
+  plan-state. In addition, custom-scan provider can expand fields on demand.
+  Once a CustomScanState gets constructed, BeginCustomScan is invoked during
+  executor initialization; ExecCustomScan is repeatedly called during
+  execution (returning a TupleTableSlot with each fetched record), then
+  EndCustomScan is invoked on cleanup of the executor.
+ </para>
+
+ <sect1 id="custom-scan-reference">
+  <title>Custom Scan Hooks and Callbacks</title>
+
+  <sect2 id="custom-scan-hooks">
+   <title>Custom Scan Hooks</title>
+   <para>
+    This hooks is invoked when the planner investigates the optimal way to
+    scan a particular relation. Extension can add alternative paths if it
+    can provide its own logic to scan towards the given scan and qualifiers.
+<programlisting>
+typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
+                                            RelOptInfo *rel,
+                                            Index rti,
+                                            RangeTblEntry *rte);
+extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+</programlisting>
+   </para>
+
+   <para>
+    This hook is invoked when the planner investigates the optimal combination
+    of relations join. Extension can add alternative paths that replaces the
+    relation join with its own logic. 
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+                                             RelOptInfo *joinrel,
+                                             RelOptInfo *outerrel,
+                                             RelOptInfo *innerrel,
+                                             List *restrictlist,
+                                             JoinType jointype,
+                                             SpecialJoinInfo *sjinfo,
+                                             SemiAntiJoinFactors *semifactors,
+                                             Relids param_source_rels,
+                                             Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-path-callbacks">
+   <title>Custom Path Callbacks</title>
+   <para>
+    A <literal>CustomPathMethods</> table contains a set of callbacks related
+    to <literal>CustomPath</> node. The core backend invokes these callbacks
+    during query planning.
+   </para>
+   <para>
+    This callback is invoked when the core backend tries to populate
+    <literal>CustomScan</> node according to the supplied
+    <literal>CustomPath</> node.
+    Custom-scan provider is responsible to allocate a <literal>CustomScan</>
+    node and initialize each fields of them.
+<programlisting>
+Plan *(*PlanCustomPath) (PlannerInfo *root,
+                         RelOptInfo *rel,
+                         CustomPath *best_path,
+                         List *tlist,
+                         List *clauses);
+</programlisting>
+   </para>
+   <para>
+    This optional callback will be invoked when <literal>nodeToString()</>
+    tries to create a text representation of <literal>CustomPath</> node.
+    A custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that expression nodes linked to
+    <literal>custom_private</> shall be transformed to text representation
+    by the core, so nothing to do by extension.
+<programlisting>
+void (*TextOutCustomPath) (StringInfo str,
+                           const CustomPath *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-callbacks">
+   <title>Custom Scan Callbacks</title>
+   <para>
+    A <literal>CustomScanMethods</> contains a set of callbacks related to
+    <literal>CustomScan</> node, then the core backend invokes these callbacks
+    during query planning and initialization of executor.
+   </para>
+   <para>
+    This callback shall be invoked when the core backend tries to populate
+    <literal>CustomScanState</> node according to the supplied
+    <literal>CustomScan</> node. The custom-scan provider is responsible to
+    allocate a <literal>CustomScanState</> (or its own data-type enhanced
+    from it), but no need to initialize the fields here, because
+    <literal>ExecInitCustomScan</> initializes the fields in
+    <literal>CustomScanState</>, then <literal>BeginCustomScan</> shall be
+    kicked on the end of executor initialization.
+<programlisting>
+Node *(*CreateCustomScanState) (CustomScan *cscan);
+</programlisting>
+   </para>
+   <para>
+    This optional callback shall be invoked when <literal>nodeToString()</>
+    tries to make text representation of <literal>CustomScan</> node.
+    Custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that it is not allowed to expand the data
+    structure of <literal>CustomScan</> node, so we usually don't need to
+    implement this callback.
+<programlisting>
+void (*TextOutCustomScan) (StringInfo str,
+                           const CustomScan *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-exec-callbacks">
+   <title>Custom Exec Callbacks</title>
+   <para>
+    A <literal>CustomExecMethods</> contains a set of callbacks related to
+    <literal>CustomScanState</> node, then the core backend invokes these
+    callbacks during query execution.
+   </para>
+   <para>
+    This callback allows a custom-scan provider to have final initialization
+    of the <literal>CustomScanState</> node.
+    The supplied <literal>CustomScanState</> node is partially initialized
+    according to either <literal>scanrelid</> or <literal>custom_ps_tlist</>
+    of <literal>CustomScan</> node. If the custom-scan provider wants to
+    apply additional initialization to the private fields, it can be done
+    by this callback.
+<programlisting>
+void (*BeginCustomScan) (CustomScanState *node,
+                         EState *estate,
+                         int eflags);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to produce the next tuple
+    of the relation scan. If any tuples, it should set it on the
+    <literal>ps_ResultTupleSlot</> then returns the tuple slot. Elsewhere,
+    <literal>NULL</> or empty slot shall be returned to inform end of the
+    relation scan.
+<programlisting>
+TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback allows a custom-scan provider to cleanup the
+    <literal>CustomScanState</> node. If it holds any private (and not
+    released automatically) resources on the supplied node, it can release
+    these resources prior to the cleanup of the common portion.
+<programlisting>
+void (*EndCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to rewind the current scan
+    position to the head of relation. Custom-scan provider is expected to
+    reset its internal state to restart the relation scan again.
+<programlisting>
+void (*ReScanCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to save the current
+    scan position on its internal state. It shall be able to restore the
+    position using <literal>RestrPosCustomScan</> callback. It shall be never
+    called unless <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*MarkPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to restore the
+    previous scan position that was saved by <literal>MarkPosCustomScan</>
+   callback. It shall be never called unless
+   <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*RestrPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback allows custom-scan provider to output additional
+    information on <command>EXPLAIN</> that involves custom-scan node.
+    Note that it can output common items; target-list, qualifiers, relation
+    to be scanned. So, it can be used when custom-scan provider wants to show
+    something others in addition to the items.
+<programlisting>
+void (*ExplainCustomScan) (CustomScanState *node,
+                           List *ancestors,
+                           ExplainState *es);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
+
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index c1daa4b..77477c8 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -598,6 +598,60 @@ IsForeignRelUpdatable (Relation rel);
 
    </sect2>
 
+   <sect2>
+    <title>FDW Routines for remote join</title>
+    <para>
+<programlisting>
+void
+GetForeignJoinPaths(PlannerInfo *root,
+                    RelOptInfo *joinrel,
+                    RelOptInfo *outerrel,
+                    RelOptInfo *innerrel,
+                    JoinType jointype,
+                    SpecialJoinInfo *sjinfo,
+                    SemiAntiJoinFactors *semifactors,
+                    List *restrictlist,
+                    Relids extra_lateral_rels);
+</programlisting>
+     Create possible access paths for a join of two foreign tables or
+     joined relations, but both of them needs to be managed with same
+     FDW driver.
+     This optional function is called during query planning.
+    </para>
+    <para>
+     This function allows FDW driver to add <literal>ForeignScan</> path
+     towards the supplied <literal>joinrel</>. From the standpoint of
+     query planner, it looks like scan-node is added for join-relation.
+     It means, <literal>ForeignScan</> path added instead of the built-in
+     local join logic has to generate tuples as if it scans on a joined
+     and materialized relations.
+    </para>
+    <para>
+     Usually, we expect FDW drivers issues a remote query that involves
+     tables join on remote side, then FDW driver fetches the joined result
+     on local side.
+     Unlike simple table scan, definition of slot descriptor of the joined
+     relations is determined on the fly, thus we cannot know its definition
+     from the system catalog.
+     So, FDW driver is responsible to introduce the query planner expected
+     form of the joined relations. In case when <literal>ForeignScan</>
+     replaced a relations join, <literal>scanrelid</> of the generated plan
+     node shall be zero, to mark this <literal>ForeignScan</> node is not
+     associated with a particular foreign tables.
+     Also, it need to construct pseudo scan tlist (<literal>fdw_ps_tlist</>)
+     to indicate expected tuple definition.
+    </para>
+    <para>
+     Once <literal>scanrelid</> equals zero, executor initializes the slot
+     for scan according to <literal>fdw_ps_tlist</>, but excludes junk
+     entries. This list is also used to solve the name of the original
+     relation and columns, so FDW can chains expression nodes which are
+     not run on local side actually, like a join clause to be executed on
+     the remote side, however, target-entries of them will have
+     <literal>resjunk=true</>.
+    </para>
+   </sect2>
+
    <sect2 id="fdw-callbacks-explain">
     <title>FDW Routines for <command>EXPLAIN</></title>
 
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index f03b72a..89fff77 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -93,6 +93,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index a648a4c..e378d69 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -242,6 +242,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a951c55..9281874 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1072,9 +1072,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..2f18a8a 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0 &&
+			 (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+		varno = INDEX_VAR;
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..2344129 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
+	Index				scan_relid = cscan->scan.scanrelid;
 	Relation			scan_rel;
 
 	/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,31 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +109,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..542d176 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
+	Index		scanrelid = node->scan.scanrelid;
 	Relation	currentRelation;
 	FdwRoutine *fdwroutine;
 
@@ -141,16 +142,28 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +174,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +206,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..df69a95 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -302,13 +302,12 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +349,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,6 +408,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+	return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
 
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9fe8008..9300b70 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -592,7 +592,9 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
@@ -617,6 +619,7 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
 
 	/*
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 775f482..f3676ec 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -558,7 +558,9 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
@@ -572,6 +574,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1da953f..03d5781 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,10 +17,13 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -260,6 +263,37 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDW drivers or custom-scan providers, in
+	 * addition to built-in paths.
+	 *
+	 * XXX - In case of FDW, we may be able to omit invocation if joinrel's
+	 * fdwhandler (set only if both relations are managed by same FDW server).
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 6. Consider paths added by FDWs when both outer and inner relations are
+	 * managed by same foreign-data wrapper.  Matching of foreign server and/or
+	 * checkAsUser should be checked in GetForeignJoinPaths by the FDW.
+	 */
+	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPaths)
+	{
+		joinrel->fdwroutine->GetForeignJoinPaths(root,
+												 joinrel,
+												 outerrel,
+												 innerrel,
+												 jointype,
+												 sjinfo,
+												 &semifactors,
+												 restrictlist,
+												 extra_lateral_rels);
+	}
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76ba1bf..7a37824 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1958,16 +1957,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1982,13 +1991,35 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
+	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+	 * ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (scan_plan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, scan_plan->fdw_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2050,12 +2081,7 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
-
-	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
+	ListCell   *lc;
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2075,6 +2101,26 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	Assert(IsA(cplan, CustomScan));
 
 	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the custom_ps_tlist, excluding resjunk=true, so we need
+	 * to ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (cplan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, cplan->custom_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+
+	/*
 	 * Copy cost data from Path to Plan; no need to make custom-plan providers
 	 * do this
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..a41c4f0 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -568,6 +568,34 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -582,6 +610,34 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 313a5c1..1c570c8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -378,10 +378,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerForRelation(relation);
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..5623566 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -122,6 +123,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->fdwroutine = NULL;
+	rel->fdw_handler = InvalidOid;
 	rel->fdw_private = NULL;
 	rel->baserestrictinfo = NIL;
 	rel->baserestrictcost.startup = 0;
@@ -427,6 +429,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 2fa30be..87f84a7 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3842,6 +3842,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..5a8bd39 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,16 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
+typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,
+											   RelOptInfo *joinrel,
+											   RelOptInfo *outerrel,
+											   RelOptInfo *innerrel,
+											   JoinType jointype,
+											   SpecialJoinInfo *sjinfo,
+											   SemiAntiJoinFactors *semifactors,
+											   List *restrictlist,
+											   Relids extra_lateral_rels);
+
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
 
@@ -150,6 +160,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
+
+	/* Support functions for join push-down */
+	GetForeignJoinPaths_function GetForeignJoinPaths;
+
 } FdwRoutine;
 
 
@@ -157,6 +171,7 @@ typedef struct FdwRoutine
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid	GetFdwHandlerForRelation(Relation relation);
 extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f6683f0..213034b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -471,7 +471,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -480,7 +486,9 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
@@ -488,10 +496,11 @@ typedef struct ForeignScan
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -512,6 +521,7 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
 	const CustomScanMethods *methods;
 } CustomScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..9ef0b56 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fa72918..0c8cbcd 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
#30Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#29)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

The attached patch integrates a suggestion from Ashutosh Bapat.
It allows to track set of relations involved in a join, but
replaced by foreign-/custom-scan. It enables to make correct
EXPLAIN output, if FDW/CSP driver makes human readable symbols
using deparse_expression() or others.

Differences from v7 is identical with what I posted on the
join push-down support thread.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Wednesday, March 04, 2015 11:42 AM
To: Shigeru Hanada
Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Obviously FDW can add multiple paths at a time, like GetForeignPaths,
so IMO it should be renamed to GetForeignJoinPaths, with plural form.

In addition to that, new member of RelOptInfo, fdw_handler, should be
initialized explicitly in build_simple_rel.

Please see attached a patch for these changes.

Thanks for your checks. Yep, the name of FDW handler should be ...Paths(),
instead of Path().

The attached one integrates Hanada-san's updates.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
Sent: Tuesday, March 03, 2015 9:26 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom
Plan API)

Kaigai-san,

The v6 patch was cleanly applied on master branch. I'll rebase my
patch onto it, but before that I have a comment about name of the new
FDW API handler GetForeignJoinPath.

Obviously FDW can add multiple paths at a time, like GetForeignPaths,
so IMO it should be renamed to GetForeignJoinPaths, with plural form.

In addition to that, new member of RelOptInfo, fdw_handler, should be
initialized explicitly in build_simple_rel.

Please see attached a patch for these changes.

I'll review the v6 path afterwards.

2015-03-03 20:20 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Sorry, I misoperated on patch creation.
Attached one is the correct version.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Tuesday, March 03, 2015 6:31 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

The attached version of custom/foreign-join interface patch
fixes up the problem reported on the join-pushdown support
thread.

The previous version referenced *_ps_tlist on setrefs.c, to
check whether the Custom/ForeignScan node is associated with
a particular base relation, or not.
This logic considered above nodes performs base relation scan,
if *_ps_tlist is valid. However, it was incorrect in case when
underlying pseudo-scan relation has empty targetlist.
Instead of the previous logic, it shall be revised to check
scanrelid itself. If zero, it means Custom/ForeignScan node is
not associated with a particular base relation, thus, its slot
descriptor for scan shall be constructed based on *_ps_tlist.

Also, I noticed a potential problem if CSP/FDW driver want to
displays expression nodes using deparse_expression() but
varnode within this expression does not appear in the *_ps_tlist.
For example, a remote query below shall return rows with two
columns.

SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid;

Thus, ForeignScan will perform like as a scan on relation with
two columns, and FDW driver will set two TargetEntry on the
fdw_ps_tlist. If FDW is designed to keep the join condition
(aid = bid) using expression node form, it is expected to be
saved on custom/fdw_expr variable, then setrefs.c rewrites the
varnode according to *_ps_tlist.
It means, we also have to add *_ps_tlist both of "aid" and "bid"
to avoid failure on variable lookup. However, these additional
entries changes the definition of the slot descriptor.
So, I adjusted ExecInitForeignScan and ExecInitCustomScan to
use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct
the slot descriptor based on the *_ps_tlist.
It expects CSP/FDW drivers to add target-entries with resjunk=true,
if it wants to have additional entries for variable lookups on
EXPLAIN command.

Fortunately or unfortunately, postgres_fdw keeps its remote query
in cstring form, so it does not need to add junk entries on the
fdw_ps_tlist.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Sunday, February 15, 2015 11:01 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan

API)

The attached patch is a rebased version of join replacement with
foreign-/custom-scan. Here is no feature updates at this moment
but SGML documentation is added (according to Michael's comment).

This infrastructure allows foreign-data-wrapper and custom-scan-
provider to add alternative scan paths towards relations join.
From viewpoint of the executor, it looks like a scan on a pseudo-
relation that is materialized from multiple relations, even though
FDW/CSP internally processes relations join with their own logic.

Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
scan node runs on a pseudo relation and (2) fdw_ps_tlist and
custom_ps_tlist introduce the definition of the pseudo relation,
because it is not associated with a tangible relation unlike
simple scan case, thus planner cannot know the expected record
type to be returned without these additional information.
These two enhancement enables extensions to process relations
join internally, and to perform as like existing scan node from
viewpoint of the core backend.

Also, as an aside. I had a discussion with Hanada-san about this
interface off-list. He had an idea to keep create_plan_recurse()
static, using a special list field in CustomPath structure to
chain underlying Path node. If core backend translate the Path
node to Plan node if valid list given, extension does not need to
call create_plan_recurse() by itself.
I have no preference about this. Does anybody have opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Thursday, January 15, 2015 8:03 AM
To: Robert Haas
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

Yes, even though full code set is too large for patch submission...

https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

This create_gpuhashjoin_plan() is PlanCustomPath callback of

GpuHashJoin.

It takes GpuHashJoinPath inherited from CustomPath that has multiple
underlying scan/join paths.
Once it is called back from the backend, it also calls

create_plan_recurse()

to make inner/outer plan nodes according to the paths.

In the result, we can see the following query execution plan that CustomScan
takes underlying scan plans.

postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
QUERY PLAN

----------------------------------------------------------------------

------------
Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922
width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009
width=77)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: aid
nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
-> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
width=37)
hash keys: bid
nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000
width=37)
(13 rows)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, January 15, 2015 2:07 AM
To: Kaigai Kouhei(海外 浩平)
Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS]
[v9.5] Custom Plan API)

On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>

wrote:

When custom-scan node replaced a join-plan, it shall have at least
two child plan-nodes. The callback handler of PlanCustomPath needs
to be able to call create_plan_recurse() to transform the underlying
path-nodes to plan-nodes, because this custom-scan node may take
other built-in scan or sub-join nodes as its inner/outer input.
In case of FDW, it shall kick any underlying scan relations to
remote side, thus we may not expect ForeignScan has underlying plans...

Do you have an example of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

make

changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Shigeru HANADA

Attachments:

pgsql-v9.5-custom-join.v8.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v8.patchDownload
 doc/src/sgml/custom-scan.sgml           | 278 ++++++++++++++++++++++++++++++++
 doc/src/sgml/fdwhandler.sgml            |  54 +++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  15 +-
 src/backend/executor/execScan.c         |   4 +
 src/backend/executor/nodeCustom.c       |  38 +++--
 src/backend/executor/nodeForeignscan.c  |  34 ++--
 src/backend/foreign/foreign.c           |  32 +++-
 src/backend/nodes/bitmapset.c           |  57 +++++++
 src/backend/nodes/copyfuncs.c           |   5 +
 src/backend/nodes/outfuncs.c            |   5 +
 src/backend/optimizer/path/joinpath.c   |  34 ++++
 src/backend/optimizer/plan/createplan.c |  80 +++++++--
 src/backend/optimizer/plan/setrefs.c    |  64 ++++++++
 src/backend/optimizer/util/plancat.c    |   7 +-
 src/backend/optimizer/util/relnode.c    |  14 ++
 src/backend/utils/adt/ruleutils.c       |   4 +
 src/include/foreign/fdwapi.h            |  15 ++
 src/include/nodes/bitmapset.h           |   1 +
 src/include/nodes/plannodes.h           |  24 ++-
 src/include/nodes/relation.h            |   2 +
 src/include/optimizer/paths.h           |  13 ++
 src/include/optimizer/planmain.h        |   1 +
 24 files changed, 734 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..1d103f5
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,278 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan provider</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+
+ <para>
+  Prior to query execution, the PostgreSQL planner constructs a plan tree
+  that usually consists of built-in plan nodes (eg: SeqScan, HashJoin, etc).
+  The custom-scan interface allows extensions to provide a custom-scan plan
+  that implements its own logic, in addition to the built-in nodes, to scan
+  a relation or join relations. Once a custom-scan node is chosen by planner,
+  callback functions associated with this custom-scan node shall be invoked
+  during query execution. Custom-scan provider is responsible for returning
+  equivalent result set as built-in logic would, but it is free to scan or
+  join the target relations according to its own logic.
+  This chapter explains how to write a custom-scan provider.
+ </para>
+
+ <para>
+  The first thing custom-scan provider to do is adding alternative paths
+  to scan a relation (on the <literal>set_rel_pathlist_hook</>) or
+  to join relations (on the <literal>set_join_pathlist_hook</>).
+  It expects <literal>CustomPath</> node is added with estimated execution
+  cost and a set of callbacks defined at <literal>CustomPathMethods</>.
+  Both of hooks also give extensions enough information to construct
+  <literal>CustomPath</> node, like <literal>RelOptInfo</> of relations
+  to be scanned, joined or read as source of join. Custom-scan provider
+  is responsible to compute a reasonable cost estimation which is
+  comparable to built-in logics.
+ </para>
+
+ <para>
+  Once a custom-path got chosen by planner, custom-scan provider has to
+  populate a plan node according to the <literal>CustomPath</> node.
+  At this moment, <literal>CustomScan</> is the only node type that allows
+  to implement custom-logic towards any <literal>CustomPath</> node.
+  The <literal>CustomScan</> structure has two special fields to keep
+  private information; <literal>custom_exprs</> and <literal>custom_private</>.
+  The <literal>custom_exprs</> intends to save a couple of expression trees
+  that shall be updated on <filename>setrefs.c</> and <filename>subselect.c</>.
+  On the other hands, <literal>custom_private</> is expected to save really
+  private information nobody will touch except for the custom-scan provider
+  itself. A plan-tree, which contains custom-scan node, can be duplicated
+  using <literal>copyObject()</>, so all the data structure stored within
+  these two fields must be safe to <literal>copyObject()</>.
+ </para>
+
+ <para>
+  In case when extension implements its own logic to join relations, it looks
+  like a simple relation scan but on a pseudo materialized relation from
+  multiple source relations, from the standpoint of the core executor.
+  Custom-scan provider is expected to process relation join with its own
+  logic internally, then return a set of records according to the tuple
+  descriptor of the scan node.
+  <literal>CustomScan</> node that replaced a relations join is not
+  associated with a particular tangible relation, unlike simple scan case,
+  so extension needs to inform the core planner expected records type to be
+  fetched from this node.
+  What we should do here is, setting zero on the <literal>scanrelid</> and
+  a valid list of <literal>TargetEntry</> on the <literal>custom_ps_tlist</>
+  instead. These configuration informs the core planner this custom-scan
+  node is not associated with a particular physical table and expected
+  record type to be returned.
+ </para>
+
+ <para>
+  Once a plan-tree is moved to the executor, it has to construct plan-state
+  objects according to the supplied plan-node.
+  Custom-scan is not an exception. Executor invokes a callback to populate
+  <literal>CustomScanState</> node, if <literal>CustomScan</> node gets
+  found in the supplied plan-tree.
+  It does not have fields to save private information unlike
+  <literal>CustomScan</> node, because custom-scan provider can allocate
+  larger object than the bare <literal>CustomScanState</> to store various
+  private execution state.
+  It looks like a relationship of <literal>ScanState</> structure towards
+  <literal>PlanState</>; that expands scan specific fields towards generic
+  plan-state. In addition, custom-scan provider can expand fields on demand.
+  Once a CustomScanState gets constructed, BeginCustomScan is invoked during
+  executor initialization; ExecCustomScan is repeatedly called during
+  execution (returning a TupleTableSlot with each fetched record), then
+  EndCustomScan is invoked on cleanup of the executor.
+ </para>
+
+ <sect1 id="custom-scan-reference">
+  <title>Custom Scan Hooks and Callbacks</title>
+
+  <sect2 id="custom-scan-hooks">
+   <title>Custom Scan Hooks</title>
+   <para>
+    This hooks is invoked when the planner investigates the optimal way to
+    scan a particular relation. Extension can add alternative paths if it
+    can provide its own logic to scan towards the given scan and qualifiers.
+<programlisting>
+typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
+                                            RelOptInfo *rel,
+                                            Index rti,
+                                            RangeTblEntry *rte);
+extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+</programlisting>
+   </para>
+
+   <para>
+    This hook is invoked when the planner investigates the optimal combination
+    of relations join. Extension can add alternative paths that replaces the
+    relation join with its own logic. 
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+                                             RelOptInfo *joinrel,
+                                             RelOptInfo *outerrel,
+                                             RelOptInfo *innerrel,
+                                             List *restrictlist,
+                                             JoinType jointype,
+                                             SpecialJoinInfo *sjinfo,
+                                             SemiAntiJoinFactors *semifactors,
+                                             Relids param_source_rels,
+                                             Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-path-callbacks">
+   <title>Custom Path Callbacks</title>
+   <para>
+    A <literal>CustomPathMethods</> table contains a set of callbacks related
+    to <literal>CustomPath</> node. The core backend invokes these callbacks
+    during query planning.
+   </para>
+   <para>
+    This callback is invoked when the core backend tries to populate
+    <literal>CustomScan</> node according to the supplied
+    <literal>CustomPath</> node.
+    Custom-scan provider is responsible to allocate a <literal>CustomScan</>
+    node and initialize each fields of them.
+<programlisting>
+Plan *(*PlanCustomPath) (PlannerInfo *root,
+                         RelOptInfo *rel,
+                         CustomPath *best_path,
+                         List *tlist,
+                         List *clauses);
+</programlisting>
+   </para>
+   <para>
+    This optional callback will be invoked when <literal>nodeToString()</>
+    tries to create a text representation of <literal>CustomPath</> node.
+    A custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that expression nodes linked to
+    <literal>custom_private</> shall be transformed to text representation
+    by the core, so nothing to do by extension.
+<programlisting>
+void (*TextOutCustomPath) (StringInfo str,
+                           const CustomPath *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-callbacks">
+   <title>Custom Scan Callbacks</title>
+   <para>
+    A <literal>CustomScanMethods</> contains a set of callbacks related to
+    <literal>CustomScan</> node, then the core backend invokes these callbacks
+    during query planning and initialization of executor.
+   </para>
+   <para>
+    This callback shall be invoked when the core backend tries to populate
+    <literal>CustomScanState</> node according to the supplied
+    <literal>CustomScan</> node. The custom-scan provider is responsible to
+    allocate a <literal>CustomScanState</> (or its own data-type enhanced
+    from it), but no need to initialize the fields here, because
+    <literal>ExecInitCustomScan</> initializes the fields in
+    <literal>CustomScanState</>, then <literal>BeginCustomScan</> shall be
+    kicked on the end of executor initialization.
+<programlisting>
+Node *(*CreateCustomScanState) (CustomScan *cscan);
+</programlisting>
+   </para>
+   <para>
+    This optional callback shall be invoked when <literal>nodeToString()</>
+    tries to make text representation of <literal>CustomScan</> node.
+    Custom-scan provider can utilize this callback, if it wants to output
+    something additional. Note that it is not allowed to expand the data
+    structure of <literal>CustomScan</> node, so we usually don't need to
+    implement this callback.
+<programlisting>
+void (*TextOutCustomScan) (StringInfo str,
+                           const CustomScan *node);
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-exec-callbacks">
+   <title>Custom Exec Callbacks</title>
+   <para>
+    A <literal>CustomExecMethods</> contains a set of callbacks related to
+    <literal>CustomScanState</> node, then the core backend invokes these
+    callbacks during query execution.
+   </para>
+   <para>
+    This callback allows a custom-scan provider to have final initialization
+    of the <literal>CustomScanState</> node.
+    The supplied <literal>CustomScanState</> node is partially initialized
+    according to either <literal>scanrelid</> or <literal>custom_ps_tlist</>
+    of <literal>CustomScan</> node. If the custom-scan provider wants to
+    apply additional initialization to the private fields, it can be done
+    by this callback.
+<programlisting>
+void (*BeginCustomScan) (CustomScanState *node,
+                         EState *estate,
+                         int eflags);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to produce the next tuple
+    of the relation scan. If any tuples, it should set it on the
+    <literal>ps_ResultTupleSlot</> then returns the tuple slot. Elsewhere,
+    <literal>NULL</> or empty slot shall be returned to inform end of the
+    relation scan.
+<programlisting>
+TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback allows a custom-scan provider to cleanup the
+    <literal>CustomScanState</> node. If it holds any private (and not
+    released automatically) resources on the supplied node, it can release
+    these resources prior to the cleanup of the common portion.
+<programlisting>
+void (*EndCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This callback requires custom-scan provider to rewind the current scan
+    position to the head of relation. Custom-scan provider is expected to
+    reset its internal state to restart the relation scan again.
+<programlisting>
+void (*ReScanCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to save the current
+    scan position on its internal state. It shall be able to restore the
+    position using <literal>RestrPosCustomScan</> callback. It shall be never
+    called unless <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*MarkPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback requires custom-scan provider to restore the
+    previous scan position that was saved by <literal>MarkPosCustomScan</>
+   callback. It shall be never called unless
+   <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*RestrPosCustomScan) (CustomScanState *node);
+</programlisting>
+   </para>
+   <para>
+    This optional callback allows custom-scan provider to output additional
+    information on <command>EXPLAIN</> that involves custom-scan node.
+    Note that it can output common items; target-list, qualifiers, relation
+    to be scanned. So, it can be used when custom-scan provider wants to show
+    something others in addition to the items.
+<programlisting>
+void (*ExplainCustomScan) (CustomScanState *node,
+                           List *ancestors,
+                           ExplainState *es);
+</programlisting>
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
+
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index c1daa4b..77477c8 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -598,6 +598,60 @@ IsForeignRelUpdatable (Relation rel);
 
    </sect2>
 
+   <sect2>
+    <title>FDW Routines for remote join</title>
+    <para>
+<programlisting>
+void
+GetForeignJoinPaths(PlannerInfo *root,
+                    RelOptInfo *joinrel,
+                    RelOptInfo *outerrel,
+                    RelOptInfo *innerrel,
+                    JoinType jointype,
+                    SpecialJoinInfo *sjinfo,
+                    SemiAntiJoinFactors *semifactors,
+                    List *restrictlist,
+                    Relids extra_lateral_rels);
+</programlisting>
+     Create possible access paths for a join of two foreign tables or
+     joined relations, but both of them needs to be managed with same
+     FDW driver.
+     This optional function is called during query planning.
+    </para>
+    <para>
+     This function allows FDW driver to add <literal>ForeignScan</> path
+     towards the supplied <literal>joinrel</>. From the standpoint of
+     query planner, it looks like scan-node is added for join-relation.
+     It means, <literal>ForeignScan</> path added instead of the built-in
+     local join logic has to generate tuples as if it scans on a joined
+     and materialized relations.
+    </para>
+    <para>
+     Usually, we expect FDW drivers issues a remote query that involves
+     tables join on remote side, then FDW driver fetches the joined result
+     on local side.
+     Unlike simple table scan, definition of slot descriptor of the joined
+     relations is determined on the fly, thus we cannot know its definition
+     from the system catalog.
+     So, FDW driver is responsible to introduce the query planner expected
+     form of the joined relations. In case when <literal>ForeignScan</>
+     replaced a relations join, <literal>scanrelid</> of the generated plan
+     node shall be zero, to mark this <literal>ForeignScan</> node is not
+     associated with a particular foreign tables.
+     Also, it need to construct pseudo scan tlist (<literal>fdw_ps_tlist</>)
+     to indicate expected tuple definition.
+    </para>
+    <para>
+     Once <literal>scanrelid</> equals zero, executor initializes the slot
+     for scan according to <literal>fdw_ps_tlist</>, but excludes junk
+     entries. This list is also used to solve the name of the original
+     relation and columns, so FDW can chains expression nodes which are
+     not run on local side actually, like a join clause to be executed on
+     the remote side, however, target-entries of them will have
+     <literal>resjunk=true</>.
+    </para>
+   </sect2>
+
    <sect2 id="fdw-callbacks-explain">
     <title>FDW Routines for <command>EXPLAIN</></title>
 
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index f03b72a..89fff77 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -93,6 +93,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index a648a4c..e378d69 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -242,6 +242,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a951c55..8892dca 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -730,11 +730,17 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
-		case T_ForeignScan:
-		case T_CustomScan:
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_ForeignScan:
+			*rels_used = bms_add_members(*rels_used,
+										 ((ForeignScan *) plan)->fdw_relids);
+			break;
+		case T_CustomScan:
+			*rels_used = bms_add_members(*rels_used,
+										 ((CustomScan *) plan)->custom_relids);
+			break;
 		case T_ModifyTable:
 			*rels_used = bms_add_member(*rels_used,
 									((ModifyTable *) plan)->nominalRelation);
@@ -1072,9 +1078,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..2f18a8a 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0 &&
+			 (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+		varno = INDEX_VAR;
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..2344129 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
+	Index				scan_relid = cscan->scan.scanrelid;
 	Relation			scan_rel;
 
 	/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,31 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +109,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..542d176 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
+	Index		scanrelid = node->scan.scanrelid;
 	Relation	currentRelation;
 	FdwRoutine *fdwroutine;
 
@@ -141,16 +142,28 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +174,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +206,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..df69a95 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -302,13 +302,12 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +349,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,6 +408,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+	return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
 
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index a9c3b4b..4dc3286 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -301,6 +301,63 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 }
 
 /*
+ * bms_shift_members - move all the bits by shift
+ */
+Bitmapset *
+bms_shift_members(const Bitmapset *a, int shift)
+{
+	Bitmapset  *b;
+	bitmapword	h_word;
+	bitmapword	l_word;
+	int			nwords;
+	int			w_shift;
+	int			b_shift;
+	int			i, j;
+
+	/* fast path if result shall be NULL obviously */
+	if (a == NULL || a->nwords * BITS_PER_BITMAPWORD + shift <= 0)
+		return NULL;
+	/* actually, not shift members */
+	if (shift == 0)
+		return bms_copy(a);
+
+	nwords = (a->nwords * BITS_PER_BITMAPWORD + shift +
+			  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+	b = palloc(BITMAPSET_SIZE(nwords));
+	b->nwords = nwords;
+
+	if (shift > 0)
+	{
+		/* Left shift */
+		w_shift = WORDNUM(shift);
+		b_shift = BITNUM(shift);
+
+		for (i=0, j=-w_shift; i < b->nwords; i++, j++)
+		{
+			h_word = (j >= 0   && j   < a->nwords ? a->words[j] : 0);
+			l_word = (j-1 >= 0 && j-1 < a->nwords ? a->words[j-1] : 0);
+			b->words[i] = ((h_word << b_shift) |
+						   (l_word >> (BITS_PER_BITMAPWORD - b_shift)));
+		}
+	}
+	else
+	{
+		/* Right shift */
+		w_shift = WORDNUM(-shift);
+		b_shift = BITNUM(-shift);
+
+		for (i=0, j=-w_shift; i < b->nwords; i++, j++)
+		{
+			h_word = (j+1 >= 0 && j+1 < a->nwords ? a->words[j+1] : 0);
+			l_word = (j >= 0 && j < a->nwords ? a->words[j] : 0);
+			b->words[i] = ((h_word >> (BITS_PER_BITMAPWORD - b_shift)) |
+						   (l_word << b_shift));
+		}
+	}
+	return b;
+}
+
+/*
  * bms_is_subset - is A a subset of B?
  */
 bool
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9fe8008..7c85943 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -592,8 +592,11 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
+	COPY_BITMAPSET_FIELD(fdw_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
 	return newnode;
@@ -617,7 +620,9 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
+	COPY_BITMAPSET_FIELD(custom_relids);
 
 	/*
 	 * NOTE: The method field of CustomScan is required to be a pointer to a
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 775f482..edeee7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -558,8 +558,11 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
+	WRITE_BITMAPSET_FIELD(fdw_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
 
@@ -572,7 +575,9 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
+	WRITE_BITMAPSET_FIELD(custom_relids);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
 	if (node->methods->TextOutCustomScan)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1da953f..03d5781 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,10 +17,13 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -260,6 +263,37 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDW drivers or custom-scan providers, in
+	 * addition to built-in paths.
+	 *
+	 * XXX - In case of FDW, we may be able to omit invocation if joinrel's
+	 * fdwhandler (set only if both relations are managed by same FDW server).
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 6. Consider paths added by FDWs when both outer and inner relations are
+	 * managed by same foreign-data wrapper.  Matching of foreign server and/or
+	 * checkAsUser should be checked in GetForeignJoinPaths by the FDW.
+	 */
+	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPaths)
+	{
+		joinrel->fdwroutine->GetForeignJoinPaths(root,
+												 joinrel,
+												 outerrel,
+												 innerrel,
+												 jointype,
+												 sjinfo,
+												 &semifactors,
+												 restrictlist,
+												 extra_lateral_rels);
+	}
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76ba1bf..514fcd9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1958,16 +1957,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1982,13 +1991,37 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
+	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+	 * ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (scan_plan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, scan_plan->fdw_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+	/* Set the relids that are represented by this foreign scan for Explain */
+	scan_plan->fdw_relids = best_path->path.parent->relids;
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2050,12 +2083,7 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
-
-	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
+	ListCell   *lc;
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2075,6 +2103,28 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	Assert(IsA(cplan, CustomScan));
 
 	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the custom_ps_tlist, excluding resjunk=true, so we need
+	 * to ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (cplan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, cplan->custom_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+	/* Set the relids that are represented by this custom scan for Explain */
+	cplan->custom_relids = best_path->path.parent->relids;
+
+	/*
 	 * Copy cost data from Path to Plan; no need to make custom-plan providers
 	 * do this
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..2961f44 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -568,6 +568,38 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (rtoffset > 0)
+					splan->fdw_relids =
+						bms_shift_members(splan->fdw_relids, rtoffset);
+
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -582,6 +614,38 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (rtoffset > 0)
+					splan->custom_relids =
+						bms_shift_members(splan->custom_relids, rtoffset);
+
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 313a5c1..1c570c8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -378,10 +378,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerForRelation(relation);
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..5623566 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -122,6 +123,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->fdwroutine = NULL;
+	rel->fdw_handler = InvalidOid;
 	rel->fdw_private = NULL;
 	rel->baserestrictinfo = NIL;
 	rel->baserestrictcost.startup = 0;
@@ -427,6 +429,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 2fa30be..87f84a7 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3842,6 +3842,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..5a8bd39 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,16 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
+typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,
+											   RelOptInfo *joinrel,
+											   RelOptInfo *outerrel,
+											   RelOptInfo *innerrel,
+											   JoinType jointype,
+											   SpecialJoinInfo *sjinfo,
+											   SemiAntiJoinFactors *semifactors,
+											   List *restrictlist,
+											   Relids extra_lateral_rels);
+
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
 
@@ -150,6 +160,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
+
+	/* Support functions for join push-down */
+	GetForeignJoinPaths_function GetForeignJoinPaths;
+
 } FdwRoutine;
 
 
@@ -157,6 +171,7 @@ typedef struct FdwRoutine
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid	GetFdwHandlerForRelation(Relation relation);
 extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 3a556ee..3ca9791 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -66,6 +66,7 @@ extern void bms_free(Bitmapset *a);
 extern Bitmapset *bms_union(const Bitmapset *a, const Bitmapset *b);
 extern Bitmapset *bms_intersect(const Bitmapset *a, const Bitmapset *b);
 extern Bitmapset *bms_difference(const Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_shift_members(const Bitmapset *a, int shift);
 extern bool bms_is_subset(const Bitmapset *a, const Bitmapset *b);
 extern BMS_Comparison bms_subset_compare(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_is_member(int x, const Bitmapset *a);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f6683f0..0f1e94c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -471,7 +471,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -480,18 +486,23 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
+	Bitmapset  *fdw_relids;		/* set of relid (index of range-tables)
+								 * represented by this node */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -512,7 +523,10 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
+	Bitmapset  *custom_relids;	/* set of relid (index of range-tables)
+								 * represented by this node */
 	const CustomScanMethods *methods;
 } CustomScan;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..9ef0b56 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fa72918..0c8cbcd 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
#31Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#30)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Mon, Mar 9, 2015 at 11:18 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch integrates a suggestion from Ashutosh Bapat.
It allows to track set of relations involved in a join, but
replaced by foreign-/custom-scan. It enables to make correct
EXPLAIN output, if FDW/CSP driver makes human readable symbols
using deparse_expression() or others.

Differences from v7 is identical with what I posted on the
join push-down support thread.

I took a look at this patch today and noticed that it incorporates not
only documentation for the new functionality it adds, but also for the
custom-scan functionality whose documentation I previously excluded
from commit on the grounds that it needed more work, especially to
improve the English. That decision was not popular at the time, and I
think we need to remedy it before going further with this. I had
hoped that someone else would care about this work enough to help with
the documentation, but it seems not, so today I went through the
documentation in this patch, excluded all of the stuff specific to
custom joins, and heavily edited the rest. The result is attached.

If there are no objections, I'll commit this; then, someone can rebase
this patch over these changes and we can proceed from there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

custom-scan-docs.patchtext/x-patch; charset=US-ASCII; name=custom-scan-docs.patchDownload
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..fa6f457
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,284 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan provider</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+
+ <para>
+  <productname>PostgreSQL</> supports a set of experimental facilities which
+  are intended to allow extension modules to add new scan types to the system.
+  Unlike a <link linkend="fdwhandler">foreign data wrapper</>, which is only
+  responsible for knowing how to scan its own foreign tables, a custom scan
+  provider can provide an alternative method of scanning any relation in the
+  system.  Typically, the motivation for writing a custom scan provider will
+  be to allow the use of some optimization not supported by the core
+  system, such as caching or some form of hardware acceleration.  This chapter
+  outlines how to write a new custom scan provider.
+ </para>
+
+ <para>
+  Implementing a new type of custom scan is a three-step process.  First,
+  during planning, it is necessary to generate access paths representing a
+  scan using the proposed strategy.  Second, if one of those access paths
+  is selected by the planner as the optimal strategy for scanning a
+  particular relation, the access path must be converted to a plan.
+  Finally, it must be possible to execute the plan and generate the same
+  results that would have been generated for any other access path targeting
+  the same relation.
+ </para>
+
+ <sect1 id="custom-scan-path">
+  <title>Implementing Custom Paths</title>
+
+  <para>
+    A custom scan provider will typically add paths by setting the following
+    hook, which is called after the core code has generated what it believes
+    to be the complete and correct set of access paths for the relation.
+<programlisting>
+typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
+                                            RelOptInfo *rel,
+                                            Index rti,
+                                            RangeTblEntry *rte);
+extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+</programlisting>
+  </para>
+
+  <para>
+    Although this hook function can be used to examine, modify, or remove
+    paths generated by the core system, a custom scan provider will typically
+    confine itself to generating <structname>CustomPath</> objects and adding
+    them to <literal>rel</> using <function>add_path</>.  The custom scan
+    provider is responsible for initializing the <structname>CustomPath</>
+    object, which is declared like this:
+<programlisting>
+typedef struct CustomPath
+{
+    Path      path;
+    uint32    flags;
+    List     *custom_private;
+    const CustomPathMethods *methods;
+} CustomPath;
+</programlisting>
+  </para>
+
+  <para>
+    <structfield>path</> must be initialized as for any other path, including
+    the row-count estimate, start and total cost, and sort ordering provided
+    by this path.  <structfield>flags</> is a bitmask, which should include
+    <literal>CUSTOMPATH_SUPPORT_BACKWARD_SCAN</> if the custom path can support
+    a backward scan and <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> if it
+    can support mark and restore.  Both capabilities are optional.
+    <structfield>custom_private</> can be used to store the custom path's
+    private data.  Private data should be stored in a form that can be handled
+    by <literal>nodeToString</>, so that debugging routines which attempt to
+    print the custom path will work as designed.  <structfield>methods</> must
+    point to a (usually statically allocated) object implementing the required
+    custom path methods, of which there are currently only two, as further
+    detailed below.
+  </para>
+
+  <sect2 id="custom-scan-path-callbacks">
+  <title>Custom Path Callbacks</title>
+
+  <para>
+<programlisting>
+Plan *(*PlanCustomPath) (PlannerInfo *root,
+                         RelOptInfo *rel,
+                         CustomPath *best_path,
+                         List *tlist,
+                         List *clauses);
+</programlisting>
+    Convert a custom path to a finished plan.  The return value will generally
+    be a <literal>CustomScan</> object, which the callback must allocate and
+    initialize.  See <xref linkend="custom-scan-plan"> for more details.
+   </para>
+
+   <para>
+<programlisting>
+void (*TextOutCustomPath) (StringInfo str,
+                           const CustomPath *node);
+</programlisting>
+    Generate additional output when <function>nodeToString</> is invoked on
+    this custom path.  This callback is optional. Since
+    <function>nodeToString</> will automatically dump all fields in the
+    structure that it can see, including <structfield>custom_private</>, this
+    is only useful if the <structname>CustomPath</> is actually embedded in a
+    larger struct containing additional fields.
+   </para>
+  </sect2>
+ </sect1>
+
+ <sect1 id="custom-scan-plan">
+  <title>Implementing Custom Plans</title>
+
+  <para>
+    A custom scan is represented in a finished plan tree using the following
+    structure:
+<programlisting>
+typedef struct CustomScan
+{
+    Scan      scan;
+    uint32    flags;
+    List     *custom_exprs;
+    List     *custom_private;
+    const CustomScanMethods *methods;
+} CustomScan;
+</programlisting>
+  </para>
+
+  <para>
+    <structfield>scan</> must be initialized as for any other scan, including
+    estimated costs, target lists, qualifications, and so on.
+    <structfield>flags</> is a bitmask with the same meaning as in
+    <structname>CustomPath</>.  <structfield>custom_exprs</> should be used to
+    store expression trees that will need to be fixed up by
+    <filename>setrefs.c</> and <filename>subselect.c</>, while
+    <literal>custom_private</> should be used to store other private data that
+    is only used by the custom scan provider itself.  Plan trees must be able
+    to be duplicated using <function>copyObject</>, so all the data stored
+    within these two fields must consist of nodes that function can handle.
+    <structfield>methods</> must point to a (usually statically allocated)
+    object implementing the required custom scan methods, which are further
+    detailed below.
+  </para>
+
+  <sect2 id="custom-scan-plan-callbacks">
+   <title>Custom Scan Callbacks</title>
+   <para>
+<programlisting>
+Node *(*CreateCustomScanState) (CustomScan *cscan);
+</programlisting>
+    Allocate a <structname>CustomScanState</> for this
+    <structname>CustomScan</>.  The actual allocation will often be larger than
+    required for an ordinary <structname>CustomScanState</>, because many
+    scan types will wish to embed that as the first field of a large structure.
+    The value returned must have the node tag and <structfield>methods</>
+    set appropriately, but the other fields need not be initialized at this
+    stage; after <function>ExecInitCustomScan</> performs basic initialization,
+    the <function>BeginCustomScan</> callback will be invoked to give the
+    custom scan state a chance to do whatever else is needed.
+   </para>
+
+   <para>
+<programlisting>
+void (*TextOutCustomScan) (StringInfo str,
+                           const CustomScan *node);
+</programlisting>
+    Generate additional output when <function>nodeToString</> is invoked on
+    this custom plan.  This callback is optional.  Since a
+    <structname>CustomScan</> must be copyable by <function>copyObject</>,
+    custom scan providers cannot substitute a larger structure that embeds a
+    <structname>CustomScan</> for the structure itself, as would be possible
+    for a <structname>CustomPath</> or <structname>CustomScanState</>.
+    Therefore, providing this callback is unlikely to be useful.
+   </para>
+  </sect2>
+ </sect1>
+
+ <sect1 id="custom-scan-scan">
+  <title>Implementing Custom Scans</title>
+
+  <para>
+   When a <structfield>CustomScan</> is executed, its execution state is
+   represented by a <structfield>CustomScanState</>, which is declared as
+   follows.
+<programlisting>
+typedef struct CustomScanState
+{
+    ScanState ss;
+    uint32    flags;
+    const CustomExecMethods *methods;
+} CustomScanState;
+</programlisting>
+  </para>
+
+  <para>
+   <structfield>ss</> must be initialized as for any other scanstate;
+   <structfield>flags</> is a bitmask with the same meaning as in
+   <structname>CustomPath</> and <structname>CustomScan</>.
+   <structfield>methods</> must point to a (usually statically allocated)
+   object implementing the required custom scan state methods, which are
+   further detailed below.  Typically, a <structname>CustomScanState</>, which
+   need not support <function>copyObject</>, will actually be a larger
+   structure embedding the above as its first member.
+  </para>
+
+  <sect2 id="custom-scan-scan-callbacks">
+   <title>Custom Execution-Time Callbacks</title>
+
+   <para>
+<programlisting>
+void (*BeginCustomScan) (CustomScanState *node,
+                         EState *estate,
+                         int eflags);
+</programlisting>
+    Complete initalization of the supplied <structname>CustomScanState</>.
+    Some initialization is performed by <function>ExecInitCustomScan</>, but
+    any private fields should be initialized here.
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
+</programlisting>
+    Fetch the next scan tuple.  If any tuples remain, it should set
+    <literal>ps_ResultTupleSlot</> and then returns the tuple slot.  If not,
+    <literal>NULL</> or an empty slot should be returned.
+   </para>
+
+   <para>
+<programlisting>
+void (*EndCustomScan) (CustomScanState *node);
+</programlisting>
+    Clean up any private data associated with the <literal>CustomScanState</>.
+    This method is required, but may not need to do anything if the associated
+    data does not exist or will be cleaned up automatically.
+   </para>
+
+   <para>
+<programlisting>
+void (*ReScanCustomScan) (CustomScanState *node);
+</programlisting>
+    Rewind the current scan to the beginning and prepare to rescan the
+    relation.
+   </para>
+
+   <para>
+<programlisting>
+void (*MarkPosCustomScan) (CustomScanState *node);
+</programlisting>
+    Save the current scan position so that it can subsequently be restored
+    by the <function>RestrPosCustomScan</> callback.  This calback is optional,
+    and need only be supplied if 
+    <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+   </para>
+
+   <para>
+<programlisting>
+void (*RestrPosCustomScan) (CustomScanState *node);
+</programlisting>
+    Restore the previous scan position as saved by the
+    <function>MarkPosCustomScan</> callback.  This callback is optional,
+    and need only be supplied if 
+    <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+   </para>
+
+   <para>
+<programlisting>
+void (*ExplainCustomScan) (CustomScanState *node,
+                           List *ancestors,
+                           ExplainState *es);
+</programlisting>
+    Output additional information on <command>EXPLAIN</> that involves
+    custom-scan node.  This callback is optional.  Common data stored in the
+    <structname>ScanState</>, such as the target list and scan relation, will
+    be shown even without this callback, but the callback allows the display
+    of additional, private state.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index f03b72a..89fff77 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -93,6 +93,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan SYSTEM "custom-scan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index a648a4c..e378d69 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -242,6 +242,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#31)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

I took a look at this patch today and noticed that it incorporates not
only documentation for the new functionality it adds, but also for the
custom-scan functionality whose documentation I previously excluded
from commit on the grounds that it needed more work, especially to
improve the English. That decision was not popular at the time, and I
think we need to remedy it before going further with this. I had
hoped that someone else would care about this work enough to help with
the documentation, but it seems not, so today I went through the
documentation in this patch, excluded all of the stuff specific to
custom joins, and heavily edited the rest. The result is attached.

Looks good; I noticed one trivial typo --- please s/returns/return/ under
ExecCustomScan. Also, perhaps instead of just "set ps_ResultTupleSlot"
that should say "fill ps_ResultTupleSlot with the next tuple in the
current scan direction".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Thom Brown
thom@linux.com
In reply to: Tom Lane (#32)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 12 March 2015 at 21:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I took a look at this patch today and noticed that it incorporates not
only documentation for the new functionality it adds, but also for the
custom-scan functionality whose documentation I previously excluded
from commit on the grounds that it needed more work, especially to
improve the English. That decision was not popular at the time, and I
think we need to remedy it before going further with this. I had
hoped that someone else would care about this work enough to help with
the documentation, but it seems not, so today I went through the
documentation in this patch, excluded all of the stuff specific to
custom joins, and heavily edited the rest. The result is attached.

Looks good; I noticed one trivial typo --- please s/returns/return/ under
ExecCustomScan. Also, perhaps instead of just "set ps_ResultTupleSlot"
that should say "fill ps_ResultTupleSlot with the next tuple in the
current scan direction".

Also:

s/initalization/initialization/

--
Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Thom Brown (#33)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Mon, Mar 9, 2015 at 11:18 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch integrates a suggestion from Ashutosh Bapat.
It allows to track set of relations involved in a join, but
replaced by foreign-/custom-scan. It enables to make correct
EXPLAIN output, if FDW/CSP driver makes human readable symbols
using deparse_expression() or others.

Differences from v7 is identical with what I posted on the
join push-down support thread.

I took a look at this patch today and noticed that it incorporates not
only documentation for the new functionality it adds, but also for the
custom-scan functionality whose documentation I previously excluded
from commit on the grounds that it needed more work, especially to
improve the English. That decision was not popular at the time, and I
think we need to remedy it before going further with this. I had
hoped that someone else would care about this work enough to help with
the documentation, but it seems not, so today I went through the
documentation in this patch, excluded all of the stuff specific to
custom joins, and heavily edited the rest. The result is attached.

If there are no objections, I'll commit this; then, someone can rebase
this patch over these changes and we can proceed from there.

Thanks for your help. I tried to check the documentation from the
implementation standpoint, however, I have no objection here.

Best regards,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Robert Haas
robertmhaas@gmail.com
In reply to: Thom Brown (#33)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Thu, Mar 12, 2015 at 8:09 PM, Thom Brown <thom@linux.com> wrote:

On 12 March 2015 at 21:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I took a look at this patch today and noticed that it incorporates not
only documentation for the new functionality it adds, but also for the
custom-scan functionality whose documentation I previously excluded
from commit on the grounds that it needed more work, especially to
improve the English. That decision was not popular at the time, and I
think we need to remedy it before going further with this. I had
hoped that someone else would care about this work enough to help with
the documentation, but it seems not, so today I went through the
documentation in this patch, excluded all of the stuff specific to
custom joins, and heavily edited the rest. The result is attached.

Looks good; I noticed one trivial typo --- please s/returns/return/ under
ExecCustomScan. Also, perhaps instead of just "set ps_ResultTupleSlot"
that should say "fill ps_ResultTupleSlot with the next tuple in the
current scan direction".

Also:

s/initalization/initialization/

Thanks to both of you for the review. I have committed it with those
improvements. Please let me know if you spot anything else.

Another bit of this that I think we could commit without fretting
about it too much is the code adding set_join_pathlist_hook. This is
- I think - analogous to set_rel_pathlist_hook, and like that hook,
could be used for other purposes than custom plan generation - e.g. to
delete paths we do not want to use. I've extracted this portion of
the patch and adjusted the comments; if there are no objections, I
will commit this bit also.

Kaigai, note that your patch puts this hook and the call to
GetForeignJoinPaths() in the wrong order. As in the baserel case, the
hook should get the last word, after any FDW-specific handlers have
been called, so that it can delete or modify paths as well as add
them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

join-pathlist-hook.patchtext/x-patch; charset=US-ASCII; name=join-pathlist-hook.patchDownload
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1da953f..2872430 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -260,6 +262,17 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * Allow a plugin to editorialize on the set of Paths for this join
+	 * relation.  It could add new paths by calling add_path(), or delete
+	 * or modify paths added by the core code.
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
 }
 
 /*
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

Another bit of this that I think we could commit without fretting
about it too much is the code adding set_join_pathlist_hook. This is
- I think - analogous to set_rel_pathlist_hook, and like that hook,
could be used for other purposes than custom plan generation - e.g. to
delete paths we do not want to use. I've extracted this portion of
the patch and adjusted the comments; if there are no objections, I
will commit this bit also.

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

I think the right placement is just before the set_cheapest call for
each joinrel, just as we did with set_rel_pathlist_hook. It looks
like those calls are at:

allpaths.c:1649 (in standard_join_search)
geqo_eval.c:270 (in merge_clump)

There are a couple of other set_cheapest calls that probably don't need
hooked, since they are for dummy (proven empty) rels, and it's not clear
how a hook could improve on an empty plan.

Also, this would leave you with a much shorter parameter list ;-) ...
really no reason to pass more than root and rel.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#36)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Another bit of this that I think we could commit without fretting
about it too much is the code adding set_join_pathlist_hook. This is
- I think - analogous to set_rel_pathlist_hook, and like that hook,
could be used for other purposes than custom plan generation - e.g. to
delete paths we do not want to use. I've extracted this portion of
the patch and adjusted the comments; if there are no objections, I
will commit this bit also.

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list. It's true that if your
goal is to delete paths, it's probably best to be called just once
after the path list is complete, and there might be a use case for
that, but I guess it's less useful than for baserels. For a baserel,
as long as you don't nuke the sequential-scan path, there is always
going to be a way to complete the plan; so this would be a fine way to
implement a disable-an-index extension. But for joinrels, it's not so
easy to rule out, say, a hash-join here. Neither hook placement is
much good for that; the path you want to get rid of may have already
dominated paths you want to keep.

Suppose you want to add paths - e.g. you have an extension that goes
and looks for a materialized view that matches this subtree of the
query, and if it finds one, it substitutes a scan of the materialized
view for a scan of the baserel. Or, as in KaiGai's case, you have an
extension that can perform the whole join in GPU-land and produce the
same results we would have gotten via normal execution. Either way,
you want - and this is the central point of the whole patch here - to
inject a scan path into a joinrel. It is not altogether obvious to me
what the best placement for this is. In the materialized view case,
you probably need a perfect match between the baserels in the view and
the baserels in the joinrel to do anything. There's no point in
re-checking that for every innerrels/outerrels combination. I don't
know enough about the GPU case to reason about it intelligently; maybe
KaiGai can comment.

I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from? Currently,
the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
called from add_paths_to_joinrel(). The patch at
/messages/by-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com
uses that to implement join pushdown in postgres_fdw; if you have A
JOIN B JOIN C all on server X, we'll notice that the join with A and B
can be turned into a foreign scan on A JOIN B, and similarly for A-C
and B-C. Then, if it turns out that the cheapest path for A-B is the
foreign join, and the cheapest path for C is a foreign scan, we'll
arrive at the idea of a foreign scan on A-B-C, and we'll realize the
same thing in each of the other combinations as well. So, eventually
the foreign join gets pushed down.

But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#37)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list.

Hmm. You're right, it's certainly possible that some users would like to
operate on each possible pair of input relations, rather than considering
the joinrel "as a whole". Maybe we need two hooks, one like your patch
and one like I suggested.

... But for joinrels, it's not so
easy to rule out, say, a hash-join here. Neither hook placement is
much good for that; the path you want to get rid of may have already
dominated paths you want to keep.

I don't particularly buy that line of argument. If a path has been
deleted because it was dominated by another, and you are unhappy about
that decision, then a hook of this sort is not the appropriate solution;
you need to be going and fixing the cost estimates that you think are
wrong. (This gets back to the point I keep making that I don't actually
believe you can do anything very useful with these hooks; anything
interesting is probably going to involve more fundamental changes to the
planner.)

I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from? Currently,
the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
called from add_paths_to_joinrel(). The patch at
/messages/by-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com
uses that to implement join pushdown in postgres_fdw; if you have A
JOIN B JOIN C all on server X, we'll notice that the join with A and B
can be turned into a foreign scan on A JOIN B, and similarly for A-C
and B-C. Then, if it turns out that the cheapest path for A-B is the
foreign join, and the cheapest path for C is a foreign scan, we'll
arrive at the idea of a foreign scan on A-B-C, and we'll realize the
same thing in each of the other combinations as well. So, eventually
the foreign join gets pushed down.

But this is in fact exactly the sort of case where you should not
rediscover all that over again for each pair of input rels. "Do all
these baserels belong to the same foreign server" is a question that need
only be considered once per joinrel. Not that that matters for this
hook, because as you say we're not doing foreign-server support through
this hook, but I think it's a fine example of why you'd want a single
call per joinrel.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#37)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Another bit of this that I think we could commit without fretting
about it too much is the code adding set_join_pathlist_hook. This is
- I think - analogous to set_rel_pathlist_hook, and like that hook,
could be used for other purposes than custom plan generation - e.g. to
delete paths we do not want to use. I've extracted this portion of
the patch and adjusted the comments; if there are no objections, I
will commit this bit also.

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list. It's true that if your
goal is to delete paths, it's probably best to be called just once
after the path list is complete, and there might be a use case for
that, but I guess it's less useful than for baserels. For a baserel,
as long as you don't nuke the sequential-scan path, there is always
going to be a way to complete the plan; so this would be a fine way to
implement a disable-an-index extension. But for joinrels, it's not so
easy to rule out, say, a hash-join here. Neither hook placement is
much good for that; the path you want to get rid of may have already
dominated paths you want to keep.

From the standpoint of extension development, I'm uncertain whether we
can easily reproduce information needed to compute alternative paths on
the hook at standard_join_search(), like a hook at add_paths_to_joinrel().

(Please correct me, if I misunderstood.)
For example, it is not obvious which path is inner/outer of the joinrel
on which custom-scan provider tries to add an alternative scan path.
Probably, extension needs to find out the path of source relations from
the join_rel_level[] array.
Also, how do we pull SpecialJoinInfo? It contains needed information to
identify required join-type (like JOIN_LEFT), however, extension needs
to search join_info_list by relids again, if hook is located at
standard_join_search().
Even if number of hook invocation is larger if it is located on
add_paths_to_joinrel(), it allows to design extensions simpler,
I think.

Suppose you want to add paths - e.g. you have an extension that goes
and looks for a materialized view that matches this subtree of the
query, and if it finds one, it substitutes a scan of the materialized
view for a scan of the baserel. Or, as in KaiGai's case, you have an
extension that can perform the whole join in GPU-land and produce the
same results we would have gotten via normal execution. Either way,
you want - and this is the central point of the whole patch here - to
inject a scan path into a joinrel. It is not altogether obvious to me
what the best placement for this is. In the materialized view case,
you probably need a perfect match between the baserels in the view and
the baserels in the joinrel to do anything. There's no point in
re-checking that for every innerrels/outerrels combination. I don't
know enough about the GPU case to reason about it intelligently; maybe
KaiGai can comment.

In case of GPU, extension will add alternative paths based on hash-join
and nested-loop algorithm with individual cost estimation as long as
device can execute join condition. It expects planner (set_cheapest)
will choose the best path in the built-in/additional ones.
So, it is more reasonable for me, if extension can utilize a common
infrastructure as built-in logic (hash-join/merge-join/nested-loop)
is using to compute its cost estimation.

But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)

Thoughts?

Do we need to pay attention on relids of joinrel, instead of innerpath
and outerpath? Yep, we might assume a path with join pushed-down has
cheaper cost than combination of two foreign-scan and a local join,
however, foreign-scan with join pushed-down may partially have
expensive cost.
In this case, either of hook location may be reasonable, because FDW
driver can check whether all the relids are foreign-scan path managed
by same foreign-server, or not, regardless of innerpath/outerpath.
Of course, it is a significant factor for extensions (including FDW
driver) whether hook allows to utilize a common infrastructure (like
SpecialJoinInfo or join restrictlist, ...).

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#39)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

The attached patch changed invocation order of GetForeignJoinPaths and
set_join_pathlist_hook, and adjusted documentation part on custom-scan.sgml.

Other portions are kept as previous version.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Sunday, March 15, 2015 11:38 AM
To: Robert Haas; Tom Lane
Cc: Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Another bit of this that I think we could commit without fretting
about it too much is the code adding set_join_pathlist_hook. This is
- I think - analogous to set_rel_pathlist_hook, and like that hook,
could be used for other purposes than custom plan generation - e.g. to
delete paths we do not want to use. I've extracted this portion of
the patch and adjusted the comments; if there are no objections, I
will commit this bit also.

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list. It's true that if your
goal is to delete paths, it's probably best to be called just once
after the path list is complete, and there might be a use case for
that, but I guess it's less useful than for baserels. For a baserel,
as long as you don't nuke the sequential-scan path, there is always
going to be a way to complete the plan; so this would be a fine way to
implement a disable-an-index extension. But for joinrels, it's not so
easy to rule out, say, a hash-join here. Neither hook placement is
much good for that; the path you want to get rid of may have already
dominated paths you want to keep.

From the standpoint of extension development, I'm uncertain whether we
can easily reproduce information needed to compute alternative paths on
the hook at standard_join_search(), like a hook at add_paths_to_joinrel().

(Please correct me, if I misunderstood.)
For example, it is not obvious which path is inner/outer of the joinrel
on which custom-scan provider tries to add an alternative scan path.
Probably, extension needs to find out the path of source relations from
the join_rel_level[] array.
Also, how do we pull SpecialJoinInfo? It contains needed information to
identify required join-type (like JOIN_LEFT), however, extension needs
to search join_info_list by relids again, if hook is located at
standard_join_search().
Even if number of hook invocation is larger if it is located on
add_paths_to_joinrel(), it allows to design extensions simpler,
I think.

Suppose you want to add paths - e.g. you have an extension that goes
and looks for a materialized view that matches this subtree of the
query, and if it finds one, it substitutes a scan of the materialized
view for a scan of the baserel. Or, as in KaiGai's case, you have an
extension that can perform the whole join in GPU-land and produce the
same results we would have gotten via normal execution. Either way,
you want - and this is the central point of the whole patch here - to
inject a scan path into a joinrel. It is not altogether obvious to me
what the best placement for this is. In the materialized view case,
you probably need a perfect match between the baserels in the view and
the baserels in the joinrel to do anything. There's no point in
re-checking that for every innerrels/outerrels combination. I don't
know enough about the GPU case to reason about it intelligently; maybe
KaiGai can comment.

In case of GPU, extension will add alternative paths based on hash-join
and nested-loop algorithm with individual cost estimation as long as
device can execute join condition. It expects planner (set_cheapest)
will choose the best path in the built-in/additional ones.
So, it is more reasonable for me, if extension can utilize a common
infrastructure as built-in logic (hash-join/merge-join/nested-loop)
is using to compute its cost estimation.

But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)

Thoughts?

Do we need to pay attention on relids of joinrel, instead of innerpath
and outerpath? Yep, we might assume a path with join pushed-down has
cheaper cost than combination of two foreign-scan and a local join,
however, foreign-scan with join pushed-down may partially have
expensive cost.
In this case, either of hook location may be reasonable, because FDW
driver can check whether all the relids are foreign-scan path managed
by same foreign-server, or not, regardless of innerpath/outerpath.
Of course, it is a significant factor for extensions (including FDW
driver) whether hook allows to utilize a common infrastructure (like
SpecialJoinInfo or join restrictlist, ...).

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

pgsql-v9.5-custom-join.v9.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v9.patchDownload
 doc/src/sgml/custom-scan.sgml           | 43 ++++++++++++++++++
 doc/src/sgml/fdwhandler.sgml            | 54 ++++++++++++++++++++++
 src/backend/commands/explain.c          | 15 +++++--
 src/backend/executor/execScan.c         |  4 ++
 src/backend/executor/nodeCustom.c       | 38 ++++++++++++----
 src/backend/executor/nodeForeignscan.c  | 34 +++++++++-----
 src/backend/foreign/foreign.c           | 32 ++++++++++---
 src/backend/nodes/bitmapset.c           | 57 +++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  5 +++
 src/backend/nodes/outfuncs.c            |  5 +++
 src/backend/optimizer/path/joinpath.c   | 31 +++++++++++++
 src/backend/optimizer/plan/createplan.c | 80 ++++++++++++++++++++++++++-------
 src/backend/optimizer/plan/setrefs.c    | 64 ++++++++++++++++++++++++++
 src/backend/optimizer/util/plancat.c    |  7 ++-
 src/backend/optimizer/util/relnode.c    | 14 ++++++
 src/backend/utils/adt/ruleutils.c       |  4 ++
 src/include/foreign/fdwapi.h            | 15 +++++++
 src/include/nodes/bitmapset.h           |  1 +
 src/include/nodes/plannodes.h           | 24 +++++++---
 src/include/nodes/relation.h            |  2 +
 src/include/optimizer/paths.h           | 13 ++++++
 src/include/optimizer/planmain.h        |  1 +
 22 files changed, 494 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 8a4a3df..b1400ae 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -48,6 +48,27 @@ extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
   </para>
 
   <para>
+   A custom scan provider will be also able to add paths by setting the
+   following hook, to replace built-in join paths by custom-scan that
+   performs as if a scan on preliminary joined relations, which us called
+   after the core code has generated what it believes to be the complete
+   and correct set of access paths for the join.
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+                                             RelOptInfo *joinrel,
+                                             RelOptInfo *outerrel,
+                                             RelOptInfo *innerrel,
+                                             List *restrictlist,
+                                             JoinType jointype,
+                                             SpecialJoinInfo *sjinfo,
+                                             SemiAntiJoinFactors *semifactors,
+                                             Relids param_source_rels,
+                                             Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+  </para>
+
+  <para>
     Although this hook function can be used to examine, modify, or remove
     paths generated by the core system, a custom scan provider will typically
     confine itself to generating <structname>CustomPath</> objects and adding
@@ -124,7 +145,9 @@ typedef struct CustomScan
     Scan      scan;
     uint32    flags;
     List     *custom_exprs;
+    List     *custom_ps_tlist;
     List     *custom_private;
+    List     *custom_relids;
     const CustomScanMethods *methods;
 } CustomScan;
 </programlisting>
@@ -141,10 +164,30 @@ typedef struct CustomScan
     is only used by the custom scan provider itself.  Plan trees must be able
     to be duplicated using <function>copyObject</>, so all the data stored
     within these two fields must consist of nodes that function can handle.
+    <literal>custom_relids</> is set by the backend, thus custom-scan provider
+    does not need to touch, to track underlying relations represented by this
+    custom-scan node.
     <structfield>methods</> must point to a (usually statically allocated)
     object implementing the required custom scan methods, which are further
     detailed below.
   </para>
+  <para>
+   In case when <structname>CustomScan</> replaced built-in join paths,
+   custom-scan provider must have two characteristic setup.
+   The first one is zero on the <structfield>scan.scanrelid</>, which
+   should be usually an index of range-tables. It informs the backend
+   this <structname>CustomScan</> node is not associated with a particular
+   table. The second one is valid list of <structname>TargetEntry</> on
+   the <structfield>custom_ps_tlist</>. A <structname>CustomScan</> node
+   looks to the backend like a scan as literal, but on a relation which is
+   the result of relations join. It means we cannot construct a tuple
+   descriptor based on table definition, thus custom-scan provider must
+   introduce the expected record-type of the tuples.
+   Tuple-descriptor of scan-slot shall be constructed based on the
+   <structfield>custom_ps_tlist</>, and assigned on executor initialization.
+   Also, referenced by <command>EXPLAIN</> to solve name of the underlying
+   columns and relations.
+  </para>
 
   <sect2 id="custom-scan-plan-callbacks">
    <title>Custom Scan Callbacks</title>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index c1daa4b..77477c8 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -598,6 +598,60 @@ IsForeignRelUpdatable (Relation rel);
 
    </sect2>
 
+   <sect2>
+    <title>FDW Routines for remote join</title>
+    <para>
+<programlisting>
+void
+GetForeignJoinPaths(PlannerInfo *root,
+                    RelOptInfo *joinrel,
+                    RelOptInfo *outerrel,
+                    RelOptInfo *innerrel,
+                    JoinType jointype,
+                    SpecialJoinInfo *sjinfo,
+                    SemiAntiJoinFactors *semifactors,
+                    List *restrictlist,
+                    Relids extra_lateral_rels);
+</programlisting>
+     Create possible access paths for a join of two foreign tables or
+     joined relations, but both of them needs to be managed with same
+     FDW driver.
+     This optional function is called during query planning.
+    </para>
+    <para>
+     This function allows FDW driver to add <literal>ForeignScan</> path
+     towards the supplied <literal>joinrel</>. From the standpoint of
+     query planner, it looks like scan-node is added for join-relation.
+     It means, <literal>ForeignScan</> path added instead of the built-in
+     local join logic has to generate tuples as if it scans on a joined
+     and materialized relations.
+    </para>
+    <para>
+     Usually, we expect FDW drivers issues a remote query that involves
+     tables join on remote side, then FDW driver fetches the joined result
+     on local side.
+     Unlike simple table scan, definition of slot descriptor of the joined
+     relations is determined on the fly, thus we cannot know its definition
+     from the system catalog.
+     So, FDW driver is responsible to introduce the query planner expected
+     form of the joined relations. In case when <literal>ForeignScan</>
+     replaced a relations join, <literal>scanrelid</> of the generated plan
+     node shall be zero, to mark this <literal>ForeignScan</> node is not
+     associated with a particular foreign tables.
+     Also, it need to construct pseudo scan tlist (<literal>fdw_ps_tlist</>)
+     to indicate expected tuple definition.
+    </para>
+    <para>
+     Once <literal>scanrelid</> equals zero, executor initializes the slot
+     for scan according to <literal>fdw_ps_tlist</>, but excludes junk
+     entries. This list is also used to solve the name of the original
+     relation and columns, so FDW can chains expression nodes which are
+     not run on local side actually, like a join clause to be executed on
+     the remote side, however, target-entries of them will have
+     <literal>resjunk=true</>.
+    </para>
+   </sect2>
+
    <sect2 id="fdw-callbacks-explain">
     <title>FDW Routines for <command>EXPLAIN</></title>
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a951c55..8892dca 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -730,11 +730,17 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
-		case T_ForeignScan:
-		case T_CustomScan:
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_ForeignScan:
+			*rels_used = bms_add_members(*rels_used,
+										 ((ForeignScan *) plan)->fdw_relids);
+			break;
+		case T_CustomScan:
+			*rels_used = bms_add_members(*rels_used,
+										 ((CustomScan *) plan)->custom_relids);
+			break;
 		case T_ModifyTable:
 			*rels_used = bms_add_member(*rels_used,
 									((ModifyTable *) plan)->nominalRelation);
@@ -1072,9 +1078,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..2f18a8a 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0 &&
+			 (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+		varno = INDEX_VAR;
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..2344129 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
+	Index				scan_relid = cscan->scan.scanrelid;
 	Relation			scan_rel;
 
 	/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,31 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +109,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..542d176 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
+	Index		scanrelid = node->scan.scanrelid;
 	Relation	currentRelation;
 	FdwRoutine *fdwroutine;
 
@@ -141,16 +142,28 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +174,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +206,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..df69a95 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -302,13 +302,12 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +349,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,6 +408,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+	return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
 
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index a9c3b4b..4dc3286 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -301,6 +301,63 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 }
 
 /*
+ * bms_shift_members - move all the bits by shift
+ */
+Bitmapset *
+bms_shift_members(const Bitmapset *a, int shift)
+{
+	Bitmapset  *b;
+	bitmapword	h_word;
+	bitmapword	l_word;
+	int			nwords;
+	int			w_shift;
+	int			b_shift;
+	int			i, j;
+
+	/* fast path if result shall be NULL obviously */
+	if (a == NULL || a->nwords * BITS_PER_BITMAPWORD + shift <= 0)
+		return NULL;
+	/* actually, not shift members */
+	if (shift == 0)
+		return bms_copy(a);
+
+	nwords = (a->nwords * BITS_PER_BITMAPWORD + shift +
+			  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+	b = palloc(BITMAPSET_SIZE(nwords));
+	b->nwords = nwords;
+
+	if (shift > 0)
+	{
+		/* Left shift */
+		w_shift = WORDNUM(shift);
+		b_shift = BITNUM(shift);
+
+		for (i=0, j=-w_shift; i < b->nwords; i++, j++)
+		{
+			h_word = (j >= 0   && j   < a->nwords ? a->words[j] : 0);
+			l_word = (j-1 >= 0 && j-1 < a->nwords ? a->words[j-1] : 0);
+			b->words[i] = ((h_word << b_shift) |
+						   (l_word >> (BITS_PER_BITMAPWORD - b_shift)));
+		}
+	}
+	else
+	{
+		/* Right shift */
+		w_shift = WORDNUM(-shift);
+		b_shift = BITNUM(-shift);
+
+		for (i=0, j=-w_shift; i < b->nwords; i++, j++)
+		{
+			h_word = (j+1 >= 0 && j+1 < a->nwords ? a->words[j+1] : 0);
+			l_word = (j >= 0 && j < a->nwords ? a->words[j] : 0);
+			b->words[i] = ((h_word >> (BITS_PER_BITMAPWORD - b_shift)) |
+						   (l_word << b_shift));
+		}
+	}
+	return b;
+}
+
+/*
  * bms_is_subset - is A a subset of B?
  */
 bool
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 3c6a964..5be5a7d 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -592,8 +592,11 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
+	COPY_BITMAPSET_FIELD(fdw_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
 	return newnode;
@@ -617,7 +620,9 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
+	COPY_BITMAPSET_FIELD(custom_relids);
 
 	/*
 	 * NOTE: The method field of CustomScan is required to be a pointer to a
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 385b289..a178132 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -558,8 +558,11 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
+	WRITE_BITMAPSET_FIELD(fdw_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
 
@@ -572,7 +575,9 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
+	WRITE_BITMAPSET_FIELD(custom_relids);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
 	if (node->methods->TextOutCustomScan)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1da953f..ffcd857 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,10 +17,13 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -260,6 +263,34 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDWs when both outer and inner relations are
+	 * managed by same foreign-data wrapper.  Matching of foreign server and/or
+	 * checkAsUser should be checked in GetForeignJoinPaths by the FDW.
+	 */
+	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPaths)
+	{
+		joinrel->fdwroutine->GetForeignJoinPaths(root,
+												 joinrel,
+												 outerrel,
+												 innerrel,
+												 jointype,
+												 sjinfo,
+												 &semifactors,
+												 restrictlist,
+												 extra_lateral_rels);
+	}
+
+	/*
+	 * 6. Consider paths added by custom-scan providers, or other extensions
+	 * in addition to the built-in paths.
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cb69c03..7f86fcb 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1961,16 +1960,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1985,13 +1994,37 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
+	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+	 * ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (scan_plan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, scan_plan->fdw_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+	/* Set the relids that are represented by this foreign scan for Explain */
+	scan_plan->fdw_relids = best_path->path.parent->relids;
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2053,12 +2086,7 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
-
-	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
+	ListCell   *lc;
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2078,6 +2106,28 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	Assert(IsA(cplan, CustomScan));
 
 	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the custom_ps_tlist, excluding resjunk=true, so we need
+	 * to ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (cplan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, cplan->custom_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+	/* Set the relids that are represented by this custom scan for Explain */
+	cplan->custom_relids = best_path->path.parent->relids;
+
+	/*
 	 * Copy cost data from Path to Plan; no need to make custom-plan providers
 	 * do this
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..2961f44 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -568,6 +568,38 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (rtoffset > 0)
+					splan->fdw_relids =
+						bms_shift_members(splan->fdw_relids, rtoffset);
+
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -582,6 +614,38 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (rtoffset > 0)
+					splan->custom_relids =
+						bms_shift_members(splan->custom_relids, rtoffset);
+
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 313a5c1..1c570c8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -378,10 +378,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerForRelation(relation);
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..5623566 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -122,6 +123,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->fdwroutine = NULL;
+	rel->fdw_handler = InvalidOid;
 	rel->fdw_private = NULL;
 	rel->baserestrictinfo = NIL;
 	rel->baserestrictcost.startup = 0;
@@ -427,6 +429,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 28e1acf..90e1107 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3842,6 +3842,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..5a8bd39 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,16 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
+typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,
+											   RelOptInfo *joinrel,
+											   RelOptInfo *outerrel,
+											   RelOptInfo *innerrel,
+											   JoinType jointype,
+											   SpecialJoinInfo *sjinfo,
+											   SemiAntiJoinFactors *semifactors,
+											   List *restrictlist,
+											   Relids extra_lateral_rels);
+
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
 
@@ -150,6 +160,10 @@ typedef struct FdwRoutine
 
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
+
+	/* Support functions for join push-down */
+	GetForeignJoinPaths_function GetForeignJoinPaths;
+
 } FdwRoutine;
 
 
@@ -157,6 +171,7 @@ typedef struct FdwRoutine
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid	GetFdwHandlerForRelation(Relation relation);
 extern bool IsImportableForeignTable(const char *tablename,
 						 ImportForeignSchemaStmt *stmt);
 
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 3a556ee..3ca9791 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -66,6 +66,7 @@ extern void bms_free(Bitmapset *a);
 extern Bitmapset *bms_union(const Bitmapset *a, const Bitmapset *b);
 extern Bitmapset *bms_intersect(const Bitmapset *a, const Bitmapset *b);
 extern Bitmapset *bms_difference(const Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_shift_members(const Bitmapset *a, int shift);
 extern bool bms_is_subset(const Bitmapset *a, const Bitmapset *b);
 extern BMS_Comparison bms_subset_compare(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_is_member(int x, const Bitmapset *a);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21cbfa8..b25330e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -471,7 +471,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -480,18 +486,23 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
+	Bitmapset  *fdw_relids;		/* set of relid (index of range-tables)
+								 * represented by this node */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -512,7 +523,10 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
+	Bitmapset  *custom_relids;	/* set of relid (index of range-tables)
+								 * represented by this node */
 	const CustomScanMethods *methods;
 } CustomScan;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 334cf51..4eb89c6 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fa72918..0c8cbcd 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
#41Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#39)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sat, Mar 14, 2015 at 10:37 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

From the standpoint of extension development, I'm uncertain whether we
can easily reproduce information needed to compute alternative paths on
the hook at standard_join_search(), like a hook at add_paths_to_joinrel().

(Please correct me, if I misunderstood.)
For example, it is not obvious which path is inner/outer of the joinrel
on which custom-scan provider tries to add an alternative scan path.

That's a problem for the GPU-join use case, where you are essentially
trying to add new join types to the system. But it's NOT a problem if
what you're actually trying to do is substitute a *scan* from a
*join*. If you're going to produce the join output by scanning a
materialized view, or by scanning the results of a query pushed down
to a foreign server, you don't need to divide the rels into inner rels
and outer rels; indeed, any such division would be artificial. You
just need to generate a query that produces the right answer *for the
entire joinrel* and push it down.

I'd really like to hear what the folks who care about FDW join
pushdown think about this hook placement issue.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Robert Haas (#37)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-03-14 7:18 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from? Currently,
the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
called from add_paths_to_joinrel(). The patch at
/messages/by-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com
uses that to implement join pushdown in postgres_fdw; if you have A
JOIN B JOIN C all on server X, we'll notice that the join with A and B
can be turned into a foreign scan on A JOIN B, and similarly for A-C
and B-C. Then, if it turns out that the cheapest path for A-B is the
foreign join, and the cheapest path for C is a foreign scan, we'll
arrive at the idea of a foreign scan on A-B-C, and we'll realize the
same thing in each of the other combinations as well. So, eventually
the foreign join gets pushed down.

From the viewpoint of postgres_fdw, incremental approach seemed
natural way, although postgres_fdw should consider paths in pathlist
in additon to cheapest one as you mentioned in another thread. This
approarch allows FDW to use SQL statement generated for underlying
scans as parts of FROM clause, as postgres_fdw does in the join
push-down patch.

But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)

Interesting, I overlooked that pattern. As you pointed out, join
between big foregin tables might be dominated, perhaps by a MergeJoin
path. Leaving dominated ForeignPath in pathlist for more optimization
in the future (in higher join level) is an idea, but it would make
planning time longer (and use more cycle and memory).

Tom's idea sounds good for saving the path b), but I worry that
whether FDW can get enough information at that timing, just before
set_cheapest. It would not be good I/F if each FDW needs to copy many
code form joinrel.c...

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#42)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

-----Original Message-----
From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
Sent: Monday, March 16, 2015 9:59 PM
To: Robert Haas
Cc: Tom Lane; Thom Brown; Kaigai Kouhei(海外 浩平); pgsql-hackers@postgreSQL.org
Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom
Plan API)

2015-03-14 7:18 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from? Currently,
the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
called from add_paths_to_joinrel(). The patch at

/messages/by-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9
ztFZG7zN64Xw@mail.gmail.com

uses that to implement join pushdown in postgres_fdw; if you have A
JOIN B JOIN C all on server X, we'll notice that the join with A and B
can be turned into a foreign scan on A JOIN B, and similarly for A-C
and B-C. Then, if it turns out that the cheapest path for A-B is the
foreign join, and the cheapest path for C is a foreign scan, we'll
arrive at the idea of a foreign scan on A-B-C, and we'll realize the
same thing in each of the other combinations as well. So, eventually
the foreign join gets pushed down.

From the viewpoint of postgres_fdw, incremental approach seemed
natural way, although postgres_fdw should consider paths in pathlist
in additon to cheapest one as you mentioned in another thread. This
approarch allows FDW to use SQL statement generated for underlying
scans as parts of FROM clause, as postgres_fdw does in the join
push-down patch.

But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)

Interesting, I overlooked that pattern. As you pointed out, join
between big foregin tables might be dominated, perhaps by a MergeJoin
path. Leaving dominated ForeignPath in pathlist for more optimization
in the future (in higher join level) is an idea, but it would make
planning time longer (and use more cycle and memory).

Tom's idea sounds good for saving the path b), but I worry that
whether FDW can get enough information at that timing, just before
set_cheapest. It would not be good I/F if each FDW needs to copy many
code form joinrel.c...

I had a call to discuss this topic with Hanada-san. Even though he
expected FDW driver needs to check and extract relations involved
in a particular join, it also means we have less problem as long as
core backend can handle these common portion for all FDW/CSP drivers.
Thus, we need care about two hook locations. The first one is
add_paths_to_joinrel() as current patch doing, for custom-scan that
adds an alternative join logic and takes underlying child nodes as
input. The other one is standard_join_search() as Tom pointed out,
for foreign-scan of remote join, or for custom-scan that replaces
an entire join subtree.

One positive aspect of this approach is, postgres_fdw can handle
whole-row-reference much simpler than bottom-up approach, according
to Hanada-san.

Remaining issue is, how to implement the core portion that extracts
relations in a particular join, and to identify join type to be
applied on a particular relations.
One rough idea is, we pull relids bitmap from the target joinrel,
then references the SpecialJoinInfo with identical union bitmap
of left/righthand. It allows to inform FDW driver which relations
and which another relations shall be joined in this level.
For example, if relids=0x007 and relids=0x0018 are left joined,
PlannerInfo shall have a SpecialJoinInfo that fits the requirement.
Also, both of left/right side is not singleton, FDW driver will
takes recursive process to construct remote join query on relids=0x007
and relids=0x0018. If all of them are inner-join, we don't need to
take care about this. All FDW driver needs to do is, just putting
the involved relation names in FROM-clause.

It is my rough idea, thus, here may be better idea to extract
relations involved in a particular join on a certain level.
Please tell me, if you have some other ideas.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#37)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 2015/03/14 7:18, Robert Haas wrote:

I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from?

I haven't had enough time to review the patch in details yet, so I don't
know where we should call the method, but I'd vote for the idea of
substituting a scan for a join, because I think that idea would probably
allow update pushdown, which I'm proposing in the current CF, to scale
up to handling a pushed-down update on a join.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Robert Haas (#37)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sat, Mar 14, 2015 at 3:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Another bit of this that I think we could commit without fretting
about it too much is the code adding set_join_pathlist_hook. This is
- I think - analogous to set_rel_pathlist_hook, and like that hook,
could be used for other purposes than custom plan generation - e.g. to
delete paths we do not want to use. I've extracted this portion of
the patch and adjusted the comments; if there are no objections, I
will commit this bit also.

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list. It's true that if your
goal is to delete paths, it's probably best to be called just once
after the path list is complete, and there might be a use case for
that, but I guess it's less useful than for baserels. For a baserel,
as long as you don't nuke the sequential-scan path, there is always
going to be a way to complete the plan; so this would be a fine way to
implement a disable-an-index extension. But for joinrels, it's not so
easy to rule out, say, a hash-join here. Neither hook placement is
much good for that; the path you want to get rid of may have already
dominated paths you want to keep.

Suppose you want to add paths - e.g. you have an extension that goes
and looks for a materialized view that matches this subtree of the
query, and if it finds one, it substitutes a scan of the materialized
view for a scan of the baserel. Or, as in KaiGai's case, you have an
extension that can perform the whole join in GPU-land and produce the
same results we would have gotten via normal execution. Either way,
you want - and this is the central point of the whole patch here - to
inject a scan path into a joinrel. It is not altogether obvious to me
what the best placement for this is. In the materialized view case,
you probably need a perfect match between the baserels in the view and
the baserels in the joinrel to do anything. There's no point in
re-checking that for every innerrels/outerrels combination. I don't
know enough about the GPU case to reason about it intelligently; maybe
KaiGai can comment.

I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from? Currently,
the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
called from add_paths_to_joinrel(). The patch at

/messages/by-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com
uses that to implement join pushdown in postgres_fdw; if you have A
JOIN B JOIN C all on server X, we'll notice that the join with A and B
can be turned into a foreign scan on A JOIN B, and similarly for A-C
and B-C. Then, if it turns out that the cheapest path for A-B is the
foreign join, and the cheapest path for C is a foreign scan, we'll
arrive at the idea of a foreign scan on A-B-C, and we'll realize the
same thing in each of the other combinations as well. So, eventually
the foreign join gets pushed down.

But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)

The real problem here, is that with FDW in picture, the "optimal
substructure" property required by dynamic programming is broken. If A
foreign join B foreign join C is optimal solution for problem A join B join
C, A foreign join B is not necessarily optimal solution for subproblem A
join B. While for local relations, PostgreSQL has to compute each two way
join itself, and thus chooses the cheapest path for each two way join, FDW
(esp. those working with real foreign servers) do not compute the joins in
two-way fashion and don't need to choose the cheapest path for each two way
join.

A way to work around this is to leave the ForeignPaths (there can possibly
be only one foreign path per join relation) in the joinrel without removing
them. FDW should work on joining two relations if they have foreign paths
in the list of paths, irrespective of whether the cheapest path is foreign
join path or not. For the topmost joinrel, if the foreign path happens to
be the cheapest one, the whole join tree will be pushed down.

On the other thread implementing foreign join for postgres_fdw,
postgresGetForeignJoinPaths(), is just looking at the cheapest path, which
would cause the problem you have described above.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#46Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Shigeru Hanada (#42)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Tue, Mar 17, 2015 at 10:28 AM, Shigeru Hanada <shigeru.hanada@gmail.com>
wrote:

2015-03-14 7:18 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from? Currently,
the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
called from add_paths_to_joinrel(). The patch at

/messages/by-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com

uses that to implement join pushdown in postgres_fdw; if you have A
JOIN B JOIN C all on server X, we'll notice that the join with A and B
can be turned into a foreign scan on A JOIN B, and similarly for A-C
and B-C. Then, if it turns out that the cheapest path for A-B is the
foreign join, and the cheapest path for C is a foreign scan, we'll
arrive at the idea of a foreign scan on A-B-C, and we'll realize the
same thing in each of the other combinations as well. So, eventually
the foreign join gets pushed down.

From the viewpoint of postgres_fdw, incremental approach seemed
natural way, although postgres_fdw should consider paths in pathlist
in additon to cheapest one as you mentioned in another thread. This
approarch allows FDW to use SQL statement generated for underlying
scans as parts of FROM clause, as postgres_fdw does in the join
push-down patch.

But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)

Interesting, I overlooked that pattern. As you pointed out, join
between big foregin tables might be dominated, perhaps by a MergeJoin
path. Leaving dominated ForeignPath in pathlist for more optimization
in the future (in higher join level) is an idea, but it would make
planning time longer (and use more cycle and memory).

Tom's idea sounds good for saving the path b), but I worry that
whether FDW can get enough information at that timing, just before
set_cheapest. It would not be good I/F if each FDW needs to copy many
code form joinrel.c...

Even I have the same concern. A simple joinrel doesn't contain much
information about the individual two way joins involved in it, so FDW may
not be able to construct a query (or execution plan) and hence judge
whether a join is pushable or not, just by looking at the joinrel. There
will be a lot of code duplication to reconstruct that information, within
the FDW code.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#47Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Ashutosh Bapat (#45)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sat, Mar 14, 2015 at 3:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Another bit of this that I think we could commit without fretting
about it too much is the code adding set_join_pathlist_hook. This

is

- I think - analogous to set_rel_pathlist_hook, and like that hook,
could be used for other purposes than custom plan generation - e.g.

to

delete paths we do not want to use. I've extracted this portion of
the patch and adjusted the comments; if there are no objections, I
will commit this bit also.

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list. It's true that if your
goal is to delete paths, it's probably best to be called just once
after the path list is complete, and there might be a use case for
that, but I guess it's less useful than for baserels. For a baserel,
as long as you don't nuke the sequential-scan path, there is always
going to be a way to complete the plan; so this would be a fine way to
implement a disable-an-index extension. But for joinrels, it's not so
easy to rule out, say, a hash-join here. Neither hook placement is
much good for that; the path you want to get rid of may have already
dominated paths you want to keep.

Suppose you want to add paths - e.g. you have an extension that goes
and looks for a materialized view that matches this subtree of the
query, and if it finds one, it substitutes a scan of the materialized
view for a scan of the baserel. Or, as in KaiGai's case, you have an
extension that can perform the whole join in GPU-land and produce the
same results we would have gotten via normal execution. Either way,
you want - and this is the central point of the whole patch here - to
inject a scan path into a joinrel. It is not altogether obvious to me
what the best placement for this is. In the materialized view case,
you probably need a perfect match between the baserels in the view and
the baserels in the joinrel to do anything. There's no point in
re-checking that for every innerrels/outerrels combination. I don't
know enough about the GPU case to reason about it intelligently; maybe
KaiGai can comment.

I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from? Currently,
the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
called from add_paths_to_joinrel(). The patch at
/messages/by-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbf
qF82YSb9ztFZG7zN64Xw@mail.gmail.com
uses that to implement join pushdown in postgres_fdw; if you have A
JOIN B JOIN C all on server X, we'll notice that the join with A and B
can be turned into a foreign scan on A JOIN B, and similarly for A-C
and B-C. Then, if it turns out that the cheapest path for A-B is the
foreign join, and the cheapest path for C is a foreign scan, we'll
arrive at the idea of a foreign scan on A-B-C, and we'll realize the
same thing in each of the other combinations as well. So, eventually
the foreign join gets pushed down.

But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)

The real problem here, is that with FDW in picture, the "optimal substructure"
property required by dynamic programming is broken. If A foreign join B foreign
join C is optimal solution for problem A join B join C, A foreign join B is not
necessarily optimal solution for subproblem A join B. While for local relations,
PostgreSQL has to compute each two way join itself, and thus chooses the cheapest
path for each two way join, FDW (esp. those working with real foreign servers)
do not compute the joins in two-way fashion and don't need to choose the cheapest
path for each two way join.

I cannot agree 100% because we cannot know whether A foreign join B foreign
join C is optimal than A join B join C. For example, if (A x B) is estimated
to generate O(N) rows but (A x B) x C is estimated to generate O(N x M) rows,
local join may be optimal to process the final stage.
Even if N-way remote join might be possible, we need to estimate the cost of
remote join for each level, and make a decision whether it shall be pushed-
down to the remote server based on the estimated cost.
The hooks location Tom suggested requires FDW to compute a foreign-scan path
for each joinrel during concentration of join combinations, but not multiple
times for each joinrel.

A way to work around this is to leave the ForeignPaths (there can possibly be
only one foreign path per join relation) in the joinrel without removing them.
FDW should work on joining two relations if they have foreign paths in the list
of paths, irrespective of whether the cheapest path is foreign join path or not.
For the topmost joinrel, if the foreign path happens to be the cheapest one, the
whole join tree will be pushed down.

On the other thread implementing foreign join for postgres_fdw,
postgresGetForeignJoinPaths(), is just looking at the cheapest path, which would
cause the problem you have described above.

It might be an idea if foreign-scan path is not wiped out regardless of the
estimated cost, we will be able to construct an entirely remote-join path
even if intermediation path is expensive than local join.
A problem is, how to distinct these special paths from usual paths that are
eliminated on the previous stage once its path is more expensive.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#47)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Tue, Mar 17, 2015 at 10:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

A way to work around this is to leave the ForeignPaths (there can possibly be
only one foreign path per join relation) in the joinrel without removing them.
FDW should work on joining two relations if they have foreign paths in the list
of paths, irrespective of whether the cheapest path is foreign join path or not.
For the topmost joinrel, if the foreign path happens to be the cheapest one, the
whole join tree will be pushed down.

On the other thread implementing foreign join for postgres_fdw,
postgresGetForeignJoinPaths(), is just looking at the cheapest path, which would
cause the problem you have described above.

It might be an idea if foreign-scan path is not wiped out regardless of the
estimated cost, we will be able to construct an entirely remote-join path
even if intermediation path is expensive than local join.
A problem is, how to distinct these special paths from usual paths that are
eliminated on the previous stage once its path is more expensive.

Any solution that is based on not eliminating paths that would
otherwise be discarded based on cost seems to me to be unlikely to be
feasible. We can't complicate the core path-cost-comparison stuff for
the convenience of FDW or custom-scan pushdown.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Robert Haas (#48)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Tue, Mar 17, 2015 at 8:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 17, 2015 at 10:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
wrote:

A way to work around this is to leave the ForeignPaths (there can

possibly be

only one foreign path per join relation) in the joinrel without

removing them.

FDW should work on joining two relations if they have foreign paths in

the list

of paths, irrespective of whether the cheapest path is foreign join

path or not.

For the topmost joinrel, if the foreign path happens to be the cheapest

one, the

whole join tree will be pushed down.

On the other thread implementing foreign join for postgres_fdw,
postgresGetForeignJoinPaths(), is just looking at the cheapest path,

which would

cause the problem you have described above.

It might be an idea if foreign-scan path is not wiped out regardless of

the

estimated cost, we will be able to construct an entirely remote-join path
even if intermediation path is expensive than local join.
A problem is, how to distinct these special paths from usual paths that

are

eliminated on the previous stage once its path is more expensive.

Any solution that is based on not eliminating paths that would
otherwise be discarded based on cost seems to me to be unlikely to be
feasible. We can't complicate the core path-cost-comparison stuff for
the convenience of FDW or custom-scan pushdown.

We already have a precedence here. We cache different cheapest paths e.g
439 struct Path *cheapest_startup_path;
440 struct Path *cheapest_total_path;
441 struct Path *cheapest_unique_path;
442 List *cheapest_parameterized_paths;

All we have to do is add yet another there "cheapest_foreign_path" which
can be NULL like cheapest_unique_path.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#48)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Mar 17, 2015 at 10:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It might be an idea if foreign-scan path is not wiped out regardless of the
estimated cost, we will be able to construct an entirely remote-join path
even if intermediation path is expensive than local join.
A problem is, how to distinct these special paths from usual paths that are
eliminated on the previous stage once its path is more expensive.

Any solution that is based on not eliminating paths that would
otherwise be discarded based on cost seems to me to be unlikely to be
feasible. We can't complicate the core path-cost-comparison stuff for
the convenience of FDW or custom-scan pushdown.

I concur. I'm not even so worried about the cost of add_path as such;
the real problem with not discarding paths as aggressively as possible
is that it will result in a combinatorial explosion in the number of
path combinations that have to be examined at higher join levels.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#50)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Mar 17, 2015 at 10:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It might be an idea if foreign-scan path is not wiped out regardless of the
estimated cost, we will be able to construct an entirely remote-join path
even if intermediation path is expensive than local join.
A problem is, how to distinct these special paths from usual paths that are
eliminated on the previous stage once its path is more expensive.

Any solution that is based on not eliminating paths that would
otherwise be discarded based on cost seems to me to be unlikely to be
feasible. We can't complicate the core path-cost-comparison stuff for
the convenience of FDW or custom-scan pushdown.

I concur. I'm not even so worried about the cost of add_path as such;
the real problem with not discarding paths as aggressively as possible
is that it will result in a combinatorial explosion in the number of
path combinations that have to be examined at higher join levels.

I'm inclined to agree. It is also conclusion of the discussion with Hanada-san
yesterday, due to the number of paths to be considered and combination problems,
as you mentioned above.

So, overall consensus for the FDW hook location is just before the set_chepest()
at standard_join_search() and merge_clump(), isn't it?
Let me make a design of FDW hook to reduce code duplications for each FDW driver,
especially, to identify baserel/joinrel to be involved in this join.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#51)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

So, overall consensus for the FDW hook location is just before the set_chepest()
at standard_join_search() and merge_clump(), isn't it?

Yes, I think so.

Let me make a design of FDW hook to reduce code duplications for each FDW driver,
especially, to identify baserel/joinrel to be involved in this join.

Great, thanks!

One issue, which I think Ashutosh alluded to upthread, is that we need
to make sure it's not unreasonably difficult for foreign data wrappers
to construct the FROM clause of an SQL query to be pushed down to the
remote side. It should be simple when there are only inner joins
involved, but when there are all outer joins it might be a bit
complex. It would be very good if someone could try to write that
code, based on the new hook locations, and see how it turns out, so
that we can figure out how to address any issues that may crop up
there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#52)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

So, overall consensus for the FDW hook location is just before the set_chepest()
at standard_join_search() and merge_clump(), isn't it?

Yes, I think so.

Let me make a design of FDW hook to reduce code duplications for each FDW driver,
especially, to identify baserel/joinrel to be involved in this join.

Great, thanks!

One issue, which I think Ashutosh alluded to upthread, is that we need
to make sure it's not unreasonably difficult for foreign data wrappers
to construct the FROM clause of an SQL query to be pushed down to the
remote side. It should be simple when there are only inner joins
involved, but when there are all outer joins it might be a bit
complex. It would be very good if someone could try to write that
code, based on the new hook locations, and see how it turns out, so
that we can figure out how to address any issues that may crop up
there.

Here is an idea that provides a common utility function that break down
the supplied RelOptInfo of joinrel into a pair of join-type and a list of
baserel/joinrel being involved in the relations join. It intends to be
called by FDW driver to list up underlying relations.
IIUC, root->join_info_list will provide information of how relations are
combined to the upper joined relations, thus, I expect it is not
unreasonably complicated way to solve.
Once a RelOptInfo of the target joinrel is broken down into multiple sub-
relations (N>=2 if all inner join, elsewhere N=2), FDW driver can
reference the RestrictInfo to be used in relations join.

Anyway, I'll try to investigate the existing code for more detail today,
to clarify whether the above approach is feasible.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#53)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Wed, Mar 18, 2015 at 9:33 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

So, overall consensus for the FDW hook location is just before the set_chepest()
at standard_join_search() and merge_clump(), isn't it?

Yes, I think so.

Let me make a design of FDW hook to reduce code duplications for each FDW driver,
especially, to identify baserel/joinrel to be involved in this join.

Great, thanks!

One issue, which I think Ashutosh alluded to upthread, is that we need
to make sure it's not unreasonably difficult for foreign data wrappers
to construct the FROM clause of an SQL query to be pushed down to the
remote side. It should be simple when there are only inner joins
involved, but when there are all outer joins it might be a bit
complex. It would be very good if someone could try to write that
code, based on the new hook locations, and see how it turns out, so
that we can figure out how to address any issues that may crop up
there.

Here is an idea that provides a common utility function that break down
the supplied RelOptInfo of joinrel into a pair of join-type and a list of
baserel/joinrel being involved in the relations join. It intends to be
called by FDW driver to list up underlying relations.
IIUC, root->join_info_list will provide information of how relations are
combined to the upper joined relations, thus, I expect it is not
unreasonably complicated way to solve.
Once a RelOptInfo of the target joinrel is broken down into multiple sub-
relations (N>=2 if all inner join, elsewhere N=2), FDW driver can
reference the RestrictInfo to be used in relations join.

Anyway, I'll try to investigate the existing code for more detail today,
to clarify whether the above approach is feasible.

Sounds good. Keep in mind that, while the parse tree will obviously
reflect the way that the user actually specified the join
syntactically, it's not the job of the join_info_list to make it
simple to reconstruct that information. To the contrary,
join_info_list is supposed to be structured in a way that makes it
easy to determine whether *a particular join order is one of the legal
join orders* not *whether it is the specific join order selected by
the user*. See join_is_legal().

For FDW pushdown, I think it's sufficient to be able to identify *any
one* legal join order, not necessarily the same order the user
originally entered. For exampe, if the user entered A LEFT JOIN B ON
A.x = B.x LEFT JOIN C ON A.y = C.y and the FDW generates a query that
instead does A LEFT JOIN C ON a.y = C.y LEFT JOIN B ON A.x = B.x, I
suspect that's just fine. Particular FDWs might wish to try to be
smart about what they emit based on knowledge of what the remote
side's optimizer is likely to do, and that's fine. If the remote side
is PostgreSQL, it shouldn't matter much.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#38)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, Mar 13, 2015 at 8:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list.

Hmm. You're right, it's certainly possible that some users would like to
operate on each possible pair of input relations, rather than considering
the joinrel "as a whole". Maybe we need two hooks, one like your patch
and one like I suggested.

Let me attempt to summarize subsequent discussion on this thread by
saying the hook location that you proposed (just before set_cheapest)
has not elicited any enthusiasm from anyone else. In a nutshell, the
problem is that a single callback for a large join problem is just
fine if there are no special joins involved, but in any other
scenario, nobody knows how to use a hook at that location for anything
useful. To push down a join to the remote server, you've got to
figure out how to emit an SQL query for it. To execute it with a
custom join strategy, you've got to know which of those joins should
have inner join semantics vs. left join semantics. A hook/callback in
make_join_rel() or in add_paths_to_joinrel() makes that relatively
straightforward. Otherwise, it's not clear what to do, short of
copy-and-pasting join_search_one_level(). If you have a suggestion,
I'd like to hear it.

If not, I'm going to press forward with the idea of putting the
relevant logic in either add_paths_to_joinrel(), as previously
proposed, or perhaps up oe level in make_one_rel(). Either way, if
you don't need to be called multiple times per joinrel, you can stash
a flag inside whatever you hang off of the joinrel's fdw_private and
return immediately on every call after the first. I think that's
cheap enough that we shouldn't get too stressed about it: for FDWs, we
only call the hook at all if everything in the joinrel uses the same
FDW, so it won't get called at all except for joinrels where it's
likely to win big; for custom joins, multiple calls are quite likely
to be useful and necessary, and if the hook burns too much CPU time
for the query performance you get out of it, that's the custom-join
provider's fault, not ours. The current patch takes this approach one
step further and attempts FDW pushdown only once per joinrel. It does
that because, while postgres_fdw DOES need the jointype and a valid
innerrel/outerrel breakdown to figure out what query to generate, it
does NOT every possible breakdown; rather, the first one is as good as
any other. But this might not be true for a non-PostgreSQL remote
database. So I think it's better to call the hook every time and let
the hook return without doing anything if it wants.

I'm still not totally sure whether make_one_rel() is better than
add_paths_to_joinrel(). The current patch attempts to split the
difference by doing FDW pushdown from make_one_rel() and custom joins
from add_paths_to_joinrel(). I dunno why; if possible, those two
things should happen in the same place. Doing it in make_one_rel()
makes for fewer arguments and fewer repetitive calls, but that's not
much good if you would have had a use for the extra arguments that
aren't computed until we get down to add_paths_to_joinrel(). I'm not
sure whether that's the case or not. The latest version of the
postgres_fdw patch doesn't seem to mind not having extra_lateral_rels,
but I'm wondering if that's busted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#55)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Hi Robert,

Thanks for your comments.

A few random cosmetic problems:

- The hunk in allpaths.c is useless.
- The first hunk in fdwapi.h contains an extra space before the
closing parenthesis.

OK, it's my oversight.

And then:

+       else if (scan->scanrelid == 0 &&
+                        (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+               varno = INDEX_VAR;

Suppose scan->scanrelid == 0 but the scan type is something else? Is
that legal? Is varno == 0 the correct outcome in that case?

Right now, no other scan type has capability to return a tuples
with flexible type/attributes more than static definition.
I think it is a valid restriction that only foreign/custom-scan
can have scanrelid == 0.

I checked overall code again. One point doubtful was ExecScanFetch().
If estate->es_epqTuple is not NULL, it tries to save a tuple from
a particular scanrelid (larger than zero).
IIUC, es_epqTuple is used only when fetched tuple is updated then
visibility checks are applied on writer operation again.
So, it should work for CPS with underlying actual scan node on base
relations, however, I need code investigation if FDW/CSP replaced
an entire join subtree by an alternative relation scan (like a
materialized view).

[ new patch ]

A little more nitpicking:

ExecInitForeignScan() and ExecInitCustomScan() could declare
currentRelation inside the if (scanrelid > 0) block instead of in the
outer scope.

OK,

I'm not too excited about the addition of GetFdwHandlerForRelation,
which is a one-line function used in one place. It seems like we
don't really need that.

OK,

On Fri, Mar 13, 2015 at 8:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't object to the concept, but I think that is a pretty bad place
to put the hook call: add_paths_to_joinrel is typically called multiple
(perhaps *many*) times per joinrel and thus this placement would force
any user of the hook to do a lot of repetitive work.

Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list.

Hmm. You're right, it's certainly possible that some users would like to
operate on each possible pair of input relations, rather than considering
the joinrel "as a whole". Maybe we need two hooks, one like your patch
and one like I suggested.

Let me attempt to summarize subsequent discussion on this thread by
saying the hook location that you proposed (just before set_cheapest)
has not elicited any enthusiasm from anyone else. In a nutshell, the
problem is that a single callback for a large join problem is just
fine if there are no special joins involved, but in any other
scenario, nobody knows how to use a hook at that location for anything
useful. To push down a join to the remote server, you've got to
figure out how to emit an SQL query for it. To execute it with a
custom join strategy, you've got to know which of those joins should
have inner join semantics vs. left join semantics. A hook/callback in
make_join_rel() or in add_paths_to_joinrel() makes that relatively
straightforward. Otherwise, it's not clear what to do, short of
copy-and-pasting join_search_one_level(). If you have a suggestion,
I'd like to hear it.

Nothing I have. Once I tried to put a hook just after the set_cheapest(),
the largest problem was that we cannot extract a set of left and right
relations from a set of joined relations, like an extraction of apple
and orange from mix juice.

If not, I'm going to press forward with the idea of putting the
relevant logic in either add_paths_to_joinrel(), as previously
proposed, or perhaps up oe level in make_one_rel(). Either way, if
you don't need to be called multiple times per joinrel, you can stash
a flag inside whatever you hang off of the joinrel's fdw_private and
return immediately on every call after the first. I think that's
cheap enough that we shouldn't get too stressed about it: for FDWs, we
only call the hook at all if everything in the joinrel uses the same
FDW, so it won't get called at all except for joinrels where it's
likely to win big; for custom joins, multiple calls are quite likely
to be useful and necessary, and if the hook burns too much CPU time
for the query performance you get out of it, that's the custom-join
provider's fault, not ours. The current patch takes this approach one
step further and attempts FDW pushdown only once per joinrel. It does
that because, while postgres_fdw DOES need the jointype and a valid
innerrel/outerrel breakdown to figure out what query to generate, it
does NOT every possible breakdown; rather, the first one is as good as
any other. But this might not be true for a non-PostgreSQL remote
database. So I think it's better to call the hook every time and let
the hook return without doing anything if it wants.

Indeed. Although I and Hanada-san have discussed under an assumption of
remote PostgreSQL and join-pushdown cases, we may have remote RDBMS
that makes query execution plan according to the order of appear in
query.
If FDW driver don't want to call GetForeignJoinPaths() multiple times,
fdw_private of RelOptInfo is a good marker to determine whether it is
the first call or not.
In case when multiple CSP will add paths on join, we may need a facility
to allow multiple extensions to save its own private data.
If we could identify individual CSP by name, it may be an idea to
have a hash-table to track private data of CSP. But I don't think
it is mandatory feature in the 1st version.

I'm still not totally sure whether make_one_rel() is better than
add_paths_to_joinrel(). The current patch attempts to split the
difference by doing FDW pushdown from make_one_rel() and custom joins
from add_paths_to_joinrel(). I dunno why; if possible, those two
things should happen in the same place. Doing it in make_one_rel()
makes for fewer arguments and fewer repetitive calls, but that's not
much good if you would have had a use for the extra arguments that
aren't computed until we get down to add_paths_to_joinrel(). I'm not
sure whether that's the case or not. The latest version of the
postgres_fdw patch doesn't seem to mind not having extra_lateral_rels,
but I'm wondering if that's busted.

As my initial proposition doing, my preference is add_paths_to_joinrel()
for values calculated during this routine (but also increases number
of arguments). Even if make_one_rel() called FDW/CSP, I expect extensions
have to re-generate these values again, by itself. It is not impossible
to implement, not a graceful manner at least.

As long as postgres_fdw checks fdw_private of RelOptInfo, amount of
code adjustment is not so much.
Hanada-san, how about your opinion?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#56)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Wed, Apr 22, 2015 at 10:48 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

+       else if (scan->scanrelid == 0 &&
+                        (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+               varno = INDEX_VAR;

Suppose scan->scanrelid == 0 but the scan type is something else? Is
that legal? Is varno == 0 the correct outcome in that case?

Right now, no other scan type has capability to return a tuples
with flexible type/attributes more than static definition.
I think it is a valid restriction that only foreign/custom-scan
can have scanrelid == 0.

But the code as you've written it doesn't enforce any such
restriction. It just spends CPU cycles testing for a condition which,
to the best of your knowledge, will never happen.

If it's really a can't happen condition, how about checking it via an Assert()?

else if (scan->scanrelid == 0)
{
Assert(IsA(scan, ForeignScan) || IsA(scan, CustomScan));
varno = INDEX_VAR;
}

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#57)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Wed, Apr 22, 2015 at 10:48 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

+       else if (scan->scanrelid == 0 &&
+                        (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+               varno = INDEX_VAR;

Suppose scan->scanrelid == 0 but the scan type is something else? Is
that legal? Is varno == 0 the correct outcome in that case?

Right now, no other scan type has capability to return a tuples
with flexible type/attributes more than static definition.
I think it is a valid restriction that only foreign/custom-scan
can have scanrelid == 0.

But the code as you've written it doesn't enforce any such
restriction. It just spends CPU cycles testing for a condition which,
to the best of your knowledge, will never happen.

If it's really a can't happen condition, how about checking it via an Assert()?

else if (scan->scanrelid == 0)
{
Assert(IsA(scan, ForeignScan) || IsA(scan, CustomScan));
varno = INDEX_VAR;
}

Thanks for your suggestion. I'd like to use this idea on the next patch.

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#58)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

The attached patch v13 is revised one according to the suggestion
by Robert.

- eliminated useless change in allpaths.c
- eliminated an extra space in FdwRoutine definition
- prohibited to have scanrelid==0 by other than ForeignScan
or CustomScan, using Assert()
- definition of currentRelation in ExecInitForeignScan() and
ExecInitCustomScan() were moved inside of the if-block on
scanrelid > 0
- GetForeignJoinPaths() was redefined and moved to
add_paths_to_joinrel(), like set_join_pathlist_hook.

As suggested, FDW driver can skip to add additional paths if
equivalent paths are already added to a certain joinrel by
checking fdw_private. So, we can achieve the purpose when we
once moved the entrypoint to make_join_rel() - no to populate
redundant paths for each potential join combinations, even
though remote RDBMS handles it correctly. It also makes sense
if remote RDBMS handles tables join according to the order of
relations appear.

Its definition is below:
void GetForeignJoinPaths(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
List *restrictlist,
JoinType jointype,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);

In addition to the arguments in the previous version, we added
some parameters computed during add_paths_to_joinrel().
Right now, I'm not certain whether we should include mergeclause_list
here, because it depends on enable_mergejoin even though extra join
logic based on merge-join may not want to be controlled by this GUC.

Hanada-san, could you adjust your postgres_fdw patch according to
the above new (previous?) definition.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Kaigai Kouhei(海外 浩平)
Sent: Friday, April 24, 2015 11:23 PM
To: 'Robert Haas'
Cc: Tom Lane; Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

On Wed, Apr 22, 2015 at 10:48 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

+       else if (scan->scanrelid == 0 &&
+                        (IsA(scan, ForeignScan) || IsA(scan,

CustomScan)))

+ varno = INDEX_VAR;

Suppose scan->scanrelid == 0 but the scan type is something else? Is
that legal? Is varno == 0 the correct outcome in that case?

Right now, no other scan type has capability to return a tuples
with flexible type/attributes more than static definition.
I think it is a valid restriction that only foreign/custom-scan
can have scanrelid == 0.

But the code as you've written it doesn't enforce any such
restriction. It just spends CPU cycles testing for a condition which,
to the best of your knowledge, will never happen.

If it's really a can't happen condition, how about checking it via an Assert()?

else if (scan->scanrelid == 0)
{
Assert(IsA(scan, ForeignScan) || IsA(scan, CustomScan));
varno = INDEX_VAR;
}

Thanks for your suggestion. I'd like to use this idea on the next patch.

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.5-custom-join.v13.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v13.patchDownload
 doc/src/sgml/custom-scan.sgml           | 43 ++++++++++++++++++
 doc/src/sgml/fdwhandler.sgml            | 55 +++++++++++++++++++++++
 src/backend/commands/explain.c          | 15 +++++--
 src/backend/executor/execScan.c         |  6 +++
 src/backend/executor/nodeCustom.c       | 41 ++++++++++++-----
 src/backend/executor/nodeForeignscan.c  | 37 ++++++++++-----
 src/backend/foreign/foreign.c           | 22 ++++++---
 src/backend/nodes/bitmapset.c           | 57 +++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  5 +++
 src/backend/nodes/outfuncs.c            |  5 +++
 src/backend/optimizer/path/joinpath.c   | 25 +++++++++++
 src/backend/optimizer/plan/createplan.c | 80 ++++++++++++++++++++++++++-------
 src/backend/optimizer/plan/setrefs.c    | 64 ++++++++++++++++++++++++++
 src/backend/optimizer/util/plancat.c    |  7 ++-
 src/backend/optimizer/util/relnode.c    | 16 +++++++
 src/backend/utils/adt/ruleutils.c       |  4 ++
 src/include/foreign/fdwapi.h            | 15 +++++++
 src/include/nodes/bitmapset.h           |  1 +
 src/include/nodes/plannodes.h           | 24 +++++++---
 src/include/nodes/relation.h            |  2 +
 src/include/optimizer/paths.h           | 13 ++++++
 src/include/optimizer/planmain.h        |  1 +
 22 files changed, 487 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 8a4a3df..b1400ae 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -48,6 +48,27 @@ extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
   </para>
 
   <para>
+   A custom scan provider will be also able to add paths by setting the
+   following hook, to replace built-in join paths by custom-scan that
+   performs as if a scan on preliminary joined relations, which us called
+   after the core code has generated what it believes to be the complete
+   and correct set of access paths for the join.
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+                                             RelOptInfo *joinrel,
+                                             RelOptInfo *outerrel,
+                                             RelOptInfo *innerrel,
+                                             List *restrictlist,
+                                             JoinType jointype,
+                                             SpecialJoinInfo *sjinfo,
+                                             SemiAntiJoinFactors *semifactors,
+                                             Relids param_source_rels,
+                                             Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+  </para>
+
+  <para>
     Although this hook function can be used to examine, modify, or remove
     paths generated by the core system, a custom scan provider will typically
     confine itself to generating <structname>CustomPath</> objects and adding
@@ -124,7 +145,9 @@ typedef struct CustomScan
     Scan      scan;
     uint32    flags;
     List     *custom_exprs;
+    List     *custom_ps_tlist;
     List     *custom_private;
+    List     *custom_relids;
     const CustomScanMethods *methods;
 } CustomScan;
 </programlisting>
@@ -141,10 +164,30 @@ typedef struct CustomScan
     is only used by the custom scan provider itself.  Plan trees must be able
     to be duplicated using <function>copyObject</>, so all the data stored
     within these two fields must consist of nodes that function can handle.
+    <literal>custom_relids</> is set by the backend, thus custom-scan provider
+    does not need to touch, to track underlying relations represented by this
+    custom-scan node.
     <structfield>methods</> must point to a (usually statically allocated)
     object implementing the required custom scan methods, which are further
     detailed below.
   </para>
+  <para>
+   In case when <structname>CustomScan</> replaced built-in join paths,
+   custom-scan provider must have two characteristic setup.
+   The first one is zero on the <structfield>scan.scanrelid</>, which
+   should be usually an index of range-tables. It informs the backend
+   this <structname>CustomScan</> node is not associated with a particular
+   table. The second one is valid list of <structname>TargetEntry</> on
+   the <structfield>custom_ps_tlist</>. A <structname>CustomScan</> node
+   looks to the backend like a scan as literal, but on a relation which is
+   the result of relations join. It means we cannot construct a tuple
+   descriptor based on table definition, thus custom-scan provider must
+   introduce the expected record-type of the tuples.
+   Tuple-descriptor of scan-slot shall be constructed based on the
+   <structfield>custom_ps_tlist</>, and assigned on executor initialization.
+   Also, referenced by <command>EXPLAIN</> to solve name of the underlying
+   columns and relations.
+  </para>
 
   <sect2 id="custom-scan-plan-callbacks">
    <title>Custom Scan Callbacks</title>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 5af4131..dc9374d 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -598,6 +598,61 @@ IsForeignRelUpdatable (Relation rel);
 
    </sect2>
 
+   <sect2>
+    <title>FDW Routines for remote join</title>
+    <para>
+<programlisting>
+void
+GetForeignJoinPaths(PlannerInfo *root,
+                    RelOptInfo *joinrel,
+                    RelOptInfo *outerrel,
+                    RelOptInfo *innerrel,
+                    List *restrictlist,
+                    JoinType jointype,
+                    SpecialJoinInfo *sjinfo,
+                    SemiAntiJoinFactors *semifactors,
+                    Relids param_source_rels,
+                    Relids extra_lateral_rels);
+</programlisting>
+     Create possible access paths for a join of two foreign tables or
+     joined relations, but both of them needs to be managed with same
+     FDW driver.
+     This optional function is called during query planning.
+    </para>
+    <para>
+     This function allows FDW driver to add <literal>ForeignScan</> path
+     towards the supplied <literal>joinrel</>. From the standpoint of
+     query planner, it looks like scan-node is added for join-relation.
+     It means, <literal>ForeignScan</> path added instead of the built-in
+     local join logic has to generate tuples as if it scans on a joined
+     and materialized relations.
+    </para>
+    <para>
+     Usually, we expect FDW drivers issues a remote query that involves
+     tables join on remote side, then FDW driver fetches the joined result
+     on local side.
+     Unlike simple table scan, definition of slot descriptor of the joined
+     relations is determined on the fly, thus we cannot know its definition
+     from the system catalog.
+     So, FDW driver is responsible to introduce the query planner expected
+     form of the joined relations. In case when <literal>ForeignScan</>
+     replaced a relations join, <literal>scanrelid</> of the generated plan
+     node shall be zero, to mark this <literal>ForeignScan</> node is not
+     associated with a particular foreign tables.
+     Also, it need to construct pseudo scan tlist (<literal>fdw_ps_tlist</>)
+     to indicate expected tuple definition.
+    </para>
+    <para>
+     Once <literal>scanrelid</> equals zero, executor initializes the slot
+     for scan according to <literal>fdw_ps_tlist</>, but excludes junk
+     entries. This list is also used to solve the name of the original
+     relation and columns, so FDW can chains expression nodes which are
+     not run on local side actually, like a join clause to be executed on
+     the remote side, however, target-entries of them will have
+     <literal>resjunk=true</>.
+    </para>
+   </sect2>
+
    <sect2 id="fdw-callbacks-explain">
     <title>FDW Routines for <command>EXPLAIN</></title>
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 315a528..f4cc901 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -730,11 +730,17 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
-		case T_ForeignScan:
-		case T_CustomScan:
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_ForeignScan:
+			*rels_used = bms_add_members(*rels_used,
+										 ((ForeignScan *) plan)->fdw_relids);
+			break;
+		case T_CustomScan:
+			*rels_used = bms_add_members(*rels_used,
+										 ((CustomScan *) plan)->custom_relids);
+			break;
 		case T_ModifyTable:
 			*rels_used = bms_add_member(*rels_used,
 									((ModifyTable *) plan)->nominalRelation);
@@ -1072,9 +1078,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..85ce932 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,12 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0)
+	{
+		Assert(IsA(scan, ForeignScan) || IsA(scan, CustomScan));
+		varno = INDEX_VAR;
+	}
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..80851de 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,7 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
-	Relation			scan_rel;
+	Index				scan_relid = cscan->scan.scanrelid;
 
 	/* populate a CustomScanState according to the CustomScan */
 	css = (CustomScanState *) cscan->methods->CreateCustomScanState(cscan);
@@ -48,12 +48,33 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		Relation		scan_rel;
+
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +110,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..8f69cd4 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,7 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
-	Relation	currentRelation;
+	Index		scanrelid = node->scan.scanrelid;
 	FdwRoutine *fdwroutine;
 
 	/* check for unsupported flags */
@@ -141,16 +141,30 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		Relation	currentRelation;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +175,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +207,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..78c977f 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -304,11 +304,11 @@ GetFdwRoutine(Oid fdwhandler)
 
 
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +350,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
@@ -398,7 +409,6 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
 	return relation->rd_fdwroutine;
 }
 
-
 /*
  * IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
  *
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index a9c3b4b..4dc3286 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -301,6 +301,63 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 }
 
 /*
+ * bms_shift_members - move all the bits by shift
+ */
+Bitmapset *
+bms_shift_members(const Bitmapset *a, int shift)
+{
+	Bitmapset  *b;
+	bitmapword	h_word;
+	bitmapword	l_word;
+	int			nwords;
+	int			w_shift;
+	int			b_shift;
+	int			i, j;
+
+	/* fast path if result shall be NULL obviously */
+	if (a == NULL || a->nwords * BITS_PER_BITMAPWORD + shift <= 0)
+		return NULL;
+	/* actually, not shift members */
+	if (shift == 0)
+		return bms_copy(a);
+
+	nwords = (a->nwords * BITS_PER_BITMAPWORD + shift +
+			  BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+	b = palloc(BITMAPSET_SIZE(nwords));
+	b->nwords = nwords;
+
+	if (shift > 0)
+	{
+		/* Left shift */
+		w_shift = WORDNUM(shift);
+		b_shift = BITNUM(shift);
+
+		for (i=0, j=-w_shift; i < b->nwords; i++, j++)
+		{
+			h_word = (j >= 0   && j   < a->nwords ? a->words[j] : 0);
+			l_word = (j-1 >= 0 && j-1 < a->nwords ? a->words[j-1] : 0);
+			b->words[i] = ((h_word << b_shift) |
+						   (l_word >> (BITS_PER_BITMAPWORD - b_shift)));
+		}
+	}
+	else
+	{
+		/* Right shift */
+		w_shift = WORDNUM(-shift);
+		b_shift = BITNUM(-shift);
+
+		for (i=0, j=-w_shift; i < b->nwords; i++, j++)
+		{
+			h_word = (j+1 >= 0 && j+1 < a->nwords ? a->words[j+1] : 0);
+			l_word = (j >= 0 && j < a->nwords ? a->words[j] : 0);
+			b->words[i] = ((h_word >> (BITS_PER_BITMAPWORD - b_shift)) |
+						   (l_word << b_shift));
+		}
+	}
+	return b;
+}
+
+/*
  * bms_is_subset - is A a subset of B?
  */
 bool
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1685efe..805045d 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -592,8 +592,11 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
+	COPY_BITMAPSET_FIELD(fdw_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
 	return newnode;
@@ -617,7 +620,9 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
+	COPY_BITMAPSET_FIELD(custom_relids);
 
 	/*
 	 * NOTE: The method field of CustomScan is required to be a pointer to a
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e0dca56..f9f948e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -558,8 +558,11 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
+	WRITE_BITMAPSET_FIELD(fdw_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
 
@@ -572,7 +575,9 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
+	WRITE_BITMAPSET_FIELD(custom_relids);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
 	if (node->methods->TextOutCustomScan)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1da953f..f61e725 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,10 +17,13 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -260,6 +263,28 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Consider paths added by FDW, in case when both of outer and
+	 * inner relations are managed by the same driver.
+	 */
+	if (joinrel->fdwroutine &&
+		joinrel->fdwroutine->GetForeignJoinPaths)
+		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
+												 outerrel, innerrel,
+												 restrictlist, jointype, sjinfo,
+												 &semifactors,
+												 param_source_rels,
+												 extra_lateral_rels);
+	/*
+	 * 6. At the last, consider paths added by extension, in addition to the
+	 * built-in paths.
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cb69c03..7f86fcb 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1961,16 +1960,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * Fetch relation-id, if this foreign-scan node actuall scans on
+	 * a particular real relation. Elsewhere, InvalidOid shall be
+	 * informed to the FDW driver.
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1985,13 +1994,37 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
+	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+	 * ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (scan_plan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, scan_plan->fdw_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+	/* Set the relids that are represented by this foreign scan for Explain */
+	scan_plan->fdw_relids = best_path->path.parent->relids;
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2053,12 +2086,7 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
-
-	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
+	ListCell   *lc;
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2078,6 +2106,28 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	Assert(IsA(cplan, CustomScan));
 
 	/*
+	 * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+	 * based on the custom_ps_tlist, excluding resjunk=true, so we need
+	 * to ensure all valid TLEs have to locate prior to junk ones.
+	 */
+	if (cplan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, cplan->custom_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+	/* Set the relids that are represented by this custom scan for Explain */
+	cplan->custom_relids = best_path->path.parent->relids;
+
+	/*
 	 * Copy cost data from Path to Plan; no need to make custom-plan providers
 	 * do this
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 94b12ab..60fbb08 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -568,6 +568,38 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				ForeignScan *splan = (ForeignScan *) plan;
 
+				if (rtoffset > 0)
+					splan->fdw_relids =
+						bms_shift_members(splan->fdw_relids, rtoffset);
+
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->fdw_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->fdw_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->fdw_ps_tlist =
+						fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -582,6 +614,38 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			{
 				CustomScan *splan = (CustomScan *) plan;
 
+				if (rtoffset > 0)
+					splan->custom_relids =
+						bms_shift_members(splan->custom_relids, rtoffset);
+
+				if (splan->scan.scanrelid == 0)
+				{
+					indexed_tlist *pscan_itlist =
+						build_tlist_index(splan->custom_ps_tlist);
+
+					splan->scan.plan.targetlist = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.targetlist,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->scan.plan.qual = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->scan.plan.qual,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_exprs = (List *)
+						fix_upper_expr(root,
+									   (Node *) splan->custom_exprs,
+									   pscan_itlist,
+									   INDEX_VAR,
+									   rtoffset);
+					splan->custom_ps_tlist =
+						fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+					pfree(pscan_itlist);
+					break;
+				}
 				splan->scan.scanrelid += rtoffset;
 				splan->scan.plan.targetlist =
 					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8abed2a..068ab39 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -379,10 +379,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerByRelId(RelationGetRelid(relation));
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..94687b4 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -122,6 +123,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->fdwroutine = NULL;
+	rel->fdw_handler = InvalidOid;
 	rel->fdw_private = NULL;
 	rel->baserestrictinfo = NIL;
 	rel->baserestrictcost.startup = 0;
@@ -316,6 +318,8 @@ find_join_rel(PlannerInfo *root, Relids relids)
  * 'restrictlist_ptr': result variable.  If not NULL, *restrictlist_ptr
  *		receives the list of RestrictInfo nodes that apply to this
  *		particular pair of joinable relations.
+ * 'found' : indicates whether RelOptInfo is actually constructed.
+ *		true, if it was already built and on the cache.
  *
  * restrictlist_ptr makes the routine's API a little grotty, but it saves
  * duplicated calculation of the restrictlist...
@@ -427,6 +431,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 5ffb712..29d1210 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3862,6 +3862,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..5d77623 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,17 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
+typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,
+											   RelOptInfo *joinrel,
+											   RelOptInfo *outerrel,
+											   RelOptInfo *innerrel,
+											   List *restrictlist,
+											   JoinType jointype,
+											   SpecialJoinInfo *sjinfo,
+											   SemiAntiJoinFactors *semifactors,
+											   Relids param_source_rels,
+											   Relids extra_lateral_rels);
+
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
 
@@ -150,10 +161,14 @@ typedef struct FdwRoutine
 
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
+
+	/* Support functions for join push-down */
+	GetForeignJoinPaths_function GetForeignJoinPaths;
 } FdwRoutine;
 
 
 /* Functions in foreign/foreign.c */
+extern Oid GetFdwHandlerByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 3a556ee..3ca9791 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -66,6 +66,7 @@ extern void bms_free(Bitmapset *a);
 extern Bitmapset *bms_union(const Bitmapset *a, const Bitmapset *b);
 extern Bitmapset *bms_intersect(const Bitmapset *a, const Bitmapset *b);
 extern Bitmapset *bms_difference(const Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_shift_members(const Bitmapset *a, int shift);
 extern bool bms_is_subset(const Bitmapset *a, const Bitmapset *b);
 extern BMS_Comparison bms_subset_compare(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_is_member(int x, const Bitmapset *a);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21cbfa8..b25330e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -471,7 +471,13 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -480,18 +486,23 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
+	Bitmapset  *fdw_relids;		/* set of relid (index of range-tables)
+								 * represented by this node */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ *  Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -512,7 +523,10 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
+	Bitmapset  *custom_relids;	/* set of relid (index of range-tables)
+								 * represented by this node */
 	const CustomScanMethods *methods;
 } CustomScan;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 401a686..1713d29 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fa72918..0c8cbcd 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
#60Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Kouhei Kaigai (#59)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Kaigai-san,

2015-04-27 11:00 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Hanada-san, could you adjust your postgres_fdw patch according to
the above new (previous?) definition.

The attached v14 patch is the revised version for your v13 patch. It also contains changed for Ashutosh’s comments.

--
Shigeru HANADA

Attachments:

foreign_join_v14.patchapplication/octet-stream; name=foreign_join_v14.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 94fab18..98b93f5 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,11 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/plannodes.h"
 #include "optimizer/clauses.h"
+#include "optimizer/prep.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -89,6 +92,8 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	List	   *outertlist;		/* outer child's target list */
+	List	   *innertlist;		/* inner child's target list */
 } deparse_expr_cxt;
 
 /*
@@ -136,6 +141,13 @@ static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
 				 deparse_expr_cxt *context);
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
 					   deparse_expr_cxt *context);
+static const char *get_jointype_name(JoinType jointype);
+
+/*
+ * Convert an absolute attnum to a relative one.  This would be handy for
+ * handling attnum for attrs_used and column aliases.
+ */
+#define GET_RELATIVE_ATTNO(x)	((x) - FirstLowInvalidHeapAttributeNumber)
 
 
 /*
@@ -143,6 +155,7 @@ static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
  * which are returned as two lists:
  *	- remote_conds contains expressions that can be evaluated remotely
  *	- local_conds contains expressions that can't be evaluated remotely
+ * Note that each element is Expr, which was stripped from RestrictInfo, 
  */
 void
 classifyConditions(PlannerInfo *root,
@@ -250,7 +263,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -675,18 +688,83 @@ is_builtin(Oid oid)
  *
  * We also create an integer List of the columns being retrieved, which is
  * returned to *retrieved_attrs.
+ *
+ * The relations is a string buffer for "Relations" portion of EXPLAIN output,
+ * or NULL if caller doesn't need it.  Note that it should have been
+ * initialized by caller.
  */
 void
 deparseSelectSql(StringInfo buf,
 				 PlannerInfo *root,
 				 RelOptInfo *baserel,
 				 Bitmapset *attrs_used,
-				 List **retrieved_attrs)
+				 List *remote_conds,
+				 List **params_list,
+				 List **fdw_ps_tlist,
+				 List **retrieved_attrs,
+				 StringInfo relations)
 {
+	PgFdwRelationInfo  *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
 	Relation	rel;
 
 	/*
+	 * If given relation was a join relation, recursively construct statement
+	 * by putting each outer and inner relations in FROM clause as a subquery
+	 * with aliasing.
+	 */
+	if (baserel->reloptkind == RELOPT_JOINREL)
+	{
+		RelOptInfo		   *rel_o = fpinfo->outerrel;
+		RelOptInfo		   *rel_i = fpinfo->innerrel;
+		PgFdwRelationInfo  *fpinfo_o = (PgFdwRelationInfo *) rel_o->fdw_private;
+		PgFdwRelationInfo  *fpinfo_i = (PgFdwRelationInfo *) rel_i->fdw_private;
+		StringInfoData		sql_o;
+		StringInfoData		sql_i;
+		List			   *ret_attrs_tmp;	/* not used */
+		StringInfoData		relations_o;
+		StringInfoData		relations_i;
+		const char		   *jointype_str;
+
+		/*
+		 * Deparse query for outer and inner relation, and combine them into
+		 * a query.
+		 *
+		 * Here we don't pass fdw_ps_tlist because targets of underlying
+		 * relations are already put in joinrel->reltargetlist, and
+		 * deparseJoinRel() takes all care about it.
+		 */
+		initStringInfo(&sql_o);
+		initStringInfo(&relations_o);
+		deparseSelectSql(&sql_o, root, rel_o, fpinfo_o->attrs_used,
+						 fpinfo_o->remote_conds, params_list,
+						 NULL, &ret_attrs_tmp, &relations_o);
+		initStringInfo(&sql_i);
+		initStringInfo(&relations_i);
+		deparseSelectSql(&sql_i, root, rel_i, fpinfo_i->attrs_used,
+						 fpinfo_i->remote_conds, params_list,
+						 NULL, &ret_attrs_tmp, &relations_i);
+
+		/* For EXPLAIN output */
+		jointype_str = get_jointype_name(fpinfo->jointype);
+		if (relations)
+			appendStringInfo(relations, "(%s) %s JOIN (%s)",
+							 relations_o.data, jointype_str, relations_i.data);
+
+		deparseJoinSql(buf, root, baserel,
+					   fpinfo->outerrel,
+					   fpinfo->innerrel,
+					   sql_o.data,
+					   sql_i.data,
+					   fpinfo->jointype,
+					   fpinfo->joinclauses,
+					   fpinfo->otherclauses,
+					   fdw_ps_tlist,
+					   retrieved_attrs);
+		return;
+	}
+
+	/*
 	 * Core code already has some lock on each rel being planned, so we can
 	 * use NoLock here.
 	 */
@@ -705,6 +783,87 @@ deparseSelectSql(StringInfo buf,
 	appendStringInfoString(buf, " FROM ");
 	deparseRelation(buf, rel);
 
+	/*
+	 * Return local relation name for EXPLAIN output.
+	 * We can't know VERBOSE option is specified or not, so always add shcema
+	 * name.
+	 */
+	if (relations)
+	{
+		const char	   *namespace;
+		const char	   *relname;
+		const char	   *refname;
+
+		namespace = get_namespace_name(get_rel_namespace(rte->relid));
+		relname = get_rel_name(rte->relid);
+		refname = rte->eref->aliasname;
+		appendStringInfo(relations, "%s.%s",
+						 quote_identifier(namespace),
+						 quote_identifier(relname));
+		if (*refname && strcmp(refname, relname) != 0)
+			appendStringInfo(relations, " %s",
+							 quote_identifier(rte->eref->aliasname));
+	}
+
+	/*
+	 * Construct WHERE clause
+	 */
+	if (remote_conds)
+		appendConditions(buf, root, baserel, NULL, NULL, remote_conds,
+						 " WHERE ", params_list);
+
+	/*
+	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
+	 * initial row fetch, rather than later on as is done for local tables.
+	 * The extra roundtrips involved in trying to duplicate the local
+	 * semantics exactly don't seem worthwhile (see also comments for
+	 * RowMarkType).
+	 *
+	 * Note: because we actually run the query as a cursor, this assumes
+	 * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
+	 * before 8.3.
+	 */
+	if (baserel->relid == root->parse->resultRelation &&
+		(root->parse->commandType == CMD_UPDATE ||
+		 root->parse->commandType == CMD_DELETE))
+	{
+		/* Relation is UPDATE/DELETE target, so use FOR UPDATE */
+		appendStringInfoString(buf, " FOR UPDATE");
+	}
+	else
+	{
+		PlanRowMark *rc = get_plan_rowmark(root->rowMarks, baserel->relid);
+
+		if (rc)
+		{
+			/*
+			 * Relation is specified as a FOR UPDATE/SHARE target, so handle
+			 * that.  (But we could also see LCS_NONE, meaning this isn't a
+			 * target relation after all.)
+			 *
+			 * For now, just ignore any [NO] KEY specification, since (a)
+			 * it's not clear what that means for a remote table that we
+			 * don't have complete information about, and (b) it wouldn't
+			 * work anyway on older remote servers.  Likewise, we don't
+			 * worry about NOWAIT.
+			 */
+			switch (rc->strength)
+			{
+				case LCS_NONE:
+					/* No locking needed */
+					break;
+				case LCS_FORKEYSHARE:
+				case LCS_FORSHARE:
+					appendStringInfoString(buf, " FOR SHARE");
+					break;
+				case LCS_FORNOKEYUPDATE:
+				case LCS_FORUPDATE:
+					appendStringInfoString(buf, " FOR UPDATE");
+					break;
+			}
+		}
+	}
+
 	heap_close(rel, NoLock);
 }
 
@@ -731,8 +890,7 @@ deparseTargetList(StringInfo buf,
 	*retrieved_attrs = NIL;
 
 	/* If there's a whole-row reference, we'll need all the columns. */
-	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
-								  attrs_used);
+	have_wholerow = bms_is_member(GET_RELATIVE_ATTNO(0), attrs_used);
 
 	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
@@ -743,15 +901,14 @@ deparseTargetList(StringInfo buf,
 		if (attr->attisdropped)
 			continue;
 
-		if (have_wholerow ||
-			bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
-						  attrs_used))
+		if (have_wholerow || bms_is_member(GET_RELATIVE_ATTNO(i), attrs_used))
 		{
 			if (!first)
 				appendStringInfoString(buf, ", ");
 			first = false;
 
 			deparseColumnRef(buf, rtindex, i, root);
+			appendStringInfo(buf, " a%d", GET_RELATIVE_ATTNO(i));
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -761,17 +918,17 @@ deparseTargetList(StringInfo buf,
 	 * Add ctid if needed.  We currently don't support retrieving any other
 	 * system columns.
 	 */
-	if (bms_is_member(SelfItemPointerAttributeNumber - FirstLowInvalidHeapAttributeNumber,
-					  attrs_used))
+	if (bms_is_member(GET_RELATIVE_ATTNO(SelfItemPointerAttributeNumber), attrs_used))
 	{
 		if (!first)
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		appendStringInfoString(buf, "ctid");
+		appendStringInfo(buf, "ctid a%d",
+						 GET_RELATIVE_ATTNO(SelfItemPointerAttributeNumber));
 
 		*retrieved_attrs = lappend_int(*retrieved_attrs,
-									   SelfItemPointerAttributeNumber);
+										   SelfItemPointerAttributeNumber);
 	}
 
 	/* Don't generate bad syntax if no undropped columns */
@@ -780,11 +937,13 @@ deparseTargetList(StringInfo buf,
 }
 
 /*
- * Deparse WHERE clauses in given list of RestrictInfos and append them to buf.
+ * Deparse conditions, such as WHERE clause and ON clause of JOIN, in the given
+ * list, consist of RestrictInfo or Expr, and append string representation of
+ * them to buf.
  *
  * baserel is the foreign table we're planning for.
  *
- * If no WHERE clause already exists in the buffer, is_first should be true.
+ * prefix is placed before the conditions, if any.
  *
  * If params is not NULL, it receives a list of Params and other-relation Vars
  * used in the clauses; these values must be transmitted to the remote server
@@ -794,12 +953,14 @@ deparseTargetList(StringInfo buf,
  * so Params and other-relation Vars should be replaced by dummy values.
  */
 void
-appendWhereClause(StringInfo buf,
-				  PlannerInfo *root,
-				  RelOptInfo *baserel,
-				  List *exprs,
-				  bool is_first,
-				  List **params)
+appendConditions(StringInfo buf,
+				 PlannerInfo *root,
+				 RelOptInfo *baserel,
+				 List *outertlist,
+				 List *innertlist,
+				 List *exprs,
+				 const char *prefix,
+				 List **params)
 {
 	deparse_expr_cxt context;
 	int			nestlevel;
@@ -813,31 +974,329 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.outertlist = outertlist;
+	context.innertlist = innertlist;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
 
 	foreach(lc, exprs)
 	{
-		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+		Node	   *node = (Node *) lfirst(lc);
+		Expr	   *expr;
 
-		/* Connect expressions with "AND" and parenthesize each condition. */
-		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+		if (IsA(node, RestrictInfo))
+		{
+			RestrictInfo *ri = (RestrictInfo *) node;
+			expr = ri->clause;
+		}
 		else
-			appendStringInfoString(buf, " AND ");
+			expr = (Expr *) node;
+
+		/* Connect expressions with "AND" and parenthesize each condition. */
+		if (prefix)
+			appendStringInfo(buf, "%s", prefix);
 
 		appendStringInfoChar(buf, '(');
-		deparseExpr(ri->clause, &context);
+		deparseExpr(expr, &context);
 		appendStringInfoChar(buf, ')');
 
-		is_first = false;
+		prefix= " AND ";
 	}
 
 	reset_transmission_modes(nestlevel);
 }
 
 /*
+ * Returns position index (start with 1) of given var in given target list, or
+ * 0 when not found.
+ */
+static int
+find_var_pos(Var *node, List *tlist)
+{
+	int		pos = 1;
+	ListCell *lc;
+
+	foreach(lc, tlist)
+	{
+		Var *var = (Var *) lfirst(lc);
+
+		if (equal(var, node))
+		{
+			return pos;
+		}
+		pos++;
+	}
+
+	return 0;
+}
+
+/*
+ * Deparse given Var into buf.
+ */
+static void
+deparseJoinVar(Var *node, deparse_expr_cxt *context)
+{
+	char		side;
+	int			pos;
+
+	pos = find_var_pos(node, context->outertlist);
+	if (pos > 0)
+		side = 'l';
+	else
+	{
+		side = 'r';
+		pos = find_var_pos(node, context->innertlist);
+	}
+
+	/*
+	 * We treat whole-row reference same as ordinary attribute references,
+	 * because such transformation should be done in lower level.
+	 */
+	appendStringInfo(context->buf, "%c.a%d", side, pos);
+}
+
+/*
+ * Deparse column alias list for a subquery in FROM clause.
+ */
+static void
+deparseColumnAliases(StringInfo buf, List *tlist)
+{
+	int			pos;
+	ListCell   *lc;
+
+	pos = 1;
+	foreach(lc, tlist)
+	{
+		/* Deparse column alias for the subquery */
+		if (pos > 1)
+			appendStringInfoString(buf, ", ");
+		appendStringInfo(buf, "a%d", pos);
+		pos++;
+	}
+}
+
+/*
+ * Deparse "wrapper" SQL for a query which projects target lists in proper
+ * order and contents.  Note that this treatment is necessary only for queries
+ * used in FROM clause of a join query.
+ *
+ * Even if the SQL is enough simple (no ctid, no whole-row reference), the order
+ * of output column might different from underlying scan, so we always need to
+ * wrap the queries for join sources.
+ *
+ */
+static const char *
+deparseProjectionSql(PlannerInfo *root,
+					 RelOptInfo *baserel,
+					 const char *sql,
+					 char side)
+{
+	StringInfoData wholerow;
+	StringInfoData buf;
+	ListCell   *lc;
+	bool		first;
+	bool		have_wholerow = false;
+
+	/*
+	 * We have nothing to do if the targetlist contains no special reference,
+	 * such as whole-row and ctid.
+	 */
+	foreach(lc, baserel->reltargetlist)
+	{
+		Var		   *var = (Var *) lfirst(lc);
+		if (var->varattno == 0)
+		{
+			have_wholerow = true;
+			break;
+		}
+	}
+
+	/*
+	 * Construct whole-row reference with ROW() syntax
+	 */
+	if (have_wholerow)
+	{
+		RangeTblEntry *rte;
+		Relation		rel;
+		TupleDesc		tupdesc;
+		int				i;
+
+		/* Obtain TupleDesc for deparsing all valid columns */
+		rte = planner_rt_fetch(baserel->relid, root);
+		rel = heap_open(rte->relid, NoLock);
+		tupdesc = rel->rd_att;
+
+		/* Print all valid columns in ROW() to generate whole-row value */
+		initStringInfo(&wholerow);
+		appendStringInfoString(&wholerow, "ROW(");
+		first = true;
+		for (i = 1; i <= tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = tupdesc->attrs[i - 1];
+
+			/* Ignore dropped columns. */
+			if (attr->attisdropped)
+				continue;
+
+			if (!first)
+				appendStringInfoString(&wholerow, ", ");
+			first = false;
+
+			appendStringInfo(&wholerow, "%c.a%d", side, GET_RELATIVE_ATTNO(i));
+		}
+		appendStringInfoString(&wholerow, ")");
+
+		heap_close(rel, NoLock);
+	}
+
+	/*
+	 * Construct a SELECT statement which has the original query in its FROM
+	 * clause, and have target list entries in its SELECT clause.  The number
+	 * used in column aliases are attnum - FirstLowInvalidHeapAttributeNumber in
+	 * order to make all numbers positive even for system columns which have
+	 * minus value as attnum.
+	 */
+	initStringInfo(&buf);
+	appendStringInfoString(&buf, "SELECT ");
+	first = true;
+	foreach(lc, baserel->reltargetlist)
+	{
+		Var *var = (Var *) lfirst(lc);
+
+		if (!first)
+			appendStringInfoString(&buf, ", ");
+	
+		if (var->varattno == 0)
+			appendStringInfo(&buf, "%s", wholerow.data);
+		else
+			appendStringInfo(&buf, "%c.a%d", side, GET_RELATIVE_ATTNO(var->varattno));
+
+		first = false;
+	}
+	appendStringInfo(&buf, " FROM (%s) %c", sql, side);
+
+	return buf.data;
+}
+
+static const char *
+get_jointype_name(JoinType jointype)
+{
+	if (jointype == JOIN_INNER)
+		return "INNER";
+	else if (jointype == JOIN_LEFT)
+		return "LEFT";
+	else if (jointype == JOIN_RIGHT)
+		return "RIGHT";
+	else if (jointype == JOIN_FULL)
+		return "FULL";
+
+	/* not reached */
+	elog(ERROR, "unsupported join type %d", jointype);
+}
+
+/*
+ * Construct a SELECT statement which contains join clause.
+ *
+ * We also create an TargetEntry List of the columns being retrieved, which is
+ * returned to *fdw_ps_tlist.
+ *
+ * path_o, tl_o, sql_o are respectively path, targetlist, and remote query
+ * statement of the outer child relation.  postfix _i means those for the inner
+ * child relation.  jointype and joinclauses are information of join method.
+ * fdw_ps_tlist is output parameter to pass target list of the pseudo scan to
+ * caller.
+ */
+void
+deparseJoinSql(StringInfo buf,
+			   PlannerInfo *root,
+			   RelOptInfo *baserel,
+			   RelOptInfo *outerrel,
+			   RelOptInfo *innerrel,
+			   const char *sql_o,
+			   const char *sql_i,
+			   JoinType jointype,
+			   List *joinclauses,
+			   List *otherclauses,
+			   List **fdw_ps_tlist,
+			   List **retrieved_attrs)
+{
+	StringInfoData selbuf;		/* buffer for SELECT clause */
+	StringInfoData abuf_o;		/* buffer for column alias list of outer */
+	StringInfoData abuf_i;		/* buffer for column alias list of inner */
+	int			i;
+	ListCell   *lc;
+	const char *jointype_str;
+	deparse_expr_cxt context;
+
+	context.root = root;
+	context.foreignrel = baserel;
+	context.buf = &selbuf;
+	context.params_list = NULL;
+	context.outertlist = outerrel->reltargetlist;
+	context.innertlist = innerrel->reltargetlist;
+
+	jointype_str = get_jointype_name(jointype);
+	*retrieved_attrs = NIL;
+
+	/* print SELECT clause of the join scan */
+	initStringInfo(&selbuf);
+	i = 0;
+	foreach(lc, baserel->reltargetlist)
+	{
+		Var		   *var = (Var *) lfirst(lc);
+		TargetEntry *tle;
+
+		if (i > 0)
+			appendStringInfoString(&selbuf, ", ");
+		deparseJoinVar(var, &context);
+
+		tle = makeTargetEntry((Expr *) var, i + 1, NULL, false);
+		if (fdw_ps_tlist)
+			*fdw_ps_tlist = lappend(*fdw_ps_tlist, tle);
+
+		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+
+		i++;
+	}
+	if (i == 0)
+		appendStringInfoString(&selbuf, "NULL");
+
+	/*
+	 * Do pseudo-projection for an underlying scan on a foreign table, if a) the
+	 * relation is a base relation, and b) its targetlist contains whole-row
+	 * reference.
+	 */
+	if (outerrel->reloptkind == RELOPT_BASEREL)
+		sql_o = deparseProjectionSql(root, outerrel, sql_o, 'l');
+	if (innerrel->reloptkind == RELOPT_BASEREL)
+		sql_i = deparseProjectionSql(root, innerrel, sql_i, 'r');
+
+	/* Deparse column alias portion of subquery in FROM clause. */
+	initStringInfo(&abuf_o);
+	deparseColumnAliases(&abuf_o, outerrel->reltargetlist);
+	initStringInfo(&abuf_i);
+	deparseColumnAliases(&abuf_i, innerrel->reltargetlist);
+
+	/* Construct SELECT statement */
+	appendStringInfo(buf, "SELECT %s FROM", selbuf.data);
+	appendStringInfo(buf, " (%s) l (%s) %s JOIN (%s) r (%s)",
+					 sql_o, abuf_o.data, jointype_str, sql_i, abuf_i.data);
+	/* Append ON clause */
+	if (joinclauses)
+		appendConditions(buf, root, baserel,
+						 outerrel->reltargetlist, innerrel->reltargetlist,
+						 joinclauses,
+						 " ON ", NULL);
+	/* Append WHERE clause */
+	if (otherclauses)
+		appendConditions(buf, root, baserel,
+						 outerrel->reltargetlist, innerrel->reltargetlist,
+						 otherclauses,
+						 " WHERE ", NULL);
+}
+
+/*
  * deparse remote INSERT statement
  *
  * The statement text is appended to buf, and we also create an integer List
@@ -976,8 +1435,7 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 	if (trig_after_row)
 	{
 		/* whole-row reference acquires all non-system columns */
-		attrs_used =
-			bms_make_singleton(0 - FirstLowInvalidHeapAttributeNumber);
+		attrs_used = bms_make_singleton(GET_RELATIVE_ATTNO(0));
 	}
 
 	if (returningList != NIL)
@@ -1261,6 +1719,8 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
 /*
  * Deparse given Var node into context->buf.
  *
+ * If context has valid innerrel, this is invoked for a join conditions.
+ *
  * If the Var belongs to the foreign relation, just print its remote name.
  * Otherwise, it's effectively a Param (and will in fact be a Param at
  * run time).  Handle it the same way we handle plain Params --- see
@@ -1271,39 +1731,46 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
-		node->varlevelsup == 0)
+	if (context->foreignrel->reloptkind == RELOPT_JOINREL)
 	{
-		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseJoinVar(node, context);
 	}
 	else
 	{
-		/* Treat like a Param */
-		if (context->params_list)
+		if (node->varno == context->foreignrel->relid &&
+			node->varlevelsup == 0)
 		{
-			int			pindex = 0;
-			ListCell   *lc;
-
-			/* find its index in params_list */
-			foreach(lc, *context->params_list)
+			/* Var belongs to foreign table */
+			deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		}
+		else
+		{
+			/* Treat like a Param */
+			if (context->params_list)
 			{
-				pindex++;
-				if (equal(node, (Node *) lfirst(lc)))
-					break;
+				int			pindex = 0;
+				ListCell   *lc;
+
+				/* find its index in params_list */
+				foreach(lc, *context->params_list)
+				{
+					pindex++;
+					if (equal(node, (Node *) lfirst(lc)))
+						break;
+				}
+				if (lc == NULL)
+				{
+					/* not in list, so add it */
+					pindex++;
+					*context->params_list = lappend(*context->params_list, node);
+				}
+
+				printRemoteParam(pindex, node->vartype, node->vartypmod, context);
 			}
-			if (lc == NULL)
+			else
 			{
-				/* not in list, so add it */
-				pindex++;
-				*context->params_list = lappend(*context->params_list, node);
+				printRemotePlaceholder(node->vartype, node->vartypmod, context);
 			}
-
-			printRemoteParam(pindex, node->vartype, node->vartypmod, context);
-		}
-		else
-		{
-			printRemotePlaceholder(node->vartype, node->vartypmod, context);
 		}
 	}
 }
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 93e9836..05801dc 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9,11 +9,16 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
 -- ===================================================================
 -- create objects used through FDW loopback server
 -- ===================================================================
@@ -35,6 +40,18 @@ CREATE TABLE "S 1"."T 2" (
 	c2 text,
 	CONSTRAINT t2_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 3" (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c3 text,
+	CONSTRAINT t3_pkey PRIMARY KEY (c1)
+);
+CREATE TABLE "S 1"."T 4" (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c4 text,
+	CONSTRAINT t4_pkey PRIMARY KEY (c1)
+);
 INSERT INTO "S 1"."T 1"
 	SELECT id,
 	       id % 10,
@@ -49,8 +66,22 @@ INSERT INTO "S 1"."T 2"
 	SELECT id,
 	       'AAA' || to_char(id, 'FM000')
 	FROM generate_series(1, 100) id;
+INSERT INTO "S 1"."T 3"
+	SELECT id,
+	       id + 1,
+	       'AAA' || to_char(id, 'FM000')
+	FROM generate_series(1, 100) id;
+DELETE FROM "S 1"."T 3" WHERE c1 % 2 != 0;	-- delete for outer join tests
+INSERT INTO "S 1"."T 4"
+	SELECT id,
+	       id + 1,
+	       'AAA' || to_char(id, 'FM000')
+	FROM generate_series(1, 100) id;
+DELETE FROM "S 1"."T 4" WHERE c1 % 3 != 0;	-- delete for outer join tests
 ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
+ANALYZE "S 1"."T 3";
+ANALYZE "S 1"."T 4";
 -- ===================================================================
 -- create foreign tables
 -- ===================================================================
@@ -78,6 +109,26 @@ CREATE FOREIGN TABLE ft2 (
 	c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
+CREATE FOREIGN TABLE ft4 (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c3 text
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');
+CREATE FOREIGN TABLE ft5 (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c3 text
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft6 (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c3 text
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE USER view_owner;
+GRANT ALL ON ft5 TO view_owner;
+CREATE VIEW v_ft5 AS SELECT * FROM ft5;
+ALTER VIEW v_ft5 OWNER TO view_owner;
+CREATE USER MAPPING FOR view_owner SERVER loopback;
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -119,12 +170,15 @@ ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                             List of foreign tables
- Schema | Table |  Server  |              FDW Options              | Description 
---------+-------+----------+---------------------------------------+-------------
- public | ft1   | loopback | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback | (schema_name 'S 1', table_name 'T 1') | 
-(2 rows)
+                              List of foreign tables
+ Schema | Table |  Server   |              FDW Options              | Description 
+--------+-------+-----------+---------------------------------------+-------------
+ public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
+ public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
+ public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
+ public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
+(5 rows)
 
 -- Now we should be able to run ANALYZE.
 -- To exercise multiple code paths, we use local stats on ft1
@@ -160,8 +214,8 @@ SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
 (10 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-                                     QUERY PLAN                                      
--------------------------------------------------------------------------------------
+                                                     QUERY PLAN                                                      
+---------------------------------------------------------------------------------------------------------------------
  Limit
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    ->  Sort
@@ -169,7 +223,7 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET
          Sort Key: t1.c3, t1.c1
          ->  Foreign Scan on public.ft1 t1
                Output: c1, c2, c3, c4, c5, c6, c7, c8
-               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+               Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1"
 (8 rows)
 
 SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
@@ -189,8 +243,8 @@ SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 
 -- whole-row reference
 EXPLAIN (VERBOSE, COSTS false) SELECT t1 FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-                                     QUERY PLAN                                      
--------------------------------------------------------------------------------------
+                                                     QUERY PLAN                                                      
+---------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.*, c3, c1
    ->  Sort
@@ -198,7 +252,7 @@ EXPLAIN (VERBOSE, COSTS false) SELECT t1 FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSE
          Sort Key: t1.c3, t1.c1
          ->  Foreign Scan on public.ft1 t1
                Output: t1.*, c3, c1
-               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+               Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1"
 (8 rows)
 
 SELECT t1 FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
@@ -224,11 +278,11 @@ SELECT * FROM ft1 WHERE false;
 
 -- with WHERE clause
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
-                                                                   QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                   QUERY PLAN                                                                                   
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c7 >= '1'::bpchar)) AND (("C 1" = 101)) AND ((c6 = '1'::text))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((c7 >= '1'::bpchar)) AND (("C 1" = 101)) AND ((c6 = '1'::text))
 (3 rows)
 
 SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
@@ -239,13 +293,13 @@ SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
 
 -- with FOR UPDATE/SHARE
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE;
-                                                   QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------------
+                                                                   QUERY PLAN                                                                   
+------------------------------------------------------------------------------------------------------------------------------------------------
  LockRows
    Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
    ->  Foreign Scan on public.ft1 t1
          Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 101)) FOR UPDATE
+         Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 101)) FOR UPDATE
 (5 rows)
 
 SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE;
@@ -255,13 +309,13 @@ SELECT * FROM ft1 t1 WHERE c1 = 101 FOR UPDATE;
 (1 row)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
-                                                  QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
+                                                                  QUERY PLAN                                                                   
+-----------------------------------------------------------------------------------------------------------------------------------------------
  LockRows
    Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
    ->  Foreign Scan on public.ft1 t1
          Output: c1, c2, c3, c4, c5, c6, c7, c8, t1.*
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 102)) FOR SHARE
+         Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 102)) FOR SHARE
 (5 rows)
 
 SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
@@ -277,22 +331,6 @@ SELECT COUNT(*) FROM ft1 t1;
   1000
 (1 row)
 
--- join two tables
-SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
- c1  
------
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
-(10 rows)
-
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -353,153 +391,149 @@ CREATE OPERATOR === (
     NEGATOR = !==
 );
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_abs(t1.c2);
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                               QUERY PLAN                                                
+---------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c1 = postgres_fdw_abs(t1.c2))
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1"
 (4 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                               QUERY PLAN                                                
+---------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c1 === t1.c2)
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1"
 (4 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
-                                            QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
+                                                            QUERY PLAN                                                             
+-----------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = abs(c2)))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = abs(c2)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
-                                          QUERY PLAN                                          
-----------------------------------------------------------------------------------------------
+                                                          QUERY PLAN                                                          
+------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = c2))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = c2))
 (3 rows)
 
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
-                                                  QUERY PLAN                                                  
---------------------------------------------------------------------------------------------------------------
+                                                                  QUERY PLAN                                                                  
+----------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 100)) AND ((c2 = 0))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 100)) AND ((c2 = 0))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL;        -- NullTest
-                                           QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;    -- NullTest
-                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
+                                                             QUERY PLAN                                                              
+-------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
-                                                     QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
+                                                                     QUERY PLAN                                                                      
+-----------------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((round(abs("C 1"), 0) = 1::numeric))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((round(abs("C 1"), 0) = 1::numeric))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
-                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
+                                                             QUERY PLAN                                                              
+-------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = (- "C 1")))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!;           -- OpExpr(r)
-                                                QUERY PLAN                                                
-----------------------------------------------------------------------------------------------------------
+                                                                QUERY PLAN                                                                
+------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((1::numeric = ("C 1" !)))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((1::numeric = ("C 1" !)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
-                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                 QUERY PLAN                                                                                 
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL)))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL)))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
-                                                        QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
+                                                                        QUERY PLAN                                                                         
+-----------------------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = ANY (ARRAY[c2, 1, ("C 1" + 0)])))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = ANY (ARRAY[c2, 1, ("C 1" + 0)])))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
-                                                      QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------------
+                                                                      QUERY PLAN                                                                      
+------------------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = ((ARRAY["C 1", c2, 3])[1])))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = ((ARRAY["C 1", c2, 3])[1])))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c6 = E'foo''s\\bar';  -- check special chars
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
+                                                                 QUERY PLAN                                                                  
+---------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c6 = E'foo''s\\bar'::text))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((c6 = E'foo''s\\bar'::text))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't be sent to remote
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                                               QUERY PLAN                                                
+---------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c8 = 'foo'::user_enum)
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1"
 (4 rows)
 
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
+                                                                                                                                                                                                                                                                                     QUERY PLAN                                                                                                                                                                                                                                                                                      
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan
    Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+   Relations: (public.ft2 a) INNER JOIN (public.ft2 b)
+   Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8 FROM (SELECT l.a9, l.a10, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17 FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 47))) l) l (a1, a2, a3, a4, a5, a6, a7, a8) INNER JOIN (SELECT r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17 FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8) ON ((l.a2 = r.a1))
+(4 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -511,18 +545,18 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b
   WHERE a.c2 = 6 AND b.c1 = a.c1 AND a.c8 = 'foo' AND b.c7 = upper(a.c7);
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
+                                                                 QUERY PLAN                                                                 
+--------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop
    Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
    ->  Foreign Scan on public.ft2 a
          Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
          Filter: (a.c8 = 'foo'::user_enum)
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c2 = 6))
+         Remote SQL: SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((c2 = 6))
    ->  Foreign Scan on public.ft2 b
          Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
          Filter: (upper((a.c7)::text) = (b.c7)::text)
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
+         Remote SQL: SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
 (10 rows)
 
 SELECT * FROM ft2 a, ft2 b
@@ -651,21 +685,685 @@ SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
 (4 rows)
 
 -- ===================================================================
+-- JOIN queries
+-- ===================================================================
+-- join two tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                                                                                               QUERY PLAN                                                                                                                
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1, t1.c3
+   ->  Sort
+         Output: t1.c1, t2.c1, t1.c3
+         Sort Key: t1.c3, t1.c1
+         ->  Foreign Scan
+               Output: t1.c1, t2.c1, t1.c3
+               Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+               Remote SQL: SELECT l.a1, l.a2, r.a1 FROM (SELECT l.a10, l.a12 FROM (SELECT "C 1" a10, c3 a12 FROM "S 1"."T 1") l) l (a1, a2) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+(9 rows)
+
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1  | c1  
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- join three tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+                                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                                               
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c2, t3.c3, t1.c3
+   ->  Sort
+         Output: t1.c1, t2.c2, t3.c3, t1.c3
+         Sort Key: t1.c3, t1.c1
+         ->  Foreign Scan
+               Output: t1.c1, t2.c2, t3.c3, t1.c3
+               Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
+               Remote SQL: SELECT l.a1, l.a2, l.a3, r.a1 FROM (SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT l.a10, l.a12 FROM (SELECT "C 1" a10, c3 a12 FROM "S 1"."T 1") l) l (a1, a2) INNER JOIN (SELECT r.a10, r.a9 FROM (SELECT "C 1" a9, c2 a10 FROM "S 1"."T 1") r) r (a1, a2) ON ((l.a1 = r.a2))) l (a1, a2, a3, a4) INNER JOIN (SELECT r.a11, r.a9 FROM (SELECT c1 a9, c3 a11 FROM "S 1"."T 3") r) r (a1, a2) ON ((l.a1 = r.a2))
+(9 rows)
+
+SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+ c1 | c2 |   c3   
+----+----+--------
+ 22 |  2 | AAA022
+ 24 |  4 | AAA024
+ 26 |  6 | AAA026
+ 28 |  8 | AAA028
+ 30 |  0 | AAA030
+ 32 |  2 | AAA032
+ 34 |  4 | AAA034
+ 36 |  6 | AAA036
+ 38 |  8 | AAA038
+ 40 |  0 | AAA040
+(10 rows)
+
+-- left outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+                                                                                              QUERY PLAN                                                                                               
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Sort
+         Output: t1.c1, t2.c1
+         Sort Key: t1.c1, t2.c1
+         ->  Foreign Scan
+               Output: t1.c1, t2.c1
+               Relations: (public.ft4 t1) LEFT JOIN (public.ft5 t2)
+               Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") l) l (a1) LEFT JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") r) r (a1) ON ((l.a1 = r.a1))
+(9 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ c1 | c1 
+----+----
+ 22 |   
+ 24 | 24
+ 26 |   
+ 28 |   
+ 30 | 30
+ 32 |   
+ 34 |   
+ 36 | 36
+ 38 |   
+ 40 |   
+(10 rows)
+
+-- right outer join
+SET enable_mergejoin = off; -- planner choose MergeJoin even it has higher costs, so disable it for testing.
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 RIGHT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t2.c1 OFFSET 10 LIMIT 10;
+                                                                                              QUERY PLAN                                                                                               
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Sort
+         Output: t1.c1, t2.c1
+         Sort Key: t2.c1
+         ->  Foreign Scan
+               Output: t1.c1, t2.c1
+               Relations: (public.ft5 t2) LEFT JOIN (public.ft4 t1)
+               Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") l) l (a1) LEFT JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") r) r (a1) ON ((r.a1 = l.a1))
+(9 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 RIGHT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t2.c1 OFFSET 10 LIMIT 10;
+ c1 | c1 
+----+----
+    | 33
+ 36 | 36
+    | 39
+ 42 | 42
+    | 45
+ 48 | 48
+    | 51
+ 54 | 54
+    | 57
+ 60 | 60
+(10 rows)
+
+SET enable_mergejoin = on;
+-- full outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 45 LIMIT 10;
+                                                                                              QUERY PLAN                                                                                               
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Sort
+         Output: t1.c1, t2.c1
+         Sort Key: t1.c1, t2.c1
+         ->  Foreign Scan
+               Output: t1.c1, t2.c1
+               Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+               Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") l) l (a1) FULL JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") r) r (a1) ON ((l.a1 = r.a1))
+(9 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 45 LIMIT 10;
+ c1  | c1 
+-----+----
+  92 |   
+  94 |   
+  96 | 96
+  98 |   
+ 100 |   
+     |  3
+     |  9
+     | 15
+     | 21
+     | 27
+(10 rows)
+
+-- full outer join + WHERE clause, only matched rows
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+                                                                                                                   QUERY PLAN                                                                                                                    
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Sort
+         Output: t1.c1, t2.c1
+         Sort Key: t1.c1, t2.c1
+         ->  Foreign Scan
+               Output: t1.c1, t2.c1
+               Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+               Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") l) l (a1) FULL JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") r) r (a1) ON ((l.a1 = r.a1)) WHERE (((l.a1 = r.a1) OR (l.a1 IS NULL)))
+(9 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ c1 | c1 
+----+----
+ 66 | 66
+ 72 | 72
+ 78 | 78
+ 84 | 84
+ 90 | 90
+ 96 | 96
+    |  3
+    |  9
+    | 15
+    | 21
+(10 rows)
+
+-- join at WHERE clause 
+SET enable_mergejoin = off; -- planner choose MergeJoin even it has higher costs, so disable it for testing.
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+                                                                                               QUERY PLAN                                                                                               
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Sort
+         Output: t1.c1, t2.c1
+         Sort Key: t1.c1
+         ->  Foreign Scan
+               Output: t1.c1, t2.c1
+               Relations: (public.ft4 t1) INNER JOIN (public.ft5 t2)
+               Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") l) l (a1) INNER JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") r) r (a1) ON ((l.a1 = r.a1))
+(9 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ c1 | c1 
+----+----
+ 66 | 66
+ 72 | 72
+ 78 | 78
+ 84 | 84
+ 90 | 90
+ 96 | 96
+(6 rows)
+
+SET enable_mergejoin = on;
+-- join in CTE
+EXPLAIN (COSTS false, VERBOSE)
+WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t.c1_1, t.c2_1, t.c1_3
+   CTE t
+     ->  Foreign Scan
+           Output: t1.c1, t1.c3, t2.c1
+           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+           Remote SQL: SELECT l.a1, l.a2, r.a1 FROM (SELECT l.a10, l.a12 FROM (SELECT "C 1" a10, c3 a12 FROM "S 1"."T 1") l) l (a1, a2) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+   ->  Sort
+         Output: t.c1_1, t.c2_1, t.c1_3
+         Sort Key: t.c1_3, t.c1_1
+         ->  CTE Scan on t
+               Output: t.c1_1, t.c2_1, t.c1_3
+(12 rows)
+
+WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+ c1_1 | c2_1 
+------+------
+  101 |  101
+  102 |  102
+  103 |  103
+  104 |  104
+  105 |  105
+  106 |  106
+  107 |  107
+  108 |  108
+  109 |  109
+  110 |  110
+(10 rows)
+
+-- ctid with whole-row reference
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                                                                                                                                                                                                                                   QUERY PLAN                                                                                                                                                                                                                                                    
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
+   ->  Sort
+         Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
+         Sort Key: t1.c3, t1.c1
+         ->  Foreign Scan
+               Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
+               Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+               Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a10, l.a12 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17, ctid a7 FROM "S 1"."T 1") l) l (a1, a2, a3, a4) INNER JOIN (SELECT ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a9 FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2) ON ((l.a3 = r.a2))
+(9 rows)
+
+SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+  ctid  |                                             t1                                             |                                             t2                                             | c1  
+--------+--------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+-----
+ (1,4)  | (101,1,00101,"Fri Jan 02 00:00:00 1970 PST","Fri Jan 02 00:00:00 1970",1,"1         ",foo) | (101,1,00101,"Fri Jan 02 00:00:00 1970 PST","Fri Jan 02 00:00:00 1970",1,"1         ",foo) | 101
+ (1,5)  | (102,2,00102,"Sat Jan 03 00:00:00 1970 PST","Sat Jan 03 00:00:00 1970",2,"2         ",foo) | (102,2,00102,"Sat Jan 03 00:00:00 1970 PST","Sat Jan 03 00:00:00 1970",2,"2         ",foo) | 102
+ (1,6)  | (103,3,00103,"Sun Jan 04 00:00:00 1970 PST","Sun Jan 04 00:00:00 1970",3,"3         ",foo) | (103,3,00103,"Sun Jan 04 00:00:00 1970 PST","Sun Jan 04 00:00:00 1970",3,"3         ",foo) | 103
+ (1,7)  | (104,4,00104,"Mon Jan 05 00:00:00 1970 PST","Mon Jan 05 00:00:00 1970",4,"4         ",foo) | (104,4,00104,"Mon Jan 05 00:00:00 1970 PST","Mon Jan 05 00:00:00 1970",4,"4         ",foo) | 104
+ (1,8)  | (105,5,00105,"Tue Jan 06 00:00:00 1970 PST","Tue Jan 06 00:00:00 1970",5,"5         ",foo) | (105,5,00105,"Tue Jan 06 00:00:00 1970 PST","Tue Jan 06 00:00:00 1970",5,"5         ",foo) | 105
+ (1,9)  | (106,6,00106,"Wed Jan 07 00:00:00 1970 PST","Wed Jan 07 00:00:00 1970",6,"6         ",foo) | (106,6,00106,"Wed Jan 07 00:00:00 1970 PST","Wed Jan 07 00:00:00 1970",6,"6         ",foo) | 106
+ (1,10) | (107,7,00107,"Thu Jan 08 00:00:00 1970 PST","Thu Jan 08 00:00:00 1970",7,"7         ",foo) | (107,7,00107,"Thu Jan 08 00:00:00 1970 PST","Thu Jan 08 00:00:00 1970",7,"7         ",foo) | 107
+ (1,11) | (108,8,00108,"Fri Jan 09 00:00:00 1970 PST","Fri Jan 09 00:00:00 1970",8,"8         ",foo) | (108,8,00108,"Fri Jan 09 00:00:00 1970 PST","Fri Jan 09 00:00:00 1970",8,"8         ",foo) | 108
+ (1,12) | (109,9,00109,"Sat Jan 10 00:00:00 1970 PST","Sat Jan 10 00:00:00 1970",9,"9         ",foo) | (109,9,00109,"Sat Jan 10 00:00:00 1970 PST","Sat Jan 10 00:00:00 1970",9,"9         ",foo) | 109
+ (1,13) | (110,0,00110,"Sun Jan 11 00:00:00 1970 PST","Sun Jan 11 00:00:00 1970",0,"0         ",foo) | (110,0,00110,"Sun Jan 11 00:00:00 1970 PST","Sun Jan 11 00:00:00 1970",0,"0         ",foo) | 110
+(10 rows)
+
+-- partially unsafe to push down, not pushed down
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON t2.c1 = t2.c1 JOIN ft4 t3 ON t2.c1 = t3.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+                                                                                                               QUERY PLAN                                                                                                                
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1
+   ->  Sort
+         Output: t1.c1
+         Sort Key: t1.c1
+         ->  Nested Loop
+               Output: t1.c1
+               ->  Foreign Scan on public.ft1 t1
+                     Output: t1.c1
+                     Remote SQL: SELECT "C 1" a10 FROM "S 1"."T 1"
+               ->  Materialize
+                     ->  Foreign Scan
+                           Relations: (public.ft2 t2) INNER JOIN (public.ft4 t3)
+                           Remote SQL: SELECT NULL FROM (SELECT l.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1" WHERE (("C 1" = "C 1"))) l) l (a1) INNER JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") r) r (a1) ON ((l.a1 = r.a1))
+(14 rows)
+
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON t2.c1 = t2.c1 JOIN ft4 t3 ON t2.c1 = t3.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+ c1 
+----
+  1
+  1
+  1
+  1
+  1
+  1
+  1
+  1
+  1
+  1
+(10 rows)
+
+-- SEMI JOIN, not pushed down
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1
+   ->  Sort
+         Output: t1.c1
+         Sort Key: t1.c1
+         ->  Hash Join
+               Output: t1.c1
+               Hash Cond: (t1.c1 = t2.c1)
+               ->  Foreign Scan on public.ft1 t1
+                     Output: t1.c1
+                     Remote SQL: SELECT "C 1" a10 FROM "S 1"."T 1"
+               ->  Hash
+                     Output: t2.c1
+                     ->  HashAggregate
+                           Output: t2.c1
+                           Group Key: t2.c1
+                           ->  Foreign Scan on public.ft2 t2
+                                 Output: t2.c1
+                                 Remote SQL: SELECT "C 1" a9 FROM "S 1"."T 1"
+(19 rows)
+
+SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ c1  
+-----
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
+-- ANTI JOIN, not pushed down
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Limit
+   Output: t1.c1
+   ->  Sort
+         Output: t1.c1
+         Sort Key: t1.c1
+         ->  Hash Anti Join
+               Output: t1.c1
+               Hash Cond: (t1.c1 = t2.c2)
+               ->  Foreign Scan on public.ft1 t1
+                     Output: t1.c1
+                     Remote SQL: SELECT "C 1" a10 FROM "S 1"."T 1"
+               ->  Hash
+                     Output: t2.c2
+                     ->  Foreign Scan on public.ft2 t2
+                           Output: t2.c2
+                           Remote SQL: SELECT c2 a10 FROM "S 1"."T 1"
+(16 rows)
+
+SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ c1  
+-----
+ 110
+ 111
+ 112
+ 113
+ 114
+ 115
+ 116
+ 117
+ 118
+ 119
+(10 rows)
+
+-- CROSS JOIN, not pushed down
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Sort
+         Output: t1.c1, t2.c1
+         Sort Key: t1.c1, t2.c1
+         ->  Nested Loop
+               Output: t1.c1, t2.c1
+               ->  Foreign Scan on public.ft1 t1
+                     Output: t1.c1
+                     Remote SQL: SELECT "C 1" a10 FROM "S 1"."T 1"
+               ->  Materialize
+                     Output: t2.c1
+                     ->  Foreign Scan on public.ft2 t2
+                           Output: t2.c1
+                           Remote SQL: SELECT "C 1" a9 FROM "S 1"."T 1"
+(15 rows)
+
+SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ c1 | c1  
+----+-----
+  1 | 101
+  1 | 102
+  1 | 103
+  1 | 104
+  1 | 105
+  1 | 106
+  1 | 107
+  1 | 108
+  1 | 109
+  1 | 110
+(10 rows)
+
+-- different server
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Merge Join
+         Output: t1.c1, t2.c1
+         Merge Cond: (t1.c1 = t2.c1)
+         ->  Sort
+               Output: t1.c1
+               Sort Key: t1.c1
+               ->  Foreign Scan on public.ft5 t1
+                     Output: t1.c1
+                     Remote SQL: SELECT c1 a9 FROM "S 1"."T 4"
+         ->  Sort
+               Output: t2.c1
+               Sort Key: t2.c1
+               ->  Foreign Scan on public.ft6 t2
+                     Output: t2.c1
+                     Remote SQL: SELECT c1 a9 FROM "S 1"."T 4"
+(17 rows)
+
+SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ c1 | c1 
+----+----
+(0 rows)
+
+-- different effective user for permission check
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ Limit
+   Output: t1.c1, ft5.c1
+   ->  Merge Join
+         Output: t1.c1, ft5.c1
+         Merge Cond: (t1.c1 = ft5.c1)
+         ->  Sort
+               Output: t1.c1
+               Sort Key: t1.c1
+               ->  Foreign Scan on public.ft5 t1
+                     Output: t1.c1
+                     Remote SQL: SELECT c1 a9 FROM "S 1"."T 4"
+         ->  Sort
+               Output: ft5.c1
+               Sort Key: ft5.c1
+               ->  Foreign Scan on public.ft5
+                     Output: ft5.c1
+                     Remote SQL: SELECT c1 a9 FROM "S 1"."T 4"
+(17 rows)
+
+SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ c1 | c1 
+----+----
+(0 rows)
+
+-- unsafe join conditions
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c8 = t2.c8) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1, t1.c3
+   ->  Sort
+         Output: t1.c1, t2.c1, t1.c3
+         Sort Key: t1.c3, t1.c1
+         ->  Merge Join
+               Output: t1.c1, t2.c1, t1.c3
+               Merge Cond: (t1.c8 = t2.c8)
+               ->  Sort
+                     Output: t1.c1, t1.c3, t1.c8
+                     Sort Key: t1.c8
+                     ->  Foreign Scan on public.ft1 t1
+                           Output: t1.c1, t1.c3, t1.c8
+                           Remote SQL: SELECT "C 1" a10, c3 a12, c8 a17 FROM "S 1"."T 1"
+               ->  Sort
+                     Output: t2.c1, t2.c8
+                     Sort Key: t2.c8
+                     ->  Foreign Scan on public.ft2 t2
+                           Output: t2.c1, t2.c8
+                           Remote SQL: SELECT "C 1" a9, c8 a17 FROM "S 1"."T 1"
+(20 rows)
+
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c8 = t2.c8) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1 | c1  
+----+-----
+  1 | 102
+  1 | 103
+  1 | 104
+  1 | 105
+  1 | 106
+  1 | 107
+  1 | 108
+  1 | 109
+  1 | 110
+  1 |   1
+(10 rows)
+
+-- local filter (unsafe conditions on one side)
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1, t1.c3
+   ->  Sort
+         Output: t1.c1, t2.c1, t1.c3
+         Sort Key: t1.c3, t1.c1
+         ->  Hash Join
+               Output: t1.c1, t2.c1, t1.c3
+               Hash Cond: (t2.c1 = t1.c1)
+               ->  Foreign Scan on public.ft2 t2
+                     Output: t2.c1
+                     Remote SQL: SELECT "C 1" a9 FROM "S 1"."T 1"
+               ->  Hash
+                     Output: t1.c1, t1.c3
+                     ->  Foreign Scan on public.ft1 t1
+                           Output: t1.c1, t1.c3
+                           Filter: (t1.c8 = 'foo'::user_enum)
+                           Remote SQL: SELECT "C 1" a10, c3 a12, c8 a17 FROM "S 1"."T 1"
+(17 rows)
+
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1  | c1  
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- Aggregate after UNION, for testing setrefs
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
+                                                                                                            QUERY PLAN                                                                                                            
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, (avg((t1.c1 + t2.c1)))
+   ->  Sort
+         Output: t1.c1, (avg((t1.c1 + t2.c1)))
+         Sort Key: t1.c1
+         ->  HashAggregate
+               Output: t1.c1, avg((t1.c1 + t2.c1))
+               Group Key: t1.c1
+               ->  HashAggregate
+                     Output: t1.c1, t2.c1
+                     Group Key: t1.c1, t2.c1
+                     ->  Append
+                           ->  Foreign Scan
+                                 Output: t1.c1, t2.c1
+                                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+                                 Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a10 FROM (SELECT "C 1" a10 FROM "S 1"."T 1") l) l (a1) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+                           ->  Foreign Scan
+                                 Output: t1_1.c1, t2_1.c1
+                                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+                                 Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a10 FROM (SELECT "C 1" a10 FROM "S 1"."T 1") l) l (a1) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+(20 rows)
+
+SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
+ t1c1 |         avg          
+------+----------------------
+  101 | 202.0000000000000000
+  102 | 204.0000000000000000
+  103 | 206.0000000000000000
+  104 | 208.0000000000000000
+  105 | 210.0000000000000000
+  106 | 212.0000000000000000
+  107 | 214.0000000000000000
+  108 | 216.0000000000000000
+  109 | 218.0000000000000000
+  110 | 220.0000000000000000
+(10 rows)
+
+-- join two foreign tables and two local tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON t1.c1 = t2.c1 JOIN "S 1"."T 1" t3 ON t1.c1 = t3."C 1" JOIN "S 1"."T 2" t4 ON t1.c1 = t4.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+                                                                                                     QUERY PLAN                                                                                                      
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2.c1
+   ->  Sort
+         Output: t1.c1, t2.c1
+         Sort Key: t1.c1
+         ->  Hash Join
+               Output: t1.c1, t2.c1
+               Hash Cond: (t1.c1 = t3."C 1")
+               ->  Foreign Scan
+                     Output: t1.c1, t2.c1
+                     Relations: (public.ft1 t1) LEFT JOIN (public.ft2 t2)
+                     Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a10 FROM (SELECT "C 1" a10 FROM "S 1"."T 1") l) l (a1) LEFT JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+               ->  Hash
+                     Output: t3."C 1", t4.c1
+                     ->  Merge Join
+                           Output: t3."C 1", t4.c1
+                           Merge Cond: (t3."C 1" = t4.c1)
+                           ->  Index Only Scan using t1_pkey on "S 1"."T 1" t3
+                                 Output: t3."C 1"
+                           ->  Sort
+                                 Output: t4.c1
+                                 Sort Key: t4.c1
+                                 ->  Seq Scan on "S 1"."T 2" t4
+                                       Output: t4.c1
+(24 rows)
+
+SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON t1.c1 = t2.c1 JOIN "S 1"."T 1" t3 ON t1.c1 = t3."C 1" JOIN "S 1"."T 2" t4 ON t1.c1 = t4.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+ c1 | c1 
+----+----
+ 11 | 11
+ 12 | 12
+ 13 | 13
+ 14 | 14
+ 15 | 15
+ 16 | 16
+ 17 | 17
+ 18 | 18
+ 19 | 19
+ 20 | 20
+(10 rows)
+
+-- ===================================================================
 -- parameterized queries
 -- ===================================================================
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
+                               QUERY PLAN                               
+------------------------------------------------------------------------
  Nested Loop
    Output: t1.c3, t2.c3
    ->  Foreign Scan on public.ft1 t1
          Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+         Remote SQL: SELECT c3 a12 FROM "S 1"."T 1" WHERE (("C 1" = 1))
    ->  Foreign Scan on public.ft2 t2
          Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
+         Remote SQL: SELECT c3 a12 FROM "S 1"."T 1" WHERE (("C 1" = 2))
 (8 rows)
 
 EXECUTE st1(1, 1);
@@ -683,8 +1381,8 @@ EXECUTE st1(101, 101);
 -- subquery using stable function (can't be sent to remote)
 PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
-                                                QUERY PLAN                                                
-----------------------------------------------------------------------------------------------------------
+                                                                QUERY PLAN                                                                
+------------------------------------------------------------------------------------------------------------------------------------------
  Sort
    Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
    Sort Key: t1.c1
@@ -693,13 +1391,13 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
          Join Filter: (t1.c3 = t2.c3)
          ->  Foreign Scan on public.ft1 t1
                Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
+               Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" < 20))
          ->  Materialize
                Output: t2.c3
                ->  Foreign Scan on public.ft2 t2
                      Output: t2.c3
                      Filter: (date(t2.c4) = '01-17-1970'::date)
-                     Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+                     Remote SQL: SELECT c3 a12, c4 a13 FROM "S 1"."T 1" WHERE (("C 1" > 10))
 (15 rows)
 
 EXECUTE st2(10, 20);
@@ -717,8 +1415,8 @@ EXECUTE st2(101, 121);
 -- subquery using immutable function (can be sent to remote)
 PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
-                                                      QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
+                                                                QUERY PLAN                                                                
+------------------------------------------------------------------------------------------------------------------------------------------
  Sort
    Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
    Sort Key: t1.c1
@@ -727,12 +1425,12 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
          Join Filter: (t1.c3 = t2.c3)
          ->  Foreign Scan on public.ft1 t1
                Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
-               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
+               Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" < 20))
          ->  Materialize
                Output: t2.c3
                ->  Foreign Scan on public.ft2 t2
                      Output: t2.c3
-                     Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
+                     Remote SQL: SELECT c3 a12 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
 (14 rows)
 
 EXECUTE st3(10, 20);
@@ -749,108 +1447,108 @@ EXECUTE st3(20, 30);
 -- custom plan should be chosen initially
 PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
 
 -- once we try it enough times, should switch to generic plan
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
-                                              QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
+                                                              QUERY PLAN                                                               
+---------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = $1::integer))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = $1::integer))
 (3 rows)
 
 -- value of $1 should not be sent to remote
 PREPARE st5(user_enum,int) AS SELECT * FROM ft1 t1 WHERE c8 = $1 and c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st5('foo', 1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c8 = 'foo'::user_enum)
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (4 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st5('foo', 1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c8 = 'foo'::user_enum)
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (4 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st5('foo', 1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c8 = 'foo'::user_enum)
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (4 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st5('foo', 1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c8 = 'foo'::user_enum)
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (4 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st5('foo', 1);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c8 = 'foo'::user_enum)
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (4 rows)
 
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st5('foo', 1);
-                                              QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
+                                                              QUERY PLAN                                                               
+---------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Filter: (t1.c8 = $1)
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = $1::integer))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = $1::integer))
 (4 rows)
 
 EXECUTE st5('foo', 1);
@@ -868,14 +1566,14 @@ DEALLOCATE st5;
 -- System columns, except ctid, should not be sent to remote
 EXPLAIN (VERBOSE, COSTS false)
 SELECT * FROM ft1 t1 WHERE t1.tableoid = 'pg_class'::regclass LIMIT 1;
-                                  QUERY PLAN                                   
--------------------------------------------------------------------------------
+                                                  QUERY PLAN                                                   
+---------------------------------------------------------------------------------------------------------------
  Limit
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    ->  Foreign Scan on public.ft1 t1
          Output: c1, c2, c3, c4, c5, c6, c7, c8
          Filter: (t1.tableoid = '1259'::oid)
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+         Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1"
 (6 rows)
 
 SELECT * FROM ft1 t1 WHERE t1.tableoid = 'ft1'::regclass LIMIT 1;
@@ -886,13 +1584,13 @@ SELECT * FROM ft1 t1 WHERE t1.tableoid = 'ft1'::regclass LIMIT 1;
 
 EXPLAIN (VERBOSE, COSTS false)
 SELECT tableoid::regclass, * FROM ft1 t1 LIMIT 1;
-                                  QUERY PLAN                                   
--------------------------------------------------------------------------------
+                                                  QUERY PLAN                                                   
+---------------------------------------------------------------------------------------------------------------
  Limit
    Output: ((tableoid)::regclass), c1, c2, c3, c4, c5, c6, c7, c8
    ->  Foreign Scan on public.ft1 t1
          Output: (tableoid)::regclass, c1, c2, c3, c4, c5, c6, c7, c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+         Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1"
 (5 rows)
 
 SELECT tableoid::regclass, * FROM ft1 t1 LIMIT 1;
@@ -903,11 +1601,11 @@ SELECT tableoid::regclass, * FROM ft1 t1 LIMIT 1;
 
 EXPLAIN (VERBOSE, COSTS false)
 SELECT * FROM ft1 t1 WHERE t1.ctid = '(0,2)';
-                                              QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
+                                                              QUERY PLAN                                                               
+---------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((ctid = '(0,2)'::tid))
+   Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((ctid = '(0,2)'::tid))
 (3 rows)
 
 SELECT * FROM ft1 t1 WHERE t1.ctid = '(0,2)';
@@ -918,13 +1616,13 @@ SELECT * FROM ft1 t1 WHERE t1.ctid = '(0,2)';
 
 EXPLAIN (VERBOSE, COSTS false)
 SELECT ctid, * FROM ft1 t1 LIMIT 1;
-                                     QUERY PLAN                                      
--------------------------------------------------------------------------------------
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: ctid, c1, c2, c3, c4, c5, c6, c7, c8
    ->  Foreign Scan on public.ft1 t1
          Output: ctid, c1, c2, c3, c4, c5, c6, c7, c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1"
+         Remote SQL: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17, ctid a7 FROM "S 1"."T 1"
 (5 rows)
 
 SELECT ctid, * FROM ft1 t1 LIMIT 1;
@@ -987,7 +1685,7 @@ FETCH c;
 SAVEPOINT s;
 SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0;  -- ERROR
 ERROR:  division by zero
-CONTEXT:  Remote SQL command: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((1 / ("C 1" - 1)) > 0))
+CONTEXT:  Remote SQL command: SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (((1 / ("C 1" - 1)) > 0))
 ROLLBACK TO s;
 FETCH c;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -1010,64 +1708,64 @@ create foreign table ft3 (f1 text collate "C", f2 text)
   server loopback options (table_name 'loct3');
 -- can be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 = 'foo';
-                                QUERY PLAN                                
---------------------------------------------------------------------------
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2
-   Remote SQL: SELECT f1, f2 FROM public.loct3 WHERE ((f1 = 'foo'::text))
+   Remote SQL: SELECT f1 a9, f2 a10 FROM public.loct3 WHERE ((f1 = 'foo'::text))
 (3 rows)
 
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "C" = 'foo';
-                                QUERY PLAN                                
---------------------------------------------------------------------------
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2
-   Remote SQL: SELECT f1, f2 FROM public.loct3 WHERE ((f1 = 'foo'::text))
+   Remote SQL: SELECT f1 a9, f2 a10 FROM public.loct3 WHERE ((f1 = 'foo'::text))
 (3 rows)
 
 explain (verbose, costs off) select * from ft3 where f2 = 'foo';
-                                QUERY PLAN                                
---------------------------------------------------------------------------
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2
-   Remote SQL: SELECT f1, f2 FROM public.loct3 WHERE ((f2 = 'foo'::text))
+   Remote SQL: SELECT f1 a9, f2 a10 FROM public.loct3 WHERE ((f2 = 'foo'::text))
 (3 rows)
 
 -- can't be sent to remote
 explain (verbose, costs off) select * from ft3 where f1 COLLATE "POSIX" = 'foo';
-                  QUERY PLAN                   
------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2
    Filter: ((ft3.f1)::text = 'foo'::text)
-   Remote SQL: SELECT f1, f2 FROM public.loct3
+   Remote SQL: SELECT f1 a9, f2 a10 FROM public.loct3
 (4 rows)
 
 explain (verbose, costs off) select * from ft3 where f1 = 'foo' COLLATE "C";
-                  QUERY PLAN                   
------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2
    Filter: (ft3.f1 = 'foo'::text COLLATE "C")
-   Remote SQL: SELECT f1, f2 FROM public.loct3
+   Remote SQL: SELECT f1 a9, f2 a10 FROM public.loct3
 (4 rows)
 
 explain (verbose, costs off) select * from ft3 where f2 COLLATE "C" = 'foo';
-                  QUERY PLAN                   
------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2
    Filter: ((ft3.f2)::text = 'foo'::text)
-   Remote SQL: SELECT f1, f2 FROM public.loct3
+   Remote SQL: SELECT f1 a9, f2 a10 FROM public.loct3
 (4 rows)
 
 explain (verbose, costs off) select * from ft3 where f2 = 'foo' COLLATE "C";
-                  QUERY PLAN                   
------------------------------------------------
+                      QUERY PLAN                      
+------------------------------------------------------
  Foreign Scan on public.ft3
    Output: f1, f2
    Filter: (ft3.f2 = 'foo'::text COLLATE "C")
-   Remote SQL: SELECT f1, f2 FROM public.loct3
+   Remote SQL: SELECT f1 a9, f2 a10 FROM public.loct3
 (4 rows)
 
 -- ===================================================================
@@ -1085,7 +1783,7 @@ INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
                Output: ((ft2_1.c1 + 1000)), ((ft2_1.c2 + 100)), ((ft2_1.c3 || ft2_1.c3))
                ->  Foreign Scan on public.ft2 ft2_1
                      Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
-                     Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1"
+                     Remote SQL: SELECT "C 1" a9, c2 a10, c3 a12 FROM "S 1"."T 1"
 (9 rows)
 
 INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
@@ -1210,35 +1908,28 @@ UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING
 EXPLAIN (verbose, costs off)
 UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
-                                                                            QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                                                                                                                                                                       QUERY PLAN                                                                                                                                                                                                                                                                       
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Update on public.ft2
    Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c3 = $3, c7 = $4 WHERE ctid = $1
-   ->  Hash Join
+   ->  Foreign Scan
          Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
-         Hash Cond: (ft2.c2 = ft1.c1)
-         ->  Foreign Scan on public.ft2
-               Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
-               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
-         ->  Hash
-               Output: ft1.*, ft1.c1
-               ->  Foreign Scan on public.ft1
-                     Output: ft1.*, ft1.c1
-                     Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
-(13 rows)
+         Relations: (public.ft2) INNER JOIN (public.ft1)
+         Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, r.a1 FROM (SELECT l.a9, l.a10, l.a12, l.a13, l.a14, l.a15, l.a17, l.a7 FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c8 a17, ctid a7 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8) INNER JOIN (SELECT ROW(r.a10, r.a11, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a10 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))) r) r (a1, a2) ON ((l.a2 = r.a2))
+(6 rows)
 
 UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
   FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
 EXPLAIN (verbose, costs off)
   DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
-                                       QUERY PLAN                                       
-----------------------------------------------------------------------------------------
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
  Delete on public.ft2
    Output: c1, c4
-   Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c4
+   Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1" a9, c4 a13
    ->  Foreign Scan on public.ft2
          Output: ctid
-         Remote SQL: SELECT ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
+         Remote SQL: SELECT ctid a7 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
 (6 rows)
 
 DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
@@ -1351,22 +2042,15 @@ DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
 
 EXPLAIN (verbose, costs off)
 DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
-                                                      QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------------
+                                                                                                                                                                                        QUERY PLAN                                                                                                                                                                                         
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Delete on public.ft2
    Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1
-   ->  Hash Join
+   ->  Foreign Scan
          Output: ft2.ctid, ft1.*
-         Hash Cond: (ft2.c2 = ft1.c1)
-         ->  Foreign Scan on public.ft2
-               Output: ft2.ctid, ft2.c2
-               Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
-         ->  Hash
-               Output: ft1.*, ft1.c1
-               ->  Foreign Scan on public.ft1
-                     Output: ft1.*, ft1.c1
-                     Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
-(13 rows)
+         Relations: (public.ft2) INNER JOIN (public.ft1)
+         Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a7, l.a10 FROM (SELECT c2 a10, ctid a7 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT ROW(r.a10, r.a11, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a10 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))) r) r (a1, a2) ON ((l.a2 = r.a2))
+(6 rows)
 
 DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
 SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
@@ -3190,8 +3874,8 @@ insert into bar2 values(4,44,44);
 insert into bar2 values(7,77,77);
 explain (verbose, costs off)
 select * from bar where f1 in (select f1 from foo) for update;
-                                          QUERY PLAN                                          
-----------------------------------------------------------------------------------------------
+                                             QUERY PLAN                                              
+-----------------------------------------------------------------------------------------------------
  LockRows
    Output: bar.f1, bar.f2, bar.ctid, bar.*, bar.tableoid, foo.ctid, foo.*, foo.tableoid
    ->  Hash Join
@@ -3202,7 +3886,7 @@ select * from bar where f1 in (select f1 from foo) for update;
                      Output: bar.f1, bar.f2, bar.ctid, bar.*, bar.tableoid
                ->  Foreign Scan on public.bar2
                      Output: bar2.f1, bar2.f2, bar2.ctid, bar2.*, bar2.tableoid
-                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+                     Remote SQL: SELECT f1 a9, f2 a10, f3 a11, ctid a7 FROM public.loct2 FOR UPDATE
          ->  Hash
                Output: foo.ctid, foo.*, foo.tableoid, foo.f1
                ->  HashAggregate
@@ -3213,7 +3897,7 @@ select * from bar where f1 in (select f1 from foo) for update;
                                  Output: foo.ctid, foo.*, foo.tableoid, foo.f1
                            ->  Foreign Scan on public.foo2
                                  Output: foo2.ctid, foo2.*, foo2.tableoid, foo2.f1
-                                 Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
+                                 Remote SQL: SELECT f1 a9, f2 a10, f3 a11, ctid a7 FROM public.loct1
 (22 rows)
 
 select * from bar where f1 in (select f1 from foo) for update;
@@ -3227,8 +3911,8 @@ select * from bar where f1 in (select f1 from foo) for update;
 
 explain (verbose, costs off)
 select * from bar where f1 in (select f1 from foo) for share;
-                                          QUERY PLAN                                          
-----------------------------------------------------------------------------------------------
+                                             QUERY PLAN                                              
+-----------------------------------------------------------------------------------------------------
  LockRows
    Output: bar.f1, bar.f2, bar.ctid, bar.*, bar.tableoid, foo.ctid, foo.*, foo.tableoid
    ->  Hash Join
@@ -3239,7 +3923,7 @@ select * from bar where f1 in (select f1 from foo) for share;
                      Output: bar.f1, bar.f2, bar.ctid, bar.*, bar.tableoid
                ->  Foreign Scan on public.bar2
                      Output: bar2.f1, bar2.f2, bar2.ctid, bar2.*, bar2.tableoid
-                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR SHARE
+                     Remote SQL: SELECT f1 a9, f2 a10, f3 a11, ctid a7 FROM public.loct2 FOR SHARE
          ->  Hash
                Output: foo.ctid, foo.*, foo.tableoid, foo.f1
                ->  HashAggregate
@@ -3250,7 +3934,7 @@ select * from bar where f1 in (select f1 from foo) for share;
                                  Output: foo.ctid, foo.*, foo.tableoid, foo.f1
                            ->  Foreign Scan on public.foo2
                                  Output: foo2.ctid, foo2.*, foo2.tableoid, foo2.f1
-                                 Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
+                                 Remote SQL: SELECT f1 a9, f2 a10, f3 a11, ctid a7 FROM public.loct1
 (22 rows)
 
 select * from bar where f1 in (select f1 from foo) for share;
@@ -3265,8 +3949,8 @@ select * from bar where f1 in (select f1 from foo) for share;
 -- Check UPDATE with inherited target and an inherited source table
 explain (verbose, costs off)
 update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
+                                             QUERY PLAN                                              
+-----------------------------------------------------------------------------------------------------
  Update on public.bar
    Update on public.bar
    Foreign Update on public.bar2
@@ -3286,13 +3970,13 @@ update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
                                  Output: foo.ctid, foo.*, foo.tableoid, foo.f1
                            ->  Foreign Scan on public.foo2
                                  Output: foo2.ctid, foo2.*, foo2.tableoid, foo2.f1
-                                 Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
+                                 Remote SQL: SELECT f1 a9, f2 a10, f3 a11, ctid a7 FROM public.loct1
    ->  Hash Join
          Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, foo.ctid, foo.*, foo.tableoid
          Hash Cond: (bar2.f1 = foo.f1)
          ->  Foreign Scan on public.bar2
                Output: bar2.f1, bar2.f2, bar2.f3, bar2.ctid
-               Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+               Remote SQL: SELECT f1 a9, f2 a10, f3 a11, ctid a7 FROM public.loct2 FOR UPDATE
          ->  Hash
                Output: foo.ctid, foo.*, foo.tableoid, foo.f1
                ->  HashAggregate
@@ -3303,7 +3987,7 @@ update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
                                  Output: foo.ctid, foo.*, foo.tableoid, foo.f1
                            ->  Foreign Scan on public.foo2
                                  Output: foo2.ctid, foo2.*, foo2.tableoid, foo2.f1
-                                 Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
+                                 Remote SQL: SELECT f1 a9, f2 a10, f3 a11, ctid a7 FROM public.loct1
 (37 rows)
 
 update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
@@ -3324,8 +4008,8 @@ update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                             QUERY PLAN                                             
+----------------------------------------------------------------------------------------------------
  Update on public.bar
    Update on public.bar
    Foreign Update on public.bar2
@@ -3338,12 +4022,12 @@ where bar.f1 = ss.f1;
                      Output: ROW(foo.f1), foo.f1
                ->  Foreign Scan on public.foo2
                      Output: ROW(foo2.f1), foo2.f1
-                     Remote SQL: SELECT f1 FROM public.loct1
+                     Remote SQL: SELECT f1 a9 FROM public.loct1
                ->  Seq Scan on public.foo foo_1
                      Output: ROW((foo_1.f1 + 3)), (foo_1.f1 + 3)
                ->  Foreign Scan on public.foo2 foo2_1
                      Output: ROW((foo2_1.f1 + 3)), (foo2_1.f1 + 3)
-                     Remote SQL: SELECT f1 FROM public.loct1
+                     Remote SQL: SELECT f1 a9 FROM public.loct1
          ->  Hash
                Output: bar.f1, bar.f2, bar.ctid
                ->  Seq Scan on public.bar
@@ -3356,7 +4040,7 @@ where bar.f1 = ss.f1;
                Sort Key: bar2.f1
                ->  Foreign Scan on public.bar2
                      Output: bar2.f1, bar2.f2, bar2.f3, bar2.ctid
-                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+                     Remote SQL: SELECT f1 a9, f2 a10, f3 a11, ctid a7 FROM public.loct2 FOR UPDATE
          ->  Sort
                Output: (ROW(foo.f1)), foo.f1
                Sort Key: foo.f1
@@ -3365,12 +4049,12 @@ where bar.f1 = ss.f1;
                            Output: ROW(foo.f1), foo.f1
                      ->  Foreign Scan on public.foo2
                            Output: ROW(foo2.f1), foo2.f1
-                           Remote SQL: SELECT f1 FROM public.loct1
+                           Remote SQL: SELECT f1 a9 FROM public.loct1
                      ->  Seq Scan on public.foo foo_1
                            Output: ROW((foo_1.f1 + 3)), (foo_1.f1 + 3)
                      ->  Foreign Scan on public.foo2 foo2_1
                            Output: ROW((foo2_1.f1 + 3)), (foo2_1.f1 + 3)
-                           Remote SQL: SELECT f1 FROM public.loct1
+                           Remote SQL: SELECT f1 a9 FROM public.loct1
 (45 rows)
 
 update bar set f2 = f2 + 100
@@ -3636,3 +4320,6 @@ QUERY:  CREATE FOREIGN TABLE t5 (
 OPTIONS (schema_name 'import_source', table_name 't5');
 CONTEXT:  importing foreign table "t5"
 ROLLBACK;
+-- Cleanup
+DROP OWNED BY view_owner;
+DROP USER view_owner;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 478e124..2cda436 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -28,7 +28,6 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
-#include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
@@ -47,41 +46,8 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
- * Indexes of FDW-private information stored in fdw_private lists.
+ * Indexes of FDW-private information stored in fdw_private of ForeignScan of
+ * a simple foreign table scan for a SELECT statement.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
  * planner to executor.  Currently we store:
@@ -98,7 +64,13 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer value of server for the scan */
+	FdwScanPrivateServerOid,
+	/* Integer value of effective userid for the scan */
+	FdwScanPrivateUserOid,
+	/* Names of relation scanned, added when the scan is join */
+	FdwScanPrivateRelations,
 };
 
 /*
@@ -128,7 +100,8 @@ enum FdwModifyPrivateIndex
  */
 typedef struct PgFdwScanState
 {
-	Relation	rel;			/* relcache entry for the foreign table */
+	const char *relname;		/* name of relation being scanned */
+	TupleDesc	tupdesc;		/* tuple descriptor of the scan */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -194,6 +167,8 @@ typedef struct PgFdwAnalyzeState
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 	List	   *retrieved_attrs;	/* attr numbers retrieved by query */
 
+	char	   *query;			/* text of SELECT command */
+
 	/* collected sample rows */
 	HeapTuple  *rows;			/* array of size targrows */
 	int			targrows;		/* target # of sample rows */
@@ -214,7 +189,10 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of relation being processed, or NULL for
+								   a foreign join */
+	const char *query;			/* query being processed */
+	TupleDesc	tupdesc;		/* tuple descriptor for attribute names */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -288,6 +266,16 @@ static bool postgresAnalyzeForeignTable(Relation relation,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static void postgresGetForeignJoinPaths(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										List *restrictlist,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
 
 /*
  * Helper functions
@@ -323,12 +311,40 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   const char *query,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+/*
+ * Describe Bitmapset as comma-separated integer list.
+ * For debug purpose.
+ * XXX Can this become a member of bitmapset.c?
+ */
+static char *
+bms_to_str(Bitmapset *bmp)
+{
+	StringInfoData buf;
+	bool		first = true;
+	int			x;
+
+	initStringInfo(&buf);
+
+	x = -1;
+	while ((x = bms_next_member(bmp, x)) >= 0)
+	{
+		if (!first)
+			appendStringInfoString(&buf, ", ");
+		appendStringInfo(&buf, "%d", x);
+
+		first = false;
+	}
+
+	return buf.data;
+}
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -368,6 +384,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	routine->ImportForeignSchema = postgresImportForeignSchema;
 
+	/* Support functions for join push-down */
+	routine->GetForeignJoinPaths = postgresGetForeignJoinPaths;
+
 	PG_RETURN_POINTER(routine);
 }
 
@@ -383,7 +402,9 @@ postgresGetForeignRelSize(PlannerInfo *root,
 						  RelOptInfo *baserel,
 						  Oid foreigntableid)
 {
+	RangeTblEntry *rte;
 	PgFdwRelationInfo *fpinfo;
+	ForeignTable *table;
 	ListCell   *lc;
 
 	/*
@@ -394,8 +415,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	baserel->fdw_private = (void *) fpinfo;
 
 	/* Look up foreign-table catalog info. */
-	fpinfo->table = GetForeignTable(foreigntableid);
-	fpinfo->server = GetForeignServer(fpinfo->table->serverid);
+	table = GetForeignTable(foreigntableid);
+	fpinfo->server = GetForeignServer(table->serverid);
 
 	/*
 	 * Extract user-settable option values.  Note that per-table setting of
@@ -416,7 +437,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
 			fpinfo->fdw_tuple_cost = strtod(defGetString(def), NULL);
 	}
-	foreach(lc, fpinfo->table->options)
+	foreach(lc, table->options)
 	{
 		DefElem    *def = (DefElem *) lfirst(lc);
 
@@ -428,20 +449,12 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	}
 
 	/*
-	 * If the table or the server is configured to use remote estimates,
-	 * identify which user to do remote access as during planning.  This
+	 * Identify which user to do remote access as during planning.  This
 	 * should match what ExecCheckRTEPerms() does.  If we fail due to lack of
 	 * permissions, the query would have failed at runtime anyway.
 	 */
-	if (fpinfo->use_remote_estimate)
-	{
-		RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
-		Oid			userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-		fpinfo->user = GetUserMapping(userid, fpinfo->server->serverid);
-	}
-	else
-		fpinfo->user = NULL;
+	rte = planner_rt_fetch(baserel->relid, root);
+	fpinfo->userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
 
 	/*
 	 * Identify which baserestrictinfo clauses can be sent to the remote
@@ -465,8 +478,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
 
-		pull_varattnos((Node *) rinfo->clause, baserel->relid,
-					   &fpinfo->attrs_used);
+		pull_varattnos((Node *) rinfo->clause, baserel->relid, &fpinfo->attrs_used);
 	}
 
 	/*
@@ -752,6 +764,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	List	   *retrieved_attrs;
 	StringInfoData sql;
 	ListCell   *lc;
+	List	   *fdw_ps_tlist = NIL;
+	ForeignScan *scan;
+	StringInfoData relations;
 
 	/*
 	 * Separate the scan_clauses into those that can be executed remotely and
@@ -768,9 +783,6 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 *
 	 * This code must match "extract_actual_clauses(scan_clauses, false)"
 	 * except for the additional decision about remote versus local execution.
-	 * Note however that we only strip the RestrictInfo nodes from the
-	 * local_exprs list, since appendWhereClause expects a list of
-	 * RestrictInfos.
 	 */
 	foreach(lc, scan_clauses)
 	{
@@ -783,82 +795,37 @@ postgresGetForeignPlan(PlannerInfo *root,
 			continue;
 
 		if (list_member_ptr(fpinfo->remote_conds, rinfo))
-			remote_conds = lappend(remote_conds, rinfo);
+			remote_conds = lappend(remote_conds, rinfo->clause);
 		else if (list_member_ptr(fpinfo->local_conds, rinfo))
 			local_exprs = lappend(local_exprs, rinfo->clause);
 		else if (is_foreign_expr(root, baserel, rinfo->clause))
-			remote_conds = lappend(remote_conds, rinfo);
+			remote_conds = lappend(remote_conds, rinfo->clause);
 		else
 			local_exprs = lappend(local_exprs, rinfo->clause);
 	}
 
 	/*
 	 * Build the query string to be sent for execution, and identify
-	 * expressions to be sent as parameters.
+	 * expressions to be sent as parameters.  If the relation to scan is a join
+	 * relation, receive constructed relations string from deparseSelectSql.
 	 */
 	initStringInfo(&sql);
-	deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
-					 &retrieved_attrs);
-	if (remote_conds)
-		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
-
-	/*
-	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
-	 * initial row fetch, rather than later on as is done for local tables.
-	 * The extra roundtrips involved in trying to duplicate the local
-	 * semantics exactly don't seem worthwhile (see also comments for
-	 * RowMarkType).
-	 *
-	 * Note: because we actually run the query as a cursor, this assumes that
-	 * DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3.
-	 */
-	if (baserel->relid == root->parse->resultRelation &&
-		(root->parse->commandType == CMD_UPDATE ||
-		 root->parse->commandType == CMD_DELETE))
-	{
-		/* Relation is UPDATE/DELETE target, so use FOR UPDATE */
-		appendStringInfoString(&sql, " FOR UPDATE");
-	}
-	else
-	{
-		PlanRowMark *rc = get_plan_rowmark(root->rowMarks, baserel->relid);
-
-		if (rc)
-		{
-			/*
-			 * Relation is specified as a FOR UPDATE/SHARE target, so handle
-			 * that.  (But we could also see LCS_NONE, meaning this isn't a
-			 * target relation after all.)
-			 *
-			 * For now, just ignore any [NO] KEY specification, since (a) it's
-			 * not clear what that means for a remote table that we don't have
-			 * complete information about, and (b) it wouldn't work anyway on
-			 * older remote servers.  Likewise, we don't worry about NOWAIT.
-			 */
-			switch (rc->strength)
-			{
-				case LCS_NONE:
-					/* No locking needed */
-					break;
-				case LCS_FORKEYSHARE:
-				case LCS_FORSHARE:
-					appendStringInfoString(&sql, " FOR SHARE");
-					break;
-				case LCS_FORNOKEYUPDATE:
-				case LCS_FORUPDATE:
-					appendStringInfoString(&sql, " FOR UPDATE");
-					break;
-			}
-		}
-	}
+	if (baserel->reloptkind == RELOPT_JOINREL)
+		initStringInfo(&relations);
+	deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used, remote_conds,
+					 &params_list, &fdw_ps_tlist, &retrieved_attrs,
+					 baserel->reloptkind == RELOPT_JOINREL ? &relations : NULL);
 
 	/*
-	 * Build the fdw_private list that will be available to the executor.
+	 * Build the fdw_private list that will be available in the executor.
 	 * Items in the list must match enum FdwScanPrivateIndex, above.
 	 */
-	fdw_private = list_make2(makeString(sql.data),
-							 retrieved_attrs);
+	fdw_private = list_make4(makeString(sql.data),
+							 retrieved_attrs,
+							 makeInteger(fpinfo->server->serverid),
+							 makeInteger(fpinfo->userid));
+	if (baserel->reloptkind == RELOPT_JOINREL)
+		fdw_private = lappend(fdw_private, makeString(relations.data));
 
 	/*
 	 * Create the ForeignScan node from target list, local filtering
@@ -868,11 +835,18 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 * field of the finished plan node; we can't keep them in private state
 	 * because then they wouldn't be subject to later planner processing.
 	 */
-	return make_foreignscan(tlist,
+	scan = make_foreignscan(tlist,
 							local_exprs,
 							scan_relid,
 							params_list,
 							fdw_private);
+
+	/*
+	 * set fdw_ps_tlist to handle tuples generated by this scan.
+	 */
+	scan->fdw_ps_tlist = fdw_ps_tlist;
+
+	return scan;
 }
 
 /*
@@ -885,9 +859,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
 	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
+	Oid			serverid;
 	Oid			userid;
-	ForeignTable *table;
 	ForeignServer *server;
 	UserMapping *user;
 	int			numParams;
@@ -907,22 +880,13 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	node->fdw_state = (void *) fsstate;
 
 	/*
-	 * Identify which user to do the remote access as.  This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
-
-	/*
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
 	 */
+	serverid = intVal(list_nth(fsplan->fdw_private, FdwScanPrivateServerOid));
+	userid = intVal(list_nth(fsplan->fdw_private, FdwScanPrivateUserOid));
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, server->serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
@@ -932,8 +896,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	/* Get private info created by planner functions. */
 	fsstate->query = strVal(list_nth(fsplan->fdw_private,
 									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
+	fsstate->retrieved_attrs = list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
 
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
 	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -947,8 +911,18 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
-	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	/* Get info we'll need for input data conversion and error report. */
+	if (fsplan->scan.scanrelid > 0)
+	{
+		fsstate->relname = RelationGetRelationName(node->ss.ss_currentRelation);
+		fsstate->tupdesc = RelationGetDescr(node->ss.ss_currentRelation);
+	}
+	else
+	{
+		fsstate->relname = NULL;
+		fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	}
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
 	numParams = list_length(fsplan->fdw_exprs);
@@ -1664,10 +1638,25 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
 {
 	List	   *fdw_private;
 	char	   *sql;
+	char	   *relations;
+
+	fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
 
+	/*
+	 * Add names of relation handled by the foreign scan when the scan is a
+	 * join
+	 */
+	if (list_length(fdw_private) > FdwScanPrivateRelations)
+	{
+		relations = strVal(list_nth(fdw_private, FdwScanPrivateRelations));
+		ExplainPropertyText("Relations", relations, es);
+	}
+
+	/*
+	 * Add remote query, when VERBOSE option is specified.
+	 */
 	if (es->verbose)
 	{
-		fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
 		sql = strVal(list_nth(fdw_private, FdwScanPrivateSelectSql));
 		ExplainPropertyText("Remote SQL", sql, es);
 	}
@@ -1726,10 +1715,12 @@ estimate_path_cost_size(PlannerInfo *root,
 	 */
 	if (fpinfo->use_remote_estimate)
 	{
+		List	   *remote_conds;
 		List	   *remote_join_conds;
 		List	   *local_join_conds;
-		StringInfoData sql;
 		List	   *retrieved_attrs;
+		StringInfoData sql;
+		UserMapping *user;
 		PGconn	   *conn;
 		Selectivity local_sel;
 		QualCost	local_cost;
@@ -1741,24 +1732,24 @@ estimate_path_cost_size(PlannerInfo *root,
 		classifyConditions(root, baserel, join_conds,
 						   &remote_join_conds, &local_join_conds);
 
+		remote_conds = copyObject(fpinfo->remote_conds);
+		remote_conds = list_concat(remote_conds, remote_join_conds);
+
 		/*
 		 * Construct EXPLAIN query including the desired SELECT, FROM, and
 		 * WHERE clauses.  Params and other-relation Vars are replaced by
 		 * dummy values.
+		 * Here we waste params_list and fdw_ps_tlist because they are
+		 * unnecessary for EXPLAIN.
 		 */
 		initStringInfo(&sql);
 		appendStringInfoString(&sql, "EXPLAIN ");
-		deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
-						 &retrieved_attrs);
-		if (fpinfo->remote_conds)
-			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
-		if (remote_join_conds)
-			appendWhereClause(&sql, root, baserel, remote_join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+		deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used, remote_conds,
+						 NULL, NULL, &retrieved_attrs, NULL);
 
 		/* Get the remote estimate */
-		conn = GetConnection(fpinfo->server, fpinfo->user, false);
+		user = GetUserMapping(fpinfo->userid, fpinfo->server->serverid);
+		conn = GetConnection(fpinfo->server, user, false);
 		get_remote_estimate(sql.data, conn, &rows, &width,
 							&startup_cost, &total_cost);
 		ReleaseConnection(conn);
@@ -2055,7 +2046,9 @@ fetch_more_data(ForeignScanState *node)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   fsstate->relname,
+										   fsstate->query,
+										   fsstate->tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2273,7 +2266,9 @@ store_returning_result(PgFdwModifyState *fmstate,
 		HeapTuple	newtup;
 
 		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
+										RelationGetRelationName(fmstate->rel),
+											fmstate->query,
+											RelationGetDescr(fmstate->rel),
 											fmstate->attinmeta,
 											fmstate->retrieved_attrs,
 											fmstate->temp_cxt);
@@ -2423,6 +2418,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	initStringInfo(&sql);
 	appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
 	deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+	astate.query = sql.data;
 
 	/* In what follows, do not risk leaking any PGresults. */
 	PG_TRY();
@@ -2565,7 +2561,9 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
 		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
+										   RelationGetRelationName(astate->rel),
+													   astate->query,
+											   RelationGetDescr(astate->rel),
 													   astate->attinmeta,
 													 astate->retrieved_attrs,
 													   astate->temp_cxt);
@@ -2839,6 +2837,282 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 }
 
 /*
+ * Construct PgFdwRelationInfo from two join sources
+ */
+static PgFdwRelationInfo *
+merge_fpinfo(RelOptInfo *outerrel,
+			 RelOptInfo *innerrel,
+			 JoinType jointype,
+			 double rows,
+			 int width)
+{
+	PgFdwRelationInfo *fpinfo_o;
+	PgFdwRelationInfo *fpinfo_i;
+	PgFdwRelationInfo *fpinfo;
+
+	fpinfo_o = (PgFdwRelationInfo *) outerrel->fdw_private;
+	fpinfo_i = (PgFdwRelationInfo *) innerrel->fdw_private;
+
+	fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+
+	/* Join relation must have conditions come from sources */
+	fpinfo->remote_conds = list_concat(copyObject(fpinfo_o->remote_conds),
+									   copyObject(fpinfo_i->remote_conds));
+	fpinfo->local_conds = list_concat(copyObject(fpinfo_o->local_conds),
+									  copyObject(fpinfo_i->local_conds));
+
+	/* Only for simple foreign table scan */
+	fpinfo->attrs_used = NULL;
+
+	/* rows and width will be set later */
+	fpinfo->rows = rows;
+	fpinfo->width = width;
+
+	/* A join have local conditions for outer and inner, so sum up them. */
+	fpinfo->local_conds_cost.startup = fpinfo_o->local_conds_cost.startup +
+									   fpinfo_i->local_conds_cost.startup;
+	fpinfo->local_conds_cost.per_tuple = fpinfo_o->local_conds_cost.per_tuple +
+										 fpinfo_i->local_conds_cost.per_tuple;
+
+	/* Don't consider correlation between local filters. */
+	fpinfo->local_conds_sel = fpinfo_o->local_conds_sel *
+							  fpinfo_i->local_conds_sel;
+
+	fpinfo->use_remote_estimate = false;
+
+	/*
+	 * These two comes default or per-server setting, so outer and inner must
+	 * have same value.
+	 */
+	fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+	fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+
+	/*
+	 * TODO estimate more accurately
+	 */
+	fpinfo->startup_cost = fpinfo->fdw_startup_cost +
+						   fpinfo->local_conds_cost.startup;
+	fpinfo->total_cost = fpinfo->startup_cost +
+						 (fpinfo->fdw_tuple_cost +
+						  fpinfo->local_conds_cost.per_tuple +
+						  cpu_tuple_cost) * fpinfo->rows;
+
+	/* serverid and userid are respectively identical */
+	fpinfo->server = fpinfo_o->server;
+	fpinfo->userid = fpinfo_o->userid;
+
+	fpinfo->outerrel = outerrel;
+	fpinfo->innerrel = innerrel;
+	fpinfo->jointype = jointype;
+
+	/* joinclauses and otherclauses will be set later */
+
+	return fpinfo;
+}
+
+/*
+ * postgresGetForeignJoinPaths
+ *		Add possible ForeignPath to joinrel.
+ *
+ * Joins satisfy conditions below can be pushed down to the remote PostgreSQL
+ * server.
+ *
+ * 1) Join type is INNER or OUTER (one of LEFT/RIGHT/FULL)
+ * 2) Both outer and inner portions are safe to push-down
+ * 3) All foreign tables in the join belong to the same foreign server
+ * 4) All foreign tables are accessed with identical user
+ * 5) All join conditions are safe to push down
+ * 6) No relation has local filter (this can be relaxed for INNER JOIN with
+ * no volatile function/operator, but as of now we want safer way)
+ */
+static void
+postgresGetForeignJoinPaths(PlannerInfo *root,
+							RelOptInfo *joinrel,
+							RelOptInfo *outerrel,
+							RelOptInfo *innerrel,
+							List *restrictlist,
+							JoinType jointype,
+							SpecialJoinInfo *sjinfo,
+							SemiAntiJoinFactors *semifactors,
+							Relids param_source_rels,
+							Relids extra_lateral_rels)
+{
+	PgFdwRelationInfo *fpinfo;
+	PgFdwRelationInfo *fpinfo_o;
+	PgFdwRelationInfo *fpinfo_i;
+	ForeignPath	   *joinpath;
+	double			rows;
+	Cost			startup_cost;
+	Cost			total_cost;
+
+	ListCell	   *lc;
+	List		   *joinclauses;
+	List		   *otherclauses;
+
+	/*
+	 * Skip if this join combination has been considered already.
+	 */
+	if (joinrel->fdw_private)
+	{
+		ereport(DEBUG3, (errmsg("combination already considered")));
+		return;
+	}
+
+	/*
+	 * We support all outer joins in addition to inner join.  CROSS JOIN is
+	 * an INNER JOIN with no conditions internally, so will be checked later.
+	 */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+	{
+		ereport(DEBUG3, (errmsg("unsupported join type (SEMI, ANTI)")));
+		return;
+	}
+
+	/*
+	 * Having valid PgFdwRelationInfo in RelOptInfo#fdw_private indicates that
+	 * scanning against the relation can be pushed down.  If either of them
+	 * doesn't have PgFdwRelationInfo, give up to push down this join relation.
+	 */
+	if (!outerrel->fdw_private)
+	{
+		ereport(DEBUG3, (errmsg("outer is not safe to push-down")));
+		return;
+	}
+	if (!innerrel->fdw_private)
+	{
+		ereport(DEBUG3, (errmsg("inner is not safe to push-down")));
+		return;
+	}
+	fpinfo_o = (PgFdwRelationInfo *) outerrel->fdw_private;
+	fpinfo_i = (PgFdwRelationInfo *) innerrel->fdw_private;
+
+	/*
+	 * All relations in the join must belong to same server.  Having a valid
+	 * fdw_private means that all relations in the relations belong to the
+	 * server the fdw_private has, so what we should do is just compare
+	 * serverid of outer/inner relations.
+	 */
+	if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+	{
+		ereport(DEBUG3, (errmsg("server unmatch")));
+		return;
+	}
+
+	/*
+	 * effective userid of all source relations should be identical.
+	 * Having a valid fdw_private means that all relations in the relations is
+	 * accessed with identical user, so what we should do is just compare
+	 * userid of outer/inner relations.
+	 */
+	if (fpinfo_o->userid != fpinfo_i->userid)
+	{
+		ereport(DEBUG3, (errmsg("userid unmatch")));
+		return;
+	}
+
+	/*
+	 * No source relation can have local conditions.  This can be relaxed
+	 * if the join is an inner join and local conditions don't contain
+	 * volatile function/operator, but as of now we leave it as future
+	 * enhancement.
+	 */
+	if (fpinfo_o->local_conds != NULL || fpinfo_i->local_conds != NULL)
+	{
+		ereport(DEBUG3, (errmsg("join with local filter")));
+		return;
+	}
+
+	/*
+	 * Separate restrictlist into two lists, join conditions and remote filters.
+	 */
+	joinclauses = restrictlist;
+	if (IS_OUTER_JOIN(jointype))
+	{
+		extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
+	}
+	else
+	{
+		joinclauses = extract_actual_clauses(joinclauses, false);
+		otherclauses = NIL;
+	}
+
+	/*
+	 * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+	 * with empty joinclauses.  Pushing down CROSS JOIN usually produces more
+	 * result than retrieving each tables separately, so we don't push down
+	 * such joins.
+	 */
+	if (jointype == JOIN_INNER && joinclauses == NIL)
+	{
+		ereport(DEBUG3, (errmsg("unsupported join type (CROSS)")));
+		return;
+	}
+
+	/*
+	 * Join condition must be safe to push down.
+	 */
+	foreach(lc, joinclauses)
+	{
+		Expr *expr = (Expr *) lfirst(lc);
+
+		if (!is_foreign_expr(root, joinrel, expr))
+		{
+			ereport(DEBUG3, (errmsg("join quals contains unsafe conditions")));
+			return;
+		}
+	}
+
+	/*
+	 * Other condition for the join must be safe to push down.
+	 */
+	foreach(lc, otherclauses)
+	{
+		Expr *expr = (Expr *) lfirst(lc);
+
+		if (!is_foreign_expr(root, joinrel, expr))
+		{
+			ereport(DEBUG3, (errmsg("filter contains unsafe conditions")));
+			return;
+		}
+	}
+
+	/* Here we know that this join can be pushed-down to remote side. */
+
+	/* Construct fpinfo for the join relation */
+	fpinfo = merge_fpinfo(outerrel, innerrel, jointype, joinrel->rows,
+						  joinrel->width); 
+	fpinfo->joinclauses = joinclauses;
+	fpinfo->otherclauses = otherclauses;
+	joinrel->fdw_private = fpinfo;
+
+	/* TODO determine more accurate cost and rows of the join. */
+	rows = joinrel->rows;
+	startup_cost = fpinfo->startup_cost;
+	total_cost = fpinfo->total_cost;
+
+	/*
+	 * Create a new join path and add it to the joinrel which represents a join
+	 * between foreign tables.
+	 */
+	joinpath = create_foreignscan_path(root,
+									   joinrel,
+									   rows,
+									   startup_cost,
+									   total_cost,
+									   NIL,		/* no pathkeys */
+									   NULL,	/* no required_outer */
+									   NIL);	/* no fdw_private */
+
+	/* Add generated path into joinrel by add_path(). */
+	add_path(joinrel, (Path *) joinpath);
+	elog(DEBUG3, "join path added for (%s) join (%s)",
+		 bms_to_str(outerrel->relids), bms_to_str(innerrel->relids));
+
+	/* TODO consider parameterized paths */
+}
+
+/*
  * Create a tuple from the specified row of the PGresult.
  *
  * rel is the local representation of the foreign table, attinmeta is
@@ -2849,13 +3123,14 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   const char *query,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2882,7 +3157,9 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.query = query;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2966,11 +3243,39 @@ make_tuple_from_result_row(PGresult *res,
 static void
 conversion_error_callback(void *arg)
 {
+	const char *attname;
+	const char *relname;
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
+	TupleDesc	tupdesc = errpos->tupdesc;
+	StringInfoData buf;
+
+	if (errpos->relname)
+	{
+		/* error occurred in a scan against a foreign table */ 
+		initStringInfo(&buf);
+		if (errpos->cur_attno > 0)
+			appendStringInfo(&buf, "column \"%s\"",
+					 NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname));
+		else if (errpos->cur_attno == SelfItemPointerAttributeNumber)
+			appendStringInfoString(&buf, "column \"ctid\"");
+		attname = buf.data;
+
+		initStringInfo(&buf);
+		appendStringInfo(&buf, "foreign table \"%s\"", errpos->relname);
+		relname = buf.data;
+	}
+	else
+	{
+		/* error occurred in a scan against a foreign join */ 
+		initStringInfo(&buf);
+		appendStringInfo(&buf, "column %d", errpos->cur_attno - 1);
+		attname = buf.data;
+
+		initStringInfo(&buf);
+		appendStringInfo(&buf, "foreign join \"%s\"", errpos->query);
+		relname = buf.data;
+	}
 
 	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+		errcontext("%s of %s", attname, relname);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 950c6f7..d6b16d8 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -16,10 +16,52 @@
 #include "foreign/foreign.h"
 #include "lib/stringinfo.h"
 #include "nodes/relation.h"
+#include "nodes/plannodes.h"
 #include "utils/relcache.h"
 
 #include "libpq-fe.h"
 
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table or a foreign join.  This information is collected by
+ * postgresGetForeignRelSize, or calculated from join source relations.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignServer *server;
+	Oid			userid;
+
+	/* Join information */
+	RelOptInfo *outerrel;
+	RelOptInfo *innerrel;
+	JoinType	jointype;
+	List	   *joinclauses;
+	List	   *otherclauses;
+} PgFdwRelationInfo;
+
 /* in postgres_fdw.c */
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
@@ -51,13 +93,31 @@ extern void deparseSelectSql(StringInfo buf,
 				 PlannerInfo *root,
 				 RelOptInfo *baserel,
 				 Bitmapset *attrs_used,
-				 List **retrieved_attrs);
-extern void appendWhereClause(StringInfo buf,
+				 List *remote_conds,
+				 List **params_list,
+				 List **fdw_ps_tlist,
+				 List **retrieved_attrs,
+				 StringInfo relations);
+extern void appendConditions(StringInfo buf,
 				  PlannerInfo *root,
 				  RelOptInfo *baserel,
+				  List *outertlist,
+				  List *innertlist,
 				  List *exprs,
-				  bool is_first,
+				  const char *prefix,
 				  List **params);
+extern void deparseJoinSql(StringInfo sql,
+			   PlannerInfo *root,
+			   RelOptInfo *baserel,
+			   RelOptInfo *outerrel,
+			   RelOptInfo *innerrel,
+			   const char *sql_o,
+			   const char *sql_i,
+			   JoinType jointype,
+			   List *joinclauses,
+			   List *otherclauses,
+			   List **fdw_ps_tlist,
+			   List **retrieved_attrs);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *targetAttrs, List *returningList,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4a23457..126ae04 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -11,12 +11,17 @@ DO $d$
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
+        EXECUTE $$CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+            OPTIONS (dbname '$$||current_database()||$$',
+                     port '$$||current_setting('port')||$$'
+            )$$;
     END;
 $d$;
 
 CREATE USER MAPPING FOR public SERVER testserver1
 	OPTIONS (user 'value', password 'value');
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
+CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
 
 -- ===================================================================
 -- create objects used through FDW loopback server
@@ -39,6 +44,18 @@ CREATE TABLE "S 1"."T 2" (
 	c2 text,
 	CONSTRAINT t2_pkey PRIMARY KEY (c1)
 );
+CREATE TABLE "S 1"."T 3" (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c3 text,
+	CONSTRAINT t3_pkey PRIMARY KEY (c1)
+);
+CREATE TABLE "S 1"."T 4" (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c4 text,
+	CONSTRAINT t4_pkey PRIMARY KEY (c1)
+);
 
 INSERT INTO "S 1"."T 1"
 	SELECT id,
@@ -54,9 +71,23 @@ INSERT INTO "S 1"."T 2"
 	SELECT id,
 	       'AAA' || to_char(id, 'FM000')
 	FROM generate_series(1, 100) id;
+INSERT INTO "S 1"."T 3"
+	SELECT id,
+	       id + 1,
+	       'AAA' || to_char(id, 'FM000')
+	FROM generate_series(1, 100) id;
+DELETE FROM "S 1"."T 3" WHERE c1 % 2 != 0;	-- delete for outer join tests
+INSERT INTO "S 1"."T 4"
+	SELECT id,
+	       id + 1,
+	       'AAA' || to_char(id, 'FM000')
+	FROM generate_series(1, 100) id;
+DELETE FROM "S 1"."T 4" WHERE c1 % 3 != 0;	-- delete for outer join tests
 
 ANALYZE "S 1"."T 1";
 ANALYZE "S 1"."T 2";
+ANALYZE "S 1"."T 3";
+ANALYZE "S 1"."T 4";
 
 -- ===================================================================
 -- create foreign tables
@@ -87,6 +118,29 @@ CREATE FOREIGN TABLE ft2 (
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
 
+CREATE FOREIGN TABLE ft4 (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c3 text
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');
+
+CREATE FOREIGN TABLE ft5 (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c3 text
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');
+
+CREATE FOREIGN TABLE ft6 (
+	c1 int NOT NULL,
+	c2 int NOT NULL,
+	c3 text
+) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE USER view_owner;
+GRANT ALL ON ft5 TO view_owner;
+CREATE VIEW v_ft5 AS SELECT * FROM ft5;
+ALTER VIEW v_ft5 OWNER TO view_owner;
+CREATE USER MAPPING FOR view_owner SERVER loopback;
+
 -- ===================================================================
 -- tests for validator
 -- ===================================================================
@@ -158,8 +212,6 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
 SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
 -- aggregate
 SELECT COUNT(*) FROM ft1 t1;
--- join two tables
-SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
@@ -216,6 +268,90 @@ SELECT * FROM ft1 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft2 WHERE c1 < 5));
 SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
 
 -- ===================================================================
+-- JOIN queries
+-- ===================================================================
+-- join two tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- join three tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+-- left outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+-- right outer join
+SET enable_mergejoin = off; -- planner choose MergeJoin even it has higher costs, so disable it for testing.
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 RIGHT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t2.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft4 t1 RIGHT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t2.c1 OFFSET 10 LIMIT 10;
+SET enable_mergejoin = on;
+-- full outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 45 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 45 LIMIT 10;
+-- full outer join + WHERE clause, only matched rows
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+-- join at WHERE clause 
+SET enable_mergejoin = off; -- planner choose MergeJoin even it has higher costs, so disable it for testing.
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+SET enable_mergejoin = on;
+-- join in CTE
+EXPLAIN (COSTS false, VERBOSE)
+WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+-- ctid with whole-row reference
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- partially unsafe to push down, not pushed down
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON t2.c1 = t2.c1 JOIN ft4 t3 ON t2.c1 = t3.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON t2.c1 = t2.c1 JOIN ft4 t3 ON t2.c1 = t3.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+-- SEMI JOIN, not pushed down
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- ANTI JOIN, not pushed down
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- CROSS JOIN, not pushed down
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+-- different server
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+-- different effective user for permission check
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+-- unsafe join conditions
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c8 = t2.c8) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c8 = t2.c8) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- local filter (unsafe conditions on one side)
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- Aggregate after UNION, for testing setrefs
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
+SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
+-- join two foreign tables and two local tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON t1.c1 = t2.c1 JOIN "S 1"."T 1" t3 ON t1.c1 = t3."C 1" JOIN "S 1"."T 2" t4 ON t1.c1 = t4.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON t1.c1 = t2.c1 JOIN "S 1"."T 1" t3 ON t1.c1 = t3."C 1" JOIN "S 1"."T 2" t4 ON t1.c1 = t4.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+
+-- ===================================================================
 -- parameterized queries
 -- ===================================================================
 -- simple join
@@ -831,3 +967,7 @@ DROP TYPE "Colors" CASCADE;
 IMPORT FOREIGN SCHEMA import_source LIMIT TO (t5)
   FROM SERVER loopback INTO import_dest5;  -- ERROR
 ROLLBACK;
+
+-- Cleanup
+DROP OWNED BY view_owner;
+DROP USER view_owner;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 43adb61..fb39c38 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -406,11 +406,27 @@
   <title>Remote Query Optimization</title>
 
   <para>
-   <filename>postgres_fdw</> attempts to optimize remote queries to reduce
-   the amount of data transferred from foreign servers.  This is done by
-   sending query <literal>WHERE</> clauses to the remote server for
-   execution, and by not retrieving table columns that are not needed for
-   the current query.  To reduce the risk of misexecution of queries,
+   <filename>postgres_fdw</filename> attempts to optimize remote queries to
+   reduce the amount of data transferred from foreign servers.
+   This is done by various ways.
+  </para>
+
+  <para>
+   For <literal>SELECT</> clause, <filename>postgres_fdw</filename> sends only
+   actually necessary columns in it.
+  </para>
+
+  <para>
+   If <literal>FROM</> clause contains multiple foreign tables managed
+   by the same server and accessed with identical user,
+   <filename>postgres_fdw</> tries to join foreign tables on the remote side as
+   much as it can.
+   To reduce risk of misexecution of queries, <filename>postgres_fdw</>
+   gives up sending joins to remote when join conditions might have different
+   semantics on the remote side.
+  </para>
+
+  <para>
    <literal>WHERE</> clauses are not sent to the remote server unless they use
    only built-in data types, operators, and functions.  Operators and
    functions in the clauses must be <literal>IMMUTABLE</> as well.
#61Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#59)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sun, Apr 26, 2015 at 10:00 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch v13 is revised one according to the suggestion
by Robert.

Thanks.

The last hunk in foreign.c is a useless whitespace change.

+ /* actually, not shift members */

Change to: "shift of 0 is the same as copying"

But actually, do we really need all of this? I think you could reduce
the size of this function to three lines of code if you just did this:

x = -1;
while ((x = bms_next_member(inputset, x)) >= 0)
outputset = bms_add_member(inputset, x + shift);

It might be very slightly slower, but I think it would be worth it to
reduce the amount of code needed.

+        * 5. Consider paths added by FDW, in case when both of outer and
+        * inner relations are managed by the same driver.

Change to: "If both inner and outer relations are managed by the same
FDW, give it a chance to push down joins."

+        * 6. At the last, consider paths added by extension, in addition to the
+        * built-in paths.

Change to: "Finally, give extensions a chance to manipulate the path list."

+        * Fetch relation-id, if this foreign-scan node actuall scans on
+        * a particular real relation. Elsewhere, InvalidOid shall be
+        * informed to the FDW driver.

Change to: "If we're scanning a base relation, look up the OID. (We
can skip this if scanning a join relation.)"

+        * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+        * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+        * ensure all valid TLEs have to locate prior to junk ones.

Is the goal here to make attribute numbers match up? If so, between
where and where? If not, please explain further.

+                               if (splan->scan.scanrelid == 0)
+                               {
...
+                               }
                                splan->scan.scanrelid += rtoffset;

Does this need an "else"? It seems surprising that you would offset
scanrelid even if it's starting out as zero.

(Note that there are two instances of this pattern.)

+ * 'found' : indicates whether RelOptInfo is actually constructed.
+ *             true, if it was already built and on the cache.

Leftover hunk. Revert this.

+typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,

Whitespace is wrong, still.

+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.

on -> onto

+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this.

Change to: "When fdw_ps_tlist is used, this represents a remote join,
and the FDW driver is responsible for setting this field to an
appropriate value."

If FDW returns records as foreign-
+ * table definition, just put NIL here.

I think this is just referring to the non-join case; if so, just drop
it. Otherwise, I'm confused and need a further explanation.

+ * Note that since Plan trees can be copied, custom scan providers *must*

Extra space before "Note"

+       Bitmapset  *custom_relids;      /* set of relid (index of range-tables)
+                                                                *
represented by this node */

Maybe "RTIs this node generates"?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#61)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sun, Apr 26, 2015 at 10:00 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch v13 is revised one according to the suggestion
by Robert.

Thanks.

The last hunk in foreign.c is a useless whitespace change.

Sorry, my oversight.

+ /* actually, not shift members */

Change to: "shift of 0 is the same as copying"

But actually, do we really need all of this? I think you could reduce
the size of this function to three lines of code if you just did this:

x = -1;
while ((x = bms_next_member(inputset, x)) >= 0)
outputset = bms_add_member(inputset, x + shift);

It might be very slightly slower, but I think it would be worth it to
reduce the amount of code needed.

OK, I reverted the bms_shift_members().

It seems to me the code block for T_ForeignScan and T_CustomScan in
setrefs.c are a bit large. It may be better to have a separate
function like T_IndexOnlyScan.
How about your opinion?

+        * 5. Consider paths added by FDW, in case when both of outer and
+        * inner relations are managed by the same driver.

Change to: "If both inner and outer relations are managed by the same
FDW, give it a chance to push down joins."

OK,

+        * 6. At the last, consider paths added by extension, in addition to the
+        * built-in paths.

Change to: "Finally, give extensions a chance to manipulate the path list."

OK,

+        * Fetch relation-id, if this foreign-scan node actuall scans on
+        * a particular real relation. Elsewhere, InvalidOid shall be
+        * informed to the FDW driver.

Change to: "If we're scanning a base relation, look up the OID. (We
can skip this if scanning a join relation.)"

OK,

+        * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+        * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+        * ensure all valid TLEs have to locate prior to junk ones.

Is the goal here to make attribute numbers match up? If so, between
where and where? If not, please explain further.

No, its purpose is to reduce unnecessary projection.

The *_ps_tlist is not only used to construct tuple-descriptor of
Foreign/CustomScan with scanrelid==0, but also used to resolve var-
nodes with varno==INDEX_VAR in EXPLAIN command.

For example,
SELECT t1.y, t2.b FROM t1, t2 WHERE t1.x = t2.a;

If "t1.x = t2.a" is executable on external computing resource (like
remote RDBMS or GPU device, etc), both of t1.x and t2.a don't need
to appear on the targetlist of joinrel.
In this case, the best *_ps_tlist consists of two var-nodes of t1.x
and t2.a because it fits tuple-descriptor of result tuple slot, thus
it can skip per-tuple projection.

On the other hands, we may want to print out expression clause that
shall be executed on the external resource; "t1.x = t2.a" in this
case. If FDW/CSP keeps this clause in expression form, its var-nodes
shall be rewritten to a pair of INDEX_VAR and resno on *_ps_tlist.
So, deparse_expression() needs to be capable to find out "t1.x" and
"t2.a" on the *_ps_tlist. However, it does not make sense to include
these variables on the scan tuple-descriptor.

ExecInitForeignScan() and ExecInitCustomScan() makes its scan tuple-
descriptor using ExecCleanTypeFromTL(), not ExecTypeFromTL(), to omit
these unreferenced variables on the *_ps_tlist. All the var-nodes with
INDEX_VAR shall be identified by offset from head of the list, we cannot
allow any target-entry with resjunk=false after ones with resjunk=true,
to keep the expected varattno.

This sanity checks ensures no target-entry with resjunk=false after
the resjunk=true. It helps to distinct attributes to be included in
the result tuple from the ones for just reference in EXPLAIN.

Did my explain above introduced the reason of this sanity check well?

+                               if (splan->scan.scanrelid == 0)
+                               {
...
+                               }
splan->scan.scanrelid += rtoffset;

Does this need an "else"? It seems surprising that you would offset
scanrelid even if it's starting out as zero.

(Note that there are two instances of this pattern.)

'break' was put on the tail of if-block, however, it may lead potential
bugs in the future. I'll use if-else manner as usual.

+ * 'found' : indicates whether RelOptInfo is actually constructed.
+ *             true, if it was already built and on the cache.

Leftover hunk. Revert this.

Fixed,

+typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,

Whitespace is wrong, still.

Fixed,

+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.

on -> onto

OK,

+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this.

Change to: "When fdw_ps_tlist is used, this represents a remote join,
and the FDW driver is responsible for setting this field to an
appropriate value."

OK,

If FDW returns records as foreign-
+ * table definition, just put NIL here.

I think this is just referring to the non-join case; if so, just drop
it. Otherwise, I'm confused and need a further explanation.

OK, it is just saying put NIL if non-join case.

+ * Note that since Plan trees can be copied, custom scan providers *must*

Extra space before "Note"

OK,

+       Bitmapset  *custom_relids;      /* set of relid (index of range-tables)
+                                                                *
represented by this node */

Maybe "RTIs this node generates"?

OK,

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#62)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Thu, Apr 30, 2015 at 9:16 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It seems to me the code block for T_ForeignScan and T_CustomScan in
setrefs.c are a bit large. It may be better to have a separate
function like T_IndexOnlyScan.
How about your opinion?

Either way is OK with me. Please do as you think best.

+        * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+        * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+        * ensure all valid TLEs have to locate prior to junk ones.

Is the goal here to make attribute numbers match up? If so, between
where and where? If not, please explain further.

No, its purpose is to reduce unnecessary projection.

The *_ps_tlist is not only used to construct tuple-descriptor of
Foreign/CustomScan with scanrelid==0, but also used to resolve var-
nodes with varno==INDEX_VAR in EXPLAIN command.

For example,
SELECT t1.y, t2.b FROM t1, t2 WHERE t1.x = t2.a;

If "t1.x = t2.a" is executable on external computing resource (like
remote RDBMS or GPU device, etc), both of t1.x and t2.a don't need
to appear on the targetlist of joinrel.
In this case, the best *_ps_tlist consists of two var-nodes of t1.x
and t2.a because it fits tuple-descriptor of result tuple slot, thus
it can skip per-tuple projection.

On the other hands, we may want to print out expression clause that
shall be executed on the external resource; "t1.x = t2.a" in this
case. If FDW/CSP keeps this clause in expression form, its var-nodes
shall be rewritten to a pair of INDEX_VAR and resno on *_ps_tlist.
So, deparse_expression() needs to be capable to find out "t1.x" and
"t2.a" on the *_ps_tlist. However, it does not make sense to include
these variables on the scan tuple-descriptor.

ExecInitForeignScan() and ExecInitCustomScan() makes its scan tuple-
descriptor using ExecCleanTypeFromTL(), not ExecTypeFromTL(), to omit
these unreferenced variables on the *_ps_tlist. All the var-nodes with
INDEX_VAR shall be identified by offset from head of the list, we cannot
allow any target-entry with resjunk=false after ones with resjunk=true,
to keep the expected varattno.

This sanity checks ensures no target-entry with resjunk=false after
the resjunk=true. It helps to distinct attributes to be included in
the result tuple from the ones for just reference in EXPLAIN.

Did my explain above introduced the reason of this sanity check well?

Yeah, I think so. So what we want to do in this comment is summarize
all of that briefly. Maybe something like this:

"Sanity check. There may be resjunk entries in fdw_ps_tlist that are
included only to help EXPLAIN deparse plans properly. We require that
these are at the end, so that when the executor builds the scan
descriptor based on the non-junk entries, it gets the attribute
numbers correct."

+                               if (splan->scan.scanrelid == 0)
+                               {
...
+                               }
splan->scan.scanrelid += rtoffset;

Does this need an "else"? It seems surprising that you would offset
scanrelid even if it's starting out as zero.

(Note that there are two instances of this pattern.)

'break' was put on the tail of if-block, however, it may lead potential
bugs in the future. I'll use if-else manner as usual.

Ah, OK, I missed that. Yeah, that's probably a good change.

I assume you realize you did not attach an updated patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#63)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Thu, Apr 30, 2015 at 9:16 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It seems to me the code block for T_ForeignScan and T_CustomScan in
setrefs.c are a bit large. It may be better to have a separate
function like T_IndexOnlyScan.
How about your opinion?

Either way is OK with me. Please do as you think best.

OK, in setrefs.c, I moved the code block for T_ForeignScan and T_CustomScan
into set_foreignscan_references() and set_customscan_references() for each.
Its nest-level is a bit deep to keep all the stuff within 80-characters row.
It also uses bms_add_member(), instead of bms_shift_members() reverted.

+        * Sanity check. Pseudo scan tuple-descriptor shall be constructed
+        * based on the fdw_ps_tlist, excluding resjunk=true, so we need to
+        * ensure all valid TLEs have to locate prior to junk ones.

Is the goal here to make attribute numbers match up? If so, between
where and where? If not, please explain further.

No, its purpose is to reduce unnecessary projection.

The *_ps_tlist is not only used to construct tuple-descriptor of
Foreign/CustomScan with scanrelid==0, but also used to resolve var-
nodes with varno==INDEX_VAR in EXPLAIN command.

For example,
SELECT t1.y, t2.b FROM t1, t2 WHERE t1.x = t2.a;

If "t1.x = t2.a" is executable on external computing resource (like
remote RDBMS or GPU device, etc), both of t1.x and t2.a don't need
to appear on the targetlist of joinrel.
In this case, the best *_ps_tlist consists of two var-nodes of t1.x
and t2.a because it fits tuple-descriptor of result tuple slot, thus
it can skip per-tuple projection.

On the other hands, we may want to print out expression clause that
shall be executed on the external resource; "t1.x = t2.a" in this
case. If FDW/CSP keeps this clause in expression form, its var-nodes
shall be rewritten to a pair of INDEX_VAR and resno on *_ps_tlist.
So, deparse_expression() needs to be capable to find out "t1.x" and
"t2.a" on the *_ps_tlist. However, it does not make sense to include
these variables on the scan tuple-descriptor.

ExecInitForeignScan() and ExecInitCustomScan() makes its scan tuple-
descriptor using ExecCleanTypeFromTL(), not ExecTypeFromTL(), to omit
these unreferenced variables on the *_ps_tlist. All the var-nodes with
INDEX_VAR shall be identified by offset from head of the list, we cannot
allow any target-entry with resjunk=false after ones with resjunk=true,
to keep the expected varattno.

This sanity checks ensures no target-entry with resjunk=false after
the resjunk=true. It helps to distinct attributes to be included in
the result tuple from the ones for just reference in EXPLAIN.

Did my explain above introduced the reason of this sanity check well?

Yeah, I think so. So what we want to do in this comment is summarize
all of that briefly. Maybe something like this:

"Sanity check. There may be resjunk entries in fdw_ps_tlist that are
included only to help EXPLAIN deparse plans properly. We require that
these are at the end, so that when the executor builds the scan
descriptor based on the non-junk entries, it gets the attribute
numbers correct."

Thanks, I used this sentence as is.

+                               if (splan->scan.scanrelid == 0)
+                               {
...
+                               }
splan->scan.scanrelid += rtoffset;

Does this need an "else"? It seems surprising that you would offset
scanrelid even if it's starting out as zero.

(Note that there are two instances of this pattern.)

'break' was put on the tail of if-block, however, it may lead potential
bugs in the future. I'll use if-else manner as usual.

Ah, OK, I missed that. Yeah, that's probably a good change.

set_foreignscan_references() and set_customscan_references() are
split by two portions using the manner above; a code block if scanrelid==0
and others.

I assume you realize you did not attach an updated patch?

I wanted to submit the v14 after the above items get clarified.
The attached patch (v14) includes all what you suggested in the previous
message.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.5-custom-join.v14.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v14.patchDownload
 doc/src/sgml/custom-scan.sgml           |  43 ++++++++++
 doc/src/sgml/fdwhandler.sgml            |  55 ++++++++++++
 src/backend/commands/explain.c          |  15 +++-
 src/backend/executor/execScan.c         |   6 ++
 src/backend/executor/nodeCustom.c       |  41 ++++++---
 src/backend/executor/nodeForeignscan.c  |  37 +++++---
 src/backend/foreign/foreign.c           |  21 +++--
 src/backend/nodes/copyfuncs.c           |   5 ++
 src/backend/nodes/outfuncs.c            |   5 ++
 src/backend/optimizer/path/joinpath.c   |  24 ++++++
 src/backend/optimizer/plan/createplan.c |  83 ++++++++++++++----
 src/backend/optimizer/plan/setrefs.c    | 145 +++++++++++++++++++++++++++-----
 src/backend/optimizer/util/plancat.c    |   7 +-
 src/backend/optimizer/util/relnode.c    |  14 +++
 src/backend/utils/adt/ruleutils.c       |   4 +
 src/include/foreign/fdwapi.h            |  15 ++++
 src/include/nodes/plannodes.h           |  21 +++--
 src/include/nodes/relation.h            |   2 +
 src/include/optimizer/paths.h           |  13 +++
 src/include/optimizer/planmain.h        |   1 +
 20 files changed, 485 insertions(+), 72 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 8a4a3df..b1400ae 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -48,6 +48,27 @@ extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
   </para>
 
   <para>
+   A custom scan provider will be also able to add paths by setting the
+   following hook, to replace built-in join paths by custom-scan that
+   performs as if a scan on preliminary joined relations, which us called
+   after the core code has generated what it believes to be the complete
+   and correct set of access paths for the join.
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+                                             RelOptInfo *joinrel,
+                                             RelOptInfo *outerrel,
+                                             RelOptInfo *innerrel,
+                                             List *restrictlist,
+                                             JoinType jointype,
+                                             SpecialJoinInfo *sjinfo,
+                                             SemiAntiJoinFactors *semifactors,
+                                             Relids param_source_rels,
+                                             Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+  </para>
+
+  <para>
     Although this hook function can be used to examine, modify, or remove
     paths generated by the core system, a custom scan provider will typically
     confine itself to generating <structname>CustomPath</> objects and adding
@@ -124,7 +145,9 @@ typedef struct CustomScan
     Scan      scan;
     uint32    flags;
     List     *custom_exprs;
+    List     *custom_ps_tlist;
     List     *custom_private;
+    List     *custom_relids;
     const CustomScanMethods *methods;
 } CustomScan;
 </programlisting>
@@ -141,10 +164,30 @@ typedef struct CustomScan
     is only used by the custom scan provider itself.  Plan trees must be able
     to be duplicated using <function>copyObject</>, so all the data stored
     within these two fields must consist of nodes that function can handle.
+    <literal>custom_relids</> is set by the backend, thus custom-scan provider
+    does not need to touch, to track underlying relations represented by this
+    custom-scan node.
     <structfield>methods</> must point to a (usually statically allocated)
     object implementing the required custom scan methods, which are further
     detailed below.
   </para>
+  <para>
+   In case when <structname>CustomScan</> replaced built-in join paths,
+   custom-scan provider must have two characteristic setup.
+   The first one is zero on the <structfield>scan.scanrelid</>, which
+   should be usually an index of range-tables. It informs the backend
+   this <structname>CustomScan</> node is not associated with a particular
+   table. The second one is valid list of <structname>TargetEntry</> on
+   the <structfield>custom_ps_tlist</>. A <structname>CustomScan</> node
+   looks to the backend like a scan as literal, but on a relation which is
+   the result of relations join. It means we cannot construct a tuple
+   descriptor based on table definition, thus custom-scan provider must
+   introduce the expected record-type of the tuples.
+   Tuple-descriptor of scan-slot shall be constructed based on the
+   <structfield>custom_ps_tlist</>, and assigned on executor initialization.
+   Also, referenced by <command>EXPLAIN</> to solve name of the underlying
+   columns and relations.
+  </para>
 
   <sect2 id="custom-scan-plan-callbacks">
    <title>Custom Scan Callbacks</title>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index c1daa4b..ef21215 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -598,6 +598,61 @@ IsForeignRelUpdatable (Relation rel);
 
    </sect2>
 
+   <sect2>
+    <title>FDW Routines for remote join</title>
+    <para>
+<programlisting>
+void
+GetForeignJoinPaths(PlannerInfo *root,
+                    RelOptInfo *joinrel,
+                    RelOptInfo *outerrel,
+                    RelOptInfo *innerrel,
+                    List *restrictlist,
+                    JoinType jointype,
+                    SpecialJoinInfo *sjinfo,
+                    SemiAntiJoinFactors *semifactors,
+                    Relids param_source_rels,
+                    Relids extra_lateral_rels);
+</programlisting>
+     Create possible access paths for a join of two foreign tables or
+     joined relations, but both of them needs to be managed with same
+     FDW driver.
+     This optional function is called during query planning.
+    </para>
+    <para>
+     This function allows FDW driver to add <literal>ForeignScan</> path
+     towards the supplied <literal>joinrel</>. From the standpoint of
+     query planner, it looks like scan-node is added for join-relation.
+     It means, <literal>ForeignScan</> path added instead of the built-in
+     local join logic has to generate tuples as if it scans on a joined
+     and materialized relations.
+    </para>
+    <para>
+     Usually, we expect FDW drivers issues a remote query that involves
+     tables join on remote side, then FDW driver fetches the joined result
+     on local side.
+     Unlike simple table scan, definition of slot descriptor of the joined
+     relations is determined on the fly, thus we cannot know its definition
+     from the system catalog.
+     So, FDW driver is responsible to introduce the query planner expected
+     form of the joined relations. In case when <literal>ForeignScan</>
+     replaced a relations join, <literal>scanrelid</> of the generated plan
+     node shall be zero, to mark this <literal>ForeignScan</> node is not
+     associated with a particular foreign tables.
+     Also, it need to construct pseudo scan tlist (<literal>fdw_ps_tlist</>)
+     to indicate expected tuple definition.
+    </para>
+    <para>
+     Once <literal>scanrelid</> equals zero, executor initializes the slot
+     for scan according to <literal>fdw_ps_tlist</>, but excludes junk
+     entries. This list is also used to solve the name of the original
+     relation and columns, so FDW can chains expression nodes which are
+     not run on local side actually, like a join clause to be executed on
+     the remote side, however, target-entries of them will have
+     <literal>resjunk=true</>.
+    </para>
+   </sect2>
+
    <sect2 id="fdw-callbacks-explain">
     <title>FDW Routines for <command>EXPLAIN</></title>
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 315a528..f4cc901 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -730,11 +730,17 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
-		case T_ForeignScan:
-		case T_CustomScan:
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_ForeignScan:
+			*rels_used = bms_add_members(*rels_used,
+										 ((ForeignScan *) plan)->fdw_relids);
+			break;
+		case T_CustomScan:
+			*rels_used = bms_add_members(*rels_used,
+										 ((CustomScan *) plan)->custom_relids);
+			break;
 		case T_ModifyTable:
 			*rels_used = bms_add_member(*rels_used,
 									((ModifyTable *) plan)->nominalRelation);
@@ -1072,9 +1078,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_WorkTableScan:
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
-			ExplainScanTarget((Scan *) plan, es);
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
 			{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..85ce932 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,12 @@ ExecAssignScanProjectionInfo(ScanState *node)
 	/* Vars in an index-only scan's tlist should be INDEX_VAR */
 	if (IsA(scan, IndexOnlyScan))
 		varno = INDEX_VAR;
+	/* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+	else if (scan->scanrelid == 0)
+	{
+		Assert(IsA(scan, ForeignScan) || IsA(scan, CustomScan));
+		varno = INDEX_VAR;
+	}
 	else
 		varno = scan->scanrelid;
 
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..80851de 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,7 +23,7 @@ CustomScanState *
 ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 {
 	CustomScanState    *css;
-	Relation			scan_rel;
+	Index				scan_relid = cscan->scan.scanrelid;
 
 	/* populate a CustomScanState according to the CustomScan */
 	css = (CustomScanState *) cscan->methods->CreateCustomScanState(cscan);
@@ -48,12 +48,33 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &css->ss);
 	ExecInitResultTupleSlot(estate, &css->ss.ps);
 
-	/* initialize scan relation */
-	scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
-	css->ss.ss_currentRelation = scan_rel;
-	css->ss.ss_currentScanDesc = NULL;	/* set by provider */
-	ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+	/*
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this custom
+	 * scan is on actual relations.
+	 *
+	 * on the other hands, custom-scan may scan on a pseudo relation;
+	 * that is usually a result-set of relations join by external
+	 * computing resource, or others. It has to get the scan type from
+	 * the pseudo-scan target-list that should be assigned by custom-scan
+	 * provider.
+	 */
+	if (scan_relid > 0)
+	{
+		Relation		scan_rel;
+
+		scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+		css->ss.ss_currentRelation = scan_rel;
+		css->ss.ss_currentScanDesc = NULL;	/* set by provider */
+		ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_ps_tlist, false);
+		ExecAssignScanType(&css->ss, ps_tupdesc);
+	}
 	css->ss.ps.ps_TupFromTlist = false;
 
 	/*
@@ -89,11 +110,11 @@ ExecEndCustomScan(CustomScanState *node)
 
 	/* Clean out the tuple table */
 	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	if (node->ss.ss_ScanTupleSlot)
-		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* Close the heap relation */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..8f69cd4 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,7 +102,7 @@ ForeignScanState *
 ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 {
 	ForeignScanState *scanstate;
-	Relation	currentRelation;
+	Index		scanrelid = node->scan.scanrelid;
 	FdwRoutine *fdwroutine;
 
 	/* check for unsupported flags */
@@ -141,16 +141,30 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	ExecInitScanTupleSlot(estate, &scanstate->ss);
 
 	/*
-	 * open the base relation and acquire appropriate lock on it.
+	 * open the base relation and acquire appropriate lock on it, then
+	 * get the scan type from the relation descriptor, if this foreign
+	 * scan is on actual foreign-table.
+	 *
+	 * on the other hands, foreign-scan may scan on a pseudo relation;
+	 * that is usually a result-set of remote relations join. It has
+	 * to get the scan type from the pseudo-scan target-list that should
+	 * be assigned by FDW driver.
 	 */
-	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
-	scanstate->ss.ss_currentRelation = currentRelation;
+	if (scanrelid > 0)
+	{
+		Relation	currentRelation;
 
-	/*
-	 * get the scan type from the relation descriptor.  (XXX at some point we
-	 * might want to let the FDW editorialize on the scan tupdesc.)
-	 */
-	ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+		currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+		scanstate->ss.ss_currentRelation = currentRelation;
+		ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+	}
+	else
+	{
+		TupleDesc	ps_tupdesc;
+
+		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_ps_tlist, false);
+		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+	}
 
 	/*
 	 * Initialize result tuple type and projection info.
@@ -161,7 +175,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+	fdwroutine = GetFdwRoutine(node->fdw_handler);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
@@ -193,7 +207,8 @@ ExecEndForeignScan(ForeignScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/* close the relation. */
-	ExecCloseScanRelation(node->ss.ss_currentRelation);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..cdbd550 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -304,11 +304,11 @@ GetFdwRoutine(Oid fdwhandler)
 
 
 /*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
  */
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+Oid
+GetFdwHandlerByRelId(Oid relid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
@@ -350,7 +350,18 @@ GetFdwRoutineByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	/* And finally, call the handler function. */
+	return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+
 	return GetFdwRoutine(fdwhandler);
 }
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 029761e..61379a7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -592,8 +592,11 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
+	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
+	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
+	COPY_BITMAPSET_FIELD(fdw_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
 
 	return newnode;
@@ -617,7 +620,9 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
+	COPY_NODE_FIELD(custom_ps_tlist);
 	COPY_NODE_FIELD(custom_private);
+	COPY_BITMAPSET_FIELD(custom_relids);
 
 	/*
 	 * NOTE: The method field of CustomScan is required to be a pointer to a
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 385b289..a178132 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -558,8 +558,11 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
+	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
+	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
+	WRITE_BITMAPSET_FIELD(fdw_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
 
@@ -572,7 +575,9 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_ps_tlist);
 	WRITE_NODE_FIELD(custom_private);
+	WRITE_BITMAPSET_FIELD(custom_relids);
 	appendStringInfoString(str, " :methods ");
 	_outToken(str, node->methods->CustomName);
 	if (node->methods->TextOutCustomScan)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1da953f..dabef3c 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,10 +17,13 @@
 #include <math.h>
 
 #include "executor/executor.h"
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -260,6 +263,27 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. If both inner and outer relations are managed by the same FDW,
+	 * give it a chance to push down joins.
+	 */
+	if (joinrel->fdwroutine &&
+		joinrel->fdwroutine->GetForeignJoinPaths)
+		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
+												 outerrel, innerrel,
+												 restrictlist, jointype, sjinfo,
+												 &semifactors,
+												 param_source_rels,
+												 extra_lateral_rels);
+	/*
+	 * 6. Finally, give extensions a chance to manipulate the path list.
+	 */
+	if (set_join_pathlist_hook)
+		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+							   restrictlist, jointype,
+							   sjinfo, &semifactors,
+							   param_source_rels, extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cb69c03..eeb2a41 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -1961,16 +1960,25 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	ForeignScan *scan_plan;
 	RelOptInfo *rel = best_path->path.parent;
 	Index		scan_relid = rel->relid;
-	RangeTblEntry *rte;
+	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
 	ListCell   *lc;
 	int			i;
 
-	/* it should be a base rel... */
-	Assert(scan_relid > 0);
-	Assert(rel->rtekind == RTE_RELATION);
-	rte = planner_rt_fetch(scan_relid, root);
-	Assert(rte->rtekind == RTE_RELATION);
+	/*
+	 * If we're scanning a base relation, look up the OID.
+	 * (We can skip this if scanning a join relation.)
+	 */
+	if (scan_relid > 0)
+	{
+		RangeTblEntry *rte;
+
+		Assert(rel->rtekind == RTE_RELATION);
+		rte = planner_rt_fetch(scan_relid, root);
+		Assert(rte->rtekind == RTE_RELATION);
+		rel_oid = rte->relid;
+	}
+	Assert(rel->fdwroutine != NULL);
 
 	/*
 	 * Sort clauses into best execution order.  We do this first since the FDW
@@ -1985,13 +1993,39 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * has selected some join clauses for remote use but also wants them
 	 * rechecked locally).
 	 */
-	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
 												tlist, scan_clauses);
+	/*
+	 * Sanity check.  There may be resjunk entries in fdw_ps_tlist that
+	 * are included only to help EXPLAIN deparse plans properly. We require
+	 * that these are at the end, so that when the executor builds the scan
+	 * descriptor based on the non-junk entries, it gets the attribute
+	 * numbers correct.
+	 */
+	if (scan_plan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, scan_plan->fdw_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+	/* Set the relids that are represented by this foreign scan for Explain */
+	scan_plan->fdw_relids = best_path->path.parent->relids;
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
+	/* Track FDW server-id; no need to make FDW do this */
+	scan_plan->fdw_handler = rel->fdw_handler;
+
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
@@ -2053,12 +2087,7 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
-
-	/*
-	 * Right now, all we can support is CustomScan node which is associated
-	 * with a particular base relation to be scanned.
-	 */
-	Assert(rel && rel->reloptkind == RELOPT_BASEREL);
+	ListCell   *lc;
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2078,6 +2107,30 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	Assert(IsA(cplan, CustomScan));
 
 	/*
+	 * Sanity check.  There may be resjunk entries in custom_ps_tlist that
+	 * are included only to help EXPLAIN deparse plans properly. We require
+	 * that these are at the end, so that when the executor builds the scan
+	 * descriptor based on the non-junk entries, it gets the attribute
+	 * numbers correct.
+	 */
+	if (cplan->scan.scanrelid == 0)
+	{
+		bool	found_resjunk = false;
+
+		foreach (lc, cplan->custom_ps_tlist)
+		{
+			TargetEntry	   *tle = lfirst(lc);
+
+			if (tle->resjunk)
+				found_resjunk = true;
+			else if (found_resjunk)
+				elog(ERROR, "junk TLE should not apper prior to valid one");
+		}
+	}
+	/* Set the relids that are represented by this custom scan for Explain */
+	cplan->custom_relids = best_path->path.parent->relids;
+
+	/*
 	 * Copy cost data from Path to Plan; no need to make custom-plan providers
 	 * do this
 	 */
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 94b12ab..69ed2a5 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -86,6 +86,12 @@ static void flatten_unplanned_rtes(PlannerGlobal *glob, RangeTblEntry *rte);
 static bool flatten_rtes_walker(Node *node, PlannerGlobal *glob);
 static void add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte);
 static Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_foreignscan_references(PlannerInfo *root,
+									   ForeignScan *fscan,
+									   int rtoffset);
+static void set_customscan_references(PlannerInfo *root,
+									  CustomScan *cscan,
+									  int rtoffset);
 static Plan *set_indexonlyscan_references(PlannerInfo *root,
 							 IndexOnlyScan *plan,
 							 int rtoffset);
@@ -565,31 +571,11 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 		case T_ForeignScan:
-			{
-				ForeignScan *splan = (ForeignScan *) plan;
-
-				splan->scan.scanrelid += rtoffset;
-				splan->scan.plan.targetlist =
-					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
-				splan->scan.plan.qual =
-					fix_scan_list(root, splan->scan.plan.qual, rtoffset);
-				splan->fdw_exprs =
-					fix_scan_list(root, splan->fdw_exprs, rtoffset);
-			}
+			set_foreignscan_references(root, (ForeignScan *) plan, rtoffset);
 			break;
 
 		case T_CustomScan:
-			{
-				CustomScan *splan = (CustomScan *) plan;
-
-				splan->scan.scanrelid += rtoffset;
-				splan->scan.plan.targetlist =
-					fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
-				splan->scan.plan.qual =
-					fix_scan_list(root, splan->scan.plan.qual, rtoffset);
-				splan->custom_exprs =
-					fix_scan_list(root, splan->custom_exprs, rtoffset);
-			}
+			set_customscan_references(root, (CustomScan *) plan, rtoffset);
 			break;
 
 		case T_NestLoop:
@@ -877,6 +863,121 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 }
 
 /*
+ * set_foreignscan_references
+ *     Do set_plan_references processing on an ForeignScan
+ */
+static void
+set_foreignscan_references(PlannerInfo *root,
+						   ForeignScan *fscan,
+						   int rtoffset)
+{
+	if (rtoffset > 0)
+	{
+		Bitmapset  *tempset = NULL;
+		int			x = -1;
+
+		while ((x = bms_next_member(fscan->fdw_relids, x)) >= 0)
+			tempset = bms_add_member(tempset, x + rtoffset);
+		fscan->fdw_relids = tempset;
+	}
+
+	if (fscan->scan.scanrelid == 0)
+	{
+		indexed_tlist *pscan_itlist = build_tlist_index(fscan->fdw_ps_tlist);
+
+		fscan->scan.plan.targetlist = (List *)
+			fix_upper_expr(root,
+						   (Node *) fscan->scan.plan.targetlist,
+						   pscan_itlist,
+						   INDEX_VAR,
+						   rtoffset);
+		fscan->scan.plan.qual = (List *)
+			fix_upper_expr(root,
+						   (Node *) fscan->scan.plan.qual,
+						   pscan_itlist,
+						   INDEX_VAR,
+						   rtoffset);
+		fscan->fdw_exprs = (List *)
+			fix_upper_expr(root,
+						   (Node *) fscan->fdw_exprs,
+						   pscan_itlist,
+						   INDEX_VAR,
+						   rtoffset);
+		fscan->fdw_ps_tlist =
+			fix_scan_list(root, fscan->fdw_ps_tlist, rtoffset);
+		pfree(pscan_itlist);
+	}
+	else
+	{
+		fscan->scan.scanrelid += rtoffset;
+		fscan->scan.plan.targetlist =
+			fix_scan_list(root, fscan->scan.plan.targetlist, rtoffset);
+		fscan->scan.plan.qual =
+			fix_scan_list(root, fscan->scan.plan.qual, rtoffset);
+		fscan->fdw_exprs =
+			fix_scan_list(root, fscan->fdw_exprs, rtoffset);
+	}
+}
+
+/*
+ * set_customscan_references
+ *     Do set_plan_references processing on an CustomScan
+ */
+static void
+set_customscan_references(PlannerInfo *root,
+						  CustomScan *cscan,
+						  int rtoffset)
+{
+	if (rtoffset > 0)
+	{
+		Bitmapset  *tempset = NULL;
+		int			x = -1;
+
+		while ((x = bms_next_member(cscan->custom_relids, x)) >= 0)
+			tempset = bms_add_member(tempset, x + rtoffset);
+		cscan->custom_relids = tempset;
+	}
+
+	if (cscan->scan.scanrelid == 0)
+	{
+		indexed_tlist *pscan_itlist =
+			build_tlist_index(cscan->custom_ps_tlist);
+
+		cscan->scan.plan.targetlist = (List *)
+			fix_upper_expr(root,
+						   (Node *) cscan->scan.plan.targetlist,
+						   pscan_itlist,
+						   INDEX_VAR,
+						   rtoffset);
+		cscan->scan.plan.qual = (List *)
+			fix_upper_expr(root,
+						   (Node *) cscan->scan.plan.qual,
+						   pscan_itlist,
+						   INDEX_VAR,
+						   rtoffset);
+		cscan->custom_exprs = (List *)
+			fix_upper_expr(root,
+						   (Node *) cscan->custom_exprs,
+						   pscan_itlist,
+						   INDEX_VAR,
+						   rtoffset);
+		cscan->custom_ps_tlist =
+			fix_scan_list(root, cscan->custom_ps_tlist, rtoffset);
+		pfree(pscan_itlist);
+	}
+	else
+	{
+		cscan->scan.scanrelid += rtoffset;
+		cscan->scan.plan.targetlist =
+			fix_scan_list(root, cscan->scan.plan.targetlist, rtoffset);
+		cscan->scan.plan.qual =
+			fix_scan_list(root, cscan->scan.plan.qual, rtoffset);
+		cscan->custom_exprs =
+			fix_scan_list(root, cscan->custom_exprs, rtoffset);
+	}
+}
+
+/*
  * set_indexonlyscan_references
  *		Do set_plan_references processing on an IndexOnlyScan
  *
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8abed2a..068ab39 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -379,10 +379,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		rel->fdw_handler = GetFdwHandlerByRelId(RelationGetRelid(relation));
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+	}
 	else
+	{
+		rel->fdw_handler = InvalidOid;
 		rel->fdwroutine = NULL;
-
+	}
 	heap_close(relation, NoLock);
 
 	/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..5623566 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "foreign/fdwapi.h"
 #include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -122,6 +123,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->fdwroutine = NULL;
+	rel->fdw_handler = InvalidOid;
 	rel->fdw_private = NULL;
 	rel->baserestrictinfo = NIL;
 	rel->baserestrictcost.startup = 0;
@@ -427,6 +429,18 @@ build_join_rel(PlannerInfo *root,
 							   sjinfo, restrictlist);
 
 	/*
+	 * Set FDW handler and routine if both outer and inner relation
+	 * are managed by same FDW driver.
+	 */
+	if (OidIsValid(outer_rel->fdw_handler) &&
+		OidIsValid(inner_rel->fdw_handler) &&
+		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	{
+		joinrel->fdw_handler = outer_rel->fdw_handler;
+		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+	}
+
+	/*
 	 * Add the joinrel to the query's joinrel list, and store it into the
 	 * auxiliary hashtable if there is one.  NB: GEQO requires us to append
 	 * the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 29b5b1b..82bb438 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3843,6 +3843,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	/* index_tlist is set only if it's an IndexOnlyScan */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+	else if (IsA(ps->plan, ForeignScan))
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+	else if (IsA(ps->plan, CustomScan))
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..c683d92 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,17 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
+typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
+											  RelOptInfo *joinrel,
+											  RelOptInfo *outerrel,
+											  RelOptInfo *innerrel,
+											  List *restrictlist,
+											  JoinType jointype,
+											  SpecialJoinInfo *sjinfo,
+											  SemiAntiJoinFactors *semifactors,
+											  Relids param_source_rels,
+											  Relids extra_lateral_rels);
+
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
 
@@ -150,10 +161,14 @@ typedef struct FdwRoutine
 
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
+
+	/* Support functions for join push-down */
+	GetForeignJoinPaths_function GetForeignJoinPaths;
 } FdwRoutine;
 
 
 /* Functions in foreign/foreign.c */
+extern Oid GetFdwHandlerByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21cbfa8..91b10cf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -471,7 +471,11 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) onto a pair of INDEX_VAR and alternative varattno.
+ * When fdw_ps_tlist is used, this represents a remote join, and the FDW
+ * driver is responsible for setting this field to an appropriate value.
+ * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
  * Const.  Usually, though, you'll be better off choosing a representation
  * that can be dumped usefully by nodeToString().
@@ -480,18 +484,23 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
+	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
+	List	   *fdw_ps_tlist;	/* optional pseudo-scan tlist for FDW */
 	List	   *fdw_private;	/* private data for FDW */
+	Bitmapset  *fdw_relids;		/* set of relid (index of range-tables)
+								 * represented by this node */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
 /* ----------------
  *	   CustomScan node
  *
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private.  Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ * Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
  * ----------------
  */
 struct CustomScan;
@@ -512,7 +521,9 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
+	List	   *custom_ps_tlist;/* optional pseudo-scan target list */
 	List	   *custom_private; /* private data for custom code */
+	Bitmapset  *custom_relids;	/* RTIs this node generates */
 	const CustomScanMethods *methods;
 } CustomScan;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 401a686..1713d29 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
+	Oid			fdw_handler;	/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+											 RelOptInfo *joinrel,
+											 RelOptInfo *outerrel,
+											 RelOptInfo *innerrel,
+											 List *restrictlist,
+											 JoinType jointype,
+											 SpecialJoinInfo *sjinfo,
+											 SemiAntiJoinFactors *semifactors,
+											 Relids param_source_rels,
+											 Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
 /* Hook for plugins to replace standard_join_search() */
 typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fa72918..0c8cbcd 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
#65Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#64)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Thu, Apr 30, 2015 at 5:21 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

I wanted to submit the v14 after the above items get clarified.
The attached patch (v14) includes all what you suggested in the previous
message.

Committed, after heavily working over the documentation, and with some
more revisions to the comments as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#65)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Apr 30, 2015 at 5:21 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

I wanted to submit the v14 after the above items get clarified.
The attached patch (v14) includes all what you suggested in the previous
message.

Committed, after heavily working over the documentation, and with some
more revisions to the comments as well.

I've been trying to code-review this patch, because the documentation
seemed several bricks shy of a load, and I find myself entirely confused
by the fdw_ps_tlist and custom_ps_tlist fields. The names, along with
some of the comments, imply that these are just targetlists for the join
nodes; but if that is the case then we don't need them, because surely
scan.targetlist would serve the purpose. There is some other, utterly
uncommented, code in setrefs.c and ruleutils.c that suggests these fields
are supposed to serve a purpose more like IndexOnlyScan.indextlist; but
if that's what they are the comments are woefully inadequate/misleading,
and I'm really unsure that the associated code actually works. Also,
if that is what they're for (ie, to allow the FDW to redefine the scan
tuple contents) it would likely be better to decouple that feature from
whether the plan node is for a simple scan or a join. The business about
resjunk columns in that list also seems a bit half baked, or at least
underdocumented.

I do not think that this should have gotten committed without an attendant
proof-of-concept patch to postgres_fdw, so that the logic could be tested.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#66)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Apr 30, 2015 at 5:21 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

I wanted to submit the v14 after the above items get clarified.
The attached patch (v14) includes all what you suggested in the previous
message.

Committed, after heavily working over the documentation, and with some
more revisions to the comments as well.

I've been trying to code-review this patch, because the documentation
seemed several bricks shy of a load, and I find myself entirely confused
by the fdw_ps_tlist and custom_ps_tlist fields. The names, along with
some of the comments, imply that these are just targetlists for the join
nodes; but if that is the case then we don't need them, because surely
scan.targetlist would serve the purpose. There is some other, utterly
uncommented, code in setrefs.c and ruleutils.c that suggests these fields
are supposed to serve a purpose more like IndexOnlyScan.indextlist; but
if that's what they are the comments are woefully inadequate/misleading,
and I'm really unsure that the associated code actually works.

Main-point of your concern is lack of documentation/comments to introduce
how does the pseudo-scan targetlist works here, isn't it??

Also,
if that is what they're for (ie, to allow the FDW to redefine the scan
tuple contents) it would likely be better to decouple that feature from
whether the plan node is for a simple scan or a join.

In this version, we don't intend FDW/CSP to redefine the contents of
scan tuples, even though I want off-loads heavy targetlist calculation
workloads to external computing resources in *the future version*.

The business about
resjunk columns in that list also seems a bit half baked, or at least
underdocumented.

I'll add source code comments to introduce how does it works any when
does it have resjunk=true. It will be a bit too deep to be introduced
in the SGML file.

I do not think that this should have gotten committed without an attendant
proof-of-concept patch to postgres_fdw, so that the logic could be tested.

Hanada-san is now working according to the comments from Robert.
Overall design was already discussed in the upthread and the latest
implementation follows the people's consensus.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kouhei Kaigai (#67)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

I've been trying to code-review this patch, because the documentation
seemed several bricks shy of a load, and I find myself entirely confused
by the fdw_ps_tlist and custom_ps_tlist fields.

Main-point of your concern is lack of documentation/comments to introduce
how does the pseudo-scan targetlist works here, isn't it??

Well, there's a bunch of omissions and outright errors in the docs and
comments, but this is the main issue that I was uncertain how to fix
from looking at the patch.

Also,
if that is what they're for (ie, to allow the FDW to redefine the scan
tuple contents) it would likely be better to decouple that feature from
whether the plan node is for a simple scan or a join.

In this version, we don't intend FDW/CSP to redefine the contents of
scan tuples, even though I want off-loads heavy targetlist calculation
workloads to external computing resources in *the future version*.

I do not think it's a good idea to introduce such a field now and then
redefine how it works and what it's for in a future version. We should
not be moving the FDW APIs around more than we absolutely have to,
especially not in ways that wouldn't throw an obvious compile error
for un-updated code. Also, the longer we wait to make a change that
we know we want, the more pain we inflict on FDW authors (simply because
there will be more of them a year from now than there are today).

The business about
resjunk columns in that list also seems a bit half baked, or at least
underdocumented.

I'll add source code comments to introduce how does it works any when
does it have resjunk=true. It will be a bit too deep to be introduced
in the SGML file.

I don't actually see a reason for resjunk marking in that list at all,
if what it's for is to define the contents of the scan tuple. I think we
should just s/ExecCleanTypeFromTL/ExecTypeFromTL/ in nodeForeignscan and
nodeCustom, and get rid of the "sanity check" in create_foreignscan_plan
(which is pretty pointless anyway, considering the number of other ways
you could screw up that tlist without it being detected).

I'm also inclined to rename the fields to
fdw_scan_tlist/custom_scan_tlist, which would better reflect what they do,
and to change the API of make_foreignscan() to add a parameter
corresponding to the scan tlist. It's utterly bizarre and error-prone
that this patch has added a field that the FDW is supposed to set and
not changed make_foreignscan to match.

I do not think that this should have gotten committed without an attendant
proof-of-concept patch to postgres_fdw, so that the logic could be tested.

Hanada-san is now working according to the comments from Robert.

That's nice, but 9.5 feature freeze is only a week away. I don't have a
lot of confidence that this stuff is actually in a state where we won't
regret shipping it in 9.5.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#68)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, May 8, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

That's nice, but 9.5 feature freeze is only a week away. I don't have a
lot of confidence that this stuff is actually in a state where we won't
regret shipping it in 9.5.

Yeah. The POC you were asking for upthread certainly exists and has
for a while, or I would not have committed this. But I do not think
it likely that the postgres_fdw support will be ready for 9.5.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#69)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, May 8, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

That's nice, but 9.5 feature freeze is only a week away. I don't have a
lot of confidence that this stuff is actually in a state where we won't
regret shipping it in 9.5.

Yeah. The POC you were asking for upthread certainly exists and has
for a while, or I would not have committed this. But I do not think
it likely that the postgres_fdw support will be ready for 9.5.

Well, we have two alternatives. I can keep hacking on this and get it
to a state where it seems credible to me, but we won't have any proof
that it actually works (though perhaps we could treat any problems
as bugs that should hopefully get found before 9.5 ships, if a
postgres_fdw patch shows up in the next few months). Or we could
revert the whole thing and bounce it to the 9.6 cycle. I don't really
like doing the latter, but I'm pretty uncomfortable with committing to
published FDW APIs that are (a) as messy as this and (b) practically
untested. The odds that something slipped through the cracks are high.

Aside from the other gripes I raised, I'm exceedingly unhappy with the
ad-hoc APIs proposed for GetForeignJoinPaths and set_join_pathlist_hook.
It's okay for internal calls in joinpath.c to look like that, but
exporting that set of parameters seems like pure folly. We've changed
those parameter lists repeatedly (for instance in 9.2 and again in 9.3);
the odds that they'll need to change again in future approach 100%.

One way we could reduce the risk of code breakage there is to stuff all
or most of those parameters into a struct. This might result in a small
slowdown for the internal calls, or then again maybe not --- there
probably aren't many architectures that can pass 10 parameters in
registers anyway.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#70)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

... btw, I just noticed something that had escaped me because it seems so
obviously wrong that I had not even stopped to consider the possibility
that the code was doing what it's doing. To wit, that the planner
supposes that two foreign tables are potentially remote-joinable if they
share the same underlying FDW handler function. Not the same server, and
not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for
the handler function. I think this is fundamentally bogus. Under what
circumstances are we not just laying off the need to check same server
origin onto the FDW? How is it that the urgent need for the FDW to check
for that isn't even mentioned in the documentation?

I think that we'd really be better off insisting on same server (as in
same pg_foreign_server OID), hence automatically same FDW, and what's
even more important, same user mapping for any possible query execution
context. The possibility that there are some corner cases where some FDWs
could optimize other scenarios seems to me to be poor return for the bugs
and security holes that will arise any time typical FDWs forget to check
this.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tom Lane (#71)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-05-09 6:48 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:

... btw, I just noticed something that had escaped me because it seems so
obviously wrong that I had not even stopped to consider the possibility
that the code was doing what it's doing. To wit, that the planner
supposes that two foreign tables are potentially remote-joinable if they
share the same underlying FDW handler function. Not the same server, and
not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for
the handler function. I think this is fundamentally bogus. Under what
circumstances are we not just laying off the need to check same server
origin onto the FDW? How is it that the urgent need for the FDW to check
for that isn't even mentioned in the documentation?

Indeed. Comparison of fdw_handler may cause unexpected behavior.
I agree it needs to be fixed up.

I think that we'd really be better off insisting on same server (as in
same pg_foreign_server OID), hence automatically same FDW, and what's
even more important, same user mapping for any possible query execution
context. The possibility that there are some corner cases where some FDWs
could optimize other scenarios seems to me to be poor return for the bugs
and security holes that will arise any time typical FDWs forget to check
this.

The former version of foreign/custom-join patch did check for joinable relations
using FDW server id, however, it was changed to the current form because it
may have additional optimization opportunity - in case when multiple foreign
servers have same remote host, access credential and others...
Also, I understand your concern about potential security holes by oversight.
It is an issue like a weighing scales, however, it seems to me the benefit
come from the potential optimization case does not negate the security-
hole risk enough.
So, I'll make a patch to change the logic to check joinable foreign-tables.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tom Lane (#68)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-05-09 2:46 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

I've been trying to code-review this patch, because the documentation
seemed several bricks shy of a load, and I find myself entirely confused
by the fdw_ps_tlist and custom_ps_tlist fields.

Main-point of your concern is lack of documentation/comments to introduce
how does the pseudo-scan targetlist works here, isn't it??

Well, there's a bunch of omissions and outright errors in the docs and
comments, but this is the main issue that I was uncertain how to fix
from looking at the patch.

Also,
if that is what they're for (ie, to allow the FDW to redefine the scan
tuple contents) it would likely be better to decouple that feature from
whether the plan node is for a simple scan or a join.

In this version, we don't intend FDW/CSP to redefine the contents of
scan tuples, even though I want off-loads heavy targetlist calculation
workloads to external computing resources in *the future version*.

I do not think it's a good idea to introduce such a field now and then
redefine how it works and what it's for in a future version. We should
not be moving the FDW APIs around more than we absolutely have to,
especially not in ways that wouldn't throw an obvious compile error
for un-updated code. Also, the longer we wait to make a change that
we know we want, the more pain we inflict on FDW authors (simply because
there will be more of them a year from now than there are today).

Ah, above my sentence don't intend to reuse the existing field for
different works in the future version. It's just what I want to support
in the future version.
Yep, I see. It is not a good idea to redefine the existing field for
different purpose silently. It's not my plan.

The business about
resjunk columns in that list also seems a bit half baked, or at least
underdocumented.

I'll add source code comments to introduce how does it works any when
does it have resjunk=true. It will be a bit too deep to be introduced
in the SGML file.

I don't actually see a reason for resjunk marking in that list at all,
if what it's for is to define the contents of the scan tuple. I think we
should just s/ExecCleanTypeFromTL/ExecTypeFromTL/ in nodeForeignscan and
nodeCustom, and get rid of the "sanity check" in create_foreignscan_plan
(which is pretty pointless anyway, considering the number of other ways
you could screw up that tlist without it being detected).

/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010D7E24@BPXM15GP.gisp.nec.co.jp

Does the introduction in above post make sense?
The *_ps_tlist is not only used for a basic of scan-tuple descriptor, but
also used to solve var-node if varno==INDEX_VAR in EXPLAIN command.
On the other hands, existence of the junk entries (which are referenced in
external computing resources only) may cause unnecessary projection.
So, I want to discriminate target-entries for basis of scan-tuple descriptor
from other ones just for EXPLAIN command.

I'm also inclined to rename the fields to
fdw_scan_tlist/custom_scan_tlist, which would better reflect what they do,
and to change the API of make_foreignscan() to add a parameter
corresponding to the scan tlist. It's utterly bizarre and error-prone
that this patch has added a field that the FDW is supposed to set and
not changed make_foreignscan to match.

OK, I'll do the both of changes. The name of ps_tlist is a shorten of
"pseudo-scan target-list". So, fdw_scan_tlist/custom_scan_tlist are
almost intentional.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tom Lane (#70)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-05-09 3:51 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, May 8, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

That's nice, but 9.5 feature freeze is only a week away. I don't have a
lot of confidence that this stuff is actually in a state where we won't
regret shipping it in 9.5.

Yeah. The POC you were asking for upthread certainly exists and has
for a while, or I would not have committed this. But I do not think
it likely that the postgres_fdw support will be ready for 9.5.

Well, we have two alternatives. I can keep hacking on this and get it
to a state where it seems credible to me, but we won't have any proof
that it actually works (though perhaps we could treat any problems
as bugs that should hopefully get found before 9.5 ships, if a
postgres_fdw patch shows up in the next few months). Or we could
revert the whole thing and bounce it to the 9.6 cycle. I don't really
like doing the latter, but I'm pretty uncomfortable with committing to
published FDW APIs that are (a) as messy as this and (b) practically
untested. The odds that something slipped through the cracks are high.

Aside from the other gripes I raised, I'm exceedingly unhappy with the
ad-hoc APIs proposed for GetForeignJoinPaths and set_join_pathlist_hook.
It's okay for internal calls in joinpath.c to look like that, but
exporting that set of parameters seems like pure folly. We've changed
those parameter lists repeatedly (for instance in 9.2 and again in 9.3);
the odds that they'll need to change again in future approach 100%.

One way we could reduce the risk of code breakage there is to stuff all
or most of those parameters into a struct. This might result in a small
slowdown for the internal calls, or then again maybe not --- there
probably aren't many architectures that can pass 10 parameters in
registers anyway.

Is it like a following structure definition?

typedef struct
{
PlannerInfo *root;
RelOptInfo *joinrel;
RelOptInfo *outerrel;
RelOptInfo *innerrel;
List *restrictlist;
JoinType jointype;
SpecialJoinInfo *sjinfo;
SemiAntiJoinFactors *semifactors;
Relids param_source_rels;
Relids extra_lateral_rels;
} SetJoinPathListArgs;

I agree the idea. It also helps CSP driver implementation where it calls
next driver that was already chained on its installation.

if (set_join_pathlist_next)
set_join_pathlist_next(args);

is more stable manner than

if (set_join_pathlist_next)
set_join_pathlist_next(root,
joinrel,
outerrel,
innerrel,
restrictlist,
jointype,
sjinfo,
semifactors,
param_source_rels,
extra_lateral_rels);

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#70)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, May 8, 2015 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, we have two alternatives. I can keep hacking on this and get it
to a state where it seems credible to me, but we won't have any proof
that it actually works (though perhaps we could treat any problems
as bugs that should hopefully get found before 9.5 ships, if a
postgres_fdw patch shows up in the next few months). Or we could
revert the whole thing and bounce it to the 9.6 cycle. I don't really
like doing the latter, but I'm pretty uncomfortable with committing to
published FDW APIs that are (a) as messy as this and (b) practically
untested. The odds that something slipped through the cracks are high.

A lot of work went into this patch. I think it would be a shame to
revert it. I'd even rather ship something imperfect or somewhat
unstable and change it later than give up and roll it all back.

Aside from the other gripes I raised, I'm exceedingly unhappy with the
ad-hoc APIs proposed for GetForeignJoinPaths and set_join_pathlist_hook.
It's okay for internal calls in joinpath.c to look like that, but
exporting that set of parameters seems like pure folly. We've changed
those parameter lists repeatedly (for instance in 9.2 and again in 9.3);
the odds that they'll need to change again in future approach 100%.

One way we could reduce the risk of code breakage there is to stuff all
or most of those parameters into a struct. This might result in a small
slowdown for the internal calls, or then again maybe not --- there
probably aren't many architectures that can pass 10 parameters in
registers anyway.

Putting it into a structure certainly seems fine. I think it's pretty
silly to assume that the FDW APIs are frozen or we're never going to
change them. There was much discussion of the merits of exposing that
information or not, and I was (and am) convinced that the FDWs need
access to most if not all of that stuff, and that removing access to
it will cripple the facility and result in mountains of duplicated and
inefficient code. If in the future we compute more or different stuff
there, I expect there's a good chance that FDWs will need to be
updated to look at that stuff too. Of course, I don't object to
maximizing our chances of not needing an API break, but I will be
neither surprised nor disappointed if such efforts fail.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#71)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Fri, May 8, 2015 at 5:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

... btw, I just noticed something that had escaped me because it seems so
obviously wrong that I had not even stopped to consider the possibility
that the code was doing what it's doing. To wit, that the planner
supposes that two foreign tables are potentially remote-joinable if they
share the same underlying FDW handler function. Not the same server, and
not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for
the handler function. I think this is fundamentally bogus. Under what
circumstances are we not just laying off the need to check same server
origin onto the FDW? How is it that the urgent need for the FDW to check
for that isn't even mentioned in the documentation?

I think that we'd really be better off insisting on same server (as in
same pg_foreign_server OID), hence automatically same FDW, and what's
even more important, same user mapping for any possible query execution
context. The possibility that there are some corner cases where some FDWs
could optimize other scenarios seems to me to be poor return for the bugs
and security holes that will arise any time typical FDWs forget to check
this.

I originally wanted to go quite the other way with this and check for
join pushdown via handler X any time at least one of the two relations
involved used handler X, even if the other one used some other handler
or was a plain table. In particular, it seems to me quite plausible
to want to teach an FDW that a certain local table is replicated on a
remote node, allowing a join between a foreign table and a plain table
to be pushed down. This infrastructure can't be used that way anyhow,
so maybe there's no harm in tightening it up, but I'm wary of
circumscribing what FDW authors can do. I think it's better to be
rather expansive in terms of when we call them and let them return
without doing anything some of them time than to define the situations
in which we call them too narrowly and end up ruling out interesting
use cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Robert Haas (#76)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-05-09 11:21 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

On Fri, May 8, 2015 at 5:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

... btw, I just noticed something that had escaped me because it seems so
obviously wrong that I had not even stopped to consider the possibility
that the code was doing what it's doing. To wit, that the planner
supposes that two foreign tables are potentially remote-joinable if they
share the same underlying FDW handler function. Not the same server, and
not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for
the handler function. I think this is fundamentally bogus. Under what
circumstances are we not just laying off the need to check same server
origin onto the FDW? How is it that the urgent need for the FDW to check
for that isn't even mentioned in the documentation?

I think that we'd really be better off insisting on same server (as in
same pg_foreign_server OID), hence automatically same FDW, and what's
even more important, same user mapping for any possible query execution
context. The possibility that there are some corner cases where some FDWs
could optimize other scenarios seems to me to be poor return for the bugs
and security holes that will arise any time typical FDWs forget to check
this.

I originally wanted to go quite the other way with this and check for
join pushdown via handler X any time at least one of the two relations
involved used handler X, even if the other one used some other handler
or was a plain table. In particular, it seems to me quite plausible
to want to teach an FDW that a certain local table is replicated on a
remote node, allowing a join between a foreign table and a plain table
to be pushed down. This infrastructure can't be used that way anyhow,
so maybe there's no harm in tightening it up, but I'm wary of
circumscribing what FDW authors can do. I think it's better to be
rather expansive in terms of when we call them and let them return
without doing anything some of them time than to define the situations
in which we call them too narrowly and end up ruling out interesting
use cases.

Probably, it is relatively minor case to join a foreign table and a replicated
local relation on remote side. Even if the rough check by sameness of
foreign server-id does not invoke GetForeignJoinPaths, FDW driver can
implement its arbitrary logic using set_join_pathlist_hook by its own risk,
isn't it?

The attached patch changed the logic to check joinability of two foreign
relations. As upthread, it checks foreign server-id instead of handler
function.
build_join_rel() set fdw_server of RelOptInfo if inner and outer foreign-
relations have same value, then it eventually allows to kick
GetForeignJoinPaths on add_paths_to_joinrel().

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

custom-join-fdw-pushdown-check-by-server.patchapplication/octet-stream; name=custom-join-fdw-pushdown-check-by-server.patchDownload
 src/backend/executor/nodeForeignscan.c  |  2 +-
 src/backend/foreign/foreign.c           | 50 +++++++++++++++++++++------------
 src/backend/nodes/copyfuncs.c           |  2 +-
 src/backend/nodes/outfuncs.c            |  2 +-
 src/backend/optimizer/plan/createplan.c |  2 +-
 src/backend/optimizer/util/plancat.c    |  4 +--
 src/backend/optimizer/util/relnode.c    | 12 ++++----
 src/include/foreign/fdwapi.h            |  3 +-
 src/include/nodes/plannodes.h           |  2 +-
 src/include/nodes/relation.h            |  4 +--
 10 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index fa553ac..17f7fb8 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -169,7 +169,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	/*
 	 * Acquire function pointers from the FDW's handler, and init fdw_state.
 	 */
-	fdwroutine = GetFdwRoutine(node->fdw_handler);
+	fdwroutine = GetFdwRoutineByServer(node->fdw_server);
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cdbd550..ea293ca 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -281,6 +281,27 @@ GetForeignColumnOptions(Oid relid, AttrNumber attnum)
 	return options;
 }
 
+/*
+ * GetFdwRoutineByRelId - looks up Oid of the foreign server for the
+ * given foreign table.
+ */
+Oid
+GetFdwServerByRelId(Oid relid)
+{
+	HeapTuple	tp;
+	Form_pg_foreign_table tableform;
+	Oid			serverid;
+
+	/* Get server OID for the foreign table. */
+	tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for foreign table %u", relid);
+	tableform = (Form_pg_foreign_table) GETSTRUCT(tp);
+	serverid = tableform->ftserver;
+	ReleaseSysCache(tp);
+
+	return serverid;
+}
 
 /*
  * GetFdwRoutine - call the specified foreign-data wrapper handler routine
@@ -302,30 +323,19 @@ GetFdwRoutine(Oid fdwhandler)
 	return routine;
 }
 
-
 /*
- * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table
+ * GetFdwRoutineByServer - look up the handler of the foreign-data wrapper
+ * for the given foreign server, and retrieve its FdwRoutine struct.
  */
-Oid
-GetFdwHandlerByRelId(Oid relid)
+FdwRoutine *
+GetFdwRoutineByServer(Oid serverid)
 {
 	HeapTuple	tp;
 	Form_pg_foreign_data_wrapper fdwform;
 	Form_pg_foreign_server serverform;
-	Form_pg_foreign_table tableform;
-	Oid			serverid;
 	Oid			fdwid;
 	Oid			fdwhandler;
 
-	/* Get server OID for the foreign table. */
-	tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
-	if (!HeapTupleIsValid(tp))
-		elog(ERROR, "cache lookup failed for foreign table %u", relid);
-	tableform = (Form_pg_foreign_table) GETSTRUCT(tp);
-	serverid = tableform->ftserver;
-	ReleaseSysCache(tp);
-
 	/* Get foreign-data wrapper OID for the server. */
 	tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(serverid));
 	if (!HeapTupleIsValid(tp))
@@ -350,7 +360,8 @@ GetFdwHandlerByRelId(Oid relid)
 
 	ReleaseSysCache(tp);
 
-	return fdwhandler;
+	/* And finally, call the handler function. */
+	return GetFdwRoutine(fdwhandler);
 }
 
 /*
@@ -360,9 +371,12 @@ GetFdwHandlerByRelId(Oid relid)
 FdwRoutine *
 GetFdwRoutineByRelId(Oid relid)
 {
-	Oid			fdwhandler = GetFdwHandlerByRelId(relid);
+	Oid			serverid;
 
-	return GetFdwRoutine(fdwhandler);
+	/* Get server OID for the foreign table. */
+	serverid = GetFdwServerByRelId(relid);
+	/* Then retrieve its FdwRoutine struct */
+	return GetFdwRoutineByServer(serverid);
 }
 
 /*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 805045d..aa05590 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -592,7 +592,7 @@ _copyForeignScan(const ForeignScan *from)
 	/*
 	 * copy remainder of node
 	 */
-	COPY_SCALAR_FIELD(fdw_handler);
+	COPY_SCALAR_FIELD(fdw_server);
 	COPY_NODE_FIELD(fdw_exprs);
 	COPY_NODE_FIELD(fdw_ps_tlist);
 	COPY_NODE_FIELD(fdw_private);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f9f948e..5a94c5e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -558,7 +558,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	_outScanInfo(str, (const Scan *) node);
 
-	WRITE_OID_FIELD(fdw_handler);
+	WRITE_OID_FIELD(fdw_server);
 	WRITE_NODE_FIELD(fdw_exprs);
 	WRITE_NODE_FIELD(fdw_ps_tlist);
 	WRITE_NODE_FIELD(fdw_private);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index eeb2a41..a3f23ad 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2024,7 +2024,7 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
 
 	/* Track FDW server-id; no need to make FDW do this */
-	scan_plan->fdw_handler = rel->fdw_handler;
+	scan_plan->fdw_server = rel->fdw_server;
 
 	/*
 	 * Replace any outer-relation variables with nestloop params in the qual
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 068ab39..d81736c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -380,12 +380,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	/* Grab the fdwroutine info using the relcache, while we have it */
 	if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
 	{
-		rel->fdw_handler = GetFdwHandlerByRelId(RelationGetRelid(relation));
+		rel->fdw_server = GetFdwServerByRelId(RelationGetRelid(relation));
 		rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
 	}
 	else
 	{
-		rel->fdw_handler = InvalidOid;
+		rel->fdw_server = InvalidOid;
 		rel->fdwroutine = NULL;
 	}
 	heap_close(relation, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 5623566..e6a5349 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -123,7 +123,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->subroot = NULL;
 	rel->subplan_params = NIL;
 	rel->fdwroutine = NULL;
-	rel->fdw_handler = InvalidOid;
+	rel->fdw_server = InvalidOid;
 	rel->fdw_private = NULL;
 	rel->baserestrictinfo = NIL;
 	rel->baserestrictcost.startup = 0;
@@ -432,12 +432,12 @@ build_join_rel(PlannerInfo *root,
 	 * Set FDW handler and routine if both outer and inner relation
 	 * are managed by same FDW driver.
 	 */
-	if (OidIsValid(outer_rel->fdw_handler) &&
-		OidIsValid(inner_rel->fdw_handler) &&
-		outer_rel->fdw_handler == inner_rel->fdw_handler)
+	if (OidIsValid(outer_rel->fdw_server) &&
+		OidIsValid(inner_rel->fdw_server) &&
+		outer_rel->fdw_server == inner_rel->fdw_server)
 	{
-		joinrel->fdw_handler = outer_rel->fdw_handler;
-		joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+		joinrel->fdw_server = outer_rel->fdw_server;
+		joinrel->fdwroutine = GetFdwRoutineByServer(joinrel->fdw_server);
 	}
 
 	/*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c683d92..ff91d0f 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -168,8 +168,9 @@ typedef struct FdwRoutine
 
 
 /* Functions in foreign/foreign.c */
-extern Oid GetFdwHandlerByRelId(Oid relid);
+extern Oid GetFdwServerByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
+extern FdwRoutine *GetFdwRoutineByServer(Oid server_id);
 extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
 extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
 extern bool IsImportableForeignTable(const char *tablename,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index baeba2d..bd23d24 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -484,7 +484,7 @@ typedef struct WorkTableScan
 typedef struct ForeignScan
 {
 	Scan		scan;
-	Oid			fdw_handler;	/* OID of FDW handler */
+	Oid			fdw_server;		/* OID of FDW server */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
 	List	   *fdw_ps_tlist;	/* tlist, if replacing a join */
 	List	   *fdw_private;	/* private data for FDW */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1713d29..e0ca04f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,7 +366,7 @@ typedef struct PlannerInfo
  *		subroot - PlannerInfo for subquery (NULL if it's not a subquery)
  *		subplan_params - list of PlannerParamItems to be passed to subquery
  *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
- *		fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
+ *		fdw_server - OID of FDW server, if foreign table (else InvalidOid)
  *		fdw_private - private state for FDW, if foreign table (else NULL)
  *
  *		Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -462,7 +462,7 @@ typedef struct RelOptInfo
 	List	   *subplan_params; /* if subquery */
 	/* use "struct FdwRoutine" to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;		/* if foreign table */
-	Oid			fdw_handler;	/* if foreign table */
+	Oid			fdw_server;		/* if foreign table */
 	void	   *fdw_private;	/* if foreign table */
 
 	/* used by various scans and joins: */
#78Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Kohei KaiGai (#74)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-05-09 8:32 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2015-05-09 3:51 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, May 8, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

That's nice, but 9.5 feature freeze is only a week away. I don't have a
lot of confidence that this stuff is actually in a state where we won't
regret shipping it in 9.5.

Yeah. The POC you were asking for upthread certainly exists and has
for a while, or I would not have committed this. But I do not think
it likely that the postgres_fdw support will be ready for 9.5.

Well, we have two alternatives. I can keep hacking on this and get it
to a state where it seems credible to me, but we won't have any proof
that it actually works (though perhaps we could treat any problems
as bugs that should hopefully get found before 9.5 ships, if a
postgres_fdw patch shows up in the next few months). Or we could
revert the whole thing and bounce it to the 9.6 cycle. I don't really
like doing the latter, but I'm pretty uncomfortable with committing to
published FDW APIs that are (a) as messy as this and (b) practically
untested. The odds that something slipped through the cracks are high.

Aside from the other gripes I raised, I'm exceedingly unhappy with the
ad-hoc APIs proposed for GetForeignJoinPaths and set_join_pathlist_hook.
It's okay for internal calls in joinpath.c to look like that, but
exporting that set of parameters seems like pure folly. We've changed
those parameter lists repeatedly (for instance in 9.2 and again in 9.3);
the odds that they'll need to change again in future approach 100%.

One way we could reduce the risk of code breakage there is to stuff all
or most of those parameters into a struct. This might result in a small
slowdown for the internal calls, or then again maybe not --- there
probably aren't many architectures that can pass 10 parameters in
registers anyway.

Is it like a following structure definition?

typedef struct
{
PlannerInfo *root;
RelOptInfo *joinrel;
RelOptInfo *outerrel;
RelOptInfo *innerrel;
List *restrictlist;
JoinType jointype;
SpecialJoinInfo *sjinfo;
SemiAntiJoinFactors *semifactors;
Relids param_source_rels;
Relids extra_lateral_rels;
} SetJoinPathListArgs;

I agree the idea. It also helps CSP driver implementation where it calls
next driver that was already chained on its installation.

if (set_join_pathlist_next)
set_join_pathlist_next(args);

is more stable manner than

if (set_join_pathlist_next)
set_join_pathlist_next(root,
joinrel,
outerrel,
innerrel,
restrictlist,
jointype,
sjinfo,
semifactors,
param_source_rels,
extra_lateral_rels);

The attached patch newly defines ExtraJoinPathArgs struct to pack
all the necessary information to be delivered on GetForeignJoinPaths
and set_join_pathlist_hook.

Its definition is below:
typedef struct
{
PlannerInfo *root;
RelOptInfo *joinrel;
RelOptInfo *outerrel;
RelOptInfo *innerrel;
List *restrictlist;
JoinType jointype;
SpecialJoinInfo *sjinfo;
SemiAntiJoinFactors *semifactors;
Relids param_source_rels;
Relids extra_lateral_rels;
} ExtraJoinPathArgs;

then, hook invocation gets much simplified, like:

/*
* 6. Finally, give extensions a chance to manipulate the path list.
*/
if (set_join_pathlist_hook)
set_join_pathlist_hook(&jargs);

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

custom-join-argument-by-struct.patchapplication/octet-stream; name=custom-join-argument-by-struct.patchDownload
 doc/src/sgml/custom-scan.sgml         | 12 ++------
 doc/src/sgml/fdwhandler.sgml          | 11 +-------
 src/backend/optimizer/path/joinpath.c | 52 +++++++++++++++++++++++------------
 src/include/foreign/fdwapi.h          | 12 ++------
 src/include/optimizer/paths.h         | 33 ++++++++++++++--------
 5 files changed, 61 insertions(+), 59 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 9fd1db6..9c14d87 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -89,16 +89,8 @@ typedef struct CustomPath
    different combinations of inner and outer relations; it is the
    responsibility of the hook to minimize duplicated work.
 <programlisting>
-typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
-                                             RelOptInfo *joinrel,
-                                             RelOptInfo *outerrel,
-                                             RelOptInfo *innerrel,
-                                             List *restrictlist,
-                                             JoinType jointype,
-                                             SpecialJoinInfo *sjinfo,
-                                             SemiAntiJoinFactors *semifactors,
-                                             Relids param_source_rels,
-                                             Relids extra_lateral_rels);
+typedef void (*set_join_pathlist_hook_type) (ExtraJoinPathArgs *jargs);
+
 extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
 </programlisting>
   </para>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04f3c22..eb49eaa 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -603,16 +603,7 @@ IsForeignRelUpdatable (Relation rel);
     <para>
 <programlisting>
 void
-GetForeignJoinPaths(PlannerInfo *root,
-                    RelOptInfo *joinrel,
-                    RelOptInfo *outerrel,
-                    RelOptInfo *innerrel,
-                    List *restrictlist,
-                    JoinType jointype,
-                    SpecialJoinInfo *sjinfo,
-                    SemiAntiJoinFactors *semifactors,
-                    Relids param_source_rels,
-                    Relids extra_lateral_rels);
+GetForeignJoinPaths(ExtraJoinPathArgs *jargs);
 </programlisting>
      Create possible access paths for a join of two foreign tables managed
      by the same foreign data wrapper.
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index dabef3c..54a238f 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -265,25 +265,41 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 param_source_rels, extra_lateral_rels);
 
 	/*
-	 * 5. If both inner and outer relations are managed by the same FDW,
-	 * give it a chance to push down joins.
+	 * If we may consider the last two cases, join paths provided by
+	 * extensions, ExtraJoinPathArgs will pack all the arguments to
+	 * be informed, for simplification and stability of the interface.
 	 */
-	if (joinrel->fdwroutine &&
-		joinrel->fdwroutine->GetForeignJoinPaths)
-		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
-												 outerrel, innerrel,
-												 restrictlist, jointype, sjinfo,
-												 &semifactors,
-												 param_source_rels,
-												 extra_lateral_rels);
-	/*
-	 * 6. Finally, give extensions a chance to manipulate the path list.
-	 */
-	if (set_join_pathlist_hook)
-		set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
-							   restrictlist, jointype,
-							   sjinfo, &semifactors,
-							   param_source_rels, extra_lateral_rels);
+	if ((joinrel->fdwroutine &&
+		 joinrel->fdwroutine->GetForeignJoinPaths) ||
+		set_join_pathlist_hook)
+	{
+		ExtraJoinPathArgs	jargs;
+
+		jargs.root = root;
+		jargs.joinrel = joinrel;
+		jargs.outerrel = outerrel;
+		jargs.innerrel = innerrel;
+		jargs.restrictlist = restrictlist;
+		jargs.jointype = jointype;
+		jargs.sjinfo = sjinfo;
+		jargs.semifactors = &semifactors;
+		jargs.param_source_rels = param_source_rels;
+		jargs.extra_lateral_rels = extra_lateral_rels;
+
+		/*
+		 * 5. If both inner and outer relations are managed by the same
+		 * foreign server, give it a chance to push down joins.
+		 */
+		if (joinrel->fdwroutine &&
+			joinrel->fdwroutine->GetForeignJoinPaths)
+			joinrel->fdwroutine->GetForeignJoinPaths(&jargs);
+
+		/*
+		 * 6. Finally, give extensions a chance to manipulate the path list.
+		 */
+		if (set_join_pathlist_hook)
+			set_join_pathlist_hook(&jargs);
+	}
 }
 
 /*
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index c683d92..c674f8f 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -14,6 +14,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/relation.h"
+#include "optimizer/paths.h"
 
 /* To avoid including explain.h here, reference ExplainState thus: */
 struct ExplainState;
@@ -82,16 +83,7 @@ typedef void (*EndForeignModify_function) (EState *estate,
 
 typedef int (*IsForeignRelUpdatable_function) (Relation rel);
 
-typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
-											  RelOptInfo *joinrel,
-											  RelOptInfo *outerrel,
-											  RelOptInfo *innerrel,
-											  List *restrictlist,
-											  JoinType jointype,
-											  SpecialJoinInfo *sjinfo,
-											  SemiAntiJoinFactors *semifactors,
-											  Relids param_source_rels,
-											  Relids extra_lateral_rels);
+typedef void (*GetForeignJoinPaths_function) (ExtraJoinPathArgs *jargs);
 
 typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
 													struct ExplainState *es);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index c42c69d..10bace7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,17 +30,28 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
 														RangeTblEntry *rte);
 extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
 
-/* Hook for plugins to get control in add_paths_to_joinrel() */
-typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
-											 RelOptInfo *joinrel,
-											 RelOptInfo *outerrel,
-											 RelOptInfo *innerrel,
-											 List *restrictlist,
-											 JoinType jointype,
-											 SpecialJoinInfo *sjinfo,
-											 SemiAntiJoinFactors *semifactors,
-											 Relids param_source_rels,
-											 Relids extra_lateral_rels);
+/*
+ * Hook for plugins to get control in add_paths_to_joinrel()
+ *
+ * ExtraJoinPathArgs packs all the information to construct extra paths
+ * by extensions. Its definition may be adjusted in the future version.
+ */
+typedef struct
+{
+	PlannerInfo	   *root;
+	RelOptInfo	   *joinrel;
+	RelOptInfo	   *outerrel;
+	RelOptInfo	   *innerrel;
+	List		   *restrictlist;
+	JoinType		jointype;
+	SpecialJoinInfo *sjinfo;
+	SemiAntiJoinFactors *semifactors;
+	Relids			param_source_rels;
+	Relids			extra_lateral_rels;
+} ExtraJoinPathArgs;
+
+typedef void (*set_join_pathlist_hook_type) (ExtraJoinPathArgs *jargs);
+
 extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
 
 /* Hook for plugins to replace standard_join_search() */
#79Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Kohei KaiGai (#73)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-05-09 8:18 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2015-05-09 2:46 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

I've been trying to code-review this patch, because the documentation
seemed several bricks shy of a load, and I find myself entirely confused
by the fdw_ps_tlist and custom_ps_tlist fields.

Main-point of your concern is lack of documentation/comments to introduce
how does the pseudo-scan targetlist works here, isn't it??

Well, there's a bunch of omissions and outright errors in the docs and
comments, but this is the main issue that I was uncertain how to fix
from looking at the patch.

Also,
if that is what they're for (ie, to allow the FDW to redefine the scan
tuple contents) it would likely be better to decouple that feature from
whether the plan node is for a simple scan or a join.

In this version, we don't intend FDW/CSP to redefine the contents of
scan tuples, even though I want off-loads heavy targetlist calculation
workloads to external computing resources in *the future version*.

I do not think it's a good idea to introduce such a field now and then
redefine how it works and what it's for in a future version. We should
not be moving the FDW APIs around more than we absolutely have to,
especially not in ways that wouldn't throw an obvious compile error
for un-updated code. Also, the longer we wait to make a change that
we know we want, the more pain we inflict on FDW authors (simply because
there will be more of them a year from now than there are today).

Ah, above my sentence don't intend to reuse the existing field for
different works in the future version. It's just what I want to support
in the future version.
Yep, I see. It is not a good idea to redefine the existing field for
different purpose silently. It's not my plan.

The business about
resjunk columns in that list also seems a bit half baked, or at least
underdocumented.

I'll add source code comments to introduce how does it works any when
does it have resjunk=true. It will be a bit too deep to be introduced
in the SGML file.

I don't actually see a reason for resjunk marking in that list at all,
if what it's for is to define the contents of the scan tuple. I think we
should just s/ExecCleanTypeFromTL/ExecTypeFromTL/ in nodeForeignscan and
nodeCustom, and get rid of the "sanity check" in create_foreignscan_plan
(which is pretty pointless anyway, considering the number of other ways
you could screw up that tlist without it being detected).

/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010D7E24@BPXM15GP.gisp.nec.co.jp

Does the introduction in above post make sense?
The *_ps_tlist is not only used for a basic of scan-tuple descriptor, but
also used to solve var-node if varno==INDEX_VAR in EXPLAIN command.
On the other hands, existence of the junk entries (which are referenced in
external computing resources only) may cause unnecessary projection.
So, I want to discriminate target-entries for basis of scan-tuple descriptor
from other ones just for EXPLAIN command.

I'm also inclined to rename the fields to
fdw_scan_tlist/custom_scan_tlist, which would better reflect what they do,
and to change the API of make_foreignscan() to add a parameter
corresponding to the scan tlist. It's utterly bizarre and error-prone
that this patch has added a field that the FDW is supposed to set and
not changed make_foreignscan to match.

OK, I'll do the both of changes. The name of ps_tlist is a shorten of
"pseudo-scan target-list". So, fdw_scan_tlist/custom_scan_tlist are
almost intentional.

The attached patch renamed *_ps_tlist by *_scan_tlist according to
the suggestion.
Also, put a few detailed source code comments around this alternative
scan_tlist.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

custom-join-rename-ps_tlist.patchapplication/octet-stream; name=custom-join-rename-ps_tlist.patchDownload
 doc/src/sgml/custom-scan.sgml           |  6 ++---
 doc/src/sgml/fdwhandler.sgml            |  2 +-
 src/backend/executor/nodeCustom.c       |  2 +-
 src/backend/executor/nodeForeignscan.c  |  2 +-
 src/backend/nodes/copyfuncs.c           |  4 +--
 src/backend/nodes/outfuncs.c            |  4 +--
 src/backend/optimizer/plan/createplan.c | 27 +++++++++++++++++---
 src/backend/optimizer/plan/setrefs.c    | 45 ++++++++++++++++++++++++++++-----
 src/backend/utils/adt/ruleutils.c       |  9 ++++---
 src/include/nodes/plannodes.h           | 10 ++++----
 src/include/optimizer/planmain.h        |  3 ++-
 11 files changed, 85 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 9fd1db6..920010c 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -146,7 +146,7 @@ typedef struct CustomScan
     Scan      scan;
     uint32    flags;
     List     *custom_exprs;
-    List     *custom_ps_tlist;
+    List     *custom_scan_tlist;
     List     *custom_private;
     List     *custom_relids;
     const CustomScanMethods *methods;
@@ -176,9 +176,9 @@ typedef struct CustomScan
   <para>
    When a <structname>CustomScan</> scans a single relation,
    <structfield>scan.scanrelid</> should be the range table index of the table
-   to be scanned, and <structfield>custom_ps_tlist</> should be
+   to be scanned, and <structfield>custom_scan_tlist</> should be
    <literal>NULL</>.  When it replaces a join, <structfield>scan.scanrelid</>
-   should be zero, and <structfield>custom_ps_tlist</> should be a list of
+   should be zero, and <structfield>custom_scan_tlist</> should be a list of
    <structname>TargetEntry</> nodes.  This is necessary because, when a join
    is replaced, the target list cannot be constructed from the table
    definition.  At execution time, this list will be used to initialize the
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04f3c22..62c8b1e 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -627,7 +627,7 @@ GetForeignJoinPaths(PlannerInfo *root,
     <para>
      Since we cannot construct the slot descriptor for a remote join from
      the catalogs, the FDW should set the <structfield>scanrelid</> of the
-     <structname>ForeignScan</> to zero and <structfield>fdw_ps_tlist</>
+     <structname>ForeignScan</> to zero and <structfield>fdw_scan_tlist</>
      to an appropriate list of <structfield>TargetEntry</> nodes.
      Junk entries will be ignored, but can be present for the benefit of
      deparsing performed by <command>EXPLAIN</>.
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index db1b4f2..71f9352 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -65,7 +65,7 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
 	{
 		TupleDesc	ps_tupdesc;
 
-		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_ps_tlist, false);
+		ps_tupdesc = ExecCleanTypeFromTL(cscan->custom_scan_tlist, false);
 		ExecAssignScanType(&css->ss, ps_tupdesc);
 	}
 	css->ss.ps.ps_TupFromTlist = false;
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index fa553ac..4f1c783 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -156,7 +156,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	{
 		TupleDesc	ps_tupdesc;
 
-		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_ps_tlist, false);
+		ps_tupdesc = ExecCleanTypeFromTL(node->fdw_scan_tlist, false);
 		ExecAssignScanType(&scanstate->ss, ps_tupdesc);
 	}
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 805045d..07baa0e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -594,7 +594,7 @@ _copyForeignScan(const ForeignScan *from)
 	 */
 	COPY_SCALAR_FIELD(fdw_handler);
 	COPY_NODE_FIELD(fdw_exprs);
-	COPY_NODE_FIELD(fdw_ps_tlist);
+	COPY_NODE_FIELD(fdw_scan_tlist);
 	COPY_NODE_FIELD(fdw_private);
 	COPY_BITMAPSET_FIELD(fdw_relids);
 	COPY_SCALAR_FIELD(fsSystemCol);
@@ -620,7 +620,7 @@ _copyCustomScan(const CustomScan *from)
 	 */
 	COPY_SCALAR_FIELD(flags);
 	COPY_NODE_FIELD(custom_exprs);
-	COPY_NODE_FIELD(custom_ps_tlist);
+	COPY_NODE_FIELD(custom_scan_tlist);
 	COPY_NODE_FIELD(custom_private);
 	COPY_BITMAPSET_FIELD(custom_relids);
 
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f9f948e..d6b1a9c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -560,7 +560,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 
 	WRITE_OID_FIELD(fdw_handler);
 	WRITE_NODE_FIELD(fdw_exprs);
-	WRITE_NODE_FIELD(fdw_ps_tlist);
+	WRITE_NODE_FIELD(fdw_scan_tlist);
 	WRITE_NODE_FIELD(fdw_private);
 	WRITE_BITMAPSET_FIELD(fdw_relids);
 	WRITE_BOOL_FIELD(fsSystemCol);
@@ -575,7 +575,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
 
 	WRITE_UINT_FIELD(flags);
 	WRITE_NODE_FIELD(custom_exprs);
-	WRITE_NODE_FIELD(custom_ps_tlist);
+	WRITE_NODE_FIELD(custom_scan_tlist);
 	WRITE_NODE_FIELD(custom_private);
 	WRITE_BITMAPSET_FIELD(custom_relids);
 	appendStringInfoString(str, " :methods ");
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index eeb2a41..9e2777f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1997,17 +1997,31 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 												best_path,
 												tlist, scan_clauses);
 	/*
-	 * Sanity check.  There may be resjunk entries in fdw_ps_tlist that
+	 * Sanity check.  There may be resjunk entries in fdw_scan_tlist that
 	 * are included only to help EXPLAIN deparse plans properly. We require
 	 * that these are at the end, so that when the executor builds the scan
 	 * descriptor based on the non-junk entries, it gets the attribute
 	 * numbers correct.
+	 *
+	 * NOTE: The fdw_scan_tlist is not only used to construct tuple-
+	 * descriptor of scan-tuples, but also used to resolve var-node with
+	 * varno==INDEX_VAR in EXPLAIN command.
+	 * If FDW driver wants to print an expression node which includes var-
+	 * node that should not appear in the scan-tuple of this ForeignScan,
+	 * FDW driver may consider to add junk entries on the fdw_scan_tlist, 
+	 * because ExecCleanTypeFromTL() will construct a tuple-descriptor
+	 * according to the fdw_scan_tlist but except for junk-entries, but
+	 * these entries are available when EXPLAIN command looks up actual
+	 * column name referenced by the var-node with varno==INDEX_VAR.
+	 * It will potentially reduce unnecessary projection for each tuples,
+	 * thus improve the through-put of the ForeignScan on externally
+	 * materialized relation.
 	 */
 	if (scan_plan->scan.scanrelid == 0)
 	{
 		bool	found_resjunk = false;
 
-		foreach (lc, scan_plan->fdw_ps_tlist)
+		foreach (lc, scan_plan->fdw_scan_tlist)
 		{
 			TargetEntry	   *tle = lfirst(lc);
 
@@ -2107,17 +2121,20 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 	Assert(IsA(cplan, CustomScan));
 
 	/*
-	 * Sanity check.  There may be resjunk entries in custom_ps_tlist that
+	 * Sanity check.  There may be resjunk entries in custom_scan_tlist that
 	 * are included only to help EXPLAIN deparse plans properly. We require
 	 * that these are at the end, so that when the executor builds the scan
 	 * descriptor based on the non-junk entries, it gets the attribute
 	 * numbers correct.
+	 *
+	 * See the comment in create_foreignscan_plan() to know how extension
+	 * uses junk entries for performance optimization.
 	 */
 	if (cplan->scan.scanrelid == 0)
 	{
 		bool	found_resjunk = false;
 
-		foreach (lc, cplan->custom_ps_tlist)
+		foreach (lc, cplan->custom_scan_tlist)
 		{
 			TargetEntry	   *tle = lfirst(lc);
 
@@ -3611,6 +3628,7 @@ make_foreignscan(List *qptlist,
 				 List *qpqual,
 				 Index scanrelid,
 				 List *fdw_exprs,
+				 List *fdw_scan_tlist,
 				 List *fdw_private)
 {
 	ForeignScan *node = makeNode(ForeignScan);
@@ -3623,6 +3641,7 @@ make_foreignscan(List *qptlist,
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	node->fdw_exprs = fdw_exprs;
+	node->fdw_scan_tlist = fdw_scan_tlist;
 	node->fdw_private = fdw_private;
 	/* fsSystemCol will be filled in by create_foreignscan_plan */
 	node->fsSystemCol = false;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 69ed2a5..e10a033 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -881,9 +881,27 @@ set_foreignscan_references(PlannerInfo *root,
 		fscan->fdw_relids = tempset;
 	}
 
+	/*
+	 * NOTE: If scanrelid == 0, it implies the ForeignScan node performs
+	 * like scan on a relation which joins multiple foreign-tables on an
+	 * external computing resource; usually, it is remote server.
+	 * In this case, fdw_scan_tlist informs the planner/executor expected
+	 * scan tuple layout, instead of base relation's definition. Also,
+	 * executor expects fetched tuples are stores in the ecxt_scantuple
+	 * of ExprContext when local expressions are evaluated. So, setrefs.c
+	 * put INDEX_VAR on varno of Var node as a synonym of ecxt_scantuple.
+	 * (see ExecEvalScalarVar() to understand how Var-node is evaluated.)
+	 * It also put index of the fdw_scan_tlist on varattno, because FDW
+	 * driver shall return a tuple according to the layout described by
+	 * the list.
+	 * Contents of the fdw_scan_tlist are adjusted by rtoffset as usuall,
+	 * eventually, all the variables that can be locally executed shall
+	 * have a pair of INDEX_VAR and index of the fdw_scan_tlist, even if
+	 * this ForeignScan replaced more than two relations.
+	 */
 	if (fscan->scan.scanrelid == 0)
 	{
-		indexed_tlist *pscan_itlist = build_tlist_index(fscan->fdw_ps_tlist);
+		indexed_tlist *pscan_itlist = build_tlist_index(fscan->fdw_scan_tlist);
 
 		fscan->scan.plan.targetlist = (List *)
 			fix_upper_expr(root,
@@ -903,12 +921,18 @@ set_foreignscan_references(PlannerInfo *root,
 						   pscan_itlist,
 						   INDEX_VAR,
 						   rtoffset);
-		fscan->fdw_ps_tlist =
-			fix_scan_list(root, fscan->fdw_ps_tlist, rtoffset);
+		/* fdw_scan_tlist must NOT be transformed to reference itself! */
+		fscan->fdw_scan_tlist =
+			fix_scan_list(root, fscan->fdw_scan_tlist, rtoffset);
 		pfree(pscan_itlist);
 	}
 	else
 	{
+		/*
+		 * Elsewhere, we don't need to process something special if
+		 * scanrelid > 0; that implies ForeignScan will produce scan-
+		 * tuples according to the relation's definition.
+		 */
 		fscan->scan.scanrelid += rtoffset;
 		fscan->scan.plan.targetlist =
 			fix_scan_list(root, fscan->scan.plan.targetlist, rtoffset);
@@ -938,10 +962,13 @@ set_customscan_references(PlannerInfo *root,
 		cscan->custom_relids = tempset;
 	}
 
+	/*
+	 * See the comments in set_foreignscan_references().
+	 */
 	if (cscan->scan.scanrelid == 0)
 	{
 		indexed_tlist *pscan_itlist =
-			build_tlist_index(cscan->custom_ps_tlist);
+			build_tlist_index(cscan->custom_scan_tlist);
 
 		cscan->scan.plan.targetlist = (List *)
 			fix_upper_expr(root,
@@ -961,12 +988,18 @@ set_customscan_references(PlannerInfo *root,
 						   pscan_itlist,
 						   INDEX_VAR,
 						   rtoffset);
-		cscan->custom_ps_tlist =
-			fix_scan_list(root, cscan->custom_ps_tlist, rtoffset);
+		/* custom_scan_tlist must NOT be transformed to reference itself! */
+		cscan->custom_scan_tlist =
+			fix_scan_list(root, cscan->custom_scan_tlist, rtoffset);
 		pfree(pscan_itlist);
 	}
 	else
 	{
+		/*
+		 * Elsewhere, we don't need to process something special if
+		 * scanrelid > 0; that implies CustomScan will produce scan-
+		 * tuples according to the relation's definition.
+		 */
 		cscan->scan.scanrelid += rtoffset;
 		cscan->scan.plan.targetlist =
 			fix_scan_list(root, cscan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index fea8db6..7725a0b 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3859,13 +3859,16 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 	else
 		dpns->inner_tlist = NIL;
 
-	/* index_tlist is set only if it's an IndexOnlyScan */
+	/*
+	 * index_tlist is set only if it's an IndexOnlyScan, ForeignScan or
+	 * CustomScan with scanrelid == 0.
+	 */
 	if (IsA(ps->plan, IndexOnlyScan))
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else if (IsA(ps->plan, ForeignScan))
-		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+		dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_scan_tlist;
 	else if (IsA(ps->plan, CustomScan))
-		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
+		dpns->index_tlist = ((CustomScan *) ps->plan)->custom_scan_tlist;
 	else
 		dpns->index_tlist = NIL;
 }
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index baeba2d..d756e6c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -471,9 +471,9 @@ typedef struct WorkTableScan
  * fdw_exprs and fdw_private are both under the control of the foreign-data
  * wrapper, but fdw_exprs is presumed to contain expression trees and will
  * be post-processed accordingly by the planner; fdw_private won't be.
- * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * An optional fdw_scan_tlist is used to map a reference to an attribute of
  * underlying relation(s) onto a pair of INDEX_VAR and alternative varattno.
- * When fdw_ps_tlist is used, this represents a remote join, and the FDW
+ * When fdw_scan_tlist is used, this represents a remote join, and the FDW
  * is responsible for setting this field to an appropriate value.
  * Note that everything in above lists must be copiable by copyObject().
  * One way to store an arbitrary blob of bytes is to represent it as a bytea
@@ -486,7 +486,7 @@ typedef struct ForeignScan
 	Scan		scan;
 	Oid			fdw_handler;	/* OID of FDW handler */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
-	List	   *fdw_ps_tlist;	/* tlist, if replacing a join */
+	List	   *fdw_scan_tlist;	/* tlist, if replacing a join */
 	List	   *fdw_private;	/* private data for FDW */
 	Bitmapset  *fdw_relids;		/* RTIs generated by this scan */
 	bool		fsSystemCol;	/* true if any "system column" is needed */
@@ -496,7 +496,7 @@ typedef struct ForeignScan
  *	   CustomScan node
  *
  * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
- * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ * apply equally to custom_exprs, custom_scan_tlist and custom_private.
  * Note that since Plan trees can be copied, custom scan providers *must*
  * fit all plan data they need into those fields; embedding CustomScan in
  * a larger struct will not work.
@@ -520,7 +520,7 @@ typedef struct CustomScan
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
-	List	   *custom_ps_tlist;/* tlist, if replacing a join */
+	List	   *custom_scan_tlist;/* tlist, if replacing a join */
 	List	   *custom_private; /* private data for custom code */
 	Bitmapset  *custom_relids;	/* RTIs generated by this scan */
 	const CustomScanMethods *methods;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0c8cbcd..2b75579 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -45,7 +45,8 @@ extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
-				 Index scanrelid, List *fdw_exprs, List *fdw_private);
+									 Index scanrelid, List *fdw_exprs,
+									 List *fdw_scan_tlist, List *fdw_private);
 extern Append *make_append(List *appendplans, List *tlist);
 extern RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree, Plan *righttree, int wtParam,
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#76)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, May 8, 2015 at 5:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think that we'd really be better off insisting on same server (as in
same pg_foreign_server OID), hence automatically same FDW, and what's
even more important, same user mapping for any possible query execution
context. The possibility that there are some corner cases where some FDWs
could optimize other scenarios seems to me to be poor return for the bugs
and security holes that will arise any time typical FDWs forget to check
this.

I originally wanted to go quite the other way with this and check for
join pushdown via handler X any time at least one of the two relations
involved used handler X, even if the other one used some other handler
or was a plain table. In particular, it seems to me quite plausible
to want to teach an FDW that a certain local table is replicated on a
remote node, allowing a join between a foreign table and a plain table
to be pushed down.

If we did do something like that, I think a saner implementation would
involve substituting a foreign table for the local one earlier, via view
expansion. So by the time we are doing join planning, there would be no
need to consider cross-server joins anyway.

This infrastructure can't be used that way anyhow,
so maybe there's no harm in tightening it up, but I'm wary of
circumscribing what FDW authors can do.

Somebody who's really intent on shooting themselves in the foot can always
use the set_join_pathlist_hook to inject paths for arbitrary joins.
The FDW mechanism should support reasonable use cases without undue
complication, and I doubt that what we've got now is adding anything
except complication and risk of bugs.

For the archives' sake, let me lay out a couple of reasons why an FDW
that tried to allow cross-server joins would almost certainly be broken,
and broken in security-relevant ways. Suppose for instance that
postgres_fdw tried to be smart and drill down into foreign tables' server
IDs to allow joining of any two tables that have the same effective host
name, port, database name, user name, and anything else you think would be
relevant to its choice of connections. The trouble with that is that the
user mapping is context dependent, in particular one local userID might
map to the same remote user name for two different server OIDs, while
another might map to different user names. So if we plan a query under
the first userID we might decide it's okay to do the join remotely.
Then, if we re-use that plan while executing as another userID (which is
entirely possible) what will probably happen is that the remote join
query will get sent off under one or the other of the remote usernames
associated with the second local userID. This could lead to either a
permission failure, or a remote table access that should not be allowed
to the current local userID. Admittedly, such cases might be rare in
practice, but it's still a security hole. Also, even if the FDW is
defensive enough to recheck full matching of the tables' connection
properties at execution time, there's not much it could do about the
situation except fail; it couldn't cause a re-plan to occur.

For another case, we do not have any mechanism whereby operations like
ALTER SERVER OPTIONS could invalidate existing plans. Thus, even if
the two tables' connection properties matched at plan time, there's no
guarantee that they still match at execution. This is probably not a
security hole (at least not if you assume foreign-server owners are
trusted users), but it's still a bug that exists only if someone tries
to allow cross-server joins.

For these reasons, I think that if an FDW tried to be laxer than "tables
must be on the same pg_foreign_server entry to be joined", the odds
approach unity that it would be broken, and probably dangerously broken.
So we should just make that check for the FDWs. Anybody who thinks
they're smarter than the average bear can use set_join_pathlist_hook,
but they are probably wrong.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#81Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#80)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sat, May 9, 2015 at 1:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I originally wanted to go quite the other way with this and check for
join pushdown via handler X any time at least one of the two relations
involved used handler X, even if the other one used some other handler
or was a plain table. In particular, it seems to me quite plausible
to want to teach an FDW that a certain local table is replicated on a
remote node, allowing a join between a foreign table and a plain table
to be pushed down.

If we did do something like that, I think a saner implementation would
involve substituting a foreign table for the local one earlier, via view
expansion. So by the time we are doing join planning, there would be no
need to consider cross-server joins anyway.

Huh? You can't do this at rewrite time; it's very fundamentally a
planning problem. Suppose the user wants to join A, B, and C, where A
is a plain table, B is a plain table that is replicated on a foreign
server, and C is a foreign table on that same foreign server. If we
decide to join B to C first, we probably want to push down the join,
although maybe not, if we estimate that B JOIN C will have more rows
than C. If we decide to join A to B first, we want to use the local
copy of B.

This infrastructure can't be used that way anyhow,
so maybe there's no harm in tightening it up, but I'm wary of
circumscribing what FDW authors can do.

Somebody who's really intent on shooting themselves in the foot can always
use the set_join_pathlist_hook to inject paths for arbitrary joins.
The FDW mechanism should support reasonable use cases without undue
complication, and I doubt that what we've got now is adding anything
except complication and risk of bugs.

For the archives' sake, let me lay out a couple of reasons why an FDW
that tried to allow cross-server joins would almost certainly be broken,
and broken in security-relevant ways. Suppose for instance that
postgres_fdw tried to be smart and drill down into foreign tables' server
IDs to allow joining of any two tables that have the same effective host
name, port, database name, user name, and anything else you think would be
relevant to its choice of connections. The trouble with that is that the
user mapping is context dependent, in particular one local userID might
map to the same remote user name for two different server OIDs, while
another might map to different user names. So if we plan a query under
the first userID we might decide it's okay to do the join remotely.
Then, if we re-use that plan while executing as another userID (which is
entirely possible) what will probably happen is that the remote join
query will get sent off under one or the other of the remote usernames
associated with the second local userID. This could lead to either a
permission failure, or a remote table access that should not be allowed
to the current local userID. Admittedly, such cases might be rare in
practice, but it's still a security hole. Also, even if the FDW is
defensive enough to recheck full matching of the tables' connection
properties at execution time, there's not much it could do about the
situation except fail; it couldn't cause a re-plan to occur.

For another case, we do not have any mechanism whereby operations like
ALTER SERVER OPTIONS could invalidate existing plans. Thus, even if
the two tables' connection properties matched at plan time, there's no
guarantee that they still match at execution. This is probably not a
security hole (at least not if you assume foreign-server owners are
trusted users), but it's still a bug that exists only if someone tries
to allow cross-server joins.

For these reasons, I think that if an FDW tried to be laxer than "tables
must be on the same pg_foreign_server entry to be joined", the odds
approach unity that it would be broken, and probably dangerously broken.
So we should just make that check for the FDWs. Anybody who thinks
they're smarter than the average bear can use set_join_pathlist_hook,
but they are probably wrong.

Drilling down into postgres_fdw's connection properties seems pretty
silly; the user isn't likely to create two SERVER objects that are
identical and then choose which one to use at random, and if they do,
they deserve what they get. The universe of FDWs, however, is
potentially bigger than that. What does a server even mean for
file_fdw, for example? I can't think of any reason why somebody would
want to implement joins inside file_fdw, but if they did, all the
things being joined would be local files, so the server ID doesn't
really matter. Now you may say that's a silly use case, but it's less
obviously silly if the files contain structured data, as with
cstore_fdw, yet the server ID could still be not especially relevant.
Maybe you've got servers representing filesystem directories; that
shouldn't preclude cross "server" joins.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#82Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#81)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, May 9, 2015 at 1:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

For these reasons, I think that if an FDW tried to be laxer than "tables
must be on the same pg_foreign_server entry to be joined", the odds
approach unity that it would be broken, and probably dangerously broken.
So we should just make that check for the FDWs. Anybody who thinks
they're smarter than the average bear can use set_join_pathlist_hook,
but they are probably wrong.

Drilling down into postgres_fdw's connection properties seems pretty
silly; the user isn't likely to create two SERVER objects that are
identical and then choose which one to use at random, and if they do,
they deserve what they get. The universe of FDWs, however, is
potentially bigger than that. What does a server even mean for
file_fdw, for example?

Nothing, which is why you'd only ever create one per database, and so the
issue doesn't arise anyway. It would only be sane to create multiple
servers per FDW if there were a meaningful difference between them.

In any case, since the existing code doesn't support "remote" joins
involving a local table unless you use the join-path hook, this argument
seems pretty academic. If we tighten the test to be same-server, we will
benefit all but very weirdly designed FDWs. Anybody who's not happy with
that can still use the hook (and I continue to maintain that they will
probably have subtle bugs, but whatever).

Another point worth making is that the coding I have in mind doesn't
really do anything with RelOptInfo.serverid except compare it for
equality. So an FDW that wants to consider some servers interchangeable
for joining purposes could override the value at GetForeignPaths time
(ie replace "serverid" with the OID of a preferred server), and then it
would get GetForeignJoinPaths calls as desired.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#82)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

I've committed a cleanup patch along the lines discussed.

One thought is that we could test the nondefault-scan-tuple logic without
a lot of work by modifying the way postgres_fdw deals with columns it
decides don't need to be fetched. Right now it injects NULL columns so
that the remote query results still match the foreign table's rowtype,
but that's not especially desirable or efficient. What we could do
instead is generate an fdw_scan_tlist that just lists the columns we
intend to fetch.

I don't have time to pursue this idea right now, but I think it would be
a good change to squeeze into 9.5, just so that we could have some test
coverage on those parts of this patch.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#84Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Tom Lane (#83)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Tom,

I briefly checked your updates.
Even though it is not described in the commit-log, I noticed a
problematic change.

This commit reverts create_plan_recurse() as static function. It means extension
cannot have child node, even if it wants to add a custom-join logic.
Please assume a simple case below:
SELECT * FROM t0, t1 WHERE t0.a = t1.x;

An extension adds a custom join path, then its PlanCustomPath method will be
called back to create a plan node once it gets chosen by planner.
The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes
of the source relations also.
If create_plan_recurse() is static, we have no way to initialize
plan-node for t0
and t1 scan even if join-logic itself is powerful than built-in ones.

It was already discussed in the upthread, and people's consensus.
Please revert create_plan_recurse() as like initial commit.

Also, regarding of the *_scan_tlist,

I don't have time to pursue this idea right now, but I think it would be
a good change to squeeze into 9.5, just so that we could have some test
coverage on those parts of this patch.

Do you want just a test purpose module and regression test cases?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#85Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kohei KaiGai (#84)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

I briefly checked your updates.
Even though it is not described in the commit-log, I noticed a
problematic change.

This commit reverts create_plan_recurse() as static function.

Yes. I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful. (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

It means extension
cannot have child node, even if it wants to add a custom-join logic.
Please assume a simple case below:
SELECT * FROM t0, t1 WHERE t0.a = t1.x;

An extension adds a custom join path, then its PlanCustomPath method will be
called back to create a plan node once it gets chosen by planner.
The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes
of the source relations also.
If create_plan_recurse() is static, we have no way to initialize
plan-node for t0
and t1 scan even if join-logic itself is powerful than built-in ones.

I find this argument quite unconvincing, because even granting that this
is an appropriate way to create child nodes of a CustomScan, there is a
lot of core code besides createplan.c that would not know about those
child nodes either.

As a counterexample, suppose that your cool-new-join-method is capable of
joining three inputs at once. You could stick two of the children into
lefttree and righttree perhaps, but where are you gonna put the other?

This comes back to the fact that trying to wedge join behavior into scan
node types was a pretty bad idea (as evidenced by the entirely bogus
decision that now scanrelid can be zero, which I rather doubt you've found
all the places that that broke). Really there should have been a new
CustomJoin node or something like that. If we'd done that, it would be
possible to design that node type more like Append, with any number of
child nodes. And we could have set things up so that createplan.c knows
about those child nodes and takes care of the recursion for you; it would
still not be a good idea to expose create_plan_recurse and hope that
callers of that would know how to use it correctly.

I am still more than half tempted to say we should revert this entire
patch series and hope for a better design to be submitted for 9.6.
In the meantime, though, poking random holes in the modularity of core
code is a poor substitute for having designed a well-thought-out API.

A possible compromise that we could perhaps still wedge into 9.5 is to
extend CustomPath with a List of child Paths, and CustomScan with a List
of child Plans, which createplan.c would know to build from the Paths,
and other modules would then also be aware of these children. I find that
uglier than a separate join node type, but it would be tolerable I guess.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#86Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#85)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

I briefly checked your updates.
Even though it is not described in the commit-log, I noticed a
problematic change.

This commit reverts create_plan_recurse() as static function.

Yes. I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful. (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

Hmm. I got it is intentional change.

It means extension
cannot have child node, even if it wants to add a custom-join logic.
Please assume a simple case below:
SELECT * FROM t0, t1 WHERE t0.a = t1.x;

An extension adds a custom join path, then its PlanCustomPath method will be
called back to create a plan node once it gets chosen by planner.
The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes
of the source relations also.
If create_plan_recurse() is static, we have no way to initialize
plan-node for t0
and t1 scan even if join-logic itself is powerful than built-in ones.

I find this argument quite unconvincing, because even granting that this
is an appropriate way to create child nodes of a CustomScan, there is a
lot of core code besides createplan.c that would not know about those
child nodes either.

As a counterexample, suppose that your cool-new-join-method is capable of
joining three inputs at once. You could stick two of the children into
lefttree and righttree perhaps, but where are you gonna put the other?

This comes back to the fact that trying to wedge join behavior into scan
node types was a pretty bad idea (as evidenced by the entirely bogus
decision that now scanrelid can be zero, which I rather doubt you've found
all the places that that broke). Really there should have been a new
CustomJoin node or something like that. If we'd done that, it would be
possible to design that node type more like Append, with any number of
child nodes. And we could have set things up so that createplan.c knows
about those child nodes and takes care of the recursion for you; it would
still not be a good idea to expose create_plan_recurse and hope that
callers of that would know how to use it correctly.

I am still more than half tempted to say we should revert this entire
patch series and hope for a better design to be submitted for 9.6.
In the meantime, though, poking random holes in the modularity of core
code is a poor substitute for having designed a well-thought-out API.

A possible compromise that we could perhaps still wedge into 9.5 is to
extend CustomPath with a List of child Paths, and CustomScan with a List
of child Plans, which createplan.c would know to build from the Paths,
and other modules would then also be aware of these children. I find that
uglier than a separate join node type, but it would be tolerable I guess.

At this moment, my custom-join logic add a dummy node to have two child
nodes when it tries to join more than 3 relations.
Yep, if CustomPath node (ForeignPath also?) can have a list of child-path
nodes then core backend handles its initialization job, it will be more
comfortable for extensions.
I prefer this idea, rather than agree.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#87Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#85)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

I briefly checked your updates.
Even though it is not described in the commit-log, I noticed a
problematic change.

This commit reverts create_plan_recurse() as static function.

Yes. I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful. (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

I really think that reverting somebody else's committed change without
discussion is inappropriate. If I don't like the fact that you
reverted this change, can I go revert it back?

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors. Many people have complained about
that on multiple occasions. Frankly, I find it obstructionist and
petty.

If you want to improve the design of this so that it does the same
things more elegantly, fine: I'll get out of the way. If you just
want to make things impossible that the patch previously made
possible, I strongly object to that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#87)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Robert Haas <robertmhaas@gmail.com> writes:

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors. Many people have complained about
that on multiple occasions. Frankly, I find it obstructionist and
petty.

Sure, we could export every last static function in the core code,
and extension authors would rejoice ... while development on the core
code basically stops for fear of breaking extensions. It's important
not to export things that we don't have to, especially when doing so
is really just a quick-n-dirty substitute for doing things properly.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#89Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#88)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sun, May 10, 2015 at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors. Many people have complained about
that on multiple occasions. Frankly, I find it obstructionist and
petty.

Sure, we could export every last static function in the core code,
and extension authors would rejoice ... while development on the core
code basically stops for fear of breaking extensions. It's important
not to export things that we don't have to, especially when doing so
is really just a quick-n-dirty substitute for doing things properly.

Please name EVEN ONE instance in which core development has been
prevented for fear of changing a function API. Sure, we take those
things into consideration, like trying to ensure that there will be
type conflicts until people update their code, but I cannot recall a
single instance in six and a half years of working on this project
where that's been a real problem. I think this concern is entirely
hypothetical. Besides, no one has ever proposed making every static
function public. It's been proposed a handful of times for limited
classes of functions - in this case ONE - and you've fought it every
time despite clear consensus on the other side. I find that highly
regrettable and I'm very sure I'm not the only one.

I notice that you carefully didn't answer the other part of my
question: what gives you the right to revert my commits without
discussion or consensus, and do I have an equal right to change it
back?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#90Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#87)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 2015-05-10 21:26:26 -0400, Robert Haas wrote:

On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This commit reverts create_plan_recurse() as static function.

Yes. I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful. (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

Wasn't there a submitted use case? IIRC Kaigai had referenced some
pg-strom (?) code using it?

I'm failing to see how create_plan_recurse() being exposed externally is
related to "having gotten committed without submitted use-cases". Even
if submitted, presumably as simple as possible code, doesn't use it,
that's not a proof that less simple code does not need it.

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors. Many people have complained about
that on multiple occasions. Frankly, I find it obstructionist and
petty.

While I don't find the tone of the characterization super helpful, I do
tend to agree that we're *far* too conservative on that end. I've now
seen a significant number of extension that copied large swathes of code
just to cope with individual functions not being available. And even
cases where that lead to minor forks with such details changed.

I know that I'm "fearful" of asking for functions being made
public. Because it'll invariably get into a discussion of merits that's
completely out of proportion with the size of the change. And if I, who
has been on the list for a while now, am "afraid" in that way, you can
be sure that others won't even dare to ask, lest argue their way
through.

I think the problem is that during development the default often is to
create function as static if they're used only in one file. Which is
fine. But it really doesn't work if it's a larger battle to change
single incidences. Besides the pain of having to wait for the next
major release...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#91Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#89)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 2015-05-10 21:53:45 -0400, Robert Haas wrote:

Please name EVEN ONE instance in which core development has been
prevented for fear of changing a function API.

Even *moving* function declarations to a different file has been laudly
and repeatedly complained about... And there's definitely some things
around that pretty much only still exist because changing them would
break too much stuff.

But.

I don't think that's a reason to not expose more functions
externally. Because the usual consequence of not exposing them is that
either ugly workarounds will be found, or code will just copy pasted
around. That's not in any way better, and much likely to be worse.

I'm not saying that we shouldn't use judgement, but I do think that the
current approach ridicules our vaunted extensibility in many cases.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#92Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#91)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sun, May 10, 2015 at 10:37 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-05-10 21:53:45 -0400, Robert Haas wrote:

Please name EVEN ONE instance in which core development has been
prevented for fear of changing a function API.

Even *moving* function declarations to a different file has been laudly
and repeatedly complained about...

Moving declarations is a lot more likely to break compiles than adding
declarations. But even the 9.3 header file reorganizations, which
broke enough compiles to be annoying, were only annoying, not a
serious problem for anyone. I doubted whether that stuff was worth
changing, but that's just because I don't really get excited about
partial recompiles.

And there's definitely some things
around that pretty much only still exist because changing them would
break too much stuff.

Such as what?

But.

I don't think that's a reason to not expose more functions
externally. Because the usual consequence of not exposing them is that
either ugly workarounds will be found, or code will just copy pasted
around. That's not in any way better, and much likely to be worse.

Yes.

I'm not saying that we shouldn't use judgement, but I do think that the
current approach ridicules our vaunted extensibility in many cases.

Double yes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#92)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 2015-05-10 22:51:33 -0400, Robert Haas wrote:

And there's definitely some things
around that pretty much only still exist because changing them would
break too much stuff.

Such as what?

Without even thinking about it:
* linitial vs lfirst vs lnext. That thing still induces an impedance
mismatch when reading code for me, and I believe a good number of
other people.
* Two 'string buffer' APIs with essentially only minor differences.
* A whole bunch of libpq APIs. Admittedly that's a bit more exposed than
lots of backend only things.
* The whole V0 calling convention that makes it so much easier to get
odd crashes.

Admittedly that's all I could come up without having to think. But I do
vaguely remember a lot of things we did not do because of bwcompat
concerns.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#94Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Andres Freund (#90)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On 2015-05-10 21:26:26 -0400, Robert Haas wrote:

On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This commit reverts create_plan_recurse() as static function.

Yes. I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful. (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

Wasn't there a submitted use case? IIRC Kaigai had referenced some
pg-strom (?) code using it?

I'm failing to see how create_plan_recurse() being exposed externally is
related to "having gotten committed without submitted use-cases". Even
if submitted, presumably as simple as possible code, doesn't use it,
that's not a proof that less simple code does not need it.

Yes, PG-Strom code uses create_plan_recurse() to construct child plan
node of the GPU accelerated custom-join logic, once it got chosen.
Here is nothing special. It calls create_plan_recurse() as built-in
join path doing on the underlying inner/outer paths.
It is not difficult to submit as a working example, however, its total
code size (excludes GPU code) is 25KL at this moment.

I'm not certain whether it is a simple example.

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors. Many people have complained about
that on multiple occasions. Frankly, I find it obstructionist and
petty.

While I don't find the tone of the characterization super helpful, I do
tend to agree that we're *far* too conservative on that end. I've now
seen a significant number of extension that copied large swathes of code
just to cope with individual functions not being available. And even
cases where that lead to minor forks with such details changed.

I may have to join the members?

I know that I'm "fearful" of asking for functions being made
public. Because it'll invariably get into a discussion of merits that's
completely out of proportion with the size of the change. And if I, who
has been on the list for a while now, am "afraid" in that way, you can
be sure that others won't even dare to ask, lest argue their way
through.

I think the problem is that during development the default often is to
create function as static if they're used only in one file. Which is
fine. But it really doesn't work if it's a larger battle to change
single incidences. Besides the pain of having to wait for the next
major release...

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#95Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#93)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

On Sun, May 10, 2015 at 11:07 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-05-10 22:51:33 -0400, Robert Haas wrote:

And there's definitely some things
around that pretty much only still exist because changing them would
break too much stuff.

Such as what?

Without even thinking about it:
* linitial vs lfirst vs lnext. That thing still induces an impedance
mismatch when reading code for me, and I believe a good number of
other people.
* Two 'string buffer' APIs with essentially only minor differences.
* A whole bunch of libpq APIs. Admittedly that's a bit more exposed than
lots of backend only things.
* The whole V0 calling convention that makes it so much easier to get
odd crashes.

Admittedly that's all I could come up without having to think. But I do
vaguely remember a lot of things we did not do because of bwcompat
concerns.

I see your point, but I don't think it really detracts from mine. The
fact that we have a few inconsistently-named list functions is not
preventing any core development project that would otherwise get
completed to instead not get completed. Nor is any of that other
stuff, except maybe the libpq API, but that's a lot (not just a bit)
more exposed.

Also, I'd actually be in favor of looking for a way to identify the
StringInfo and PQexpBuffer stuff - and of partially deprecating the V0
calling convention.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#96Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#70)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Hi,

Let me back to the original concern and show possible options
we can take here. At least, the latest master is not usable to
implement custom-join logic without either of these options.

option-1)
Revert create_plan_recurse() to non-static function for extensions.
It is the simplest solution, however, it is still gray zone which
functions shall be public and whether we deal with the non-static
functions as a stable API or not.
IMO, we shouldn't treat non-static functions as stable APIs, even
if it can be called by extensions not only codes in another source
file. In fact, we usually changes definition of non-static functions
even though we know extensions uses. It is role of extension to
follow up the feature across major version.

option-2)
Tom's suggestion. Add a new list field of Path nodes on CustomPath,
then create_customscan_plan() will call static create_plan_recurse()
function to construct child plan nodes.
Probably, the attached patch will be an image of this enhancement,
but not tested yet, of course. Once we adopt this approach, I'll
adjust my PG-Strom code towards the new interface within 2 weeks
and report problems if any.

option-3)
Enforce authors of custom-scan provider to copy and paste createplan.c.
I really don't want this option and nobody will be happy.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Kaigai Kouhei(海外 浩平)
Sent: Monday, May 11, 2015 12:48 PM
To: 'Andres Freund'; Robert Haas
Cc: Tom Lane; Kohei KaiGai; Thom Brown; Shigeru Hanada;
pgsql-hackers@postgreSQL.org
Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

On 2015-05-10 21:26:26 -0400, Robert Haas wrote:

On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This commit reverts create_plan_recurse() as static function.

Yes. I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful. (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

Wasn't there a submitted use case? IIRC Kaigai had referenced some
pg-strom (?) code using it?

I'm failing to see how create_plan_recurse() being exposed externally is
related to "having gotten committed without submitted use-cases". Even
if submitted, presumably as simple as possible code, doesn't use it,
that's not a proof that less simple code does not need it.

Yes, PG-Strom code uses create_plan_recurse() to construct child plan
node of the GPU accelerated custom-join logic, once it got chosen.
Here is nothing special. It calls create_plan_recurse() as built-in
join path doing on the underlying inner/outer paths.
It is not difficult to submit as a working example, however, its total
code size (excludes GPU code) is 25KL at this moment.

I'm not certain whether it is a simple example.

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors. Many people have complained about
that on multiple occasions. Frankly, I find it obstructionist and
petty.

While I don't find the tone of the characterization super helpful, I do
tend to agree that we're *far* too conservative on that end. I've now
seen a significant number of extension that copied large swathes of code
just to cope with individual functions not being available. And even
cases where that lead to minor forks with such details changed.

I may have to join the members?

I know that I'm "fearful" of asking for functions being made
public. Because it'll invariably get into a discussion of merits that's
completely out of proportion with the size of the change. And if I, who
has been on the list for a while now, am "afraid" in that way, you can
be sure that others won't even dare to ask, lest argue their way
through.

I think the problem is that during development the default often is to
create function as static if they're used only in one file. Which is
fine. But it really doesn't work if it's a larger battle to change
single incidences. Besides the pain of having to wait for the next
major release...

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

custom-join-children.patchapplication/octet-stream; name=custom-join-children.patchDownload
 src/backend/commands/explain.c          | 21 +++++++++++++++++++++
 src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++++-
 src/include/nodes/execnodes.h           |  2 ++
 src/include/nodes/plannodes.h           |  1 +
 src/include/nodes/relation.h            |  4 +++-
 5 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index eeb8f19..ce42388 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -109,6 +109,8 @@ static void ExplainMemberNodes(List *plans, PlanState **planstates,
 				   List *ancestors, ExplainState *es);
 static void ExplainSubPlans(List *plans, List *ancestors,
 				const char *relationship, ExplainState *es);
+static void ExplainCustomChildren(CustomScanState *css,
+								  List *ancestors, ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *value,
 				bool numeric, ExplainState *es);
 static void ExplainOpenGroup(const char *objtype, const char *labelname,
@@ -1596,6 +1598,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		(IsA(planstate, CustomScanState) &&
+		 ((CustomScanState *) planstate)->num_children > 0) ||
 		planstate->subPlan;
 	if (haschildren)
 	{
@@ -1650,6 +1654,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainNode(((SubqueryScanState *) planstate)->subplan, ancestors,
 						"Subquery", NULL, es);
 			break;
+		case T_CustomScan:
+			ExplainCustomChildren((CustomScanState *) planstate,
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -2521,6 +2529,19 @@ ExplainSubPlans(List *plans, List *ancestors,
 }
 
 /*
+ * Explain underlying child nodes of CustomScanState, if any
+ */
+static void
+ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es)
+{
+	const char *label = (css->num_children > 1 ? "children" : "child");
+	int			i;
+
+	for (i=0; i < css->num_children; i++)
+		ExplainNode(css->custom_children[i], ancestors, label, NULL, es);
+}
+
+/*
  * Explain a property, such as sort keys or targets, that takes the form of
  * a list of unlabeled items.  "data" is a list of C strings.
  */
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c809237..61d50c2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2069,6 +2069,21 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
+	List	   *custom_children = NIL;
+	ListCell   *lc;
+
+	/*
+	 * If CustomPath takes underlying child nodes, we recursively transform
+	 * these Path nodes to Plan node.
+	 * Custom-scan provider will attach these plans on lefttree, righttree
+	 * or custom_children list of CustomScan node.
+	 */
+	foreach (lc, best_path->custom_children)
+	{
+		Plan   *child = create_plan_recurse(root, (Path *) lfirst(lc));
+
+		custom_children = lappend(custom_children, child);
+	}
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2084,7 +2099,8 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 															  rel,
 															  best_path,
 															  tlist,
-															  scan_clauses);
+															  scan_clauses,
+															  custom_children);
 	Assert(IsA(cplan, CustomScan));
 
 	/*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9de6d14..24377a1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1561,6 +1561,8 @@ typedef struct CustomScanState
 {
 	ScanState	ss;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
+	uint32		num_children;	/* length of child nodes array */
+	PlanState **custom_children;/* array of child PlanState, if any */
 	const CustomExecMethods *methods;
 } CustomScanState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9313292..a928771 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -541,6 +541,7 @@ typedef struct CustomScan
 {
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
+	List	   *custom_children;/* list of Plan nodes, if any */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
 	List	   *custom_private; /* private data for custom code */
 	List	   *custom_scan_tlist;		/* optional tlist describing scan
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d3ee61c..b9bdeff 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -924,7 +924,8 @@ typedef struct CustomPathMethods
 												RelOptInfo *rel,
 												struct CustomPath *best_path,
 												List *tlist,
-												List *clauses);
+												List *clauses,
+												List *custom_children);
 	/* Optional: print additional fields besides "private" */
 	void		(*TextOutCustomPath) (StringInfo str,
 											  const struct CustomPath *node);
@@ -934,6 +935,7 @@ typedef struct CustomPath
 {
 	Path		path;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see above */
+	List	   *custom_children;/* list of child Path nodes, if any */
 	List	   *custom_private;
 	const CustomPathMethods *methods;
 } CustomPath;
#97Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#70)
3 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Hello,

I tried to make patches for the three approaches.
Please don't think the option-3 serious proposition, however,
it is the only solution at this moment unfortunately.

In my understanding, we don't guarantee interface compatibility
across major version up, including the definitions of non-static
functions. It is role of extension's author to follow up the
new major version (and raise a problem report during development
cycle if feature update makes problems without alternatives).
In fact, people usually submit patches and a part of them changes
definition of non-static functions, however, nobody can guarantee
no extension uses this function thus don't break compatibility.
It is a collateral evidence we don't think non-static functions
are not stable interface for extensions, and it shall not be
a reason why to prohibit functions in spite of its necessity.

On the other hands, I understand it is not only issues around
createplan.c, but also a (philosophical) issue around criteria
and human's decision which functions should be static or
non-static. So, it usually takes time to get overall consensus.
If we keep the create_plan_recurse() static, the option-2 is
a solution to balance both of opinions.

Anyway, I really dislike the option-3, want to have a solution.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Kaigai Kouhei(海外 浩平)
Sent: Tuesday, May 12, 2015 10:24 AM
To: 'Andres Freund'; 'Robert Haas'
Cc: 'Tom Lane'; 'Kohei KaiGai'; 'Thom Brown'; 'Shigeru Hanada';
'pgsql-hackers@postgreSQL.org'
Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Hi,

Let me back to the original concern and show possible options
we can take here. At least, the latest master is not usable to
implement custom-join logic without either of these options.

option-1)
Revert create_plan_recurse() to non-static function for extensions.
It is the simplest solution, however, it is still gray zone which
functions shall be public and whether we deal with the non-static
functions as a stable API or not.
IMO, we shouldn't treat non-static functions as stable APIs, even
if it can be called by extensions not only codes in another source
file. In fact, we usually changes definition of non-static functions
even though we know extensions uses. It is role of extension to
follow up the feature across major version.

option-2)
Tom's suggestion. Add a new list field of Path nodes on CustomPath,
then create_customscan_plan() will call static create_plan_recurse()
function to construct child plan nodes.
Probably, the attached patch will be an image of this enhancement,
but not tested yet, of course. Once we adopt this approach, I'll
adjust my PG-Strom code towards the new interface within 2 weeks
and report problems if any.

option-3)
Enforce authors of custom-scan provider to copy and paste createplan.c.
I really don't want this option and nobody will be happy.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Kaigai Kouhei(海外 浩平)
Sent: Monday, May 11, 2015 12:48 PM
To: 'Andres Freund'; Robert Haas
Cc: Tom Lane; Kohei KaiGai; Thom Brown; Shigeru Hanada;
pgsql-hackers@postgreSQL.org
Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

On 2015-05-10 21:26:26 -0400, Robert Haas wrote:

On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This commit reverts create_plan_recurse() as static function.

Yes. I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful. (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

Wasn't there a submitted use case? IIRC Kaigai had referenced some
pg-strom (?) code using it?

I'm failing to see how create_plan_recurse() being exposed externally is
related to "having gotten committed without submitted use-cases". Even
if submitted, presumably as simple as possible code, doesn't use it,
that's not a proof that less simple code does not need it.

Yes, PG-Strom code uses create_plan_recurse() to construct child plan
node of the GPU accelerated custom-join logic, once it got chosen.
Here is nothing special. It calls create_plan_recurse() as built-in
join path doing on the underlying inner/outer paths.
It is not difficult to submit as a working example, however, its total
code size (excludes GPU code) is 25KL at this moment.

I'm not certain whether it is a simple example.

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors. Many people have complained about
that on multiple occasions. Frankly, I find it obstructionist and
petty.

While I don't find the tone of the characterization super helpful, I do
tend to agree that we're *far* too conservative on that end. I've now
seen a significant number of extension that copied large swathes of code
just to cope with individual functions not being available. And even
cases where that lead to minor forks with such details changed.

I may have to join the members?

I know that I'm "fearful" of asking for functions being made
public. Because it'll invariably get into a discussion of merits that's
completely out of proportion with the size of the change. And if I, who
has been on the list for a while now, am "afraid" in that way, you can
be sure that others won't even dare to ask, lest argue their way
through.

I think the problem is that during development the default often is to
create function as static if they're used only in one file. Which is
fine. But it really doesn't work if it's a larger battle to change
single incidences. Besides the pain of having to wait for the next
major release...

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

custom-join-problem-option-2.v1.patchapplication/octet-stream; name=custom-join-problem-option-2.v1.patchDownload
 doc/src/sgml/custom-scan.sgml           | 12 +++++++++++-
 src/backend/commands/explain.c          | 21 +++++++++++++++++++++
 src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++++-
 src/backend/optimizer/plan/setrefs.c    |  8 ++++++++
 src/backend/optimizer/plan/subselect.c  | 25 +++++++++++++++++++++----
 src/include/nodes/execnodes.h           |  2 ++
 src/include/nodes/plannodes.h           |  1 +
 src/include/nodes/relation.h            |  4 +++-
 8 files changed, 84 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 62a8a33..c7187c7 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -60,6 +60,7 @@ typedef struct CustomPath
 {
     Path      path;
     uint32    flags;
+    List     *custom_children;
     List     *custom_private;
     const CustomPathMethods *methods;
 } CustomPath;
@@ -73,6 +74,10 @@ typedef struct CustomPath
     <literal>CUSTOMPATH_SUPPORT_BACKWARD_SCAN</> if the custom path can support
     a backward scan and <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> if it
     can support mark and restore.  Both capabilities are optional.
+    An optional <structfield>custom_children</> is a list of underlying
+    <structname>Path</> nodes that can be executed as input data stream of
+    this custom-path node. If valid list is given, it shall be transformed
+    to the relevant <structname>Plan</> nodes by planner.
     <structfield>custom_private</> can be used to store the custom path's
     private data.  Private data should be stored in a form that can be handled
     by <literal>nodeToString</>, so that debugging routines that attempt to
@@ -112,7 +117,8 @@ Plan *(*PlanCustomPath) (PlannerInfo *root,
                          RelOptInfo *rel,
                          CustomPath *best_path,
                          List *tlist,
-                         List *clauses);
+                         List *clauses,
+                         List *custom_children);
 </programlisting>
     Convert a custom path to a finished plan.  The return value will generally
     be a <literal>CustomScan</> object, which the callback must allocate and
@@ -145,6 +151,7 @@ typedef struct CustomScan
 {
     Scan      scan;
     uint32    flags;
+    List     *custom_children;
     List     *custom_exprs;
     List     *custom_private;
     List     *custom_scan_tlist;
@@ -159,6 +166,9 @@ typedef struct CustomScan
     estimated costs, target lists, qualifications, and so on.
     <structfield>flags</> is a bitmask with the same meaning as in
     <structname>CustomPath</>.
+    <structfield>custom_children</> can be used to store child
+    <structname>Plan</> nodes, if custom-scan provider takes multiple
+    (more than two) underlying query execution plans.
     <structfield>custom_exprs</> should be used to
     store expression trees that will need to be fixed up by
     <filename>setrefs.c</> and <filename>subselect.c</>, while
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index eeb8f19..ce42388 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -109,6 +109,8 @@ static void ExplainMemberNodes(List *plans, PlanState **planstates,
 				   List *ancestors, ExplainState *es);
 static void ExplainSubPlans(List *plans, List *ancestors,
 				const char *relationship, ExplainState *es);
+static void ExplainCustomChildren(CustomScanState *css,
+								  List *ancestors, ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *value,
 				bool numeric, ExplainState *es);
 static void ExplainOpenGroup(const char *objtype, const char *labelname,
@@ -1596,6 +1598,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		(IsA(planstate, CustomScanState) &&
+		 ((CustomScanState *) planstate)->num_children > 0) ||
 		planstate->subPlan;
 	if (haschildren)
 	{
@@ -1650,6 +1654,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainNode(((SubqueryScanState *) planstate)->subplan, ancestors,
 						"Subquery", NULL, es);
 			break;
+		case T_CustomScan:
+			ExplainCustomChildren((CustomScanState *) planstate,
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -2521,6 +2529,19 @@ ExplainSubPlans(List *plans, List *ancestors,
 }
 
 /*
+ * Explain underlying child nodes of CustomScanState, if any
+ */
+static void
+ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es)
+{
+	const char *label = (css->num_children > 1 ? "children" : "child");
+	int			i;
+
+	for (i=0; i < css->num_children; i++)
+		ExplainNode(css->custom_children[i], ancestors, label, NULL, es);
+}
+
+/*
  * Explain a property, such as sort keys or targets, that takes the form of
  * a list of unlabeled items.  "data" is a list of C strings.
  */
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c809237..61d50c2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2069,6 +2069,21 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
+	List	   *custom_children = NIL;
+	ListCell   *lc;
+
+	/*
+	 * If CustomPath takes underlying child nodes, we recursively transform
+	 * these Path nodes to Plan node.
+	 * Custom-scan provider will attach these plans on lefttree, righttree
+	 * or custom_children list of CustomScan node.
+	 */
+	foreach (lc, best_path->custom_children)
+	{
+		Plan   *child = create_plan_recurse(root, (Path *) lfirst(lc));
+
+		custom_children = lappend(custom_children, child);
+	}
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2084,7 +2099,8 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 															  rel,
 															  best_path,
 															  tlist,
-															  scan_clauses);
+															  scan_clauses,
+															  custom_children);
 	Assert(IsA(cplan, CustomScan));
 
 	/*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index fac51c9..86f2dcb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1136,6 +1136,8 @@ set_customscan_references(PlannerInfo *root,
 						  CustomScan *cscan,
 						  int rtoffset)
 {
+	ListCell   *lc;
+
 	/* Adjust scanrelid if it's valid */
 	if (cscan->scan.scanrelid > 0)
 		cscan->scan.scanrelid += rtoffset;
@@ -1179,6 +1181,12 @@ set_customscan_references(PlannerInfo *root,
 			fix_scan_list(root, cscan->custom_exprs, rtoffset);
 	}
 
+	/* Adjust child plan-nodes recursively, if needed */
+	foreach (lc, cscan->custom_children)
+	{
+		lfirst(lc) = set_plan_refs(root, (Plan *) lfirst(lc), rtoffset);
+	}
+
 	/* Adjust custom_relids if needed */
 	if (rtoffset > 0)
 	{
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index afccee5..af0fb85 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2323,10 +2323,27 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_CustomScan:
-			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
-							  &context);
-			/* We assume custom_scan_tlist cannot contain Params */
-			context.paramids = bms_add_members(context.paramids, scan_params);
+			{
+				CustomScan *cscan = (CustomScan *) plan;
+				ListCell   *lc;
+
+				finalize_primnode((Node *) cscan->custom_exprs,
+								  &context);
+				/* We assume custom_scan_tlist cannot contain Params */
+				context.paramids =
+					bms_add_members(context.paramids, scan_params);
+
+				/* child nodes if any */
+				foreach (lc, cscan->custom_children)
+				{
+					context.paramids =
+						bms_add_members(context.paramids,
+										finalize_plan(root,
+													  (Plan *) lfirst(lc),
+													  valid_params,
+													  scan_params));
+				}
+			}
 			break;
 
 		case T_ModifyTable:
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9de6d14..24377a1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1561,6 +1561,8 @@ typedef struct CustomScanState
 {
 	ScanState	ss;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
+	uint32		num_children;	/* length of child nodes array */
+	PlanState **custom_children;/* array of child PlanState, if any */
 	const CustomExecMethods *methods;
 } CustomScanState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9313292..a928771 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -541,6 +541,7 @@ typedef struct CustomScan
 {
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
+	List	   *custom_children;/* list of Plan nodes, if any */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
 	List	   *custom_private; /* private data for custom code */
 	List	   *custom_scan_tlist;		/* optional tlist describing scan
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d3ee61c..b9bdeff 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -924,7 +924,8 @@ typedef struct CustomPathMethods
 												RelOptInfo *rel,
 												struct CustomPath *best_path,
 												List *tlist,
-												List *clauses);
+												List *clauses,
+												List *custom_children);
 	/* Optional: print additional fields besides "private" */
 	void		(*TextOutCustomPath) (StringInfo str,
 											  const struct CustomPath *node);
@@ -934,6 +935,7 @@ typedef struct CustomPath
 {
 	Path		path;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see above */
+	List	   *custom_children;/* list of child Path nodes, if any */
 	List	   *custom_private;
 	const CustomPathMethods *methods;
 } CustomPath;
custom-join-problem-option-1.v1.patchapplication/octet-stream; name=custom-join-problem-option-1.v1.patchDownload
 src/backend/optimizer/plan/createplan.c | 3 +--
 src/include/optimizer/planmain.h        | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c809237..91066cb 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,7 +44,6 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
 static List *build_path_tlist(PlannerInfo *root, Path *path);
 static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -220,7 +219,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index da15fca..539b45e 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,7 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
custom-join-problem-option-3.v1.patchapplication/octet-stream; name=custom-join-problem-option-3.v1.patchDownload
 doc/src/sgml/custom-scan.sgml | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 62a8a33..eb5988d 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -103,6 +103,15 @@ extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
    responsibility of the hook to minimize duplicated work.
   </para>
 
+  <para>
+   Note that we have no public interface for extensions to construct
+   <structname>Plan</> nodes based on the <structname>Path</> nodes
+   chosen, at this moment. So, extension need to copy and paste
+   <filename>createplan.c</> into its source tree, then adjust definition
+   of <function>create_plan_recurse</> as non-static function, to allow
+   to construct underlying <structname>Plan</> nodes.
+  </para>
+
   <sect2 id="custom-scan-path-callbacks">
   <title>Custom Scan Path Callbacks</title>
 
#98Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Kouhei Kaigai (#96)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-05-12 10:24 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

option-2)
Tom's suggestion. Add a new list field of Path nodes on CustomPath,
then create_customscan_plan() will call static create_plan_recurse()
function to construct child plan nodes.
Probably, the attached patch will be an image of this enhancement,
but not tested yet, of course. Once we adopt this approach, I'll
adjust my PG-Strom code towards the new interface within 2 weeks
and report problems if any.

+1. This design achieves the functionality required for custom join
by Kaigai-san's use case, instantiating sub-plans of CustomScan plan
recursively, without exposing create_plan_recurse. CSP can use the
index number to distinguish information, like FDWs do with
fdw_private.

IMO it isn't necessary to apply the change onto
ForeignPath/ForeignScan. FDW would have a way to keep-and-use sub
paths as private information. In fact, I wrote postgres_fdw to use
create_plan_recurse to generate SQL statements of inner/outer
relations to be joined, but as of now SQL construction for join
push-down is accomplished by calling private function deparseSelectSql
recursively at the top of a join tree.

Some FDW might hope to use sub-plan generation, but we don't have any
concrete use case as of now.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#99Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#85)
1 attachment(s)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

A possible compromise that we could perhaps still wedge into 9.5 is to
extend CustomPath with a List of child Paths, and CustomScan with a List
of child Plans, which createplan.c would know to build from the Paths,
and other modules would then also be aware of these children. I find that
uglier than a separate join node type, but it would be tolerable I guess.

The attached patch implements what you suggested as is.
It allows custom-scan providers to have child Paths without exporting
create_plan_recurse(), and enables to represent N-way join naturally.
Please add any solution, even if we don't reach the consensus of how
create_plan_recurse (and other useful static functions) are visible to
extensions.

Patch detail:
It adds a List field (List *custom_children) to CustomPath structure
to inform planner its child Path nodes, to be transformed to Plan node
through the planner's job.
CustomScan also have a new List field to have its child Plan nodes
which shall be processed by setrefs.c and subselect.c.
PlanCustomPath callback was extended to have a list of Plan nodes
that were constructed on create_customscan_plan in core, it is
a job of custom-scan provider to attach these Plan nodes onto
lefttree, righttree or the custom_children list.
CustomScanState also have an array to have PlanState nodes of the
children. It is used for EXPLAIN command know the child nodes.

Regarding of FDW, as Hanada-san mentioned, I'm uncertain whether
similar feature is also needed because its join-pushdown feature
scan on the result-set of remotely joined relations, thus no need
to have local child Path nodes.
So, I put this custom_children list on CustomXXXX structure only.

It may need additional section in the documentation.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

custom-join-problem-option-2.v1.patchapplication/octet-stream; name=custom-join-problem-option-2.v1.patchDownload
 doc/src/sgml/custom-scan.sgml           | 12 +++++++++++-
 src/backend/commands/explain.c          | 21 +++++++++++++++++++++
 src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++++-
 src/backend/optimizer/plan/setrefs.c    |  8 ++++++++
 src/backend/optimizer/plan/subselect.c  | 25 +++++++++++++++++++++----
 src/include/nodes/execnodes.h           |  2 ++
 src/include/nodes/plannodes.h           |  1 +
 src/include/nodes/relation.h            |  4 +++-
 8 files changed, 84 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 62a8a33..c7187c7 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -60,6 +60,7 @@ typedef struct CustomPath
 {
     Path      path;
     uint32    flags;
+    List     *custom_children;
     List     *custom_private;
     const CustomPathMethods *methods;
 } CustomPath;
@@ -73,6 +74,10 @@ typedef struct CustomPath
     <literal>CUSTOMPATH_SUPPORT_BACKWARD_SCAN</> if the custom path can support
     a backward scan and <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> if it
     can support mark and restore.  Both capabilities are optional.
+    An optional <structfield>custom_children</> is a list of underlying
+    <structname>Path</> nodes that can be executed as input data stream of
+    this custom-path node. If valid list is given, it shall be transformed
+    to the relevant <structname>Plan</> nodes by planner.
     <structfield>custom_private</> can be used to store the custom path's
     private data.  Private data should be stored in a form that can be handled
     by <literal>nodeToString</>, so that debugging routines that attempt to
@@ -112,7 +117,8 @@ Plan *(*PlanCustomPath) (PlannerInfo *root,
                          RelOptInfo *rel,
                          CustomPath *best_path,
                          List *tlist,
-                         List *clauses);
+                         List *clauses,
+                         List *custom_children);
 </programlisting>
     Convert a custom path to a finished plan.  The return value will generally
     be a <literal>CustomScan</> object, which the callback must allocate and
@@ -145,6 +151,7 @@ typedef struct CustomScan
 {
     Scan      scan;
     uint32    flags;
+    List     *custom_children;
     List     *custom_exprs;
     List     *custom_private;
     List     *custom_scan_tlist;
@@ -159,6 +166,9 @@ typedef struct CustomScan
     estimated costs, target lists, qualifications, and so on.
     <structfield>flags</> is a bitmask with the same meaning as in
     <structname>CustomPath</>.
+    <structfield>custom_children</> can be used to store child
+    <structname>Plan</> nodes, if custom-scan provider takes multiple
+    (more than two) underlying query execution plans.
     <structfield>custom_exprs</> should be used to
     store expression trees that will need to be fixed up by
     <filename>setrefs.c</> and <filename>subselect.c</>, while
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index eeb8f19..ce42388 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -109,6 +109,8 @@ static void ExplainMemberNodes(List *plans, PlanState **planstates,
 				   List *ancestors, ExplainState *es);
 static void ExplainSubPlans(List *plans, List *ancestors,
 				const char *relationship, ExplainState *es);
+static void ExplainCustomChildren(CustomScanState *css,
+								  List *ancestors, ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *value,
 				bool numeric, ExplainState *es);
 static void ExplainOpenGroup(const char *objtype, const char *labelname,
@@ -1596,6 +1598,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		(IsA(planstate, CustomScanState) &&
+		 ((CustomScanState *) planstate)->num_children > 0) ||
 		planstate->subPlan;
 	if (haschildren)
 	{
@@ -1650,6 +1654,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainNode(((SubqueryScanState *) planstate)->subplan, ancestors,
 						"Subquery", NULL, es);
 			break;
+		case T_CustomScan:
+			ExplainCustomChildren((CustomScanState *) planstate,
+								  ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -2521,6 +2529,19 @@ ExplainSubPlans(List *plans, List *ancestors,
 }
 
 /*
+ * Explain underlying child nodes of CustomScanState, if any
+ */
+static void
+ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es)
+{
+	const char *label = (css->num_children > 1 ? "children" : "child");
+	int			i;
+
+	for (i=0; i < css->num_children; i++)
+		ExplainNode(css->custom_children[i], ancestors, label, NULL, es);
+}
+
+/*
  * Explain a property, such as sort keys or targets, that takes the form of
  * a list of unlabeled items.  "data" is a list of C strings.
  */
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c809237..61d50c2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2069,6 +2069,21 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 {
 	CustomScan *cplan;
 	RelOptInfo *rel = best_path->path.parent;
+	List	   *custom_children = NIL;
+	ListCell   *lc;
+
+	/*
+	 * If CustomPath takes underlying child nodes, we recursively transform
+	 * these Path nodes to Plan node.
+	 * Custom-scan provider will attach these plans on lefttree, righttree
+	 * or custom_children list of CustomScan node.
+	 */
+	foreach (lc, best_path->custom_children)
+	{
+		Plan   *child = create_plan_recurse(root, (Path *) lfirst(lc));
+
+		custom_children = lappend(custom_children, child);
+	}
 
 	/*
 	 * Sort clauses into the best execution order, although custom-scan
@@ -2084,7 +2099,8 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
 															  rel,
 															  best_path,
 															  tlist,
-															  scan_clauses);
+															  scan_clauses,
+															  custom_children);
 	Assert(IsA(cplan, CustomScan));
 
 	/*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index fac51c9..86f2dcb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1136,6 +1136,8 @@ set_customscan_references(PlannerInfo *root,
 						  CustomScan *cscan,
 						  int rtoffset)
 {
+	ListCell   *lc;
+
 	/* Adjust scanrelid if it's valid */
 	if (cscan->scan.scanrelid > 0)
 		cscan->scan.scanrelid += rtoffset;
@@ -1179,6 +1181,12 @@ set_customscan_references(PlannerInfo *root,
 			fix_scan_list(root, cscan->custom_exprs, rtoffset);
 	}
 
+	/* Adjust child plan-nodes recursively, if needed */
+	foreach (lc, cscan->custom_children)
+	{
+		lfirst(lc) = set_plan_refs(root, (Plan *) lfirst(lc), rtoffset);
+	}
+
 	/* Adjust custom_relids if needed */
 	if (rtoffset > 0)
 	{
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index afccee5..af0fb85 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2323,10 +2323,27 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_CustomScan:
-			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
-							  &context);
-			/* We assume custom_scan_tlist cannot contain Params */
-			context.paramids = bms_add_members(context.paramids, scan_params);
+			{
+				CustomScan *cscan = (CustomScan *) plan;
+				ListCell   *lc;
+
+				finalize_primnode((Node *) cscan->custom_exprs,
+								  &context);
+				/* We assume custom_scan_tlist cannot contain Params */
+				context.paramids =
+					bms_add_members(context.paramids, scan_params);
+
+				/* child nodes if any */
+				foreach (lc, cscan->custom_children)
+				{
+					context.paramids =
+						bms_add_members(context.paramids,
+										finalize_plan(root,
+													  (Plan *) lfirst(lc),
+													  valid_params,
+													  scan_params));
+				}
+			}
 			break;
 
 		case T_ModifyTable:
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9de6d14..24377a1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1561,6 +1561,8 @@ typedef struct CustomScanState
 {
 	ScanState	ss;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
+	uint32		num_children;	/* length of child nodes array */
+	PlanState **custom_children;/* array of child PlanState, if any */
 	const CustomExecMethods *methods;
 } CustomScanState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9313292..a928771 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -541,6 +541,7 @@ typedef struct CustomScan
 {
 	Scan		scan;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see relation.h */
+	List	   *custom_children;/* list of Plan nodes, if any */
 	List	   *custom_exprs;	/* expressions that custom code may evaluate */
 	List	   *custom_private; /* private data for custom code */
 	List	   *custom_scan_tlist;		/* optional tlist describing scan
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d3ee61c..b9bdeff 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -924,7 +924,8 @@ typedef struct CustomPathMethods
 												RelOptInfo *rel,
 												struct CustomPath *best_path,
 												List *tlist,
-												List *clauses);
+												List *clauses,
+												List *custom_children);
 	/* Optional: print additional fields besides "private" */
 	void		(*TextOutCustomPath) (StringInfo str,
 											  const struct CustomPath *node);
@@ -934,6 +935,7 @@ typedef struct CustomPath
 {
 	Path		path;
 	uint32		flags;			/* mask of CUSTOMPATH_* flags, see above */
+	List	   *custom_children;/* list of child Path nodes, if any */
 	List	   *custom_private;
 	const CustomPathMethods *methods;
 } CustomPath;
#100Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Kouhei Kaigai (#99)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

2015-05-15 8:43 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Regarding of FDW, as Hanada-san mentioned, I'm uncertain whether
similar feature is also needed because its join-pushdown feature
scan on the result-set of remotely joined relations, thus no need
to have local child Path nodes.
So, I put this custom_children list on CustomXXXX structure only.

AFAIS most of FDWs won't need child paths to process their external data.

The most possible idea is that a FDW uses output of ForeignScan plan
node which is handled by the FDW, but such work should be done by
another CSP (or at least via CSP I/F).

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#101Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#99)
Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Let me remind the problem not to be forgotten towards v9.5.

The commit 1a8a4e5cde2b7755e11bde2ea7897bd650622d3e disallowed extensions
to call create_plan_recurse(), however, it is required for custom-scan node
that implements own join logic and takes child nodes to construct Plan node
from Path node. So, at this moment, we cannot utilize the new feature well
unless extension copies & pastes createplan.c to its own source tree (ugly!).

Tom suggested an alternative infrastructure that tells planner Path nodes
to be passed to create_plan_recurse() in createplan.c.

A possible compromise that we could perhaps still wedge into 9.5 is to
extend CustomPath with a List of child Paths, and CustomScan with a List
of child Plans, which createplan.c would know to build from the Paths,
and other modules would then also be aware of these children. I find that
uglier than a separate join node type, but it would be tolerable I guess.

The attached patch implements what you suggested as is.
It allows custom-scan providers to have child Paths without exporting
create_plan_recurse(), and enables to represent N-way join naturally.
Please add any solution, even if we don't reach the consensus of how
create_plan_recurse (and other useful static functions) are visible to
extensions.

Then, I made a patch (which was attached on the last message) according to
the suggestion. I think it is one possible option.

Or, one other option is to revert create_plan_recurse() non-static function
as the infrastructure originally designed.

I expect people think it is not a graceful design to force extensions to
copy and paste core file with small adjustment. So, either of options, or
others if any, needs to be merged to solve the problem.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Friday, May 15, 2015 8:44 AM
To: Tom Lane; Kohei KaiGai
Cc: Robert Haas; Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

A possible compromise that we could perhaps still wedge into 9.5 is to
extend CustomPath with a List of child Paths, and CustomScan with a List
of child Plans, which createplan.c would know to build from the Paths,
and other modules would then also be aware of these children. I find that
uglier than a separate join node type, but it would be tolerable I guess.

The attached patch implements what you suggested as is.
It allows custom-scan providers to have child Paths without exporting
create_plan_recurse(), and enables to represent N-way join naturally.
Please add any solution, even if we don't reach the consensus of how
create_plan_recurse (and other useful static functions) are visible to
extensions.

Patch detail:
It adds a List field (List *custom_children) to CustomPath structure
to inform planner its child Path nodes, to be transformed to Plan node
through the planner's job.
CustomScan also have a new List field to have its child Plan nodes
which shall be processed by setrefs.c and subselect.c.
PlanCustomPath callback was extended to have a list of Plan nodes
that were constructed on create_customscan_plan in core, it is
a job of custom-scan provider to attach these Plan nodes onto
lefttree, righttree or the custom_children list.
CustomScanState also have an array to have PlanState nodes of the
children. It is used for EXPLAIN command know the child nodes.

Regarding of FDW, as Hanada-san mentioned, I'm uncertain whether
similar feature is also needed because its join-pushdown feature
scan on the result-set of remotely joined relations, thus no need
to have local child Path nodes.
So, I put this custom_children list on CustomXXXX structure only.

It may need additional section in the documentation.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers