Custom Scan APIs (Re: Custom Plan node)

Started by Kohei KaiGaiabout 12 years ago110 messages

kaigai@kaigai.gr.jp

about 12 years ago

2 attachment(s)

The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?

+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors *semifactors,
+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.

I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.

----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan-remote-join.v1.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-remote-join.v1.patchDownload

 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1063 ++++++++++++++++++++++--
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 4 files changed, 1264 insertions(+), 106 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a2675eb..d537b81 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+                              true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 246a3a9..6f95e41 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -880,6 +858,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 
 	/* Get info about foreign table. */
 	fsstate->rel = node->ss.ss_currentRelation;
+	fsstate->scan_tupdesc = RelationGetDescr(node->ss.ss_currentRelation);
 	table = GetForeignTable(RelationGetRelid(fsstate->rel));
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(userid, server->serverid);
@@ -969,7 +948,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, node->ss.ps.ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +957,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -1704,10 +1683,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1842,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1930,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1951,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +1970,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2196,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2490,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2513,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2546,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2632,951 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte;
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		if (rel->rtekind != RTE_RELATION ||
+			rel->fdwroutine == NULL ||
+			rel->fdwroutine->GetForeignRelSize != postgresGetForeignRelSize)
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		rte = planner_rt_fetch(rel->relid, root);
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * has_wholerow_reference
+ *
+ * It returns true, if supplied expression contains whole-row reference.
+ */
+static bool
+has_wholerow_reference(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return has_wholerow_reference((Node *)rinfo->clause, context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node, has_wholerow_reference, context);
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (has_wholerow_reference((Node *)joinrel->reltargetlist, NULL) ||
+		has_wholerow_reference((Node *)j_local_conds, NULL))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * XXX - most of logic was ported from existing FDW code. So, it needs to
+ * be reworked first, to use common routine for FDW and CustomScan.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+    ForeignServer  *server;
+    UserMapping	   *user;
+	Oid				userid;
+	int				i, numParams;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * CustomScan also uses PgFdwScanState to run remote join
+	 */
+	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
+	node->custom_state = fsstate;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		fsstate->join_rels = lappend(fsstate->join_rels, rel);
+	}
+	fsstate->scan_tupdesc = tupdesc;
+
+	/* Get info about foreign server */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+	user = GetUserMapping(userid, jinfo.fdw_server_oid);
+	server = GetForeignServer(jinfo.fdw_server_oid);
+
+	/*
+	 * Get connection to the foreign server.  Connection manager will
+	 * establish new connection if necessary.
+	 */
+	fsstate->conn = GetConnection(server, user, false);
+
+	/* Assign a unique ID for my cursor */
+	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
+	fsstate->cursor_exists = false;
+
+	/* Get private info created by planner functions. */
+	fsstate->query = jinfo.select_qry;
+	fsstate->retrieved_attrs = retrieved_attrs;
+
+	/* Create contexts for batches of tuples and per-tuple temp workspace. */
+	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+											   "postgres_fdw tuple data",
+											   ALLOCSET_DEFAULT_MINSIZE,
+											   ALLOCSET_DEFAULT_INITSIZE,
+											   ALLOCSET_DEFAULT_MAXSIZE);
+	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+											  "postgres_fdw temporary data",
+											  ALLOCSET_SMALL_MINSIZE,
+											  ALLOCSET_SMALL_INITSIZE,
+											  ALLOCSET_SMALL_MAXSIZE);
+
+	/* Get info we'll need for input data conversion. */
+	fsstate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+
+	/* Prepare for output conversion of parameters used in remote query. */
+	numParams = list_length(jinfo.select_params);
+	fsstate->numParams = numParams;
+	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
+
+	i = 0;
+	foreach(lc, jinfo.select_params)
+	{
+		Node   *param_expr = (Node *) lfirst(lc);
+		Oid		typefnoid;
+		bool	isvarlena;
+
+		getTypeOutputInfo(exprType(param_expr), &typefnoid, &isvarlena);
+		fmgr_info(typefnoid, &fsstate->param_flinfo[i]);
+		i++;
+	}
+
+    /*
+     * Prepare remote-parameter expressions for evaluation.
+	 */
+    fsstate->param_exprs = (List *)ExecInitExpr((Expr *) jinfo.select_params,
+												(PlanState *) node);
+
+	/*
+     * Allocate buffer for text form of query parameters, if any.
+     */
+	if (numParams > 0)
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
+	else
+		fsstate->param_values = NULL;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * XXX - most of logic was ported from existing FDW code. So, it needs to
+ * be reworked first, to use common routine for FDW and CustomScan.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	/* create a new cursor, if first call next to Begin or ReScan */
+	if (!fsstate->cursor_exists)
+		create_cursor(fsstate, node->ss.ps.ps_ExprContext);
+
+	/*
+	 * Get some more tuples, if we've run out.
+	 */
+	if (fsstate->next_tuple >= fsstate->num_tuples)
+	{
+		/* No point in another fetch if we already detected EOF, though. */
+		if (!fsstate->eof_reached)
+			fetch_more_data(fsstate);
+		/* If we didn't get any tuples, must be end of data. */
+		if (fsstate->next_tuple >= fsstate->num_tuples)
+			return ExecClearTuple(slot);
+	}
+
+	/*
+	 * Return the next tuple.
+	 */
+	ExecStoreTuple(fsstate->tuples[fsstate->next_tuple++],
+				   slot,
+				   InvalidBuffer,
+				   false);
+
+	return slot;
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of postgresExecCustomScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * XXX - most of logic was ported from existing FDW code. So, it needs to
+ * be reworked first, to use common routine for FDW and CustomScan.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	if (!fsstate)
+		return;
+
+	/* XXX - logic copied from postgresReScanForeignScan */
+	/* needs reworking */
+
+	/* Close the cursor if open, to prevent accumulation of cursors */
+	if (fsstate->cursor_exists)
+		close_cursor(fsstate->conn, fsstate->cursor_number);
+
+	/* Release remote connection */
+	ReleaseConnection(fsstate->conn);
+	fsstate->conn = NULL;
+
+	/* Also, close the underlying foreign tables by ourselves */
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * XXX - most of logic was ported from existing FDW code. So, it needs to
+ * be reworked first, to use common routine for FDW and CustomScan.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	char			sql[64];
+	PGresult	   *res;
+
+	/* XXX - logic copied from postgresReScanForeignScan */
+	/* needs reworking */
+
+	/* If we haven't created the cursor yet, nothing to do. */
+	if (!fsstate->cursor_exists)
+		return;
+
+	/*
+	 * If any internal parameters affecting this node have changed, we'd
+	 * better destroy and recreate the cursor.  Otherwise, rewinding it should
+	 * be good enough.  If we've only fetched zero or one batch, we needn't
+	 * even rewind the cursor, just rescan what we have.
+	 */
+	if (node->ss.ps.chgParam != NULL)
+	{
+		fsstate->cursor_exists = false;
+		snprintf(sql, sizeof(sql), "CLOSE c%u",
+				 fsstate->cursor_number);
+	}
+	else if (fsstate->fetch_ct_2 > 1)
+	{
+		snprintf(sql, sizeof(sql), "MOVE BACKWARD ALL IN c%u",
+				 fsstate->cursor_number);
+	}
+	else
+	{
+		/* Easy: just rescan what we already have in memory, if anything */
+		fsstate->next_tuple = 0;
+		return;
+	}
+
+	/*
+	 * We don't use a PG_TRY block here, so be careful not to throw error
+	 * without releasing the PGresult.
+	 */
+	res = PQexec(fsstate->conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, true, sql);
+	PQclear(res);
+
+	/* Now force a fresh FETCH. */
+	fsstate->tuples = NULL;
+	fsstate->num_tuples = 0;
+	fsstate->next_tuple = 0;
+	fsstate->fetch_ct_2 = 0;
+	fsstate->eof_reached = false;
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c782d4f..27486b9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */

pgsql-v9.4-custom-scan-apis.v1.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-apis.v1.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 761 +++++++++++++++++++++++++++++
 doc/src/sgml/ctidscan.sgml                 | 107 ++++
 src/backend/commands/explain.c             |  78 +++
 src/backend/executor/Makefile              |   2 +-
 src/backend/executor/execAmi.c             |  34 +-
 src/backend/executor/execProcnode.c        |  14 +
 src/backend/executor/execQual.c            |  10 +-
 src/backend/executor/execUtils.c           |   4 +-
 src/backend/executor/nodeCustom.c          | 252 ++++++++++
 src/backend/nodes/bitmapset.c              |  62 +++
 src/backend/nodes/copyfuncs.c              |  30 ++
 src/backend/nodes/outfuncs.c               |  19 +
 src/backend/nodes/print.c                  |   4 +
 src/backend/optimizer/path/allpaths.c      |  23 +
 src/backend/optimizer/path/costsize.c      |   7 +-
 src/backend/optimizer/path/joinpath.c      |  18 +
 src/backend/optimizer/plan/createplan.c    | 103 ++++
 src/backend/optimizer/plan/setrefs.c       |  27 +-
 src/backend/optimizer/plan/subselect.c     |  10 +
 src/backend/optimizer/util/pathnode.c      |  40 ++
 src/backend/utils/adt/ruleutils.c          |  44 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/executor/executor.h            |   3 +-
 src/include/executor/nodeCustom.h          |  94 ++++
 src/include/nodes/bitmapset.h              |   4 +
 src/include/nodes/execnodes.h              |  17 +
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/plannodes.h              |  16 +
 src/include/nodes/primnodes.h              |   1 +
 src/include/nodes/relation.h               |  16 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/pathnode.h           |  10 +
 src/include/optimizer/paths.h              |  25 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 41 files changed, 2194 insertions(+), 24 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..703e5a5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..c392051
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,761 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the chepest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+    ((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chainned with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrived tuple.
+	 */
+    cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because 
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being choosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	Index		scan_relid = cscan_path->path.parent->relid;
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(scan_relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chainned with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be seeked later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we seeked.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seel to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..60081f7
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,107 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>lo</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequencial scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be choosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries"> or
+   <xref linkend="guc-local-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 4e93df2..39d2c12 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -683,6 +685,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -809,6 +816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -897,6 +905,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+			operation = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1013,6 +1028,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1291,6 +1310,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->funcexpr != NULL && es->verbose)
+				show_expression(((CustomScan *)plan)->funcexpr,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1858,6 +1888,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2025,6 +2068,41 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				Node	   *funcexpr = ((CustomScan *) plan)->funcexpr;
+
+				if (funcexpr && IsA(funcexpr, FuncExpr))
+				{
+					Oid		funcid = ((FuncExpr *) funcexpr)->funcid;
+
+					objectname = get_func_name(funcid);
+					if (es->verbose)
+						namespace =
+							get_namespace_name(get_func_namespace(funcid));
+                }
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..f80e6c4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 76dd62f..b1110b9 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..e60ac67 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..df0d295 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..61bca22
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->ExecMarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->ExecRestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 540db16..44f2236 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a');
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A');
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65f3b98..b56af14 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -602,6 +602,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(funcexpr);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3929,6 +3956,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 817b149..031112b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -568,6 +568,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(funcexpr);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2814,6 +2830,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 48ef325..29fcba9 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index bfd3809..9d0cbf5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -46,6 +46,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -399,6 +401,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -427,6 +432,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1246,6 +1254,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1269,6 +1280,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_functionscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1292,6 +1306,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1361,6 +1378,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1414,6 +1434,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e7f8cec..c6e1634 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -130,9 +130,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -2312,7 +2309,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
@@ -3201,7 +3198,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5b477e5..9483614 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9b9eb2f..9626d08 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -235,6 +239,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -411,6 +416,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2016,6 +2028,97 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			Node   *funcexpr = rte->funcexpr;
+
+			if (best_path->path.param_info)
+				funcexpr = replace_nestloop_params(root, funcexpr);
+			scan_plan->funcexpr = funcexpr;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information. 
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b78d727..30cf7e5 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -578,6 +579,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
@@ -1059,7 +1084,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 0df70c4..644a532 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2194,6 +2194,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 64b17051..46e814d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 04b1c4f..eda53d6 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -143,6 +143,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2362,14 +2363,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3627,6 +3633,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5294,6 +5308,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5564,6 +5590,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 0350ef6..0c7a233 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -159,15 +159,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 75841c8..51537d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..8ea5693
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*ExecMarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*ExecRestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	ExecMarkPosCustomScan_function	ExecMarkPosCustomScan;
+	ExecRestorePosCustom_function	ExecRestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 2a4b41d..73424f5 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3b430e0..db4176c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1494,6 +1494,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 78368c6..8f00a6b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 44ea0b7..936591b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -483,6 +483,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	Node	   *funcexpr;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7918537..b71c7ca 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a2853fb..55fa8aa 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -867,6 +867,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 444ab740..a2873ec 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -145,6 +145,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9686229..1225970 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ef93c7..882baf6 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index ba7ae7c..13cfba8 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index d5935b6..9645025 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..1ad0e7a
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test creanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..09c1bda
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test creanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1c1491c..fe81929 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index c4d451a..a2287d8 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -91,6 +91,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

Kohei KaiGai

kaigai@kaigai.gr.jp

about 12 years ago

In reply to: Kohei KaiGai (#1)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors *semifactors,
+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.

I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.

----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kohei KaiGai

kaigai@kaigai.gr.jp

about 12 years ago

In reply to: Kohei KaiGai (#2)

2 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

The attached patches are the revised custom-scan APIs.
- Custom-scan.sgml was added to introduce the way to write custom-scan
provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
handlers into two parts; common portion and fdw specific one.
Executor callbacks of custom-scan code utilizes the common portion above
because most of its implementations are equivalent.

I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works well.

Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.

Thanks,

2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:

Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors *semifactors,
+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.

I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.

----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan-part2.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part2.v3.patchDownload

 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1101 +++++++++++++++++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 src/backend/nodes/bitmapset.c                  |   62 ++
 src/include/nodes/bitmapset.h                  |    4 +
 6 files changed, 1303 insertions(+), 171 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a2675eb..d537b81 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+                              true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 246a3a9..7488d1e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -844,84 +822,59 @@ postgresGetForeignPlan(PlannerInfo *root,
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
  */
-static void
-postgresBeginForeignScan(ForeignScanState *node, int eflags)
+static PgFdwScanState *
+commonBeginForeignScan(PlanState *ps, TupleDesc tupdesc,
+					   Oid serverid, Oid userid,
+					   char *remote_query, List *retrieved_attrs,
+					   List *remote_exprs)
 {
-	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
-	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
-	Oid			userid;
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;
-	int			numParams;
-	int			i;
-	ListCell   *lc;
+	ForeignServer  *server;
+	UserMapping	   *user;
+	int				numParams;
+	int				i;
+	ListCell	   *lc;
 
-	/*
-	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
-	 */
-	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
-		return;
-
-	/*
-	 * We'll save private state in node->fdw_state.
-	 */
+	/* Allocation of private state */
 	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
-	node->fdw_state = (void *) fsstate;
-
-	/*
-	 * Identify which user to do the remote access as.	This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
+	fsstate->scan_tupdesc = tupdesc;
+	fsstate->query = remote_query;
+	fsstate->retrieved_attrs = retrieved_attrs;
 
 	/*
 	 * Get connection to the foreign server.  Connection manager will
-	 * establish new connection if necessary.
+	 * establish new connection on demand.
 	 */
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
 	fsstate->cursor_exists = false;
 
-	/* Get private info created by planner functions. */
-	fsstate->query = strVal(list_nth(fsplan->fdw_private,
-									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
-
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
-	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->batch_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											   "postgres_fdw tuple data",
 											   ALLOCSET_DEFAULT_MINSIZE,
 											   ALLOCSET_DEFAULT_INITSIZE,
 											   ALLOCSET_DEFAULT_MAXSIZE);
-	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->temp_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_MINSIZE,
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
 	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->scan_tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
-	numParams = list_length(fsplan->fdw_exprs);
+	numParams = list_length(remote_exprs);
 	fsstate->numParams = numParams;
 	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
 
 	i = 0;
-	foreach(lc, fsplan->fdw_exprs)
+	foreach(lc, remote_exprs)
 	{
 		Node	   *param_expr = (Node *) lfirst(lc);
 		Oid			typefnoid;
@@ -940,17 +893,62 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * benefit, and it'd require postgres_fdw to know more than is desirable
 	 * about Param evaluation.)
 	 */
-	fsstate->param_exprs = (List *)
-		ExecInitExpr((Expr *) fsplan->fdw_exprs,
-					 (PlanState *) node);
+	fsstate->param_exprs = (List *) ExecInitExpr((Expr *) remote_exprs, ps);
 
 	/*
 	 * Allocate buffer for text form of query parameters, if any.
 	 */
 	if (numParams > 0)
-		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+	return fsstate;
+}
+
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
+	PgFdwScanState *fsstate;
+	EState	   *estate = node->ss.ps.state;
+	Relation	rel;
+	char	   *remote_query;
+	List	   *retrieved_attrs;
+	RangeTblEntry *rte;
+	Oid			userid;
+	ForeignTable *table;
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Identify which user to do the remote access as.	This should match what
+	 * ExecCheckRTEPerms() does.
+	 */
+	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	/* Get info about foreign table. */
+	rel = node->ss.ss_currentRelation;
+	table = GetForeignTable(RelationGetRelid(rel));
+
+	/* Get private info created by planner functions. */
+    remote_query = strVal(list_nth(fsplan->fdw_private,
+								   FdwScanPrivateSelectSql));
+	retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
+
+	fsstate = commonBeginForeignScan(&node->ss.ps, RelationGetDescr(rel),
+									 table->serverid, userid,
+									 remote_query, retrieved_attrs,
+									 fsplan->fdw_exprs);
+	fsstate->rel = rel;
+
+	node->fdw_state = fsstate;
 }
 
 /*
@@ -959,17 +957,15 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
  *		EOF.
  */
 static TupleTableSlot *
-postgresIterateForeignScan(ForeignScanState *node)
+commonIterateForeignScan(PgFdwScanState *fsstate, PlanState *ps,
+						 TupleTableSlot *slot)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-
 	/*
 	 * If this is the first call after Begin or ReScan, we need to create the
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, ps->ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +974,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -995,14 +991,22 @@ postgresIterateForeignScan(ForeignScanState *node)
 	return slot;
 }
 
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
 /*
  * postgresReScanForeignScan
  *		Restart the scan.
  */
 static void
-postgresReScanForeignScan(ForeignScanState *node)
+commonReScanForeignScan(PgFdwScanState *fsstate, PlanState *ps)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	char		sql[64];
 	PGresult   *res;
 
@@ -1016,7 +1020,7 @@ postgresReScanForeignScan(ForeignScanState *node)
 	 * be good enough.	If we've only fetched zero or one batch, we needn't
 	 * even rewind the cursor, just rescan what we have.
 	 */
-	if (node->ss.ps.chgParam != NULL)
+	if (ps->chgParam != NULL)
 	{
 		fsstate->cursor_exists = false;
 		snprintf(sql, sizeof(sql), "CLOSE c%u",
@@ -1051,19 +1055,21 @@ postgresReScanForeignScan(ForeignScanState *node)
 	fsstate->eof_reached = false;
 }
 
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
 /*
  * postgresEndForeignScan
  *		Finish scanning foreign table and dispose objects used for this scan
  */
 static void
-postgresEndForeignScan(ForeignScanState *node)
+commonEndForeignScan(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-
-	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
-	if (fsstate == NULL)
-		return;
-
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -1075,6 +1081,18 @@ postgresEndForeignScan(ForeignScanState *node)
 	/* MemoryContexts will be deleted automatically. */
 }
 
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	commonEndForeignScan(fsstate);
+}
+
 /*
  * postgresAddForeignUpdateTargets
  *		Add resjunk column(s) needed for update/delete on a foreign table
@@ -1704,10 +1722,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1881,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1969,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1990,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +2009,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2235,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2529,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2552,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2585,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2671,820 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte;
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		if (rel->rtekind != RTE_RELATION ||
+			rel->fdwroutine == NULL ||
+			rel->fdwroutine->GetForeignRelSize != postgresGetForeignRelSize)
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		rte = planner_rt_fetch(rel->relid, root);
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * has_wholerow_reference
+ *
+ * It returns true, if supplied expression contains whole-row reference.
+ */
+static bool
+has_wholerow_reference(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return has_wholerow_reference((Node *)rinfo->clause, context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node, has_wholerow_reference, context);
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (has_wholerow_reference((Node *)joinrel->reltargetlist, NULL) ||
+		has_wholerow_reference((Node *)j_local_conds, NULL))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * Most of logic are equivalent to postgresBeginForeignScan, however,
+ * needs adjustment because of difference in the nature.
+ * The biggest one is, it has to open the underlying relation by itself
+ * and needs to construct tuple-descriptor from the var-list to be fetched,
+ * because custom-scan (in this case; a scan on remote join instead of
+ * local join) does not have a particular relation on its behaind, thus
+ * it needs to manage correctly.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *join_rels = NIL;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+	Oid				userid;
+	int				i;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		join_rels = lappend(join_rels, rel);
+	}
+
+	/*
+	 * Determine a user-id. Current user-id shall be applied without something
+	 * special configuration on the reference.
+	 */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+
+	/* common part to begin remote query execution */
+	fsstate = commonBeginForeignScan(&node->ss.ps, tupdesc,
+									 jinfo.fdw_server_oid, userid,
+									 jinfo.select_qry,
+									 retrieved_attrs,
+									 jinfo.select_params);
+	/* also, underlying relations also have to be saved */
+	fsstate->join_rels = join_rels;
+
+	node->custom_state = fsstate;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * Access method to fetch a tuple from the remote join query.
+ * It performs equivalent job as postgresIterateForeignScan() doing on
+ * queries to single relation.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of regular ExecScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * Nothing are different from postgresEndForeignScan, except for closing
+ * underlying relations by itself.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	/* cleanup resources used in common portion */
+	commonEndForeignScan(fsstate);
+
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * Same as postgresReScanForeignScan() doing.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c782d4f..27486b9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 540db16..44f2236 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a');
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A');
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 2a4b41d..73424f5 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */

pgsql-v9.4-custom-scan-part1.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part1.v3.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 761 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 107 ++++
 doc/src/sgml/custom-scan.sgml              | 298 +++++++++++
 doc/src/sgml/filelist.sgml                 |   2 +
 doc/src/sgml/postgres.sgml                 |   1 +
 src/backend/commands/explain.c             |  78 +++
 src/backend/executor/Makefile              |   2 +-
 src/backend/executor/execAmi.c             |  34 +-
 src/backend/executor/execProcnode.c        |  14 +
 src/backend/executor/execQual.c            |  10 +-
 src/backend/executor/execUtils.c           |   4 +-
 src/backend/executor/nodeCustom.c          | 252 ++++++++++
 src/backend/nodes/copyfuncs.c              |  30 ++
 src/backend/nodes/outfuncs.c               |  19 +
 src/backend/nodes/print.c                  |   4 +
 src/backend/optimizer/path/allpaths.c      |  23 +
 src/backend/optimizer/path/costsize.c      |   7 +-
 src/backend/optimizer/path/joinpath.c      |  18 +
 src/backend/optimizer/plan/createplan.c    | 103 ++++
 src/backend/optimizer/plan/setrefs.c       |  27 +-
 src/backend/optimizer/plan/subselect.c     |  10 +
 src/backend/optimizer/util/pathnode.c      |  40 ++
 src/backend/utils/adt/ruleutils.c          |  44 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/executor/executor.h            |   3 +-
 src/include/executor/nodeCustom.h          |  94 ++++
 src/include/nodes/execnodes.h              |  17 +
 src/include/nodes/nodes.h                  |   3 +
 src/include/nodes/plannodes.h              |  16 +
 src/include/nodes/primnodes.h              |   1 +
 src/include/nodes/relation.h               |  16 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/pathnode.h           |  10 +
 src/include/optimizer/paths.h              |  25 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 43 files changed, 2430 insertions(+), 24 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..703e5a5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..c392051
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,761 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the chepest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+    ((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chainned with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrived tuple.
+	 */
+    cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because 
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being choosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	Index		scan_relid = cscan_path->path.parent->relid;
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(scan_relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chainned with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be seeked later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we seeked.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seel to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index dd8e09e..4f23b74 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..60081f7
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,107 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>lo</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequencial scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be choosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries"> or
+   <xref linkend="guc-local-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..bb00078
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,298 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  Custom-scan API enables extension to provide alternative ways to scan or
+  join relations, being fully integrated with cost based optimizer,
+  in addition to the built-in implementation.
+  It consists of a set of callbacks, with a unique name, to be invoked during
+  query planning and execution. Custom-scan provider should implement these
+  callback functions according to the expectation of API.
+ </para>
+ <para>
+  Overall, here is four major jobs that custom-scan provider should implement.
+  The first one is registration of custom-scan provider itself. Usually, it
+  shall be done once at <literal>_PG_init()</literal> entrypoint on module
+  loading.
+  The other three jobs shall be done for each query planning and execution.
+  The second one is submission of candidate paths to scan or join relations,
+  with an adequate cost, for the core planner.
+  Then, planner shall chooses a cheapest path from all the candidates.
+  If custom path survived, the planner kicks the third job; construction of
+  <literal>CustomScan</literal> plan node, being located within query plan
+  tree instead of the built-in plan node.
+  The last one is execution of its implementation in answer to invocations
+  by the core executor.
+ </para>
+ <para>
+  Some of contrib module utilize the custom-scan API. It may be able to
+  provide a good example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      Its logic enables to skip earlier pages or terminate scan prior to
+      end of the relation, if inequality operator on <literal>ctid</literal>
+      system column can narrow down the scope to be scanned, instead of
+      the sequential scan that reads a relation from the head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      Its logic replaces a local join of foreign tables managed by
+      <literal>postgres_fdw</literal> with a custom scan that fetches
+      remotely joined relations.
+      It shows the way to implement a custom scan node that performs
+      instead join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Right now, only scan and join are supported to have fully integrated cost
+  based query optimization performing on custom scan API.
+  You might be able to implement other stuff, like sort or aggregation, with
+  manipulation of the planned tree, however, extension has to be responsible
+  to handle this replacement correctly. Here is no support by the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first job for custom scan provider is registration of a set of
+    callbacks with a unique name. Usually, it shall be done once on
+    <literal>_PG_init()</literal> entrypoint of module loading.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds out the best way to scan or join relations from
+    the various potential paths; combination of a scan algorithm and target
+    relations.
+    Prior to this selection, we list up all the potential paths towards
+    a target relation (if base relation) or a pair of relations (if join).
+    The <literal>add_scan_path_hook</> and <literal>add_join_path_hook</>
+    allows extensions to add alternative scan paths in addition to built-in
+    ones.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is common field for all the path nodes to store
+    cost estimation. In addition, <literal>custom_name</> is the name of
+    registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, custom scan provider has to be responsible to mark and restore
+        a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scan.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> got choosen by query planner,
+    it calls back its associated custom scan provider to complete setting
+    up <literal>CustomScan</literal> plan node according to the path
+    information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    Query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then custom scan provider can apply final initialization.
+    <literal>cscan_path</> is the path node that was constructed on the
+    previous stage then got choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during relation scan. Its expression portion shall be also
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after conscruction of plan node.
+    For example, Var node in the target list of join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if custom scan node is a vanilla relation
+    scan because here is nothing special to do. Elsewhere, it needs to
+    be handled by custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    Query execuror also launches associated callbacks to begin, execute and
+    end custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by custom scan provider.
+    Also, <literal>eflags</> has flag bits of the executor's operating mode
+    for this plan node.
+    Note that custom scan provider should not perform anything visible
+    externally if <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations if join
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether next
+    tuple matches the qualifier of this scan, or not.
+    A usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations if
+    join according to the custom logic. Pay attention the data format (and
+    the way to return also) depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and release resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on might change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d1b7dc6..0dfbdcc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
@@ -104,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 522316c..cce0cd8 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 4e93df2..39d2c12 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -683,6 +685,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -809,6 +816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -897,6 +905,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+			operation = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1013,6 +1028,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1291,6 +1310,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->funcexpr != NULL && es->verbose)
+				show_expression(((CustomScan *)plan)->funcexpr,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1858,6 +1888,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2025,6 +2068,41 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				Node	   *funcexpr = ((CustomScan *) plan)->funcexpr;
+
+				if (funcexpr && IsA(funcexpr, FuncExpr))
+				{
+					Oid		funcid = ((FuncExpr *) funcexpr)->funcid;
+
+					objectname = get_func_name(funcid);
+					if (es->verbose)
+						namespace =
+							get_namespace_name(get_func_namespace(funcid));
+                }
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..f80e6c4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 76dd62f..b1110b9 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..e60ac67 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..df0d295 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1733da6..9aaca17 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -602,6 +602,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(funcexpr);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3940,6 +3967,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b39927e..7f0297f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -568,6 +568,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(funcexpr);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2815,6 +2831,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 48ef325..29fcba9 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index bfd3809..9d0cbf5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -46,6 +46,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -399,6 +401,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -427,6 +432,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1246,6 +1254,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1269,6 +1280,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_functionscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1292,6 +1306,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1361,6 +1378,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1414,6 +1434,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e7f8cec..c6e1634 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -130,9 +130,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -2312,7 +2309,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
@@ -3201,7 +3198,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5b477e5..9483614 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5947e5b..f830ab8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -235,6 +239,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -411,6 +416,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2016,6 +2028,97 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			Node   *funcexpr = rte->funcexpr;
+
+			if (best_path->path.param_info)
+				funcexpr = replace_nestloop_params(root, funcexpr);
+			scan_plan->funcexpr = funcexpr;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information. 
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b78d727..30cf7e5 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -578,6 +579,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
@@ -1059,7 +1084,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 0df70c4..644a532 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2194,6 +2194,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 64b17051..46e814d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 5ffce68..bfceab8 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -143,6 +143,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2363,14 +2364,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3628,6 +3634,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5295,6 +5309,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5565,6 +5591,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 0350ef6..0c7a233 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -159,15 +159,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 75841c8..51537d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bedcf04..529930f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1494,6 +1494,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fc6b1d7..7753a09 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 44ea0b7..936591b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -483,6 +483,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	Node	   *funcexpr;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7918537..b71c7ca 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6d7b594..50194f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -876,6 +876,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 444ab740..a2873ec 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -145,6 +145,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9686229..1225970 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 96ffdb1..d7c7ef7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index ba7ae7c..13cfba8 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index d5935b6..9645025 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..1ad0e7a
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test creanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..09c1bda
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test creanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5758b07..bd6fc3f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 78348f5..0e191a2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -91,6 +91,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

Jim Mlodgenski

jimmy76@gmail.com

about 12 years ago

In reply to: Kohei KaiGai (#3)

Re: Custom Scan APIs (Re: Custom Plan node)

On Mon, Nov 18, 2013 at 7:25 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

The attached patches are the revised custom-scan APIs.

My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]

I'd recommend that you split the part1 patch containing the ctidscan
contrib into its own patch. It is more than half of the patch and its
certainly stands on its own. IMO, I think ctidscan fits a very specific use
case and would be better off being an extension instead of in contrib.

Show quoted text

- Custom-scan.sgml was added to introduce the way to write custom-scan
provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
handlers into two parts; common portion and fdw specific one.
Executor callbacks of custom-scan code utilizes the common portion above
because most of its implementations are equivalent.

I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works
well.

Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.

Thanks,

2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:

Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate

custom

scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND

a = 200;

QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY

PLAN

------------------------------------------------------------------------------------------------------

Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around

allpaths.c

and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of

callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?

+/* Hook for plugins to add custom join path, in addition to default

ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors
*semifactors,
+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.

I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.

----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kohei KaiGai

kaigai@kaigai.gr.jp

about 12 years ago

In reply to: Jim Mlodgenski (#4)

3 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Thanks for your review.

2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:

My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]

This variable was only used in Assert() macro, so it causes a warning if you
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.

I'd recommend that you split the part1 patch containing the ctidscan contrib
into its own patch. It is more than half of the patch and its certainly
stands on its own. IMO, I think ctidscan fits a very specific use case and
would be better off being an extension instead of in contrib.

OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan-part3.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part3.v4.patchDownload

 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1101 +++++++++++++++++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 src/backend/nodes/bitmapset.c                  |   62 ++
 src/include/nodes/bitmapset.h                  |    4 +
 6 files changed, 1303 insertions(+), 171 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a2675eb..d537b81 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+                              true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 246a3a9..7488d1e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -844,84 +822,59 @@ postgresGetForeignPlan(PlannerInfo *root,
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
  */
-static void
-postgresBeginForeignScan(ForeignScanState *node, int eflags)
+static PgFdwScanState *
+commonBeginForeignScan(PlanState *ps, TupleDesc tupdesc,
+					   Oid serverid, Oid userid,
+					   char *remote_query, List *retrieved_attrs,
+					   List *remote_exprs)
 {
-	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
-	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
-	Oid			userid;
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;
-	int			numParams;
-	int			i;
-	ListCell   *lc;
+	ForeignServer  *server;
+	UserMapping	   *user;
+	int				numParams;
+	int				i;
+	ListCell	   *lc;
 
-	/*
-	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
-	 */
-	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
-		return;
-
-	/*
-	 * We'll save private state in node->fdw_state.
-	 */
+	/* Allocation of private state */
 	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
-	node->fdw_state = (void *) fsstate;
-
-	/*
-	 * Identify which user to do the remote access as.	This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
+	fsstate->scan_tupdesc = tupdesc;
+	fsstate->query = remote_query;
+	fsstate->retrieved_attrs = retrieved_attrs;
 
 	/*
 	 * Get connection to the foreign server.  Connection manager will
-	 * establish new connection if necessary.
+	 * establish new connection on demand.
 	 */
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
 	fsstate->cursor_exists = false;
 
-	/* Get private info created by planner functions. */
-	fsstate->query = strVal(list_nth(fsplan->fdw_private,
-									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
-
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
-	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->batch_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											   "postgres_fdw tuple data",
 											   ALLOCSET_DEFAULT_MINSIZE,
 											   ALLOCSET_DEFAULT_INITSIZE,
 											   ALLOCSET_DEFAULT_MAXSIZE);
-	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->temp_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_MINSIZE,
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
 	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->scan_tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
-	numParams = list_length(fsplan->fdw_exprs);
+	numParams = list_length(remote_exprs);
 	fsstate->numParams = numParams;
 	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
 
 	i = 0;
-	foreach(lc, fsplan->fdw_exprs)
+	foreach(lc, remote_exprs)
 	{
 		Node	   *param_expr = (Node *) lfirst(lc);
 		Oid			typefnoid;
@@ -940,17 +893,62 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * benefit, and it'd require postgres_fdw to know more than is desirable
 	 * about Param evaluation.)
 	 */
-	fsstate->param_exprs = (List *)
-		ExecInitExpr((Expr *) fsplan->fdw_exprs,
-					 (PlanState *) node);
+	fsstate->param_exprs = (List *) ExecInitExpr((Expr *) remote_exprs, ps);
 
 	/*
 	 * Allocate buffer for text form of query parameters, if any.
 	 */
 	if (numParams > 0)
-		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+	return fsstate;
+}
+
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
+	PgFdwScanState *fsstate;
+	EState	   *estate = node->ss.ps.state;
+	Relation	rel;
+	char	   *remote_query;
+	List	   *retrieved_attrs;
+	RangeTblEntry *rte;
+	Oid			userid;
+	ForeignTable *table;
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Identify which user to do the remote access as.	This should match what
+	 * ExecCheckRTEPerms() does.
+	 */
+	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	/* Get info about foreign table. */
+	rel = node->ss.ss_currentRelation;
+	table = GetForeignTable(RelationGetRelid(rel));
+
+	/* Get private info created by planner functions. */
+    remote_query = strVal(list_nth(fsplan->fdw_private,
+								   FdwScanPrivateSelectSql));
+	retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
+
+	fsstate = commonBeginForeignScan(&node->ss.ps, RelationGetDescr(rel),
+									 table->serverid, userid,
+									 remote_query, retrieved_attrs,
+									 fsplan->fdw_exprs);
+	fsstate->rel = rel;
+
+	node->fdw_state = fsstate;
 }
 
 /*
@@ -959,17 +957,15 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
  *		EOF.
  */
 static TupleTableSlot *
-postgresIterateForeignScan(ForeignScanState *node)
+commonIterateForeignScan(PgFdwScanState *fsstate, PlanState *ps,
+						 TupleTableSlot *slot)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-
 	/*
 	 * If this is the first call after Begin or ReScan, we need to create the
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, ps->ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +974,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -995,14 +991,22 @@ postgresIterateForeignScan(ForeignScanState *node)
 	return slot;
 }
 
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
 /*
  * postgresReScanForeignScan
  *		Restart the scan.
  */
 static void
-postgresReScanForeignScan(ForeignScanState *node)
+commonReScanForeignScan(PgFdwScanState *fsstate, PlanState *ps)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	char		sql[64];
 	PGresult   *res;
 
@@ -1016,7 +1020,7 @@ postgresReScanForeignScan(ForeignScanState *node)
 	 * be good enough.	If we've only fetched zero or one batch, we needn't
 	 * even rewind the cursor, just rescan what we have.
 	 */
-	if (node->ss.ps.chgParam != NULL)
+	if (ps->chgParam != NULL)
 	{
 		fsstate->cursor_exists = false;
 		snprintf(sql, sizeof(sql), "CLOSE c%u",
@@ -1051,19 +1055,21 @@ postgresReScanForeignScan(ForeignScanState *node)
 	fsstate->eof_reached = false;
 }
 
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
 /*
  * postgresEndForeignScan
  *		Finish scanning foreign table and dispose objects used for this scan
  */
 static void
-postgresEndForeignScan(ForeignScanState *node)
+commonEndForeignScan(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-
-	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
-	if (fsstate == NULL)
-		return;
-
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -1075,6 +1081,18 @@ postgresEndForeignScan(ForeignScanState *node)
 	/* MemoryContexts will be deleted automatically. */
 }
 
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	commonEndForeignScan(fsstate);
+}
+
 /*
  * postgresAddForeignUpdateTargets
  *		Add resjunk column(s) needed for update/delete on a foreign table
@@ -1704,10 +1722,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1881,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1969,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1990,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +2009,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2235,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2529,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2552,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2585,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2671,820 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte;
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		if (rel->rtekind != RTE_RELATION ||
+			rel->fdwroutine == NULL ||
+			rel->fdwroutine->GetForeignRelSize != postgresGetForeignRelSize)
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		rte = planner_rt_fetch(rel->relid, root);
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * has_wholerow_reference
+ *
+ * It returns true, if supplied expression contains whole-row reference.
+ */
+static bool
+has_wholerow_reference(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return has_wholerow_reference((Node *)rinfo->clause, context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node, has_wholerow_reference, context);
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (has_wholerow_reference((Node *)joinrel->reltargetlist, NULL) ||
+		has_wholerow_reference((Node *)j_local_conds, NULL))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * Most of logic are equivalent to postgresBeginForeignScan, however,
+ * needs adjustment because of difference in the nature.
+ * The biggest one is, it has to open the underlying relation by itself
+ * and needs to construct tuple-descriptor from the var-list to be fetched,
+ * because custom-scan (in this case; a scan on remote join instead of
+ * local join) does not have a particular relation on its behaind, thus
+ * it needs to manage correctly.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *join_rels = NIL;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+	Oid				userid;
+	int				i;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		join_rels = lappend(join_rels, rel);
+	}
+
+	/*
+	 * Determine a user-id. Current user-id shall be applied without something
+	 * special configuration on the reference.
+	 */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+
+	/* common part to begin remote query execution */
+	fsstate = commonBeginForeignScan(&node->ss.ps, tupdesc,
+									 jinfo.fdw_server_oid, userid,
+									 jinfo.select_qry,
+									 retrieved_attrs,
+									 jinfo.select_params);
+	/* also, underlying relations also have to be saved */
+	fsstate->join_rels = join_rels;
+
+	node->custom_state = fsstate;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * Access method to fetch a tuple from the remote join query.
+ * It performs equivalent job as postgresIterateForeignScan() doing on
+ * queries to single relation.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of regular ExecScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * Nothing are different from postgresEndForeignScan, except for closing
+ * underlying relations by itself.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	/* cleanup resources used in common portion */
+	commonEndForeignScan(fsstate);
+
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * Same as postgresReScanForeignScan() doing.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c782d4f..27486b9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 540db16..44f2236 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a');
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A');
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 2a4b41d..73424f5 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */

pgsql-v9.4-custom-scan-part2.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part2.v4.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 760 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 107 ++++
 doc/src/sgml/filelist.sgml                 |   1 +
 src/backend/optimizer/path/costsize.c      |   5 +-
 src/backend/optimizer/plan/setrefs.c       |   2 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 16 files changed, 1247 insertions(+), 9 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..703e5a5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..31db244
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,760 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the chepest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+    ((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chainned with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrived tuple.
+	 */
+    cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because 
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being choosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chainned with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be seeked later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we seeked.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seel to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index dd8e09e..4f23b74 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..60081f7
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,107 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>lo</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequencial scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be choosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries"> or
+   <xref linkend="guc-local-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1e96829..0dfbdcc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -105,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e5b0cd7..c6e1634 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -130,9 +130,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -3201,7 +3198,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aae5a1c..30cf7e5 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1084,7 +1084,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 0350ef6..0c7a233 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -159,15 +159,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 444ab740..a2873ec 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -145,6 +145,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index ba7ae7c..13cfba8 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index d5935b6..9645025 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..1ad0e7a
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test creanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..09c1bda
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test creanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5758b07..bd6fc3f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 78348f5..0e191a2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -91,6 +91,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

pgsql-v9.4-custom-scan-part1.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part1.v4.patchDownload

 doc/src/sgml/custom-scan.sgml           | 298 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  78 +++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  34 +++-
 src/backend/executor/execProcnode.c     |  14 ++
 src/backend/executor/execQual.c         |  10 +-
 src/backend/executor/execUtils.c        |   4 +-
 src/backend/executor/nodeCustom.c       | 252 +++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  30 ++++
 src/backend/nodes/outfuncs.c            |  19 ++
 src/backend/nodes/print.c               |   4 +
 src/backend/optimizer/path/allpaths.c   |  23 +++
 src/backend/optimizer/path/costsize.c   |   2 +-
 src/backend/optimizer/path/joinpath.c   |  18 ++
 src/backend/optimizer/plan/createplan.c | 103 +++++++++++
 src/backend/optimizer/plan/setrefs.c    |  25 +++
 src/backend/optimizer/plan/subselect.c  |  10 ++
 src/backend/optimizer/util/pathnode.c   |  40 +++++
 src/backend/utils/adt/ruleutils.c       |  44 ++++-
 src/include/executor/executor.h         |   3 +-
 src/include/executor/nodeCustom.h       |  94 ++++++++++
 src/include/nodes/execnodes.h           |  17 ++
 src/include/nodes/nodes.h               |   3 +
 src/include/nodes/plannodes.h           |  16 ++
 src/include/nodes/primnodes.h           |   1 +
 src/include/nodes/relation.h            |  16 ++
 src/include/optimizer/pathnode.h        |  10 ++
 src/include/optimizer/paths.h           |  25 +++
 30 files changed, 1182 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..bb00078
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,298 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  Custom-scan API enables extension to provide alternative ways to scan or
+  join relations, being fully integrated with cost based optimizer,
+  in addition to the built-in implementation.
+  It consists of a set of callbacks, with a unique name, to be invoked during
+  query planning and execution. Custom-scan provider should implement these
+  callback functions according to the expectation of API.
+ </para>
+ <para>
+  Overall, here is four major jobs that custom-scan provider should implement.
+  The first one is registration of custom-scan provider itself. Usually, it
+  shall be done once at <literal>_PG_init()</literal> entrypoint on module
+  loading.
+  The other three jobs shall be done for each query planning and execution.
+  The second one is submission of candidate paths to scan or join relations,
+  with an adequate cost, for the core planner.
+  Then, planner shall chooses a cheapest path from all the candidates.
+  If custom path survived, the planner kicks the third job; construction of
+  <literal>CustomScan</literal> plan node, being located within query plan
+  tree instead of the built-in plan node.
+  The last one is execution of its implementation in answer to invocations
+  by the core executor.
+ </para>
+ <para>
+  Some of contrib module utilize the custom-scan API. It may be able to
+  provide a good example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      Its logic enables to skip earlier pages or terminate scan prior to
+      end of the relation, if inequality operator on <literal>ctid</literal>
+      system column can narrow down the scope to be scanned, instead of
+      the sequential scan that reads a relation from the head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      Its logic replaces a local join of foreign tables managed by
+      <literal>postgres_fdw</literal> with a custom scan that fetches
+      remotely joined relations.
+      It shows the way to implement a custom scan node that performs
+      instead join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Right now, only scan and join are supported to have fully integrated cost
+  based query optimization performing on custom scan API.
+  You might be able to implement other stuff, like sort or aggregation, with
+  manipulation of the planned tree, however, extension has to be responsible
+  to handle this replacement correctly. Here is no support by the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first job for custom scan provider is registration of a set of
+    callbacks with a unique name. Usually, it shall be done once on
+    <literal>_PG_init()</literal> entrypoint of module loading.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds out the best way to scan or join relations from
+    the various potential paths; combination of a scan algorithm and target
+    relations.
+    Prior to this selection, we list up all the potential paths towards
+    a target relation (if base relation) or a pair of relations (if join).
+    The <literal>add_scan_path_hook</> and <literal>add_join_path_hook</>
+    allows extensions to add alternative scan paths in addition to built-in
+    ones.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is common field for all the path nodes to store
+    cost estimation. In addition, <literal>custom_name</> is the name of
+    registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, custom scan provider has to be responsible to mark and restore
+        a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scan.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> got choosen by query planner,
+    it calls back its associated custom scan provider to complete setting
+    up <literal>CustomScan</literal> plan node according to the path
+    information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    Query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then custom scan provider can apply final initialization.
+    <literal>cscan_path</> is the path node that was constructed on the
+    previous stage then got choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during relation scan. Its expression portion shall be also
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after conscruction of plan node.
+    For example, Var node in the target list of join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if custom scan node is a vanilla relation
+    scan because here is nothing special to do. Elsewhere, it needs to
+    be handled by custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    Query execuror also launches associated callbacks to begin, execute and
+    end custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by custom scan provider.
+    Also, <literal>eflags</> has flag bits of the executor's operating mode
+    for this plan node.
+    Note that custom scan provider should not perform anything visible
+    externally if <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations if join
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether next
+    tuple matches the qualifier of this scan, or not.
+    A usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations if
+    join according to the custom logic. Pay attention the data format (and
+    the way to return also) depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and release resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on might change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d1b7dc6..1e96829 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 522316c..cce0cd8 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 4e93df2..39d2c12 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -683,6 +685,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -809,6 +816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -897,6 +905,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+			operation = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1013,6 +1028,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1291,6 +1310,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->funcexpr != NULL && es->verbose)
+				show_expression(((CustomScan *)plan)->funcexpr,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1858,6 +1888,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2025,6 +2068,41 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				Node	   *funcexpr = ((CustomScan *) plan)->funcexpr;
+
+				if (funcexpr && IsA(funcexpr, FuncExpr))
+				{
+					Oid		funcid = ((FuncExpr *) funcexpr)->funcid;
+
+					objectname = get_func_name(funcid);
+					if (es->verbose)
+						namespace =
+							get_namespace_name(get_func_namespace(funcid));
+                }
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..f80e6c4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 76dd62f..b1110b9 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..e60ac67 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..df0d295 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1733da6..9aaca17 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -602,6 +602,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(funcexpr);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3940,6 +3967,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b39927e..7f0297f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -568,6 +568,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(funcexpr);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2815,6 +2831,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 48ef325..29fcba9 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index bfd3809..9d0cbf5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -46,6 +46,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -399,6 +401,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -427,6 +432,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1246,6 +1254,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1269,6 +1280,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_functionscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1292,6 +1306,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1361,6 +1378,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1414,6 +1434,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e7f8cec..e5b0cd7 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2312,7 +2312,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5b477e5..9483614 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5947e5b..f830ab8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -235,6 +239,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -411,6 +416,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2016,6 +2028,97 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			Node   *funcexpr = rte->funcexpr;
+
+			if (best_path->path.param_info)
+				funcexpr = replace_nestloop_params(root, funcexpr);
+			scan_plan->funcexpr = funcexpr;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information. 
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b78d727..aae5a1c 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -578,6 +579,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 0df70c4..644a532 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2194,6 +2194,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 64b17051..46e814d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 5ffce68..bfceab8 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -143,6 +143,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2363,14 +2364,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3628,6 +3634,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5295,6 +5309,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5565,6 +5591,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 75841c8..51537d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bedcf04..529930f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1494,6 +1494,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fc6b1d7..7753a09 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 44ea0b7..936591b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -483,6 +483,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	Node	   *funcexpr;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7918537..b71c7ca 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6d7b594..50194f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -876,6 +876,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9686229..1225970 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 96ffdb1..d7c7ef7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,

Jim Mlodgenski

jimmy76@gmail.com

about 12 years ago

In reply to: Kohei KaiGai (#5)

1 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

KaiGai

On Tue, Nov 19, 2013 at 9:41 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Thanks for your review.

2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:

My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’

[-Wunused-variable]

This variable was only used in Assert() macro, so it causes a warning if
you
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.

The warning is now gone.

I'd recommend that you split the part1 patch containing the ctidscan

contrib

into its own patch. It is more than half of the patch and its certainly
stands on its own. IMO, I think ctidscan fits a very specific use case

and

would be better off being an extension instead of in contrib.

OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

Attached is a patch for the documentation. I think the documentation still
needs a little more work, but it is pretty close. I can add some more
detail to it once finish adapting the hadoop_fdw to using the custom scan
api and have a better understanding of all of the calls.

Show quoted text

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

custom-scan.sgml.patchtext/x-patch; charset=US-ASCII; name=custom-scan.sgml.patchDownload

*** a/doc/src/sgml/custom-scan.sgml	2013-11-18 17:50:02.652039003 -0500
--- b/doc/src/sgml/custom-scan.sgml	2013-11-22 09:09:13.624254649 -0500
***************
*** 8,47 ****
    <secondary>handler for</secondary>
   </indexterm>
   <para>
!   Custom-scan API enables extension to provide alternative ways to scan or
!   join relations, being fully integrated with cost based optimizer,
!   in addition to the built-in implementation.
!   It consists of a set of callbacks, with a unique name, to be invoked during
!   query planning and execution. Custom-scan provider should implement these
!   callback functions according to the expectation of API.
   </para>
   <para>
!   Overall, here is four major jobs that custom-scan provider should implement.
!   The first one is registration of custom-scan provider itself. Usually, it
!   shall be done once at <literal>_PG_init()</literal> entrypoint on module
!   loading.
!   The other three jobs shall be done for each query planning and execution.
!   The second one is submission of candidate paths to scan or join relations,
!   with an adequate cost, for the core planner.
!   Then, planner shall chooses a cheapest path from all the candidates.
!   If custom path survived, the planner kicks the third job; construction of
!   <literal>CustomScan</literal> plan node, being located within query plan
!   tree instead of the built-in plan node.
!   The last one is execution of its implementation in answer to invocations
!   by the core executor.
   </para>
   <para>
!   Some of contrib module utilize the custom-scan API. It may be able to
!   provide a good example for new development.
    <variablelist>
     <varlistentry>
      <term><xref linkend="ctidscan"></term>
      <listitem>
       <para>
!       Its logic enables to skip earlier pages or terminate scan prior to
!       end of the relation, if inequality operator on <literal>ctid</literal>
!       system column can narrow down the scope to be scanned, instead of
!       the sequential scan that reads a relation from the head to the end.
       </para>
      </listitem>
     </varlistentry>
--- 8,46 ----
    <secondary>handler for</secondary>
   </indexterm>
   <para>
!   The custom-scan API enables an extension to provide alternative ways to scan
!   or join relations leveraging the cost based optimizer. The API consists of a
!   set of callbacks, with a unique names, to be invoked during query planning 
!   and execution. A custom-scan provider should implement these callback 
!   functions according to the expectation of the API.
   </para>
   <para>
!   Overall, there are four major tasks that a custom-scan provider should 
!   implement. The first task is the registration of custom-scan provider itself.
!   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
!   entrypoint when the module is loading. The remaing three tasks are all done
!   when a query is planning and executing. The second task is the submission of
!   candidate paths to either scan or join relations with an adequate cost for
!   the core planner. Then, the planner will choose the cheapest path from all of
!   the candidates. If the custom path survived, the planner starts the third 
!   task; construction of a <literal>CustomScan</literal> plan node, located
!   within the query plan tree instead of the built-in plan node. The last task
!   is the execution of its implementation in answer to invocations by the core
!   executor.
   </para>
   <para>
!   Some of contrib modules utilize the custom-scan API. They may provide a good
!   example for new development.
    <variablelist>
     <varlistentry>
      <term><xref linkend="ctidscan"></term>
      <listitem>
       <para>
!       This custom scan in this module enables a scan to skip earlier pages or
!       terminate prior to end of the relation, if the inequality operator on the
!       <literal>ctid</literal> system column can narrow down the scope to be
!       scanned, instead of a sequential scan which reads a relation from the
!       head to the end.
       </para>
      </listitem>
     </varlistentry>
***************
*** 49,70 ****
      <term><xref linkend="postgres-fdw"></term>
      <listitem>
       <para>
!       Its logic replaces a local join of foreign tables managed by
!       <literal>postgres_fdw</literal> with a custom scan that fetches
!       remotely joined relations.
!       It shows the way to implement a custom scan node that performs
!       instead join nodes.
       </para>
      </listitem>
     </varlistentry>
    </variablelist>
   </para>
   <para>
!   Right now, only scan and join are supported to have fully integrated cost
!   based query optimization performing on custom scan API.
!   You might be able to implement other stuff, like sort or aggregation, with
!   manipulation of the planned tree, however, extension has to be responsible
!   to handle this replacement correctly. Here is no support by the core.
   </para>
  
   <sect1 id="custom-scan-spec">
--- 48,68 ----
      <term><xref linkend="postgres-fdw"></term>
      <listitem>
       <para>
!       This custom scan in this module replaces a local join of foreign tables
!       managed by <literal>postgres_fdw</literal> with a scan that fetches
!       remotely joined relations. It demostrates the way to implement a custom
!       scan node that performs join nodes.
       </para>
      </listitem>
     </varlistentry>
    </variablelist>
   </para>
   <para>
!   Currently, only scan and join are fully supported with integrated cost
!   based query optimization using the custom scan API. You might be able to
!   implement other stuff, like sort or aggregation, with manipulation of the
!   planned tree, however, the extension has to be responsible to handle this
!   replacement correctly. There is no support in the core.
   </para>
  
   <sect1 id="custom-scan-spec">
***************
*** 72,80 ****
    <sect2 id="custom-scan-register">
     <title>Registration of custom scan provider</title>
     <para>
!     The first job for custom scan provider is registration of a set of
!     callbacks with a unique name. Usually, it shall be done once on
!     <literal>_PG_init()</literal> entrypoint of module loading.
  <programlisting>
  void
  register_custom_provider(const CustomProvider *provider);
--- 70,78 ----
    <sect2 id="custom-scan-register">
     <title>Registration of custom scan provider</title>
     <para>
!     The first task for a custom scan provider is the registration of a set of
!     callbacks with a unique names. Usually, this is done once upon module
!     loading in the <literal>_PG_init()</literal> entrypoint.
  <programlisting>
  void
  register_custom_provider(const CustomProvider *provider);
***************
*** 90,105 ****
    <sect2 id="custom-scan-path">
     <title>Submission of custom paths</title>
     <para>
!     The query planner finds out the best way to scan or join relations from
!     the various potential paths; combination of a scan algorithm and target
!     relations.
!     Prior to this selection, we list up all the potential paths towards
!     a target relation (if base relation) or a pair of relations (if join).
!     The <literal>add_scan_path_hook</> and <literal>add_join_path_hook</>
!     allows extensions to add alternative scan paths in addition to built-in
!     ones.
      If custom-scan provider can submit a potential scan path towards the
!     supplied relation, it shall construct <literal>CustomPath</> object
      with appropriate parameters.
  <programlisting>
  typedef struct CustomPath
--- 88,102 ----
    <sect2 id="custom-scan-path">
     <title>Submission of custom paths</title>
     <para>
!     The query planner finds the best way to scan or join relations from various
!     potential paths using a combination of scan algorithms and target 
!     relations. Prior to this selection, we list all of the potential paths
!     towards a target relation (if it is a base relation) or a pair of relations
!     (if it is a join). The <literal>add_scan_path_hook</> and
!     <literal>add_join_path_hook</> allow extensions to add alternative scan
!     paths in addition to built-in paths.
      If custom-scan provider can submit a potential scan path towards the
!     supplied relation, it shall construct a <literal>CustomPath</> object
      with appropriate parameters.
  <programlisting>
  typedef struct CustomPath
***************
*** 110,118 ****
      List       *custom_private;     /* can be used for private data */
  } CustomPath;
  </programlisting>
!     Its <literal>path</> is common field for all the path nodes to store
!     cost estimation. In addition, <literal>custom_name</> is the name of
!     registered custom scan provider, <literal>custom_flags</> is a set of
      flags below, and <literal>custom_private</> can be used to store private
      data of the custom scan provider.
     </para>
--- 107,115 ----
      List       *custom_private;     /* can be used for private data */
  } CustomPath;
  </programlisting>
!     Its <literal>path</> is a common field for all the path nodes to store
!     a cost estimation. In addition, <literal>custom_name</> is the name of
!     the registered custom scan provider, <literal>custom_flags</> is a set of
      flags below, and <literal>custom_private</> can be used to store private
      data of the custom scan provider.
     </para>
***************
*** 125,132 ****
          It informs the query planner this custom scan node supports
          <literal>ExecMarkPosCustomScan</> and
          <literal>ExecRestorePosCustomScan</> methods.
!         Also, custom scan provider has to be responsible to mark and restore
!         a particular position.
         </para>
        </listitem>
       </varlistentry>
--- 122,129 ----
          It informs the query planner this custom scan node supports
          <literal>ExecMarkPosCustomScan</> and
          <literal>ExecRestorePosCustomScan</> methods.
!         Also, the custom scan provider has to be responsible to mark and
!         restore a particular position.
         </para>
        </listitem>
       </varlistentry>
***************
*** 135,141 ****
        <listitem>
         <para>
          It informs the query planner this custom scan node supports
!         backward scan.
          Also, custom scan provider has to be responsible to scan with
          backward direction.
         </para>
--- 132,138 ----
        <listitem>
         <para>
          It informs the query planner this custom scan node supports
!         backward scans.
          Also, custom scan provider has to be responsible to scan with
          backward direction.
         </para>
***************
*** 148,157 ****
    <sect2 id="custom-scan-plan">
     <title>Construction of custom plan node</title>
     <para>
!     Once <literal>CustomPath</literal> got choosen by query planner,
!     it calls back its associated custom scan provider to complete setting
!     up <literal>CustomScan</literal> plan node according to the path
!     information.
  <programlisting>
  void
  InitCustomScanPlan(PlannerInfo *root,
--- 145,154 ----
    <sect2 id="custom-scan-plan">
     <title>Construction of custom plan node</title>
     <para>
!     Once <literal>CustomPath</literal> was choosen by the query planner,
!     it calls back to its associated to the custom scan provider to complete 
!     setting up the <literal>CustomScan</literal> plan node according to the
!     path information.
  <programlisting>
  void
  InitCustomScanPlan(PlannerInfo *root,
***************
*** 160,180 ****
                     List *tlist,
                     List *scan_clauses);
  </programlisting>
!     Query planner does basic initialization on the <literal>cscan_plan</>
!     being allocated, then custom scan provider can apply final initialization.
!     <literal>cscan_path</> is the path node that was constructed on the
!     previous stage then got choosen.
      <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
      on the <literal>Plan</> portion in the <literal>cscan_plan</>.
      Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
!     be checked during relation scan. Its expression portion shall be also
      assigned on the <literal>Plan</> portion, but can be eliminated from
      this list if custom scan provider can handle these checks by itself.
     </para>
     <para>
      It often needs to adjust <literal>varno</> of <literal>Var</> node that
!     references a particular scan node, after conscruction of plan node.
!     For example, Var node in the target list of join node originally
      references a particular relation underlying a join, however, it has to
      be adjusted to either inner or outer reference.
  <programlisting>
--- 157,177 ----
                     List *tlist,
                     List *scan_clauses);
  </programlisting>
!     The query planner does basic initialization on the <literal>cscan_plan</>
!     being allocated, then the custom scan provider can apply final 
!     initialization. <literal>cscan_path</> is the path node that was 
!     constructed on the previous stage then was choosen.
      <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
      on the <literal>Plan</> portion in the <literal>cscan_plan</>.
      Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
!     be checked during a relation scan. Its expression portion will also be
      assigned on the <literal>Plan</> portion, but can be eliminated from
      this list if custom scan provider can handle these checks by itself.
     </para>
     <para>
      It often needs to adjust <literal>varno</> of <literal>Var</> node that
!     references a particular scan node, after construction of the plan node.
!     For example, Var node in the target list of the join node originally
      references a particular relation underlying a join, however, it has to
      be adjusted to either inner or outer reference.
  <programlisting>
***************
*** 183,191 ****
                       CustomScan *cscan_plan,
                       int rtoffset);
  </programlisting>
!     This callback is optional if custom scan node is a vanilla relation
!     scan because here is nothing special to do. Elsewhere, it needs to
!     be handled by custom scan provider in case when a custom scan replaced
      a join with two or more relations for example.
     </para>
    </sect2>
--- 180,188 ----
                       CustomScan *cscan_plan,
                       int rtoffset);
  </programlisting>
!     This callback is optional if the custom scan node is a vanilla relation
!     scan because there is nothing special to do. Elsewhere, it needs to
!     be handled by the custom scan provider in case when a custom scan replaced
      a join with two or more relations for example.
     </para>
    </sect2>
***************
*** 193,200 ****
    <sect2 id="custom-scan-exec">
     <title>Execution of custom scan node</title>
     <para>
!     Query execuror also launches associated callbacks to begin, execute and
!     end custom scan according to the executor's manner.
     </para>
     <para>
  <programlisting>
--- 190,197 ----
    <sect2 id="custom-scan-exec">
     <title>Execution of custom scan node</title>
     <para>
!     The query executor also launches the associated callbacks to begin, execute
!     and end the custom scan according to the executor's manner.
     </para>
     <para>
  <programlisting>
***************
*** 202,217 ****
  BeginCustomScan(CustomScanState *csstate, int eflags);
  </programlisting>
      It begins execution of the custom scan on starting up executor.
!     It allows custom scan provider to do any initialization job around this
!     plan, however, it is not a good idea to launch actual scanning jobs.
      (It shall be done on the first invocation of <literal>ExecCustomScan</>
      instead.)
      The <literal>custom_state</> field of <literal>CustomScanState</> is
!     intended to save the private state being managed by custom scan provider.
!     Also, <literal>eflags</> has flag bits of the executor's operating mode
!     for this plan node.
!     Note that custom scan provider should not perform anything visible
!     externally if <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
     </para>
  
     <para>
--- 199,214 ----
  BeginCustomScan(CustomScanState *csstate, int eflags);
  </programlisting>
      It begins execution of the custom scan on starting up executor.
!     It allows the custom scan provider to do any initialization job around this
!     plan, however, it is not a good idea to launch the actual scanning jobs.
      (It shall be done on the first invocation of <literal>ExecCustomScan</>
      instead.)
      The <literal>custom_state</> field of <literal>CustomScanState</> is
!     intended to save the private state being managed by the custom scan
!     provider. Also, <literal>eflags</> has flag bits of the executor's
!     operating mode for this plan node. Note that the custom scan provider
!     should not perform anything visible externally if 
!     <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
     </para>
  
     <para>
***************
*** 219,229 ****
  TupleTableSlot *
  ExecCustomScan(CustomScanState *csstate);
  </programlisting>
!     It fetches one tuple from the underlying relation or relations if join
      according to the custom logic. Unlike <literal>IterateForeignScan</>
!     method in foreign table, it is also responsible to check whether next
      tuple matches the qualifier of this scan, or not.
!     A usual way to implement this method is the callback performs just an
      entrypoint of <literal>ExecQual</> with its own access method.
     </para>
  
--- 216,226 ----
  TupleTableSlot *
  ExecCustomScan(CustomScanState *csstate);
  </programlisting>
!     It fetches one tuple from the underlying relation or relations, if joining,
      according to the custom logic. Unlike <literal>IterateForeignScan</>
!     method in foreign table, it is also responsible to check whether the next
      tuple matches the qualifier of this scan, or not.
!     The usual way to implement this method is the callback performs just an
      entrypoint of <literal>ExecQual</> with its own access method.
     </para>
  
***************
*** 232,240 ****
  Node *
  MultiExecCustomScan(CustomScanState *csstate);
  </programlisting>
!     It fetches multiple tuples from the underlying relation or relations if
!     join according to the custom logic. Pay attention the data format (and
!     the way to return also) depends on the type of upper node.
     </para>
  
     <para>
--- 229,237 ----
  Node *
  MultiExecCustomScan(CustomScanState *csstate);
  </programlisting>
!     It fetches multiple tuples from the underlying relation or relations, if
!     joining, according to the custom logic. Pay attention the data format (and
!     the way to return also) since it depends on the type of upper node.
     </para>
  
     <para>
***************
*** 242,248 ****
  void
  EndCustomScan(CustomScanState *csstate);
  </programlisting>
!     It ends the scan and release resources privately allocated.
      It is usually not important to release memory in per-execution memory
      context. So, all this callback should be responsible is its own
      resources regardless from the framework.
--- 239,245 ----
  void
  EndCustomScan(CustomScanState *csstate);
  </programlisting>
!     It ends the scan and releases resources privately allocated.
      It is usually not important to release memory in per-execution memory
      context. So, all this callback should be responsible is its own
      resources regardless from the framework.
***************
*** 257,263 ****
  ReScanCustomScan(CustomScanState *csstate);
  </programlisting>
      It restarts the current scan from the beginning.
!     Note that parameters of the scan depends on might change values,
      so rewinded scan does not need to return exactly identical tuples.
     </para>
     <para>
--- 254,260 ----
  ReScanCustomScan(CustomScanState *csstate);
  </programlisting>
      It restarts the current scan from the beginning.
!     Note that parameters of the scan depends on may change values,
      so rewinded scan does not need to return exactly identical tuples.
     </para>
     <para>
***************
*** 276,282 ****
  RestorePosCustom(CustomScanState *csstate);
  </programlisting>
      It rewinds the current position of the custom scan to the position
!     where <literal>MarkPosCustomScan</> saved before.
      Note that it is optional to implement, only when
      <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
     </para>
--- 273,279 ----
  RestorePosCustom(CustomScanState *csstate);
  </programlisting>
      It rewinds the current position of the custom scan to the position
!     where <literal>MarkPosCustomScan</> was saved before.
      Note that it is optional to implement, only when
      <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
     </para>

Peter Eisentraut

peter_e@gmx.net

about 12 years ago

In reply to: Kohei KaiGai (#5)

Re: Custom Scan APIs (Re: Custom Plan node)

contrib/ctidscan/ctidscan.c:44: indent with spaces.
contrib/ctidscan/ctidscan.c:250: indent with spaces.
contrib/ctidscan/ctidscan.c:266: trailing whitespace.
contrib/postgres_fdw/deparse.c:1044: indent with spaces.
contrib/postgres_fdw/postgres_fdw.c:940: indent with spaces.
src/backend/commands/explain.c:2097: indent with spaces.
src/backend/optimizer/plan/createplan.c:2097: trailing whitespace.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Shigeru Hanada

shigeru.hanada@gmail.com

about 12 years ago

In reply to: Kohei KaiGai (#5)

Re: Custom Scan APIs (Re: Custom Plan node)

2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>:

OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

These three patches can be applied with no conflict onto 2013-11-27
HEAD, but some fixes are necessary to build because commit
784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21)
allows FunctionScan node to have multiple function expression, so Node
* funcexpr in CustomScan should be List *funcitons now.

I'll continue to review by applying patches onto 2013-11-19 HEAD.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kohei KaiGai

kaigai@kaigai.gr.jp

about 12 years ago

In reply to: Shigeru Hanada (#8)

3 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Thanks for the series of checks.

The attached ones are the revised patches.

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

An actual code changes are to follow the changes in FunctionScan when
CustomScan replaces a FunctionScan. It puts a List * object instead of
a single expression tree, to have multiple functions.

Nothing were changed from the previous version.

Best regards,

2013/11/27 Shigeru Hanada <shigeru.hanada@gmail.com>:

2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>:

OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

These three patches can be applied with no conflict onto 2013-11-27
HEAD, but some fixes are necessary to build because commit
784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21)
allows FunctionScan node to have multiple function expression, so Node
* funcexpr in CustomScan should be List *funcitons now.

I'll continue to review by applying patches onto 2013-11-19 HEAD.

--
Shigeru HANADA

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.patchDownload

 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1101 +++++++++++++++++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 src/backend/nodes/bitmapset.c                  |   62 ++
 src/include/nodes/bitmapset.h                  |    4 +
 6 files changed, 1303 insertions(+), 171 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a2675eb..5af3dd7 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+							  true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 246a3a9..c654295 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -844,84 +822,59 @@ postgresGetForeignPlan(PlannerInfo *root,
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
  */
-static void
-postgresBeginForeignScan(ForeignScanState *node, int eflags)
+static PgFdwScanState *
+commonBeginForeignScan(PlanState *ps, TupleDesc tupdesc,
+					   Oid serverid, Oid userid,
+					   char *remote_query, List *retrieved_attrs,
+					   List *remote_exprs)
 {
-	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
-	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
-	Oid			userid;
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;
-	int			numParams;
-	int			i;
-	ListCell   *lc;
+	ForeignServer  *server;
+	UserMapping	   *user;
+	int				numParams;
+	int				i;
+	ListCell	   *lc;
 
-	/*
-	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
-	 */
-	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
-		return;
-
-	/*
-	 * We'll save private state in node->fdw_state.
-	 */
+	/* Allocation of private state */
 	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
-	node->fdw_state = (void *) fsstate;
-
-	/*
-	 * Identify which user to do the remote access as.	This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
+	fsstate->scan_tupdesc = tupdesc;
+	fsstate->query = remote_query;
+	fsstate->retrieved_attrs = retrieved_attrs;
 
 	/*
 	 * Get connection to the foreign server.  Connection manager will
-	 * establish new connection if necessary.
+	 * establish new connection on demand.
 	 */
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
 	fsstate->cursor_exists = false;
 
-	/* Get private info created by planner functions. */
-	fsstate->query = strVal(list_nth(fsplan->fdw_private,
-									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
-
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
-	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->batch_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											   "postgres_fdw tuple data",
 											   ALLOCSET_DEFAULT_MINSIZE,
 											   ALLOCSET_DEFAULT_INITSIZE,
 											   ALLOCSET_DEFAULT_MAXSIZE);
-	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->temp_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_MINSIZE,
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
 	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->scan_tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
-	numParams = list_length(fsplan->fdw_exprs);
+	numParams = list_length(remote_exprs);
 	fsstate->numParams = numParams;
 	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
 
 	i = 0;
-	foreach(lc, fsplan->fdw_exprs)
+	foreach(lc, remote_exprs)
 	{
 		Node	   *param_expr = (Node *) lfirst(lc);
 		Oid			typefnoid;
@@ -940,17 +893,62 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * benefit, and it'd require postgres_fdw to know more than is desirable
 	 * about Param evaluation.)
 	 */
-	fsstate->param_exprs = (List *)
-		ExecInitExpr((Expr *) fsplan->fdw_exprs,
-					 (PlanState *) node);
+	fsstate->param_exprs = (List *) ExecInitExpr((Expr *) remote_exprs, ps);
 
 	/*
 	 * Allocate buffer for text form of query parameters, if any.
 	 */
 	if (numParams > 0)
-		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+	return fsstate;
+}
+
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
+	PgFdwScanState *fsstate;
+	EState	   *estate = node->ss.ps.state;
+	Relation	rel;
+	char	   *remote_query;
+	List	   *retrieved_attrs;
+	RangeTblEntry *rte;
+	Oid			userid;
+	ForeignTable *table;
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Identify which user to do the remote access as.	This should match what
+	 * ExecCheckRTEPerms() does.
+	 */
+	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	/* Get info about foreign table. */
+	rel = node->ss.ss_currentRelation;
+	table = GetForeignTable(RelationGetRelid(rel));
+
+	/* Get private info created by planner functions. */
+	remote_query = strVal(list_nth(fsplan->fdw_private,
+								   FdwScanPrivateSelectSql));
+	retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
+
+	fsstate = commonBeginForeignScan(&node->ss.ps, RelationGetDescr(rel),
+									 table->serverid, userid,
+									 remote_query, retrieved_attrs,
+									 fsplan->fdw_exprs);
+	fsstate->rel = rel;
+
+	node->fdw_state = fsstate;
 }
 
 /*
@@ -959,17 +957,15 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
  *		EOF.
  */
 static TupleTableSlot *
-postgresIterateForeignScan(ForeignScanState *node)
+commonIterateForeignScan(PgFdwScanState *fsstate, PlanState *ps,
+						 TupleTableSlot *slot)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-
 	/*
 	 * If this is the first call after Begin or ReScan, we need to create the
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, ps->ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +974,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -995,14 +991,22 @@ postgresIterateForeignScan(ForeignScanState *node)
 	return slot;
 }
 
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
 /*
  * postgresReScanForeignScan
  *		Restart the scan.
  */
 static void
-postgresReScanForeignScan(ForeignScanState *node)
+commonReScanForeignScan(PgFdwScanState *fsstate, PlanState *ps)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	char		sql[64];
 	PGresult   *res;
 
@@ -1016,7 +1020,7 @@ postgresReScanForeignScan(ForeignScanState *node)
 	 * be good enough.	If we've only fetched zero or one batch, we needn't
 	 * even rewind the cursor, just rescan what we have.
 	 */
-	if (node->ss.ps.chgParam != NULL)
+	if (ps->chgParam != NULL)
 	{
 		fsstate->cursor_exists = false;
 		snprintf(sql, sizeof(sql), "CLOSE c%u",
@@ -1051,19 +1055,21 @@ postgresReScanForeignScan(ForeignScanState *node)
 	fsstate->eof_reached = false;
 }
 
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
 /*
  * postgresEndForeignScan
  *		Finish scanning foreign table and dispose objects used for this scan
  */
 static void
-postgresEndForeignScan(ForeignScanState *node)
+commonEndForeignScan(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-
-	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
-	if (fsstate == NULL)
-		return;
-
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -1075,6 +1081,18 @@ postgresEndForeignScan(ForeignScanState *node)
 	/* MemoryContexts will be deleted automatically. */
 }
 
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	commonEndForeignScan(fsstate);
+}
+
 /*
  * postgresAddForeignUpdateTargets
  *		Add resjunk column(s) needed for update/delete on a foreign table
@@ -1704,10 +1722,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1881,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1969,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1990,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +2009,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2235,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2529,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2552,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2585,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2671,820 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte;
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		if (rel->rtekind != RTE_RELATION ||
+			rel->fdwroutine == NULL ||
+			rel->fdwroutine->GetForeignRelSize != postgresGetForeignRelSize)
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		rte = planner_rt_fetch(rel->relid, root);
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * has_wholerow_reference
+ *
+ * It returns true, if supplied expression contains whole-row reference.
+ */
+static bool
+has_wholerow_reference(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return has_wholerow_reference((Node *)rinfo->clause, context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node, has_wholerow_reference, context);
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (has_wholerow_reference((Node *)joinrel->reltargetlist, NULL) ||
+		has_wholerow_reference((Node *)j_local_conds, NULL))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * Most of logic are equivalent to postgresBeginForeignScan, however,
+ * needs adjustment because of difference in the nature.
+ * The biggest one is, it has to open the underlying relation by itself
+ * and needs to construct tuple-descriptor from the var-list to be fetched,
+ * because custom-scan (in this case; a scan on remote join instead of
+ * local join) does not have a particular relation on its behaind, thus
+ * it needs to manage correctly.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *join_rels = NIL;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+	Oid				userid;
+	int				i;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		join_rels = lappend(join_rels, rel);
+	}
+
+	/*
+	 * Determine a user-id. Current user-id shall be applied without something
+	 * special configuration on the reference.
+	 */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+
+	/* common part to begin remote query execution */
+	fsstate = commonBeginForeignScan(&node->ss.ps, tupdesc,
+									 jinfo.fdw_server_oid, userid,
+									 jinfo.select_qry,
+									 retrieved_attrs,
+									 jinfo.select_params);
+	/* also, underlying relations also have to be saved */
+	fsstate->join_rels = join_rels;
+
+	node->custom_state = fsstate;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * Access method to fetch a tuple from the remote join query.
+ * It performs equivalent job as postgresIterateForeignScan() doing on
+ * queries to single relation.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of regular ExecScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * Nothing are different from postgresEndForeignScan, except for closing
+ * underlying relations by itself.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	/* cleanup resources used in common portion */
+	commonEndForeignScan(fsstate);
+
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * Same as postgresReScanForeignScan() doing.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c782d4f..27486b9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 540db16..44f2236 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a');
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A');
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 2a4b41d..73424f5 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */

pgsql-v9.4-custom-scan.part-2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 760 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 107 ++++
 doc/src/sgml/filelist.sgml                 |   1 +
 src/backend/optimizer/path/costsize.c      |   5 +-
 src/backend/optimizer/plan/setrefs.c       |   2 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 16 files changed, 1247 insertions(+), 9 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..703e5a5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..0c6e6c0
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,760 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the chepest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+	((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chainned with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrived tuple.
+	 */
+	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being choosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chainned with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be seeked later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we seeked.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seel to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index dd8e09e..4f23b74 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..60081f7
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,107 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>lo</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequencial scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be choosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries"> or
+   <xref linkend="guc-local-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1e96829..0dfbdcc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -105,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c6010d9..33bab08 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -130,9 +130,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -3201,7 +3198,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1af5469..630c8e7 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1081,7 +1081,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 78efaa5..b040334 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -159,15 +159,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 444ab740..a2873ec 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -145,6 +145,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index ba7ae7c..13cfba8 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index d5935b6..9645025 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..1ad0e7a
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test creanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..09c1bda
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test creanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5758b07..bd6fc3f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 78348f5..0e191a2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -91,6 +91,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

pgsql-v9.4-custom-scan.part-1.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.patchDownload

 doc/src/sgml/custom-scan.sgml           | 295 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  96 +++++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  34 +++-
 src/backend/executor/execProcnode.c     |  14 ++
 src/backend/executor/execQual.c         |  10 +-
 src/backend/executor/execUtils.c        |   4 +-
 src/backend/executor/nodeCustom.c       | 252 +++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  30 ++++
 src/backend/nodes/outfuncs.c            |  19 ++
 src/backend/nodes/print.c               |   4 +
 src/backend/optimizer/path/allpaths.c   |  23 +++
 src/backend/optimizer/path/costsize.c   |   2 +-
 src/backend/optimizer/path/joinpath.c   |  18 ++
 src/backend/optimizer/plan/createplan.c | 104 +++++++++++
 src/backend/optimizer/plan/setrefs.c    |  25 +++
 src/backend/optimizer/plan/subselect.c  |  10 ++
 src/backend/optimizer/util/pathnode.c   |  40 +++++
 src/backend/utils/adt/ruleutils.c       |  44 ++++-
 src/include/executor/executor.h         |   3 +-
 src/include/executor/nodeCustom.h       |  94 ++++++++++
 src/include/nodes/execnodes.h           |  17 ++
 src/include/nodes/nodes.h               |   3 +
 src/include/nodes/plannodes.h           |  16 ++
 src/include/nodes/primnodes.h           |   1 +
 src/include/nodes/relation.h            |  16 ++
 src/include/optimizer/pathnode.h        |  10 ++
 src/include/optimizer/paths.h           |  25 +++
 30 files changed, 1198 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..b57d82f
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,295 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-scan API enables an extension to provide alternative ways to scan
+  or join relations leveraging the cost based optimizer. The API consists of a
+  set of callbacks, with a unique names, to be invoked during query planning 
+  and execution. A custom-scan provider should implement these callback 
+  functions according to the expectation of the API.
+ </para>
+ <para>
+  Overall, there are four major tasks that a custom-scan provider should 
+  implement. The first task is the registration of custom-scan provider itself.
+  Usually, this needs to be done once at the <literal>_PG_init()</literal> 
+  entrypoint when the module is loading. The remaing three tasks are all done
+  when a query is planning and executing. The second task is the submission of
+  candidate paths to either scan or join relations with an adequate cost for
+  the core planner. Then, the planner will choose the cheapest path from all of
+  the candidates. If the custom path survived, the planner starts the third 
+  task; construction of a <literal>CustomScan</literal> plan node, located
+  within the query plan tree instead of the built-in plan node. The last task
+  is the execution of its implementation in answer to invocations by the core
+  executor.
+ </para>
+ <para>
+  Some of contrib modules utilize the custom-scan API. They may provide a good
+  example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan to skip earlier pages or
+      terminate prior to end of the relation, if the inequality operator on the
+      <literal>ctid</literal> system column can narrow down the scope to be
+      scanned, instead of a sequential scan which reads a relation from the
+      head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      This custom scan in this module replaces a local join of foreign tables
+      managed by <literal>postgres_fdw</literal> with a scan that fetches
+      remotely joined relations. It demostrates the way to implement a custom
+      scan node that performs join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Currently, only scan and join are fully supported with integrated cost
+  based query optimization using the custom scan API. You might be able to
+  implement other stuff, like sort or aggregation, with manipulation of the
+  planned tree, however, the extension has to be responsible to handle this
+  replacement correctly. There is no support in the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first task for a custom scan provider is the registration of a set of
+    callbacks with a unique names. Usually, this is done once upon module
+    loading in the <literal>_PG_init()</literal> entrypoint.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds the best way to scan or join relations from various
+    potential paths using a combination of scan algorithms and target 
+    relations. Prior to this selection, we list all of the potential paths
+    towards a target relation (if it is a base relation) or a pair of relations
+    (if it is a join). The <literal>add_scan_path_hook</> and
+    <literal>add_join_path_hook</> allow extensions to add alternative scan
+    paths in addition to built-in paths.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct a <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is a common field for all the path nodes to store
+    a cost estimation. In addition, <literal>custom_name</> is the name of
+    the registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, the custom scan provider has to be responsible to mark and
+        restore a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scans.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> was choosen by the query planner,
+    it calls back to its associated to the custom scan provider to complete 
+    setting up the <literal>CustomScan</literal> plan node according to the
+    path information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    The query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then the custom scan provider can apply final 
+    initialization. <literal>cscan_path</> is the path node that was 
+    constructed on the previous stage then was choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during a relation scan. Its expression portion will also be
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after construction of the plan node.
+    For example, Var node in the target list of the join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if the custom scan node is a vanilla relation
+    scan because there is nothing special to do. Elsewhere, it needs to
+    be handled by the custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    The query executor also launches the associated callbacks to begin, execute
+    and end the custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows the custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch the actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by the custom scan
+    provider. Also, <literal>eflags</> has flag bits of the executor's
+    operating mode for this plan node. Note that the custom scan provider
+    should not perform anything visible externally if 
+    <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations, if joining,
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether the next
+    tuple matches the qualifier of this scan, or not.
+    The usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations, if
+    joining, according to the custom logic. Pay attention the data format (and
+    the way to return also) since it depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and releases resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> was saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d1b7dc6..1e96829 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 522316c..cce0cd8 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index bd5428d..ac7fc68 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -683,6 +685,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -809,6 +816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -897,6 +905,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+			operation = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1013,6 +1028,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1303,6 +1322,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->functions != NIL && es->verbose)
+			{
+				List	   *fexprs = NIL;
+				ListCell   *lc;
+
+				foreach(lc, ((CustomScan *) plan)->functions)
+				{
+					RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
+
+					fexprs = lappend(fexprs, rtfunc->funcexpr);
+				}
+				/* We rely on show_expression to insert commas as needed */
+				show_expression((Node *) fexprs,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			}
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1870,6 +1912,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2042,6 +2097,47 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				List	   *functions = ((CustomScan *) plan)->functions;
+
+				if (functions && list_length(functions) == 1)
+				{
+					RangeTblFunction *rtfunc = linitial(functions);
+
+					if (IsA(rtfunc->funcexpr, FuncExpr))
+					{
+						FuncExpr   *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+						Oid			funcid = funcexpr->funcid;
+
+						objectname = get_func_name(funcid);
+						if (es->verbose)
+							namespace =
+								get_namespace_name(get_func_namespace(funcid));
+					}
+				}
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..f80e6c4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 76dd62f..b1110b9 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..e60ac67 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..df0d295 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e3edcf6..e21982f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(functions);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3951,6 +3978,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4c7505e..00c7466 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -564,6 +564,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(functions);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2824,6 +2840,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 48ef325..29fcba9 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 96fe50f..ebc0b28 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -400,6 +402,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -428,6 +433,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1247,6 +1255,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1318,6 +1329,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1341,6 +1355,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1410,6 +1427,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1463,6 +1483,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 50f0852..c6010d9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2312,7 +2312,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5b477e5..9483614 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2c122d..a545af0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -233,6 +237,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -409,6 +414,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2009,6 +2021,98 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			List   *functions = rte->functions;
+
+			if (best_path->path.param_info)
+				functions = (List *)
+					replace_nestloop_params(root, (Node *)functions);
+			scan_plan->functions = functions;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information.
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5c9f3d6..1af5469 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -575,6 +576,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index d8cabbd..3a19aac 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2222,6 +2222,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a7169ef..32e8b59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 29a1027..722268d 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -143,6 +143,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2363,14 +2364,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3628,6 +3634,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5295,6 +5309,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5565,6 +5591,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 75841c8..51537d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5a40347..f315b8f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1496,6 +1496,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ff9af76..adc5123 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 101e22c..58575b9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -479,6 +479,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	List	   *functions;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7918537..b71c7ca 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6d7b594..50194f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -876,6 +876,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 0033a3c..8fbdb66 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 999adaa..09406f4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,

#10

Shigeru Hanada

shigeru.hanada@gmail.com

about 12 years ago

In reply to: Kohei KaiGai (#9)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi KaiGai-san,

2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached ones are the revised patches.

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

An actual code changes are to follow the changes in FunctionScan when
CustomScan replaces a FunctionScan. It puts a List * object instead of
a single expression tree, to have multiple functions.

Nothing were changed from the previous version.

I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:

(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.

(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.

(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?

(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Shigeru Hanada

shigeru.hanada@gmail.com

about 12 years ago

In reply to: Kohei KaiGai (#9)

1 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

I found some typos in documents and comments. Please see attached
patch for detail.

--
Shigeru HANADA

Attachments:

fix_typo.patchapplication/octet-stream; name=fix_typo.patchDownload

diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
index 0c6e6c0..72bbf17 100644
--- a/contrib/ctidscan/ctidscan.c
+++ b/contrib/ctidscan/ctidscan.c
@@ -8,7 +8,7 @@
  * to fetch records with tid larger or less than a particular value.
  * In case when inequality operators were given, this module construct
  * a custom scan path that enables to skip records not to be read. Then,
- * if it was the chepest one, it shall be used to run the query.
+ * if it was the cheapest one, it shall be used to run the query.
  * Custom Scan APIs callbacks this extension when executor tries to fetch
  * underlying records, then it utilizes existing heap_getnext() but seek
  * the records to be read prior to fetching the first record.
@@ -53,7 +53,7 @@ static add_scan_path_hook_type	add_scan_path_next;
  * It checks whether the given restriction clauses enables to determine
  * the zone to be scanned, or not. If one or more restriction clauses are
  * available, it returns a list of them, or NIL elsewhere.
- * The caller can consider all the conditions are chainned with AND-
+ * The caller can consider all the conditions are chained with AND-
  * boolean operator, so all the operator works for narrowing down the
  * scope of custom tid scan.
  */
@@ -245,7 +245,7 @@ CTidEstimateCosts(PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrived tuple.
+	 * quals once per retrieved tuple.
 	 */
 	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
 
@@ -350,7 +350,7 @@ CTidAddScanPath(PlannerInfo *root,
  * CTidInitCustomScanPlan
  *
  * It initializes the given CustomScan plan object according to the CustomPath
- * being choosen by the optimizer.
+ * being chosen by the optimizer.
  */
 static void
 CTidInitCustomScanPlan(PlannerInfo *root,
@@ -491,7 +491,7 @@ CTidEvalScanZone(CustomScanState *node)
 		else
 		{
 			/*
-			 * Whole of the restriction clauses chainned with AND- boolean
+			 * Whole of the restriction clauses chained with AND- boolean
 			 * operators because false, if one of the clauses has NULL result.
 			 * So, we can immediately break the evaluation to inform caller
 			 * it does not make sense to scan any more.
@@ -519,7 +519,7 @@ CTidBeginCustomScan(CustomScanState *node, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return;
 
-	/* Begin sequential scan, but pointer shall be seeked later */
+	/* Begin sequential scan, but pointer shall be sought later */
 	node->ss.ss_currentScanDesc
 		= heap_beginscan(node->ss.ss_currentRelation,
 						 estate->es_snapshot, 0, NULL);
@@ -538,7 +538,7 @@ CTidBeginCustomScan(CustomScanState *node, int eflags)
  * CTidSeekPosition
  *
  * It seeks current scan position into a particular point we specified.
- * Next heap_getnext() will fetch a record from the point we seeked.
+ * Next heap_getnext() will fetch a record from the point we sought.
  * It returns false, if specified position was out of range thus does not
  * make sense to scan any mode. Elsewhere, true shall be return.
  */
@@ -635,7 +635,7 @@ CTidAccessCustomScan(CustomScanState *node)
 		}
 		else if (direction == BackwardScanDirection)
 		{
-			/* seel to the point if max-tid was obvious */
+			/* seek to the point if max-tid was obvious */
 			if (ctss->ip_max_comp != 1)
 			{
 				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
index 60081f7..e4afaa7 100644
--- a/doc/src/sgml/ctidscan.sgml
+++ b/doc/src/sgml/ctidscan.sgml
@@ -1,7 +1,7 @@
 <!-- doc/src/sgml/ctidscan.sgml -->
 
 <sect1 id="ctidscan" xreflabel="ctidscan">
- <title>lo</title>
+ <title>ctidscan</title>
 
  <indexterm zone="ctidscan">
   <primary>ctidscan</primary>
@@ -54,7 +54,7 @@ postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
   <para>
    On the other hands, an alternative scan path implemented with
    <filename>ctidscan</> provides more efficient way; that skips the first
-   100 pages prior to sequencial scan, as follows.
+   100 pages prior to sequential scan, as follows.
 <programlisting>
 postgres=# load 'ctidscan';
 LOAD
@@ -71,7 +71,7 @@ postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
    smaller number of tuples to be processed.
   </para>
   <para>
-   Of course, it shall not be choosen if we have more cheaper path than the
+   Of course, it shall not be chosen if we have more cheaper path than the
    above custom-scan path. Index-scan based on equality operation is usually
    cheaper than this custom-scan, so optimizer adopts it instead of sequential
    scan or custom scan provided by <filename>ctidscan</> for instance.
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..f53902d 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -18,7 +18,7 @@
   Overall, there are four major tasks that a custom-scan provider should 
   implement. The first task is the registration of custom-scan provider itself.
   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
-  entrypoint when the module is loading. The remaing three tasks are all done
+  entrypoint when the module is loading. The reaming three tasks are all done
   when a query is planning and executing. The second task is the submission of
   candidate paths to either scan or join relations with an adequate cost for
   the core planner. Then, the planner will choose the cheapest path from all of
@@ -50,7 +50,7 @@
      <para>
       This custom scan in this module replaces a local join of foreign tables
       managed by <literal>postgres_fdw</literal> with a scan that fetches
-      remotely joined relations. It demostrates the way to implement a custom
+      remotely joined relations. It demonstrates the way to implement a custom
       scan node that performs join nodes.
      </para>
     </listitem>
@@ -145,7 +145,7 @@ typedef struct CustomPath
   <sect2 id="custom-scan-plan">
    <title>Construction of custom plan node</title>
    <para>
-    Once <literal>CustomPath</literal> was choosen by the query planner,
+    Once <literal>CustomPath</literal> was chosen by the query planner,
     it calls back to its associated to the custom scan provider to complete 
     setting up the <literal>CustomScan</literal> plan node according to the
     path information.
@@ -160,7 +160,7 @@ InitCustomScanPlan(PlannerInfo *root,
     The query planner does basic initialization on the <literal>cscan_plan</>
     being allocated, then the custom scan provider can apply final 
     initialization. <literal>cscan_path</> is the path node that was 
-    constructed on the previous stage then was choosen.
+    constructed on the previous stage then was chosen.
     <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
     on the <literal>Plan</> portion in the <literal>cscan_plan</>.
     Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 33bab08..e55b16e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -974,7 +974,7 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrived tuple.
+	 * quals once per retrieved tuple.
 	 */
 	cost_qual_eval(&tid_qual_cost, tidquals, root);
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
index 1ad0e7a..a5a205d 100644
--- a/src/test/regress/input/custom_scan.source
+++ b/src/test/regress/input/custom_scan.source
@@ -45,5 +45,5 @@ SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
 SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
 SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
 
--- Test creanup
+-- Test cleanup
 DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
index 09c1bda..fc13e9f 100644
--- a/src/test/regress/output/custom_scan.source
+++ b/src/test/regress/output/custom_scan.source
@@ -283,7 +283,7 @@ SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::
  (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
 (15 rows)
 
--- Test creanup
+-- Test cleanup
 DROP SCHEMA regtest_custom_scan CASCADE;
 NOTICE:  drop cascades to 2 other objects
 DETAIL:  drop cascades to table t1

#12

Kohei KaiGai

kaigai@kaigai.gr.jp

about 12 years ago

In reply to: Shigeru Hanada (#10)

Re: Custom Scan APIs (Re: Custom Plan node)

Hanada-san,

Thanks for your reviewing,

2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:

I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:

(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.

Almost yes. For more correctness, these hooks allows extensions to
plug paths they can provide into a particular join or scan. Then planner
will choose the cheapest one according to the cost value.

(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.

Probably, FDW has different role from the CustomScan API.
As literal, FDW performs as a bridge between a relation form and
an opaque external data source, to intermediate two different world
on behalf of a foreign table.
On the other hand, CustomScan allows to provide alternative logic
to scan or join particular relations, in addition to the built-in ones,
but does not perform on behalf of foreign tables.

Existing FDW is designed to implement a scan on an intangible
relation, thus it can assume some things; like a tuple returned
from FDW has equivalent TupleDesc as table definition, or it can
always use ExecScan() for all the cases.
So, I don't think these two frameworks should be consolidated
because it makes confusion on the existing extensions that
assumes FDW callbacks always has a particular foreign table
definition.

(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?

Probably, src/backend/foreign/foreign.c is a good host for them.

(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.

I'm uncertain whether the cost value from remote EXPLAIN represents
right difficulty on the local side, because it indeed represents the
difficulty to join two relations on the remote side, however, does not
represents local job; that just fetches tuples from the result set of
remote query with table joining.
How about your opinion? Is the remote cost estimation value comparable
with local value?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Kohei KaiGai

kaigai@kaigai.gr.jp

about 12 years ago

In reply to: Shigeru Hanada (#11)

Re: Custom Scan APIs (Re: Custom Plan node)

Thanks for fixing many my carelessness.
I didn't know "seek" was an irregular verb...

Best regards,

2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:

2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

I found some typos in documents and comments. Please see attached
patch for detail.

--
Shigeru HANADA

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Kohei KaiGai

kaigai@kaigai.gr.jp

about 12 years ago

In reply to: Kohei KaiGai (#12)

3 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.

Please check it. Thanks,

2013/12/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

Hanada-san,

Thanks for your reviewing,

2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:

I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:

(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.

Almost yes. For more correctness, these hooks allows extensions to
plug paths they can provide into a particular join or scan. Then planner
will choose the cheapest one according to the cost value.

(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.

Probably, FDW has different role from the CustomScan API.
As literal, FDW performs as a bridge between a relation form and
an opaque external data source, to intermediate two different world
on behalf of a foreign table.
On the other hand, CustomScan allows to provide alternative logic
to scan or join particular relations, in addition to the built-in ones,
but does not perform on behalf of foreign tables.

Existing FDW is designed to implement a scan on an intangible
relation, thus it can assume some things; like a tuple returned
from FDW has equivalent TupleDesc as table definition, or it can
always use ExecScan() for all the cases.
So, I don't think these two frameworks should be consolidated
because it makes confusion on the existing extensions that
assumes FDW callbacks always has a particular foreign table
definition.

(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?

Probably, src/backend/foreign/foreign.c is a good host for them.

(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.

I'm uncertain whether the cost value from remote EXPLAIN represents
right difficulty on the local side, because it indeed represents the
difficulty to join two relations on the remote side, however, does not
represents local job; that just fetches tuples from the result set of
remote query with table joining.
How about your opinion? Is the remote cost estimation value comparable
with local value?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-3.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v2.patchDownload

 contrib/ctidscan/ctidscan.c                    |   16 +-
 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1075 +++++++++++++++++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 doc/src/sgml/ctidscan.sgml                     |    6 +-
 doc/src/sgml/custom-scan.sgml                  |    8 +-
 src/backend/foreign/foreign.c                  |   28 +
 src/backend/nodes/bitmapset.c                  |   62 ++
 src/backend/optimizer/path/costsize.c          |    2 +-
 src/backend/optimizer/util/var.c               |   39 +
 src/include/foreign/foreign.h                  |    4 +
 src/include/nodes/bitmapset.h                  |    4 +
 src/include/optimizer/var.h                    |    1 +
 src/test/regress/input/custom_scan.source      |    2 +-
 src/test/regress/output/custom_scan.source     |    2 +-
 16 files changed, 1367 insertions(+), 189 deletions(-)

diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
index 72bbf17..0c6e6c0 100644
--- a/contrib/ctidscan/ctidscan.c
+++ b/contrib/ctidscan/ctidscan.c
@@ -8,7 +8,7 @@
  * to fetch records with tid larger or less than a particular value.
  * In case when inequality operators were given, this module construct
  * a custom scan path that enables to skip records not to be read. Then,
- * if it was the cheapest one, it shall be used to run the query.
+ * if it was the chepest one, it shall be used to run the query.
  * Custom Scan APIs callbacks this extension when executor tries to fetch
  * underlying records, then it utilizes existing heap_getnext() but seek
  * the records to be read prior to fetching the first record.
@@ -53,7 +53,7 @@ static add_scan_path_hook_type	add_scan_path_next;
  * It checks whether the given restriction clauses enables to determine
  * the zone to be scanned, or not. If one or more restriction clauses are
  * available, it returns a list of them, or NIL elsewhere.
- * The caller can consider all the conditions are chained with AND-
+ * The caller can consider all the conditions are chainned with AND-
  * boolean operator, so all the operator works for narrowing down the
  * scope of custom tid scan.
  */
@@ -245,7 +245,7 @@ CTidEstimateCosts(PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrieved tuple.
+	 * quals once per retrived tuple.
 	 */
 	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
 
@@ -350,7 +350,7 @@ CTidAddScanPath(PlannerInfo *root,
  * CTidInitCustomScanPlan
  *
  * It initializes the given CustomScan plan object according to the CustomPath
- * being chosen by the optimizer.
+ * being choosen by the optimizer.
  */
 static void
 CTidInitCustomScanPlan(PlannerInfo *root,
@@ -491,7 +491,7 @@ CTidEvalScanZone(CustomScanState *node)
 		else
 		{
 			/*
-			 * Whole of the restriction clauses chained with AND- boolean
+			 * Whole of the restriction clauses chainned with AND- boolean
 			 * operators because false, if one of the clauses has NULL result.
 			 * So, we can immediately break the evaluation to inform caller
 			 * it does not make sense to scan any more.
@@ -519,7 +519,7 @@ CTidBeginCustomScan(CustomScanState *node, int eflags)
 	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
 		return;
 
-	/* Begin sequential scan, but pointer shall be sought later */
+	/* Begin sequential scan, but pointer shall be seeked later */
 	node->ss.ss_currentScanDesc
 		= heap_beginscan(node->ss.ss_currentRelation,
 						 estate->es_snapshot, 0, NULL);
@@ -538,7 +538,7 @@ CTidBeginCustomScan(CustomScanState *node, int eflags)
  * CTidSeekPosition
  *
  * It seeks current scan position into a particular point we specified.
- * Next heap_getnext() will fetch a record from the point we sought.
+ * Next heap_getnext() will fetch a record from the point we seeked.
  * It returns false, if specified position was out of range thus does not
  * make sense to scan any mode. Elsewhere, true shall be return.
  */
@@ -635,7 +635,7 @@ CTidAccessCustomScan(CustomScanState *node)
 		}
 		else if (direction == BackwardScanDirection)
 		{
-			/* seek to the point if max-tid was obvious */
+			/* seel to the point if max-tid was obvious */
 			if (ctss->ip_max_comp != 1)
 			{
 				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a2675eb..5af3dd7 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+							  true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 246a3a9..6786b89 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -844,84 +822,59 @@ postgresGetForeignPlan(PlannerInfo *root,
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
  */
-static void
-postgresBeginForeignScan(ForeignScanState *node, int eflags)
+static PgFdwScanState *
+commonBeginForeignScan(PlanState *ps, TupleDesc tupdesc,
+					   Oid serverid, Oid userid,
+					   char *remote_query, List *retrieved_attrs,
+					   List *remote_exprs)
 {
-	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
-	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
-	Oid			userid;
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;
-	int			numParams;
-	int			i;
-	ListCell   *lc;
+	ForeignServer  *server;
+	UserMapping	   *user;
+	int				numParams;
+	int				i;
+	ListCell	   *lc;
 
-	/*
-	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
-	 */
-	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
-		return;
-
-	/*
-	 * We'll save private state in node->fdw_state.
-	 */
+	/* Allocation of private state */
 	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
-	node->fdw_state = (void *) fsstate;
-
-	/*
-	 * Identify which user to do the remote access as.	This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
+	fsstate->scan_tupdesc = tupdesc;
+	fsstate->query = remote_query;
+	fsstate->retrieved_attrs = retrieved_attrs;
 
 	/*
 	 * Get connection to the foreign server.  Connection manager will
-	 * establish new connection if necessary.
+	 * establish new connection on demand.
 	 */
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
 	fsstate->cursor_exists = false;
 
-	/* Get private info created by planner functions. */
-	fsstate->query = strVal(list_nth(fsplan->fdw_private,
-									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
-
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
-	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->batch_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											   "postgres_fdw tuple data",
 											   ALLOCSET_DEFAULT_MINSIZE,
 											   ALLOCSET_DEFAULT_INITSIZE,
 											   ALLOCSET_DEFAULT_MAXSIZE);
-	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->temp_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_MINSIZE,
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
 	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->scan_tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
-	numParams = list_length(fsplan->fdw_exprs);
+	numParams = list_length(remote_exprs);
 	fsstate->numParams = numParams;
 	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
 
 	i = 0;
-	foreach(lc, fsplan->fdw_exprs)
+	foreach(lc, remote_exprs)
 	{
 		Node	   *param_expr = (Node *) lfirst(lc);
 		Oid			typefnoid;
@@ -940,17 +893,62 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * benefit, and it'd require postgres_fdw to know more than is desirable
 	 * about Param evaluation.)
 	 */
-	fsstate->param_exprs = (List *)
-		ExecInitExpr((Expr *) fsplan->fdw_exprs,
-					 (PlanState *) node);
+	fsstate->param_exprs = (List *) ExecInitExpr((Expr *) remote_exprs, ps);
 
 	/*
 	 * Allocate buffer for text form of query parameters, if any.
 	 */
 	if (numParams > 0)
-		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+	return fsstate;
+}
+
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
+	PgFdwScanState *fsstate;
+	EState	   *estate = node->ss.ps.state;
+	Relation	rel;
+	char	   *remote_query;
+	List	   *retrieved_attrs;
+	RangeTblEntry *rte;
+	Oid			userid;
+	ForeignTable *table;
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Identify which user to do the remote access as.	This should match what
+	 * ExecCheckRTEPerms() does.
+	 */
+	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	/* Get info about foreign table. */
+	rel = node->ss.ss_currentRelation;
+	table = GetForeignTable(RelationGetRelid(rel));
+
+	/* Get private info created by planner functions. */
+	remote_query = strVal(list_nth(fsplan->fdw_private,
+								   FdwScanPrivateSelectSql));
+	retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
+
+	fsstate = commonBeginForeignScan(&node->ss.ps, RelationGetDescr(rel),
+									 table->serverid, userid,
+									 remote_query, retrieved_attrs,
+									 fsplan->fdw_exprs);
+	fsstate->rel = rel;
+
+	node->fdw_state = fsstate;
 }
 
 /*
@@ -959,17 +957,15 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
  *		EOF.
  */
 static TupleTableSlot *
-postgresIterateForeignScan(ForeignScanState *node)
+commonIterateForeignScan(PgFdwScanState *fsstate, PlanState *ps,
+						 TupleTableSlot *slot)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-
 	/*
 	 * If this is the first call after Begin or ReScan, we need to create the
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, ps->ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +974,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -995,14 +991,22 @@ postgresIterateForeignScan(ForeignScanState *node)
 	return slot;
 }
 
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
 /*
  * postgresReScanForeignScan
  *		Restart the scan.
  */
 static void
-postgresReScanForeignScan(ForeignScanState *node)
+commonReScanForeignScan(PgFdwScanState *fsstate, PlanState *ps)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	char		sql[64];
 	PGresult   *res;
 
@@ -1016,7 +1020,7 @@ postgresReScanForeignScan(ForeignScanState *node)
 	 * be good enough.	If we've only fetched zero or one batch, we needn't
 	 * even rewind the cursor, just rescan what we have.
 	 */
-	if (node->ss.ps.chgParam != NULL)
+	if (ps->chgParam != NULL)
 	{
 		fsstate->cursor_exists = false;
 		snprintf(sql, sizeof(sql), "CLOSE c%u",
@@ -1051,19 +1055,21 @@ postgresReScanForeignScan(ForeignScanState *node)
 	fsstate->eof_reached = false;
 }
 
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
 /*
  * postgresEndForeignScan
  *		Finish scanning foreign table and dispose objects used for this scan
  */
 static void
-postgresEndForeignScan(ForeignScanState *node)
+commonEndForeignScan(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-
-	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
-	if (fsstate == NULL)
-		return;
-
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -1075,6 +1081,18 @@ postgresEndForeignScan(ForeignScanState *node)
 	/* MemoryContexts will be deleted automatically. */
 }
 
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	commonEndForeignScan(fsstate);
+}
+
 /*
  * postgresAddForeignUpdateTargets
  *		Add resjunk column(s) needed for update/delete on a foreign table
@@ -1704,10 +1722,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1881,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1969,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1990,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +2009,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2235,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2529,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2552,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2585,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2671,794 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		FdwRoutine			pgroutine;
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte = planner_rt_fetch(rel->relid, root);
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		memset(&pgroutine, 0, sizeof(FdwRoutine));
+		pgroutine.GetForeignRelSize = postgresGetForeignRelSize;
+
+		if (!is_fdw_managed_relation(rte->relid, &pgroutine))
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (contain_wholerow_reference((Node *)joinrel->reltargetlist) ||
+		contain_wholerow_reference((Node *)j_local_conds))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * Most of logic are equivalent to postgresBeginForeignScan, however,
+ * needs adjustment because of difference in the nature.
+ * The biggest one is, it has to open the underlying relation by itself
+ * and needs to construct tuple-descriptor from the var-list to be fetched,
+ * because custom-scan (in this case; a scan on remote join instead of
+ * local join) does not have a particular relation on its behaind, thus
+ * it needs to manage correctly.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *join_rels = NIL;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+	Oid				userid;
+	int				i;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		join_rels = lappend(join_rels, rel);
+	}
+
+	/*
+	 * Determine a user-id. Current user-id shall be applied without something
+	 * special configuration on the reference.
+	 */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+
+	/* common part to begin remote query execution */
+	fsstate = commonBeginForeignScan(&node->ss.ps, tupdesc,
+									 jinfo.fdw_server_oid, userid,
+									 jinfo.select_qry,
+									 retrieved_attrs,
+									 jinfo.select_params);
+	/* also, underlying relations also have to be saved */
+	fsstate->join_rels = join_rels;
+
+	node->custom_state = fsstate;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * Access method to fetch a tuple from the remote join query.
+ * It performs equivalent job as postgresIterateForeignScan() doing on
+ * queries to single relation.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of regular ExecScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * Nothing are different from postgresEndForeignScan, except for closing
+ * underlying relations by itself.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	/* cleanup resources used in common portion */
+	commonEndForeignScan(fsstate);
+
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * Same as postgresReScanForeignScan() doing.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c782d4f..27486b9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
index e4afaa7..60081f7 100644
--- a/doc/src/sgml/ctidscan.sgml
+++ b/doc/src/sgml/ctidscan.sgml
@@ -1,7 +1,7 @@
 <!-- doc/src/sgml/ctidscan.sgml -->
 
 <sect1 id="ctidscan" xreflabel="ctidscan">
- <title>ctidscan</title>
+ <title>lo</title>
 
  <indexterm zone="ctidscan">
   <primary>ctidscan</primary>
@@ -54,7 +54,7 @@ postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
   <para>
    On the other hands, an alternative scan path implemented with
    <filename>ctidscan</> provides more efficient way; that skips the first
-   100 pages prior to sequential scan, as follows.
+   100 pages prior to sequencial scan, as follows.
 <programlisting>
 postgres=# load 'ctidscan';
 LOAD
@@ -71,7 +71,7 @@ postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
    smaller number of tuples to be processed.
   </para>
   <para>
-   Of course, it shall not be chosen if we have more cheaper path than the
+   Of course, it shall not be choosen if we have more cheaper path than the
    above custom-scan path. Index-scan based on equality operation is usually
    cheaper than this custom-scan, so optimizer adopts it instead of sequential
    scan or custom scan provided by <filename>ctidscan</> for instance.
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index f53902d..b57d82f 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -18,7 +18,7 @@
   Overall, there are four major tasks that a custom-scan provider should 
   implement. The first task is the registration of custom-scan provider itself.
   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
-  entrypoint when the module is loading. The reaming three tasks are all done
+  entrypoint when the module is loading. The remaing three tasks are all done
   when a query is planning and executing. The second task is the submission of
   candidate paths to either scan or join relations with an adequate cost for
   the core planner. Then, the planner will choose the cheapest path from all of
@@ -50,7 +50,7 @@
      <para>
       This custom scan in this module replaces a local join of foreign tables
       managed by <literal>postgres_fdw</literal> with a scan that fetches
-      remotely joined relations. It demonstrates the way to implement a custom
+      remotely joined relations. It demostrates the way to implement a custom
       scan node that performs join nodes.
      </para>
     </listitem>
@@ -145,7 +145,7 @@ typedef struct CustomPath
   <sect2 id="custom-scan-plan">
    <title>Construction of custom plan node</title>
    <para>
-    Once <literal>CustomPath</literal> was chosen by the query planner,
+    Once <literal>CustomPath</literal> was choosen by the query planner,
     it calls back to its associated to the custom scan provider to complete 
     setting up the <literal>CustomScan</literal> plan node according to the
     path information.
@@ -160,7 +160,7 @@ InitCustomScanPlan(PlannerInfo *root,
     The query planner does basic initialization on the <literal>cscan_plan</>
     being allocated, then the custom scan provider can apply final 
     initialization. <literal>cscan_path</> is the path node that was 
-    constructed on the previous stage then was chosen.
+    constructed on the previous stage then was choosen.
     <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
     on the <literal>Plan</> portion in the <literal>cscan_plan</>.
     Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 2b75f73..99819e9 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -621,3 +621,31 @@ get_foreign_server_oid(const char *servername, bool missing_ok)
 				 errmsg("server \"%s\" does not exist", servername)));
 	return oid;
 }
+
+/*
+ * is_fdw_managed_relation
+ *
+ * It checks whether the supplied relation is a foreign table managed
+ * by the module that has FdwRoutine, or not.
+ */
+bool
+is_fdw_managed_relation(Oid tableoid, const FdwRoutine *routines_self)
+{
+	FdwRoutine *routines;
+	char		relkind = get_rel_relkind(tableoid);
+
+	if (relkind == RELKIND_FOREIGN_TABLE)
+	{
+		routines = GetFdwRoutineByRelId(tableoid);
+
+		/*
+		 * Our assumption is a particular callback being implemented by
+		 * a particular extension shall not be shared with other extension.
+		 * So, we don't need to compare all the function pointers in the
+		 * FdwRoutine, but only one member.
+		 */
+		if (routines->GetForeignRelSize == routines_self->GetForeignRelSize)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 540db16..44f2236 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a');
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A');
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e55b16e..33bab08 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -974,7 +974,7 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrieved tuple.
+	 * quals once per retrived tuple.
 	 */
 	cost_qual_eval(&tid_qual_cost, tidquals, root);
 
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 4a3d5c8..6e899e8 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -73,6 +73,7 @@ static bool pull_varattnos_walker(Node *node, pull_varattnos_context *context);
 static bool pull_vars_walker(Node *node, pull_vars_context *context);
 static bool contain_var_clause_walker(Node *node, void *context);
 static bool contain_vars_of_level_walker(Node *node, int *sublevels_up);
+static bool contain_wholerow_reference_walker(Node *node, void *context);
 static bool locate_var_of_level_walker(Node *node,
 						   locate_var_of_level_context *context);
 static bool pull_var_clause_walker(Node *node,
@@ -418,6 +419,44 @@ contain_vars_of_level_walker(Node *node, int *sublevels_up)
 								  (void *) sublevels_up);
 }
 
+/*
+ * contain_wholerow_reference
+ *
+ *    Recursively scan a clause to discover whether it contains any Var nodes
+ *    of whole-row reference in the current query level.
+ *
+ *    Returns true if any such Var found.
+ */
+bool
+contain_wholerow_reference(Node *node)
+{
+	return contain_wholerow_reference_walker(node, NULL);
+}
+
+static bool
+contain_wholerow_reference_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return contain_wholerow_reference_walker((Node *)rinfo->clause,
+												 context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node,
+								  contain_wholerow_reference_walker,
+								  context);
+}
 
 /*
  * locate_var_of_level
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5bd6ae6..9514f5f 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -13,6 +13,7 @@
 #ifndef FOREIGN_H
 #define FOREIGN_H
 
+#include "foreign/fdwapi.h"
 #include "nodes/parsenodes.h"
 
 
@@ -81,4 +82,7 @@ extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
 extern Oid	get_foreign_data_wrapper_oid(const char *fdwname, bool missing_ok);
 extern Oid	get_foreign_server_oid(const char *servername, bool missing_ok);
 
+extern bool	is_fdw_managed_relation(Oid tableoid,
+									const FdwRoutine *routines_self);
+
 #endif   /* FOREIGN_H */
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 2a4b41d..73424f5 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */
diff --git a/src/include/optimizer/var.h b/src/include/optimizer/var.h
index 808bf67..6355b4d 100644
--- a/src/include/optimizer/var.h
+++ b/src/include/optimizer/var.h
@@ -36,6 +36,7 @@ extern void pull_varattnos(Node *node, Index varno, Bitmapset **varattnos);
 extern List *pull_vars_of_level(Node *node, int levelsup);
 extern bool contain_var_clause(Node *node);
 extern bool contain_vars_of_level(Node *node, int levelsup);
+extern bool contain_wholerow_reference(Node *node);
 extern int	locate_var_of_level(Node *node, int levelsup);
 extern List *pull_var_clause(Node *node, PVCAggregateBehavior aggbehavior,
 				PVCPlaceHolderBehavior phbehavior);
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
index a5a205d..1ad0e7a 100644
--- a/src/test/regress/input/custom_scan.source
+++ b/src/test/regress/input/custom_scan.source
@@ -45,5 +45,5 @@ SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
 SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
 SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
 
--- Test cleanup
+-- Test creanup
 DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
index fc13e9f..09c1bda 100644
--- a/src/test/regress/output/custom_scan.source
+++ b/src/test/regress/output/custom_scan.source
@@ -283,7 +283,7 @@ SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::
  (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
 (15 rows)
 
--- Test cleanup
+-- Test creanup
 DROP SCHEMA regtest_custom_scan CASCADE;
 NOTICE:  drop cascades to 2 other objects
 DETAIL:  drop cascades to table t1

pgsql-v9.4-custom-scan.part-2.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v2.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 760 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 107 ++++
 doc/src/sgml/custom-scan.sgml              |   8 +-
 doc/src/sgml/filelist.sgml                 |   1 +
 src/backend/optimizer/path/costsize.c      |   7 +-
 src/backend/optimizer/plan/setrefs.c       |   2 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 17 files changed, 1252 insertions(+), 14 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..703e5a5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..72bbf17
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,760 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the cheapest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+	((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chained with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being chosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chained with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be sought later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we sought.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seek to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index dd8e09e..4f23b74 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..e4afaa7
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,107 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>ctidscan</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequential scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be chosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries"> or
+   <xref linkend="guc-local-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..f53902d 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -18,7 +18,7 @@
   Overall, there are four major tasks that a custom-scan provider should 
   implement. The first task is the registration of custom-scan provider itself.
   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
-  entrypoint when the module is loading. The remaing three tasks are all done
+  entrypoint when the module is loading. The reaming three tasks are all done
   when a query is planning and executing. The second task is the submission of
   candidate paths to either scan or join relations with an adequate cost for
   the core planner. Then, the planner will choose the cheapest path from all of
@@ -50,7 +50,7 @@
      <para>
       This custom scan in this module replaces a local join of foreign tables
       managed by <literal>postgres_fdw</literal> with a scan that fetches
-      remotely joined relations. It demostrates the way to implement a custom
+      remotely joined relations. It demonstrates the way to implement a custom
       scan node that performs join nodes.
      </para>
     </listitem>
@@ -145,7 +145,7 @@ typedef struct CustomPath
   <sect2 id="custom-scan-plan">
    <title>Construction of custom plan node</title>
    <para>
-    Once <literal>CustomPath</literal> was choosen by the query planner,
+    Once <literal>CustomPath</literal> was chosen by the query planner,
     it calls back to its associated to the custom scan provider to complete 
     setting up the <literal>CustomScan</literal> plan node according to the
     path information.
@@ -160,7 +160,7 @@ InitCustomScanPlan(PlannerInfo *root,
     The query planner does basic initialization on the <literal>cscan_plan</>
     being allocated, then the custom scan provider can apply final 
     initialization. <literal>cscan_path</> is the path node that was 
-    constructed on the previous stage then was choosen.
+    constructed on the previous stage then was chosen.
     <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
     on the <literal>Plan</> portion in the <literal>cscan_plan</>.
     Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1e96829..0dfbdcc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -105,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c6010d9..e55b16e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -130,9 +130,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -977,7 +974,7 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrived tuple.
+	 * quals once per retrieved tuple.
 	 */
 	cost_qual_eval(&tid_qual_cost, tidquals, root);
 
@@ -3201,7 +3198,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1af5469..630c8e7 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1081,7 +1081,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 78efaa5..b040334 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -159,15 +159,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 444ab740..a2873ec 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -145,6 +145,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index ba7ae7c..13cfba8 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index d5935b6..9645025 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..a5a205d
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..fc13e9f
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5758b07..bd6fc3f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 78348f5..0e191a2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -91,6 +91,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

pgsql-v9.4-custom-scan.part-1.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v2.patchDownload

 doc/src/sgml/custom-scan.sgml           | 295 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  96 +++++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  34 +++-
 src/backend/executor/execProcnode.c     |  14 ++
 src/backend/executor/execQual.c         |  10 +-
 src/backend/executor/execUtils.c        |   4 +-
 src/backend/executor/nodeCustom.c       | 252 +++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  30 ++++
 src/backend/nodes/outfuncs.c            |  19 ++
 src/backend/nodes/print.c               |   4 +
 src/backend/optimizer/path/allpaths.c   |  23 +++
 src/backend/optimizer/path/costsize.c   |   2 +-
 src/backend/optimizer/path/joinpath.c   |  18 ++
 src/backend/optimizer/plan/createplan.c | 104 +++++++++++
 src/backend/optimizer/plan/setrefs.c    |  25 +++
 src/backend/optimizer/plan/subselect.c  |  10 ++
 src/backend/optimizer/util/pathnode.c   |  40 +++++
 src/backend/utils/adt/ruleutils.c       |  44 ++++-
 src/include/executor/executor.h         |   3 +-
 src/include/executor/nodeCustom.h       |  94 ++++++++++
 src/include/nodes/execnodes.h           |  17 ++
 src/include/nodes/nodes.h               |   3 +
 src/include/nodes/plannodes.h           |  16 ++
 src/include/nodes/primnodes.h           |   1 +
 src/include/nodes/relation.h            |  16 ++
 src/include/optimizer/pathnode.h        |  10 ++
 src/include/optimizer/paths.h           |  25 +++
 30 files changed, 1198 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..b57d82f
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,295 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-scan API enables an extension to provide alternative ways to scan
+  or join relations leveraging the cost based optimizer. The API consists of a
+  set of callbacks, with a unique names, to be invoked during query planning 
+  and execution. A custom-scan provider should implement these callback 
+  functions according to the expectation of the API.
+ </para>
+ <para>
+  Overall, there are four major tasks that a custom-scan provider should 
+  implement. The first task is the registration of custom-scan provider itself.
+  Usually, this needs to be done once at the <literal>_PG_init()</literal> 
+  entrypoint when the module is loading. The remaing three tasks are all done
+  when a query is planning and executing. The second task is the submission of
+  candidate paths to either scan or join relations with an adequate cost for
+  the core planner. Then, the planner will choose the cheapest path from all of
+  the candidates. If the custom path survived, the planner starts the third 
+  task; construction of a <literal>CustomScan</literal> plan node, located
+  within the query plan tree instead of the built-in plan node. The last task
+  is the execution of its implementation in answer to invocations by the core
+  executor.
+ </para>
+ <para>
+  Some of contrib modules utilize the custom-scan API. They may provide a good
+  example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan to skip earlier pages or
+      terminate prior to end of the relation, if the inequality operator on the
+      <literal>ctid</literal> system column can narrow down the scope to be
+      scanned, instead of a sequential scan which reads a relation from the
+      head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      This custom scan in this module replaces a local join of foreign tables
+      managed by <literal>postgres_fdw</literal> with a scan that fetches
+      remotely joined relations. It demostrates the way to implement a custom
+      scan node that performs join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Currently, only scan and join are fully supported with integrated cost
+  based query optimization using the custom scan API. You might be able to
+  implement other stuff, like sort or aggregation, with manipulation of the
+  planned tree, however, the extension has to be responsible to handle this
+  replacement correctly. There is no support in the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first task for a custom scan provider is the registration of a set of
+    callbacks with a unique names. Usually, this is done once upon module
+    loading in the <literal>_PG_init()</literal> entrypoint.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds the best way to scan or join relations from various
+    potential paths using a combination of scan algorithms and target 
+    relations. Prior to this selection, we list all of the potential paths
+    towards a target relation (if it is a base relation) or a pair of relations
+    (if it is a join). The <literal>add_scan_path_hook</> and
+    <literal>add_join_path_hook</> allow extensions to add alternative scan
+    paths in addition to built-in paths.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct a <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is a common field for all the path nodes to store
+    a cost estimation. In addition, <literal>custom_name</> is the name of
+    the registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, the custom scan provider has to be responsible to mark and
+        restore a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scans.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> was choosen by the query planner,
+    it calls back to its associated to the custom scan provider to complete 
+    setting up the <literal>CustomScan</literal> plan node according to the
+    path information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    The query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then the custom scan provider can apply final 
+    initialization. <literal>cscan_path</> is the path node that was 
+    constructed on the previous stage then was choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during a relation scan. Its expression portion will also be
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after construction of the plan node.
+    For example, Var node in the target list of the join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if the custom scan node is a vanilla relation
+    scan because there is nothing special to do. Elsewhere, it needs to
+    be handled by the custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    The query executor also launches the associated callbacks to begin, execute
+    and end the custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows the custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch the actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by the custom scan
+    provider. Also, <literal>eflags</> has flag bits of the executor's
+    operating mode for this plan node. Note that the custom scan provider
+    should not perform anything visible externally if 
+    <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations, if joining,
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether the next
+    tuple matches the qualifier of this scan, or not.
+    The usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations, if
+    joining, according to the custom logic. Pay attention the data format (and
+    the way to return also) since it depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and releases resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> was saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d1b7dc6..1e96829 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index b47bf52..ed76d33 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index bd5428d..ac7fc68 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -683,6 +685,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -809,6 +816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -897,6 +905,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+			operation = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1013,6 +1028,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1303,6 +1322,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->functions != NIL && es->verbose)
+			{
+				List	   *fexprs = NIL;
+				ListCell   *lc;
+
+				foreach(lc, ((CustomScan *) plan)->functions)
+				{
+					RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
+
+					fexprs = lappend(fexprs, rtfunc->funcexpr);
+				}
+				/* We rely on show_expression to insert commas as needed */
+				show_expression((Node *) fexprs,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			}
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1870,6 +1912,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2042,6 +2097,47 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				List	   *functions = ((CustomScan *) plan)->functions;
+
+				if (functions && list_length(functions) == 1)
+				{
+					RangeTblFunction *rtfunc = linitial(functions);
+
+					if (IsA(rtfunc->funcexpr, FuncExpr))
+					{
+						FuncExpr   *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+						Oid			funcid = funcexpr->funcid;
+
+						objectname = get_func_name(funcid);
+						if (es->verbose)
+							namespace =
+								get_namespace_name(get_func_namespace(funcid));
+					}
+				}
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..f80e6c4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 76dd62f..b1110b9 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..e60ac67 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..df0d295 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e3edcf6..e21982f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(functions);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3951,6 +3978,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4c7505e..00c7466 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -564,6 +564,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(functions);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2824,6 +2840,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 48ef325..29fcba9 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 96fe50f..ebc0b28 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -400,6 +402,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -428,6 +433,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1247,6 +1255,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1318,6 +1329,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1341,6 +1355,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1410,6 +1427,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1463,6 +1483,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 50f0852..c6010d9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2312,7 +2312,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5b477e5..9483614 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2c122d..a545af0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -233,6 +237,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -409,6 +414,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2009,6 +2021,98 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			List   *functions = rte->functions;
+
+			if (best_path->path.param_info)
+				functions = (List *)
+					replace_nestloop_params(root, (Node *)functions);
+			scan_plan->functions = functions;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information.
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5c9f3d6..1af5469 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -575,6 +576,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index d8cabbd..3a19aac 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2222,6 +2222,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a7169ef..32e8b59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 348f620..48bd672 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -143,6 +143,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2447,14 +2448,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3712,6 +3718,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5379,6 +5393,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5649,6 +5675,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 75841c8..51537d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5a40347..f315b8f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1496,6 +1496,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ff9af76..adc5123 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 101e22c..58575b9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -479,6 +479,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	List	   *functions;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7918537..b71c7ca 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6d7b594..50194f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -876,6 +876,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 0033a3c..8fbdb66 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 999adaa..09406f4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,

#15

Shigeru Hanada

shigeru.hanada@gmail.com

about 12 years ago

In reply to: Kohei KaiGai (#14)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi KaiGai-san,

2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.

Please check it. Thanks,

The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.

I have some more random comments about EXPLAIN.

1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?

2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.

Regards,
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Kohei KaiGai

kaigai@kaigai.gr.jp

about 12 years ago

In reply to: Shigeru Hanada (#15)

3 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Hanada-san,

Thanks for your reviewing.

2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>:

Hi KaiGai-san,

2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.

Please check it. Thanks,

The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.

Fixed,

I have some more random comments about EXPLAIN.

1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?

I tried free-riding on the existing properties, but it does not make sense
indeed, as you pointed out.
I adjusted the explain.c to show "Custom-Provider" property for Custom-
Scan node, as follows.

postgres=# explain(format xml) select * from t1 where ctid > '(4,0)'::tid;
QUERY PLAN
----------------------------------------------------------
<explain xmlns="http://www.postgresql.org/2009/explain">+
<Query> +
<Plan> +
<Node-Type>Custom Scan</Node-Type> +
<Custom-Provider>ctidscan</Custom-Provider> +
<Relation-Name>t1</Relation-Name> +
<Alias>t1</Alias> +
<Startup-Cost>0.00</Startup-Cost> +
<Total-Cost>12.30</Total-Cost> +
<Plan-Rows>410</Plan-Rows> +
<Plan-Width>36</Plan-Width> +
<Filter>(ctid > '(4,0)'::tid)</Filter> +
</Plan> +
</Query> +
</explain>
(1 row)

2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.

Even though I agree that it helps users to understand the plan,
it also has a headache to implement because CustomScan node
(and its super class) does not have an information which relations
are underlying. Probably, this functionality needs to show
the underlying relations on ExplainTargetRel() if CustomScan node
represents a scan instead of join. What data source can produce
the list of underlying relations here?
So, if it is not a significant restriction for users, I'd like to work on this
feature later.

The attached patch fixes up a minor warning around get_rel_relkind
and name of the property for custom-provider. Please check it.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-3.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v3.patchDownload

 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1075 +++++++++++++++++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 src/backend/foreign/foreign.c                  |   29 +
 src/backend/nodes/bitmapset.c                  |   62 ++
 src/backend/optimizer/util/var.c               |   39 +
 src/include/foreign/foreign.h                  |    4 +
 src/include/nodes/bitmapset.h                  |    4 +
 src/include/optimizer/var.h                    |    1 +
 10 files changed, 1350 insertions(+), 171 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a2675eb..5af3dd7 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+							  true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 246a3a9..6786b89 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -844,84 +822,59 @@ postgresGetForeignPlan(PlannerInfo *root,
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
  */
-static void
-postgresBeginForeignScan(ForeignScanState *node, int eflags)
+static PgFdwScanState *
+commonBeginForeignScan(PlanState *ps, TupleDesc tupdesc,
+					   Oid serverid, Oid userid,
+					   char *remote_query, List *retrieved_attrs,
+					   List *remote_exprs)
 {
-	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
-	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
-	Oid			userid;
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;
-	int			numParams;
-	int			i;
-	ListCell   *lc;
+	ForeignServer  *server;
+	UserMapping	   *user;
+	int				numParams;
+	int				i;
+	ListCell	   *lc;
 
-	/*
-	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
-	 */
-	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
-		return;
-
-	/*
-	 * We'll save private state in node->fdw_state.
-	 */
+	/* Allocation of private state */
 	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
-	node->fdw_state = (void *) fsstate;
-
-	/*
-	 * Identify which user to do the remote access as.	This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
+	fsstate->scan_tupdesc = tupdesc;
+	fsstate->query = remote_query;
+	fsstate->retrieved_attrs = retrieved_attrs;
 
 	/*
 	 * Get connection to the foreign server.  Connection manager will
-	 * establish new connection if necessary.
+	 * establish new connection on demand.
 	 */
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
 	fsstate->cursor_exists = false;
 
-	/* Get private info created by planner functions. */
-	fsstate->query = strVal(list_nth(fsplan->fdw_private,
-									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
-
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
-	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->batch_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											   "postgres_fdw tuple data",
 											   ALLOCSET_DEFAULT_MINSIZE,
 											   ALLOCSET_DEFAULT_INITSIZE,
 											   ALLOCSET_DEFAULT_MAXSIZE);
-	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->temp_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_MINSIZE,
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
 	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->scan_tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
-	numParams = list_length(fsplan->fdw_exprs);
+	numParams = list_length(remote_exprs);
 	fsstate->numParams = numParams;
 	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
 
 	i = 0;
-	foreach(lc, fsplan->fdw_exprs)
+	foreach(lc, remote_exprs)
 	{
 		Node	   *param_expr = (Node *) lfirst(lc);
 		Oid			typefnoid;
@@ -940,17 +893,62 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * benefit, and it'd require postgres_fdw to know more than is desirable
 	 * about Param evaluation.)
 	 */
-	fsstate->param_exprs = (List *)
-		ExecInitExpr((Expr *) fsplan->fdw_exprs,
-					 (PlanState *) node);
+	fsstate->param_exprs = (List *) ExecInitExpr((Expr *) remote_exprs, ps);
 
 	/*
 	 * Allocate buffer for text form of query parameters, if any.
 	 */
 	if (numParams > 0)
-		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+	return fsstate;
+}
+
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
+	PgFdwScanState *fsstate;
+	EState	   *estate = node->ss.ps.state;
+	Relation	rel;
+	char	   *remote_query;
+	List	   *retrieved_attrs;
+	RangeTblEntry *rte;
+	Oid			userid;
+	ForeignTable *table;
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Identify which user to do the remote access as.	This should match what
+	 * ExecCheckRTEPerms() does.
+	 */
+	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	/* Get info about foreign table. */
+	rel = node->ss.ss_currentRelation;
+	table = GetForeignTable(RelationGetRelid(rel));
+
+	/* Get private info created by planner functions. */
+	remote_query = strVal(list_nth(fsplan->fdw_private,
+								   FdwScanPrivateSelectSql));
+	retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
+
+	fsstate = commonBeginForeignScan(&node->ss.ps, RelationGetDescr(rel),
+									 table->serverid, userid,
+									 remote_query, retrieved_attrs,
+									 fsplan->fdw_exprs);
+	fsstate->rel = rel;
+
+	node->fdw_state = fsstate;
 }
 
 /*
@@ -959,17 +957,15 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
  *		EOF.
  */
 static TupleTableSlot *
-postgresIterateForeignScan(ForeignScanState *node)
+commonIterateForeignScan(PgFdwScanState *fsstate, PlanState *ps,
+						 TupleTableSlot *slot)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-
 	/*
 	 * If this is the first call after Begin or ReScan, we need to create the
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, ps->ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +974,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -995,14 +991,22 @@ postgresIterateForeignScan(ForeignScanState *node)
 	return slot;
 }
 
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
 /*
  * postgresReScanForeignScan
  *		Restart the scan.
  */
 static void
-postgresReScanForeignScan(ForeignScanState *node)
+commonReScanForeignScan(PgFdwScanState *fsstate, PlanState *ps)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	char		sql[64];
 	PGresult   *res;
 
@@ -1016,7 +1020,7 @@ postgresReScanForeignScan(ForeignScanState *node)
 	 * be good enough.	If we've only fetched zero or one batch, we needn't
 	 * even rewind the cursor, just rescan what we have.
 	 */
-	if (node->ss.ps.chgParam != NULL)
+	if (ps->chgParam != NULL)
 	{
 		fsstate->cursor_exists = false;
 		snprintf(sql, sizeof(sql), "CLOSE c%u",
@@ -1051,19 +1055,21 @@ postgresReScanForeignScan(ForeignScanState *node)
 	fsstate->eof_reached = false;
 }
 
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
 /*
  * postgresEndForeignScan
  *		Finish scanning foreign table and dispose objects used for this scan
  */
 static void
-postgresEndForeignScan(ForeignScanState *node)
+commonEndForeignScan(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-
-	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
-	if (fsstate == NULL)
-		return;
-
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -1075,6 +1081,18 @@ postgresEndForeignScan(ForeignScanState *node)
 	/* MemoryContexts will be deleted automatically. */
 }
 
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	commonEndForeignScan(fsstate);
+}
+
 /*
  * postgresAddForeignUpdateTargets
  *		Add resjunk column(s) needed for update/delete on a foreign table
@@ -1704,10 +1722,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1881,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1969,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1990,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +2009,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2235,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2529,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2552,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2585,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2671,794 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		FdwRoutine			pgroutine;
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte = planner_rt_fetch(rel->relid, root);
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		memset(&pgroutine, 0, sizeof(FdwRoutine));
+		pgroutine.GetForeignRelSize = postgresGetForeignRelSize;
+
+		if (!is_fdw_managed_relation(rte->relid, &pgroutine))
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (contain_wholerow_reference((Node *)joinrel->reltargetlist) ||
+		contain_wholerow_reference((Node *)j_local_conds))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * Most of logic are equivalent to postgresBeginForeignScan, however,
+ * needs adjustment because of difference in the nature.
+ * The biggest one is, it has to open the underlying relation by itself
+ * and needs to construct tuple-descriptor from the var-list to be fetched,
+ * because custom-scan (in this case; a scan on remote join instead of
+ * local join) does not have a particular relation on its behaind, thus
+ * it needs to manage correctly.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *join_rels = NIL;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+	Oid				userid;
+	int				i;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		join_rels = lappend(join_rels, rel);
+	}
+
+	/*
+	 * Determine a user-id. Current user-id shall be applied without something
+	 * special configuration on the reference.
+	 */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+
+	/* common part to begin remote query execution */
+	fsstate = commonBeginForeignScan(&node->ss.ps, tupdesc,
+									 jinfo.fdw_server_oid, userid,
+									 jinfo.select_qry,
+									 retrieved_attrs,
+									 jinfo.select_params);
+	/* also, underlying relations also have to be saved */
+	fsstate->join_rels = join_rels;
+
+	node->custom_state = fsstate;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * Access method to fetch a tuple from the remote join query.
+ * It performs equivalent job as postgresIterateForeignScan() doing on
+ * queries to single relation.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of regular ExecScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * Nothing are different from postgresEndForeignScan, except for closing
+ * underlying relations by itself.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	/* cleanup resources used in common portion */
+	commonEndForeignScan(fsstate);
+
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * Same as postgresReScanForeignScan() doing.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c782d4f..27486b9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 2b75f73..2efa17b 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -23,6 +23,7 @@
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -621,3 +622,31 @@ get_foreign_server_oid(const char *servername, bool missing_ok)
 				 errmsg("server \"%s\" does not exist", servername)));
 	return oid;
 }
+
+/*
+ * is_fdw_managed_relation
+ *
+ * It checks whether the supplied relation is a foreign table managed
+ * by the module that has FdwRoutine, or not.
+ */
+bool
+is_fdw_managed_relation(Oid tableoid, const FdwRoutine *routines_self)
+{
+	FdwRoutine *routines;
+	char		relkind = get_rel_relkind(tableoid);
+
+	if (relkind == RELKIND_FOREIGN_TABLE)
+	{
+		routines = GetFdwRoutineByRelId(tableoid);
+
+		/*
+		 * Our assumption is a particular callback being implemented by
+		 * a particular extension shall not be shared with other extension.
+		 * So, we don't need to compare all the function pointers in the
+		 * FdwRoutine, but only one member.
+		 */
+		if (routines->GetForeignRelSize == routines_self->GetForeignRelSize)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 540db16..44f2236 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a');
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A');
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 4a3d5c8..6e899e8 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -73,6 +73,7 @@ static bool pull_varattnos_walker(Node *node, pull_varattnos_context *context);
 static bool pull_vars_walker(Node *node, pull_vars_context *context);
 static bool contain_var_clause_walker(Node *node, void *context);
 static bool contain_vars_of_level_walker(Node *node, int *sublevels_up);
+static bool contain_wholerow_reference_walker(Node *node, void *context);
 static bool locate_var_of_level_walker(Node *node,
 						   locate_var_of_level_context *context);
 static bool pull_var_clause_walker(Node *node,
@@ -418,6 +419,44 @@ contain_vars_of_level_walker(Node *node, int *sublevels_up)
 								  (void *) sublevels_up);
 }
 
+/*
+ * contain_wholerow_reference
+ *
+ *    Recursively scan a clause to discover whether it contains any Var nodes
+ *    of whole-row reference in the current query level.
+ *
+ *    Returns true if any such Var found.
+ */
+bool
+contain_wholerow_reference(Node *node)
+{
+	return contain_wholerow_reference_walker(node, NULL);
+}
+
+static bool
+contain_wholerow_reference_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return contain_wholerow_reference_walker((Node *)rinfo->clause,
+												 context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node,
+								  contain_wholerow_reference_walker,
+								  context);
+}
 
 /*
  * locate_var_of_level
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5bd6ae6..9514f5f 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -13,6 +13,7 @@
 #ifndef FOREIGN_H
 #define FOREIGN_H
 
+#include "foreign/fdwapi.h"
 #include "nodes/parsenodes.h"
 
 
@@ -81,4 +82,7 @@ extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
 extern Oid	get_foreign_data_wrapper_oid(const char *fdwname, bool missing_ok);
 extern Oid	get_foreign_server_oid(const char *servername, bool missing_ok);
 
+extern bool	is_fdw_managed_relation(Oid tableoid,
+									const FdwRoutine *routines_self);
+
 #endif   /* FOREIGN_H */
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 2a4b41d..73424f5 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */
diff --git a/src/include/optimizer/var.h b/src/include/optimizer/var.h
index 808bf67..6355b4d 100644
--- a/src/include/optimizer/var.h
+++ b/src/include/optimizer/var.h
@@ -36,6 +36,7 @@ extern void pull_varattnos(Node *node, Index varno, Bitmapset **varattnos);
 extern List *pull_vars_of_level(Node *node, int levelsup);
 extern bool contain_var_clause(Node *node);
 extern bool contain_vars_of_level(Node *node, int levelsup);
+extern bool contain_wholerow_reference(Node *node);
 extern int	locate_var_of_level(Node *node, int levelsup);
 extern List *pull_var_clause(Node *node, PVCAggregateBehavior aggbehavior,
 				PVCPlaceHolderBehavior phbehavior);

pgsql-v9.4-custom-scan.part-2.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v3.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 760 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 107 ++++
 doc/src/sgml/custom-scan.sgml              |   8 +-
 doc/src/sgml/filelist.sgml                 |   1 +
 src/backend/optimizer/path/costsize.c      |   7 +-
 src/backend/optimizer/plan/setrefs.c       |   2 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 17 files changed, 1252 insertions(+), 14 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..703e5a5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..72bbf17
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,760 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the cheapest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+	((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chained with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being chosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chained with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be sought later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we sought.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seek to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index dd8e09e..4f23b74 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..e4afaa7
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,107 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>ctidscan</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequential scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be chosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries"> or
+   <xref linkend="guc-local-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..f53902d 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -18,7 +18,7 @@
   Overall, there are four major tasks that a custom-scan provider should 
   implement. The first task is the registration of custom-scan provider itself.
   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
-  entrypoint when the module is loading. The remaing three tasks are all done
+  entrypoint when the module is loading. The reaming three tasks are all done
   when a query is planning and executing. The second task is the submission of
   candidate paths to either scan or join relations with an adequate cost for
   the core planner. Then, the planner will choose the cheapest path from all of
@@ -50,7 +50,7 @@
      <para>
       This custom scan in this module replaces a local join of foreign tables
       managed by <literal>postgres_fdw</literal> with a scan that fetches
-      remotely joined relations. It demostrates the way to implement a custom
+      remotely joined relations. It demonstrates the way to implement a custom
       scan node that performs join nodes.
      </para>
     </listitem>
@@ -145,7 +145,7 @@ typedef struct CustomPath
   <sect2 id="custom-scan-plan">
    <title>Construction of custom plan node</title>
    <para>
-    Once <literal>CustomPath</literal> was choosen by the query planner,
+    Once <literal>CustomPath</literal> was chosen by the query planner,
     it calls back to its associated to the custom scan provider to complete 
     setting up the <literal>CustomScan</literal> plan node according to the
     path information.
@@ -160,7 +160,7 @@ InitCustomScanPlan(PlannerInfo *root,
     The query planner does basic initialization on the <literal>cscan_plan</>
     being allocated, then the custom scan provider can apply final 
     initialization. <literal>cscan_path</> is the path node that was 
-    constructed on the previous stage then was choosen.
+    constructed on the previous stage then was chosen.
     <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
     on the <literal>Plan</> portion in the <literal>cscan_plan</>.
     Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1e96829..0dfbdcc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -105,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c6010d9..e55b16e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -130,9 +130,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -977,7 +974,7 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrived tuple.
+	 * quals once per retrieved tuple.
 	 */
 	cost_qual_eval(&tid_qual_cost, tidquals, root);
 
@@ -3201,7 +3198,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1af5469..630c8e7 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1081,7 +1081,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 78efaa5..b040334 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -159,15 +159,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 444ab740..a2873ec 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -145,6 +145,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index ba7ae7c..13cfba8 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index d5935b6..9645025 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..a5a205d
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..fc13e9f
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5758b07..bd6fc3f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 78348f5..0e191a2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -91,6 +91,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

pgsql-v9.4-custom-scan.part-1.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v3.patchDownload

 doc/src/sgml/custom-scan.sgml           | 295 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  99 +++++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  34 +++-
 src/backend/executor/execProcnode.c     |  14 ++
 src/backend/executor/execQual.c         |  10 +-
 src/backend/executor/execUtils.c        |   4 +-
 src/backend/executor/nodeCustom.c       | 252 +++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  30 ++++
 src/backend/nodes/outfuncs.c            |  19 ++
 src/backend/nodes/print.c               |   4 +
 src/backend/optimizer/path/allpaths.c   |  23 +++
 src/backend/optimizer/path/costsize.c   |   2 +-
 src/backend/optimizer/path/joinpath.c   |  18 ++
 src/backend/optimizer/plan/createplan.c | 104 +++++++++++
 src/backend/optimizer/plan/setrefs.c    |  25 +++
 src/backend/optimizer/plan/subselect.c  |  10 ++
 src/backend/optimizer/util/pathnode.c   |  40 +++++
 src/backend/utils/adt/ruleutils.c       |  44 ++++-
 src/include/executor/executor.h         |   3 +-
 src/include/executor/nodeCustom.h       |  94 ++++++++++
 src/include/nodes/execnodes.h           |  17 ++
 src/include/nodes/nodes.h               |   3 +
 src/include/nodes/plannodes.h           |  16 ++
 src/include/nodes/primnodes.h           |   1 +
 src/include/nodes/relation.h            |  16 ++
 src/include/optimizer/pathnode.h        |  10 ++
 src/include/optimizer/paths.h           |  25 +++
 30 files changed, 1201 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..b57d82f
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,295 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-scan API enables an extension to provide alternative ways to scan
+  or join relations leveraging the cost based optimizer. The API consists of a
+  set of callbacks, with a unique names, to be invoked during query planning 
+  and execution. A custom-scan provider should implement these callback 
+  functions according to the expectation of the API.
+ </para>
+ <para>
+  Overall, there are four major tasks that a custom-scan provider should 
+  implement. The first task is the registration of custom-scan provider itself.
+  Usually, this needs to be done once at the <literal>_PG_init()</literal> 
+  entrypoint when the module is loading. The remaing three tasks are all done
+  when a query is planning and executing. The second task is the submission of
+  candidate paths to either scan or join relations with an adequate cost for
+  the core planner. Then, the planner will choose the cheapest path from all of
+  the candidates. If the custom path survived, the planner starts the third 
+  task; construction of a <literal>CustomScan</literal> plan node, located
+  within the query plan tree instead of the built-in plan node. The last task
+  is the execution of its implementation in answer to invocations by the core
+  executor.
+ </para>
+ <para>
+  Some of contrib modules utilize the custom-scan API. They may provide a good
+  example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan to skip earlier pages or
+      terminate prior to end of the relation, if the inequality operator on the
+      <literal>ctid</literal> system column can narrow down the scope to be
+      scanned, instead of a sequential scan which reads a relation from the
+      head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      This custom scan in this module replaces a local join of foreign tables
+      managed by <literal>postgres_fdw</literal> with a scan that fetches
+      remotely joined relations. It demostrates the way to implement a custom
+      scan node that performs join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Currently, only scan and join are fully supported with integrated cost
+  based query optimization using the custom scan API. You might be able to
+  implement other stuff, like sort or aggregation, with manipulation of the
+  planned tree, however, the extension has to be responsible to handle this
+  replacement correctly. There is no support in the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first task for a custom scan provider is the registration of a set of
+    callbacks with a unique names. Usually, this is done once upon module
+    loading in the <literal>_PG_init()</literal> entrypoint.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds the best way to scan or join relations from various
+    potential paths using a combination of scan algorithms and target 
+    relations. Prior to this selection, we list all of the potential paths
+    towards a target relation (if it is a base relation) or a pair of relations
+    (if it is a join). The <literal>add_scan_path_hook</> and
+    <literal>add_join_path_hook</> allow extensions to add alternative scan
+    paths in addition to built-in paths.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct a <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is a common field for all the path nodes to store
+    a cost estimation. In addition, <literal>custom_name</> is the name of
+    the registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, the custom scan provider has to be responsible to mark and
+        restore a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scans.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> was choosen by the query planner,
+    it calls back to its associated to the custom scan provider to complete 
+    setting up the <literal>CustomScan</literal> plan node according to the
+    path information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    The query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then the custom scan provider can apply final 
+    initialization. <literal>cscan_path</> is the path node that was 
+    constructed on the previous stage then was choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during a relation scan. Its expression portion will also be
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after construction of the plan node.
+    For example, Var node in the target list of the join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if the custom scan node is a vanilla relation
+    scan because there is nothing special to do. Elsewhere, it needs to
+    be handled by the custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    The query executor also launches the associated callbacks to begin, execute
+    and end the custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows the custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch the actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by the custom scan
+    provider. Also, <literal>eflags</> has flag bits of the executor's
+    operating mode for this plan node. Note that the custom scan provider
+    should not perform anything visible externally if 
+    <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations, if joining,
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether the next
+    tuple matches the qualifier of this scan, or not.
+    The usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations, if
+    joining, according to the custom logic. Pay attention the data format (and
+    the way to return also) since it depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and releases resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> was saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d1b7dc6..1e96829 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index b47bf52..ed76d33 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index bd5428d..0532197 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -683,6 +685,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -809,6 +816,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -897,6 +906,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+		    custom_name = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -998,6 +1014,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom Provider", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1013,6 +1031,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1303,6 +1325,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->functions != NIL && es->verbose)
+			{
+				List	   *fexprs = NIL;
+				ListCell   *lc;
+
+				foreach(lc, ((CustomScan *) plan)->functions)
+				{
+					RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
+
+					fexprs = lappend(fexprs, rtfunc->funcexpr);
+				}
+				/* We rely on show_expression to insert commas as needed */
+				show_expression((Node *) fexprs,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			}
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1870,6 +1915,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2042,6 +2100,47 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				List	   *functions = ((CustomScan *) plan)->functions;
+
+				if (functions && list_length(functions) == 1)
+				{
+					RangeTblFunction *rtfunc = linitial(functions);
+
+					if (IsA(rtfunc->funcexpr, FuncExpr))
+					{
+						FuncExpr   *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+						Oid			funcid = funcexpr->funcid;
+
+						objectname = get_func_name(funcid);
+						if (es->verbose)
+							namespace =
+								get_namespace_name(get_func_namespace(funcid));
+					}
+				}
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..f80e6c4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 76dd62f..b1110b9 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..e60ac67 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..df0d295 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e3edcf6..e21982f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(functions);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3951,6 +3978,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4c7505e..00c7466 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -564,6 +564,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(functions);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2824,6 +2840,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 48ef325..29fcba9 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 96fe50f..ebc0b28 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -400,6 +402,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -428,6 +433,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1247,6 +1255,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1318,6 +1329,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1341,6 +1355,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1410,6 +1427,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1463,6 +1483,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 50f0852..c6010d9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2312,7 +2312,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5b477e5..9483614 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2c122d..a545af0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -233,6 +237,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -409,6 +414,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2009,6 +2021,98 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			List   *functions = rte->functions;
+
+			if (best_path->path.param_info)
+				functions = (List *)
+					replace_nestloop_params(root, (Node *)functions);
+			scan_plan->functions = functions;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information.
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5c9f3d6..1af5469 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -575,6 +576,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index d8cabbd..3a19aac 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2222,6 +2222,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a7169ef..32e8b59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 348f620..48bd672 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -143,6 +143,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2447,14 +2448,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3712,6 +3718,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5379,6 +5393,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5649,6 +5675,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 75841c8..51537d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5a40347..f315b8f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1496,6 +1496,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ff9af76..adc5123 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 101e22c..58575b9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -479,6 +479,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	List	   *functions;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7918537..b71c7ca 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6d7b594..50194f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -876,6 +876,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 0033a3c..8fbdb66 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 999adaa..09406f4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,

#17

Shigeru Hanada

shigeru.hanada@gmail.com

about 12 years ago

In reply to: Kohei KaiGai (#16)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi Kaigai-san,

2013/12/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>:

The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.

Fixed,

Check.

I have some more random comments about EXPLAIN.

1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?

I tried free-riding on the existing properties, but it does not make sense
indeed, as you pointed out.
I adjusted the explain.c to show "Custom-Provider" property for Custom-
Scan node, as follows.

New name seems better, it is what the node express.

2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.

Even though I agree that it helps users to understand the plan,
it also has a headache to implement because CustomScan node
(and its super class) does not have an information which relations
are underlying. Probably, this functionality needs to show
the underlying relations on ExplainTargetRel() if CustomScan node
represents a scan instead of join. What data source can produce
the list of underlying relations here?
So, if it is not a significant restriction for users, I'd like to work on this
feature later.

Agreed. It would be enough that Custom Scan Providers can add
arbitrary information, such as "Remote SQL" of postgres_fdw, to
EXPLAIN result via core API. Some kind of framework which helps
authors of Custom Scan Providers, but it should be considered after
the first cut.

The attached patch fixes up a minor warning around get_rel_relkind
and name of the property for custom-provider. Please check it.

The patch can be applied onto 2013-12-16 HEAD cleanly, and gives no
unexpected error/warinig.

I'm sorry to post separately, but I have some comments on document.

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

Codes for core and contrib seem fine, so I'll mark the patches "Ready
for committer" after the document enhancement.

Regards,
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

KaiGai Kohei

kaigai@ak.jp.nec.com

about 12 years ago

In reply to: Shigeru Hanada (#17)

3 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi Hanada-san,

(2013/12/16 14:15), Shigeru Hanada wrote:

I'm sorry to post separately, but I have some comments on document.

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.

Best regards,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.4-custom-scan.part-1.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-1.v4.patchDownload

 doc/src/sgml/custom-scan.sgml           | 295 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  99 +++++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  34 +++-
 src/backend/executor/execProcnode.c     |  14 ++
 src/backend/executor/execQual.c         |  10 +-
 src/backend/executor/execUtils.c        |   4 +-
 src/backend/executor/nodeCustom.c       | 252 +++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  30 ++++
 src/backend/nodes/outfuncs.c            |  19 ++
 src/backend/nodes/print.c               |   4 +
 src/backend/optimizer/path/allpaths.c   |  23 +++
 src/backend/optimizer/path/costsize.c   |   2 +-
 src/backend/optimizer/path/joinpath.c   |  18 ++
 src/backend/optimizer/plan/createplan.c | 104 +++++++++++
 src/backend/optimizer/plan/setrefs.c    |  25 +++
 src/backend/optimizer/plan/subselect.c  |  10 ++
 src/backend/optimizer/util/pathnode.c   |  40 +++++
 src/backend/utils/adt/ruleutils.c       |  44 ++++-
 src/include/executor/executor.h         |   3 +-
 src/include/executor/nodeCustom.h       |  94 ++++++++++
 src/include/nodes/execnodes.h           |  17 ++
 src/include/nodes/nodes.h               |   3 +
 src/include/nodes/plannodes.h           |  16 ++
 src/include/nodes/primnodes.h           |   1 +
 src/include/nodes/relation.h            |  16 ++
 src/include/optimizer/pathnode.h        |  10 ++
 src/include/optimizer/paths.h           |  25 +++
 30 files changed, 1201 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..b57d82f
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,295 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-scan API enables an extension to provide alternative ways to scan
+  or join relations leveraging the cost based optimizer. The API consists of a
+  set of callbacks, with a unique names, to be invoked during query planning 
+  and execution. A custom-scan provider should implement these callback 
+  functions according to the expectation of the API.
+ </para>
+ <para>
+  Overall, there are four major tasks that a custom-scan provider should 
+  implement. The first task is the registration of custom-scan provider itself.
+  Usually, this needs to be done once at the <literal>_PG_init()</literal> 
+  entrypoint when the module is loading. The remaing three tasks are all done
+  when a query is planning and executing. The second task is the submission of
+  candidate paths to either scan or join relations with an adequate cost for
+  the core planner. Then, the planner will choose the cheapest path from all of
+  the candidates. If the custom path survived, the planner starts the third 
+  task; construction of a <literal>CustomScan</literal> plan node, located
+  within the query plan tree instead of the built-in plan node. The last task
+  is the execution of its implementation in answer to invocations by the core
+  executor.
+ </para>
+ <para>
+  Some of contrib modules utilize the custom-scan API. They may provide a good
+  example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan to skip earlier pages or
+      terminate prior to end of the relation, if the inequality operator on the
+      <literal>ctid</literal> system column can narrow down the scope to be
+      scanned, instead of a sequential scan which reads a relation from the
+      head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      This custom scan in this module replaces a local join of foreign tables
+      managed by <literal>postgres_fdw</literal> with a scan that fetches
+      remotely joined relations. It demostrates the way to implement a custom
+      scan node that performs join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Currently, only scan and join are fully supported with integrated cost
+  based query optimization using the custom scan API. You might be able to
+  implement other stuff, like sort or aggregation, with manipulation of the
+  planned tree, however, the extension has to be responsible to handle this
+  replacement correctly. There is no support in the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first task for a custom scan provider is the registration of a set of
+    callbacks with a unique names. Usually, this is done once upon module
+    loading in the <literal>_PG_init()</literal> entrypoint.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds the best way to scan or join relations from various
+    potential paths using a combination of scan algorithms and target 
+    relations. Prior to this selection, we list all of the potential paths
+    towards a target relation (if it is a base relation) or a pair of relations
+    (if it is a join). The <literal>add_scan_path_hook</> and
+    <literal>add_join_path_hook</> allow extensions to add alternative scan
+    paths in addition to built-in paths.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct a <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is a common field for all the path nodes to store
+    a cost estimation. In addition, <literal>custom_name</> is the name of
+    the registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, the custom scan provider has to be responsible to mark and
+        restore a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scans.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> was choosen by the query planner,
+    it calls back to its associated to the custom scan provider to complete 
+    setting up the <literal>CustomScan</literal> plan node according to the
+    path information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    The query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then the custom scan provider can apply final 
+    initialization. <literal>cscan_path</> is the path node that was 
+    constructed on the previous stage then was choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during a relation scan. Its expression portion will also be
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after construction of the plan node.
+    For example, Var node in the target list of the join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if the custom scan node is a vanilla relation
+    scan because there is nothing special to do. Elsewhere, it needs to
+    be handled by the custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    The query executor also launches the associated callbacks to begin, execute
+    and end the custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows the custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch the actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by the custom scan
+    provider. Also, <literal>eflags</> has flag bits of the executor's
+    operating mode for this plan node. Note that the custom scan provider
+    should not perform anything visible externally if 
+    <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations, if joining,
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether the next
+    tuple matches the qualifier of this scan, or not.
+    The usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations, if
+    joining, according to the custom logic. Pay attention the data format (and
+    the way to return also) since it depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and releases resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> was saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d1b7dc6..1e96829 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index b47bf52..ed76d33 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index bd5428d..0532197 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -683,6 +685,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -809,6 +816,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -897,6 +906,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+		    custom_name = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -998,6 +1014,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom Provider", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1013,6 +1031,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1303,6 +1325,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->functions != NIL && es->verbose)
+			{
+				List	   *fexprs = NIL;
+				ListCell   *lc;
+
+				foreach(lc, ((CustomScan *) plan)->functions)
+				{
+					RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
+
+					fexprs = lappend(fexprs, rtfunc->funcexpr);
+				}
+				/* We rely on show_expression to insert commas as needed */
+				show_expression((Node *) fexprs,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			}
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1870,6 +1915,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2042,6 +2100,47 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				List	   *functions = ((CustomScan *) plan)->functions;
+
+				if (functions && list_length(functions) == 1)
+				{
+					RangeTblFunction *rtfunc = linitial(functions);
+
+					if (IsA(rtfunc->funcexpr, FuncExpr))
+					{
+						FuncExpr   *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+						Oid			funcid = funcexpr->funcid;
+
+						objectname = get_func_name(funcid);
+						if (es->verbose)
+							namespace =
+								get_namespace_name(get_func_namespace(funcid));
+					}
+				}
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..f80e6c4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 76dd62f..b1110b9 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..e60ac67 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..df0d295 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e3edcf6..e21982f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(functions);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3951,6 +3978,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4c7505e..00c7466 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -564,6 +564,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(functions);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2824,6 +2840,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 48ef325..29fcba9 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 96fe50f..ebc0b28 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -400,6 +402,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -428,6 +433,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1247,6 +1255,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1318,6 +1329,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1341,6 +1355,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1410,6 +1427,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1463,6 +1483,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 50f0852..c6010d9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2312,7 +2312,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5b477e5..9483614 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2c122d..a545af0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -233,6 +237,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -409,6 +414,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2009,6 +2021,98 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			List   *functions = rte->functions;
+
+			if (best_path->path.param_info)
+				functions = (List *)
+					replace_nestloop_params(root, (Node *)functions);
+			scan_plan->functions = functions;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information.
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5c9f3d6..1af5469 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -575,6 +576,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index d8cabbd..3a19aac 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2222,6 +2222,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a7169ef..32e8b59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 348f620..48bd672 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -143,6 +143,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2447,14 +2448,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3712,6 +3718,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5379,6 +5393,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5649,6 +5675,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 75841c8..51537d2 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5a40347..f315b8f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1496,6 +1496,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ff9af76..adc5123 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 101e22c..58575b9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -479,6 +479,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	List	   *functions;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 7918537..b71c7ca 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6d7b594..50194f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -876,6 +876,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 0033a3c..8fbdb66 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 999adaa..09406f4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,

pgsql-v9.4-custom-scan.part-2.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-2.v4.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 760 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 108 ++++
 doc/src/sgml/custom-scan.sgml              |   8 +-
 doc/src/sgml/filelist.sgml                 |   1 +
 src/backend/optimizer/path/costsize.c      |   7 +-
 src/backend/optimizer/plan/setrefs.c       |   2 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 17 files changed, 1253 insertions(+), 14 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..703e5a5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..72bbf17
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,760 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the cheapest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+	((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chained with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being chosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chained with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be sought later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we sought.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seek to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index dd8e09e..4f23b74 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..d010d5c
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,108 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>ctidscan</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequential scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be chosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries">,
+   <xref linkend="guc-local-preload-libraries"> or
+   <xref linkend="guc-session-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..f53902d 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -18,7 +18,7 @@
   Overall, there are four major tasks that a custom-scan provider should 
   implement. The first task is the registration of custom-scan provider itself.
   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
-  entrypoint when the module is loading. The remaing three tasks are all done
+  entrypoint when the module is loading. The reaming three tasks are all done
   when a query is planning and executing. The second task is the submission of
   candidate paths to either scan or join relations with an adequate cost for
   the core planner. Then, the planner will choose the cheapest path from all of
@@ -50,7 +50,7 @@
      <para>
       This custom scan in this module replaces a local join of foreign tables
       managed by <literal>postgres_fdw</literal> with a scan that fetches
-      remotely joined relations. It demostrates the way to implement a custom
+      remotely joined relations. It demonstrates the way to implement a custom
       scan node that performs join nodes.
      </para>
     </listitem>
@@ -145,7 +145,7 @@ typedef struct CustomPath
   <sect2 id="custom-scan-plan">
    <title>Construction of custom plan node</title>
    <para>
-    Once <literal>CustomPath</literal> was choosen by the query planner,
+    Once <literal>CustomPath</literal> was chosen by the query planner,
     it calls back to its associated to the custom scan provider to complete 
     setting up the <literal>CustomScan</literal> plan node according to the
     path information.
@@ -160,7 +160,7 @@ InitCustomScanPlan(PlannerInfo *root,
     The query planner does basic initialization on the <literal>cscan_plan</>
     being allocated, then the custom scan provider can apply final 
     initialization. <literal>cscan_path</> is the path node that was 
-    constructed on the previous stage then was choosen.
+    constructed on the previous stage then was chosen.
     <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
     on the <literal>Plan</> portion in the <literal>cscan_plan</>.
     Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 1e96829..0dfbdcc 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -105,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c6010d9..e55b16e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -130,9 +130,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -977,7 +974,7 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrived tuple.
+	 * quals once per retrieved tuple.
 	 */
 	cost_qual_eval(&tid_qual_cost, tidquals, root);
 
@@ -3201,7 +3198,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1af5469..630c8e7 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1081,7 +1081,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 78efaa5..b040334 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -159,15 +159,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 444ab740..a2873ec 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -145,6 +145,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index ba7ae7c..13cfba8 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index d5935b6..9645025 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..a5a205d
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..fc13e9f
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5758b07..bd6fc3f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 78348f5..0e191a2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -91,6 +91,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

pgsql-v9.4-custom-scan.part-3.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-3.v4.patchDownload

 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1075 +++++++++++++++++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 doc/src/sgml/postgres-fdw.sgml                 |   10 +
 src/backend/foreign/foreign.c                  |   29 +
 src/backend/nodes/bitmapset.c                  |   62 ++
 src/backend/optimizer/util/var.c               |   39 +
 src/include/foreign/foreign.h                  |    4 +
 src/include/nodes/bitmapset.h                  |    4 +
 src/include/optimizer/var.h                    |    1 +
 11 files changed, 1360 insertions(+), 171 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a2675eb..5af3dd7 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+							  true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 246a3a9..6786b89 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -844,84 +822,59 @@ postgresGetForeignPlan(PlannerInfo *root,
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
  */
-static void
-postgresBeginForeignScan(ForeignScanState *node, int eflags)
+static PgFdwScanState *
+commonBeginForeignScan(PlanState *ps, TupleDesc tupdesc,
+					   Oid serverid, Oid userid,
+					   char *remote_query, List *retrieved_attrs,
+					   List *remote_exprs)
 {
-	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
-	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
-	Oid			userid;
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;
-	int			numParams;
-	int			i;
-	ListCell   *lc;
+	ForeignServer  *server;
+	UserMapping	   *user;
+	int				numParams;
+	int				i;
+	ListCell	   *lc;
 
-	/*
-	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
-	 */
-	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
-		return;
-
-	/*
-	 * We'll save private state in node->fdw_state.
-	 */
+	/* Allocation of private state */
 	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
-	node->fdw_state = (void *) fsstate;
-
-	/*
-	 * Identify which user to do the remote access as.	This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
+	fsstate->scan_tupdesc = tupdesc;
+	fsstate->query = remote_query;
+	fsstate->retrieved_attrs = retrieved_attrs;
 
 	/*
 	 * Get connection to the foreign server.  Connection manager will
-	 * establish new connection if necessary.
+	 * establish new connection on demand.
 	 */
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
 	fsstate->cursor_exists = false;
 
-	/* Get private info created by planner functions. */
-	fsstate->query = strVal(list_nth(fsplan->fdw_private,
-									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
-
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
-	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->batch_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											   "postgres_fdw tuple data",
 											   ALLOCSET_DEFAULT_MINSIZE,
 											   ALLOCSET_DEFAULT_INITSIZE,
 											   ALLOCSET_DEFAULT_MAXSIZE);
-	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->temp_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_MINSIZE,
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
 	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->scan_tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
-	numParams = list_length(fsplan->fdw_exprs);
+	numParams = list_length(remote_exprs);
 	fsstate->numParams = numParams;
 	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
 
 	i = 0;
-	foreach(lc, fsplan->fdw_exprs)
+	foreach(lc, remote_exprs)
 	{
 		Node	   *param_expr = (Node *) lfirst(lc);
 		Oid			typefnoid;
@@ -940,17 +893,62 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * benefit, and it'd require postgres_fdw to know more than is desirable
 	 * about Param evaluation.)
 	 */
-	fsstate->param_exprs = (List *)
-		ExecInitExpr((Expr *) fsplan->fdw_exprs,
-					 (PlanState *) node);
+	fsstate->param_exprs = (List *) ExecInitExpr((Expr *) remote_exprs, ps);
 
 	/*
 	 * Allocate buffer for text form of query parameters, if any.
 	 */
 	if (numParams > 0)
-		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+	return fsstate;
+}
+
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
+	PgFdwScanState *fsstate;
+	EState	   *estate = node->ss.ps.state;
+	Relation	rel;
+	char	   *remote_query;
+	List	   *retrieved_attrs;
+	RangeTblEntry *rte;
+	Oid			userid;
+	ForeignTable *table;
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Identify which user to do the remote access as.	This should match what
+	 * ExecCheckRTEPerms() does.
+	 */
+	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	/* Get info about foreign table. */
+	rel = node->ss.ss_currentRelation;
+	table = GetForeignTable(RelationGetRelid(rel));
+
+	/* Get private info created by planner functions. */
+	remote_query = strVal(list_nth(fsplan->fdw_private,
+								   FdwScanPrivateSelectSql));
+	retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
+
+	fsstate = commonBeginForeignScan(&node->ss.ps, RelationGetDescr(rel),
+									 table->serverid, userid,
+									 remote_query, retrieved_attrs,
+									 fsplan->fdw_exprs);
+	fsstate->rel = rel;
+
+	node->fdw_state = fsstate;
 }
 
 /*
@@ -959,17 +957,15 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
  *		EOF.
  */
 static TupleTableSlot *
-postgresIterateForeignScan(ForeignScanState *node)
+commonIterateForeignScan(PgFdwScanState *fsstate, PlanState *ps,
+						 TupleTableSlot *slot)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-
 	/*
 	 * If this is the first call after Begin or ReScan, we need to create the
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, ps->ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +974,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -995,14 +991,22 @@ postgresIterateForeignScan(ForeignScanState *node)
 	return slot;
 }
 
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
 /*
  * postgresReScanForeignScan
  *		Restart the scan.
  */
 static void
-postgresReScanForeignScan(ForeignScanState *node)
+commonReScanForeignScan(PgFdwScanState *fsstate, PlanState *ps)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	char		sql[64];
 	PGresult   *res;
 
@@ -1016,7 +1020,7 @@ postgresReScanForeignScan(ForeignScanState *node)
 	 * be good enough.	If we've only fetched zero or one batch, we needn't
 	 * even rewind the cursor, just rescan what we have.
 	 */
-	if (node->ss.ps.chgParam != NULL)
+	if (ps->chgParam != NULL)
 	{
 		fsstate->cursor_exists = false;
 		snprintf(sql, sizeof(sql), "CLOSE c%u",
@@ -1051,19 +1055,21 @@ postgresReScanForeignScan(ForeignScanState *node)
 	fsstate->eof_reached = false;
 }
 
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
 /*
  * postgresEndForeignScan
  *		Finish scanning foreign table and dispose objects used for this scan
  */
 static void
-postgresEndForeignScan(ForeignScanState *node)
+commonEndForeignScan(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-
-	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
-	if (fsstate == NULL)
-		return;
-
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -1075,6 +1081,18 @@ postgresEndForeignScan(ForeignScanState *node)
 	/* MemoryContexts will be deleted automatically. */
 }
 
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	commonEndForeignScan(fsstate);
+}
+
 /*
  * postgresAddForeignUpdateTargets
  *		Add resjunk column(s) needed for update/delete on a foreign table
@@ -1704,10 +1722,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1881,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1969,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1990,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +2009,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2235,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2529,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2552,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2585,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2671,794 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		FdwRoutine			pgroutine;
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte = planner_rt_fetch(rel->relid, root);
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		memset(&pgroutine, 0, sizeof(FdwRoutine));
+		pgroutine.GetForeignRelSize = postgresGetForeignRelSize;
+
+		if (!is_fdw_managed_relation(rte->relid, &pgroutine))
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (contain_wholerow_reference((Node *)joinrel->reltargetlist) ||
+		contain_wholerow_reference((Node *)j_local_conds))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * Most of logic are equivalent to postgresBeginForeignScan, however,
+ * needs adjustment because of difference in the nature.
+ * The biggest one is, it has to open the underlying relation by itself
+ * and needs to construct tuple-descriptor from the var-list to be fetched,
+ * because custom-scan (in this case; a scan on remote join instead of
+ * local join) does not have a particular relation on its behaind, thus
+ * it needs to manage correctly.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *join_rels = NIL;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+	Oid				userid;
+	int				i;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		join_rels = lappend(join_rels, rel);
+	}
+
+	/*
+	 * Determine a user-id. Current user-id shall be applied without something
+	 * special configuration on the reference.
+	 */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+
+	/* common part to begin remote query execution */
+	fsstate = commonBeginForeignScan(&node->ss.ps, tupdesc,
+									 jinfo.fdw_server_oid, userid,
+									 jinfo.select_qry,
+									 retrieved_attrs,
+									 jinfo.select_params);
+	/* also, underlying relations also have to be saved */
+	fsstate->join_rels = join_rels;
+
+	node->custom_state = fsstate;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * Access method to fetch a tuple from the remote join query.
+ * It performs equivalent job as postgresIterateForeignScan() doing on
+ * queries to single relation.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of regular ExecScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * Nothing are different from postgresEndForeignScan, except for closing
+ * underlying relations by itself.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	/* cleanup resources used in common portion */
+	commonEndForeignScan(fsstate);
+
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * Same as postgresReScanForeignScan() doing.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index c782d4f..27486b9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 35924f1..7926d54 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -350,6 +350,16 @@
   </para>
 
   <para>
+   In addition, <productname>PostgreSQL</> 9.4 or later adaptively tries
+   to join relations, being managed by a same foreign server, on the remote
+   node if supplied join condition is sufficient to run on the remote side.
+   It performs as if a local custom scan node walks on a virtual relation
+   being consists of multiple relations according to remote join, thus
+   it usually has cheaper cost than data translation of both relations and
+   local join operations.
+  </para>
+
+  <para>
    The query that is actually sent to the remote server for execution can
    be examined using <command>EXPLAIN VERBOSE</>.
   </para>
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 2b75f73..2efa17b 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -23,6 +23,7 @@
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -621,3 +622,31 @@ get_foreign_server_oid(const char *servername, bool missing_ok)
 				 errmsg("server \"%s\" does not exist", servername)));
 	return oid;
 }
+
+/*
+ * is_fdw_managed_relation
+ *
+ * It checks whether the supplied relation is a foreign table managed
+ * by the module that has FdwRoutine, or not.
+ */
+bool
+is_fdw_managed_relation(Oid tableoid, const FdwRoutine *routines_self)
+{
+	FdwRoutine *routines;
+	char		relkind = get_rel_relkind(tableoid);
+
+	if (relkind == RELKIND_FOREIGN_TABLE)
+	{
+		routines = GetFdwRoutineByRelId(tableoid);
+
+		/*
+		 * Our assumption is a particular callback being implemented by
+		 * a particular extension shall not be shared with other extension.
+		 * So, we don't need to compare all the function pointers in the
+		 * FdwRoutine, but only one member.
+		 */
+		if (routines->GetForeignRelSize == routines_self->GetForeignRelSize)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 540db16..44f2236 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a');
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A');
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 4a3d5c8..6e899e8 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -73,6 +73,7 @@ static bool pull_varattnos_walker(Node *node, pull_varattnos_context *context);
 static bool pull_vars_walker(Node *node, pull_vars_context *context);
 static bool contain_var_clause_walker(Node *node, void *context);
 static bool contain_vars_of_level_walker(Node *node, int *sublevels_up);
+static bool contain_wholerow_reference_walker(Node *node, void *context);
 static bool locate_var_of_level_walker(Node *node,
 						   locate_var_of_level_context *context);
 static bool pull_var_clause_walker(Node *node,
@@ -418,6 +419,44 @@ contain_vars_of_level_walker(Node *node, int *sublevels_up)
 								  (void *) sublevels_up);
 }
 
+/*
+ * contain_wholerow_reference
+ *
+ *    Recursively scan a clause to discover whether it contains any Var nodes
+ *    of whole-row reference in the current query level.
+ *
+ *    Returns true if any such Var found.
+ */
+bool
+contain_wholerow_reference(Node *node)
+{
+	return contain_wholerow_reference_walker(node, NULL);
+}
+
+static bool
+contain_wholerow_reference_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return contain_wholerow_reference_walker((Node *)rinfo->clause,
+												 context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node,
+								  contain_wholerow_reference_walker,
+								  context);
+}
 
 /*
  * locate_var_of_level
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5bd6ae6..9514f5f 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -13,6 +13,7 @@
 #ifndef FOREIGN_H
 #define FOREIGN_H
 
+#include "foreign/fdwapi.h"
 #include "nodes/parsenodes.h"
 
 
@@ -81,4 +82,7 @@ extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
 extern Oid	get_foreign_data_wrapper_oid(const char *fdwname, bool missing_ok);
 extern Oid	get_foreign_server_oid(const char *servername, bool missing_ok);
 
+extern bool	is_fdw_managed_relation(Oid tableoid,
+									const FdwRoutine *routines_self);
+
 #endif   /* FOREIGN_H */
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 2a4b41d..73424f5 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */
diff --git a/src/include/optimizer/var.h b/src/include/optimizer/var.h
index 808bf67..6355b4d 100644
--- a/src/include/optimizer/var.h
+++ b/src/include/optimizer/var.h
@@ -36,6 +36,7 @@ extern void pull_varattnos(Node *node, Index varno, Bitmapset **varattnos);
 extern List *pull_vars_of_level(Node *node, int levelsup);
 extern bool contain_var_clause(Node *node);
 extern bool contain_vars_of_level(Node *node, int levelsup);
+extern bool contain_wholerow_reference(Node *node);
 extern int	locate_var_of_level(Node *node, int levelsup);
 extern List *pull_var_clause(Node *node, PVCAggregateBehavior aggbehavior,
 				PVCPlaceHolderBehavior phbehavior);

#19

Shigeru Hanada

shigeru.hanada@gmail.com

about 12 years ago

In reply to: KaiGai Kohei (#18)

Re: Custom Scan APIs (Re: Custom Plan node)

KaiGai-san,

2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2013/12/16 14:15), Shigeru Hanada wrote:

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.

Check.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.

Check.

I think that these patches are enough considered to mark as "Ready for
Committer".

Regards,
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Shigeru Hanada (#19)

3 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Hello,

The attached patches are the ones rebased to the latest git tree, but
no functional
changes from the previous revision on the commit-fest:Nov.
Hanada-san volunteered to review the series of patches, including the
portion for
postgres_fdw, then marked it as "ready for committer" on the last commit fest.
So, I hope someone of committer also volunteer to review the patches for final
checking.

* Part-1 - CustomScan APIs
This patch provides a set of interfaces to interact query-optimizer
and -executor
for extensions. The new add_scan_path_hook or add_join_path_hook allows to
offer alternative ways to scan a particular relation or to join a
particular relations.
Then, once the alternative ways are chosen by the optimizer,
associated callbacks
shall be kicked from the executor. In this case, extension has responsibility to
return a slot that hold a tuple (or empty for end of scan) being
scanned from the
underlying relation.

* Part-2 - contrib/ctidscan
This patch provides a simple example implementation of CustomScan API.
It enables to skip pages when inequality operators are given on ctid system
columns. That is, at least, better than sequential full-scan, so it usually wins
to SeqScan, but Index-scan is much better.

* Part-3 - remote join implementation
This patch provides an example to replace a join by a custom scan node that
runs on a result set of remote join query, on top of existing postgres_fdw
extension. The idea is, a result set of remote query looks like a relation but
intangible, thus, it is feasible to replace a local join by a scan on the result
set of a query executed on the remote host, if both of the relation to be joined
belongs to the identical foreign server.
This patch gives postgres_fdw a capability to run a join on the remote host.

Thanks,

2013/12/16 Shigeru Hanada <shigeru.hanada@gmail.com>:

KaiGai-san,

2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2013/12/16 14:15), Shigeru Hanada wrote:

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.

Check.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.

Check.

I think that these patches are enough considered to mark as "Ready for
Committer".

Regards,
--
Shigeru HANADA

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-1.v5.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v5.patchDownload

 doc/src/sgml/custom-scan.sgml           | 295 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  99 +++++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  34 +++-
 src/backend/executor/execProcnode.c     |  14 ++
 src/backend/executor/execQual.c         |  10 +-
 src/backend/executor/execUtils.c        |   4 +-
 src/backend/executor/nodeCustom.c       | 252 +++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  30 ++++
 src/backend/nodes/outfuncs.c            |  19 ++
 src/backend/nodes/print.c               |   4 +
 src/backend/optimizer/path/allpaths.c   |  23 +++
 src/backend/optimizer/path/costsize.c   |   2 +-
 src/backend/optimizer/path/joinpath.c   |  18 ++
 src/backend/optimizer/plan/createplan.c | 104 +++++++++++
 src/backend/optimizer/plan/setrefs.c    |  25 +++
 src/backend/optimizer/plan/subselect.c  |  10 ++
 src/backend/optimizer/util/pathnode.c   |  40 +++++
 src/backend/utils/adt/ruleutils.c       |  44 ++++-
 src/include/executor/executor.h         |   3 +-
 src/include/executor/nodeCustom.h       |  94 ++++++++++
 src/include/nodes/execnodes.h           |  17 ++
 src/include/nodes/nodes.h               |   3 +
 src/include/nodes/plannodes.h           |  16 ++
 src/include/nodes/primnodes.h           |   1 +
 src/include/nodes/relation.h            |  16 ++
 src/include/optimizer/pathnode.h        |  10 ++
 src/include/optimizer/paths.h           |  25 +++
 30 files changed, 1201 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..b57d82f
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,295 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-scan API enables an extension to provide alternative ways to scan
+  or join relations leveraging the cost based optimizer. The API consists of a
+  set of callbacks, with a unique names, to be invoked during query planning 
+  and execution. A custom-scan provider should implement these callback 
+  functions according to the expectation of the API.
+ </para>
+ <para>
+  Overall, there are four major tasks that a custom-scan provider should 
+  implement. The first task is the registration of custom-scan provider itself.
+  Usually, this needs to be done once at the <literal>_PG_init()</literal> 
+  entrypoint when the module is loading. The remaing three tasks are all done
+  when a query is planning and executing. The second task is the submission of
+  candidate paths to either scan or join relations with an adequate cost for
+  the core planner. Then, the planner will choose the cheapest path from all of
+  the candidates. If the custom path survived, the planner starts the third 
+  task; construction of a <literal>CustomScan</literal> plan node, located
+  within the query plan tree instead of the built-in plan node. The last task
+  is the execution of its implementation in answer to invocations by the core
+  executor.
+ </para>
+ <para>
+  Some of contrib modules utilize the custom-scan API. They may provide a good
+  example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan to skip earlier pages or
+      terminate prior to end of the relation, if the inequality operator on the
+      <literal>ctid</literal> system column can narrow down the scope to be
+      scanned, instead of a sequential scan which reads a relation from the
+      head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      This custom scan in this module replaces a local join of foreign tables
+      managed by <literal>postgres_fdw</literal> with a scan that fetches
+      remotely joined relations. It demostrates the way to implement a custom
+      scan node that performs join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Currently, only scan and join are fully supported with integrated cost
+  based query optimization using the custom scan API. You might be able to
+  implement other stuff, like sort or aggregation, with manipulation of the
+  planned tree, however, the extension has to be responsible to handle this
+  replacement correctly. There is no support in the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first task for a custom scan provider is the registration of a set of
+    callbacks with a unique names. Usually, this is done once upon module
+    loading in the <literal>_PG_init()</literal> entrypoint.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds the best way to scan or join relations from various
+    potential paths using a combination of scan algorithms and target 
+    relations. Prior to this selection, we list all of the potential paths
+    towards a target relation (if it is a base relation) or a pair of relations
+    (if it is a join). The <literal>add_scan_path_hook</> and
+    <literal>add_join_path_hook</> allow extensions to add alternative scan
+    paths in addition to built-in paths.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct a <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is a common field for all the path nodes to store
+    a cost estimation. In addition, <literal>custom_name</> is the name of
+    the registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, the custom scan provider has to be responsible to mark and
+        restore a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scans.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> was choosen by the query planner,
+    it calls back to its associated to the custom scan provider to complete 
+    setting up the <literal>CustomScan</literal> plan node according to the
+    path information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    The query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then the custom scan provider can apply final 
+    initialization. <literal>cscan_path</> is the path node that was 
+    constructed on the previous stage then was choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during a relation scan. Its expression portion will also be
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after construction of the plan node.
+    For example, Var node in the target list of the join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if the custom scan node is a vanilla relation
+    scan because there is nothing special to do. Elsewhere, it needs to
+    be handled by the custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    The query executor also launches the associated callbacks to begin, execute
+    and end the custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows the custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch the actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by the custom scan
+    provider. Also, <literal>eflags</> has flag bits of the executor's
+    operating mode for this plan node. Note that the custom scan provider
+    should not perform anything visible externally if 
+    <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations, if joining,
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether the next
+    tuple matches the qualifier of this scan, or not.
+    The usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations, if
+    joining, according to the custom logic. Pay attention the data format (and
+    the way to return also) since it depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and releases resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> was saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 552c3aa..656fd2e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index b47bf52..ed76d33 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e604be3..4528058 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -90,6 +91,7 @@ static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -689,6 +691,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -815,6 +822,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -903,6 +912,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+		    custom_name = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1004,6 +1020,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom Provider", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1019,6 +1037,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1315,6 +1337,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->functions != NIL && es->verbose)
+			{
+				List	   *fexprs = NIL;
+				ListCell   *lc;
+
+				foreach(lc, ((CustomScan *) plan)->functions)
+				{
+					RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
+
+					fexprs = lappend(fexprs, rtfunc->funcexpr);
+				}
+				/* We rely on show_expression to insert commas as needed */
+				show_expression((Node *) fexprs,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			}
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1954,6 +1999,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2126,6 +2184,47 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				List	   *functions = ((CustomScan *) plan)->functions;
+
+				if (functions && list_length(functions) == 1)
+				{
+					RangeTblFunction *rtfunc = linitial(functions);
+
+					if (IsA(rtfunc->funcexpr, FuncExpr))
+					{
+						FuncExpr   *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+						Oid			funcid = funcexpr->funcid;
+
+						objectname = get_func_name(funcid);
+						if (es->verbose)
+							namespace =
+								get_namespace_name(get_func_namespace(funcid));
+					}
+				}
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 8c01a63..2443e24 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c5ecd18..b4a7411 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 0eba025..e71ce9b 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 46895b2..58d7190 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fb4ce2c..1a774f6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(functions);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3964,6 +3991,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 568c3b8..28ea915 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -564,6 +564,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(functions);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2827,6 +2843,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 9f7f322..9f2b6bb 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..c7fcb80 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -389,6 +391,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -417,6 +422,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1236,6 +1244,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1307,6 +1318,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1330,6 +1344,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1399,6 +1416,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1452,6 +1472,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8492eed..e139316 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2312,7 +2312,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index a996116..48f5ad4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 184d37a..c07e000 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -233,6 +237,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -409,6 +414,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2006,6 +2018,98 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			List   *functions = rte->functions;
+
+			if (best_path->path.param_info)
+				functions = (List *)
+					replace_nestloop_params(root, (Node *)functions);
+			scan_plan->functions = functions;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information.
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 46affe7..b10a2c9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -576,6 +577,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a3f3583..74ff415 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2236,6 +2236,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b79af7a..17827e0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index add5cd1..d099d16 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -145,6 +145,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2482,14 +2483,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3747,6 +3753,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5414,6 +5428,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5684,6 +5710,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index eb78776..7fe0998 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a301a08..621830a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1501,6 +1501,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ae12c0d..21fd0b4 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 38c039c..85d088d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -479,6 +479,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	List	   *functions;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 4992bc0..7d9b0c0 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 8aa40d0..527a060 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -877,6 +877,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index a0bcc82..b99d841 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9b22fda..a561387 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,

pgsql-v9.4-custom-scan.part-2.v5.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v5.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 760 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 108 ++++
 doc/src/sgml/custom-scan.sgml              |   8 +-
 doc/src/sgml/filelist.sgml                 |   1 +
 src/backend/optimizer/path/costsize.c      |   7 +-
 src/backend/optimizer/plan/setrefs.c       |   2 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 17 files changed, 1253 insertions(+), 14 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index dd2683b..4a62710 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..72bbf17
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,760 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the cheapest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+	((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chained with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being chosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chained with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be sought later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we sought.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seek to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 2892fa1..d9e1997 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..d010d5c
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,108 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>ctidscan</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequential scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be chosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries">,
+   <xref linkend="guc-local-preload-libraries"> or
+   <xref linkend="guc-session-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..f53902d 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -18,7 +18,7 @@
   Overall, there are four major tasks that a custom-scan provider should 
   implement. The first task is the registration of custom-scan provider itself.
   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
-  entrypoint when the module is loading. The remaing three tasks are all done
+  entrypoint when the module is loading. The reaming three tasks are all done
   when a query is planning and executing. The second task is the submission of
   candidate paths to either scan or join relations with an adequate cost for
   the core planner. Then, the planner will choose the cheapest path from all of
@@ -50,7 +50,7 @@
      <para>
       This custom scan in this module replaces a local join of foreign tables
       managed by <literal>postgres_fdw</literal> with a scan that fetches
-      remotely joined relations. It demostrates the way to implement a custom
+      remotely joined relations. It demonstrates the way to implement a custom
       scan node that performs join nodes.
      </para>
     </listitem>
@@ -145,7 +145,7 @@ typedef struct CustomPath
   <sect2 id="custom-scan-plan">
    <title>Construction of custom plan node</title>
    <para>
-    Once <literal>CustomPath</literal> was choosen by the query planner,
+    Once <literal>CustomPath</literal> was chosen by the query planner,
     it calls back to its associated to the custom scan provider to complete 
     setting up the <literal>CustomScan</literal> plan node according to the
     path information.
@@ -160,7 +160,7 @@ InitCustomScanPlan(PlannerInfo *root,
     The query planner does basic initialization on the <literal>cscan_plan</>
     being allocated, then the custom scan provider can apply final 
     initialization. <literal>cscan_path</> is the path node that was 
-    constructed on the previous stage then was choosen.
+    constructed on the previous stage then was chosen.
     <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
     on the <literal>Plan</> portion in the <literal>cscan_plan</>.
     Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 656fd2e..a318495 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -105,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e139316..3cdcea4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -130,9 +130,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -977,7 +974,7 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrived tuple.
+	 * quals once per retrieved tuple.
 	 */
 	cost_qual_eval(&tid_qual_cost, tidquals, root);
 
@@ -3201,7 +3198,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b10a2c9..ee3fbab 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1082,7 +1082,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 6aa4890..5a3cfca 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -161,15 +161,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index e1b7a0b..655af19 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -145,6 +145,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 8bdb7db..064640c 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index 94762d5..b3e1e9a 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..a5a205d
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..fc13e9f
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5758b07..bd6fc3f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 78348f5..0e191a2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -91,6 +91,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

pgsql-v9.4-custom-scan.part-3.v5.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v5.patchDownload

 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1075 +++++++++++++++++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 doc/src/sgml/postgres-fdw.sgml                 |   10 +
 src/backend/foreign/foreign.c                  |   29 +
 src/backend/nodes/bitmapset.c                  |   62 ++
 src/backend/optimizer/util/var.c               |   39 +
 src/include/foreign/foreign.h                  |    4 +
 src/include/nodes/bitmapset.h                  |    4 +
 src/include/optimizer/var.h                    |    1 +
 11 files changed, 1360 insertions(+), 171 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index e5e9c2d..85e98b5 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+							  true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index ae3ab00..5e0c421 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -844,84 +822,59 @@ postgresGetForeignPlan(PlannerInfo *root,
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
  */
-static void
-postgresBeginForeignScan(ForeignScanState *node, int eflags)
+static PgFdwScanState *
+commonBeginForeignScan(PlanState *ps, TupleDesc tupdesc,
+					   Oid serverid, Oid userid,
+					   char *remote_query, List *retrieved_attrs,
+					   List *remote_exprs)
 {
-	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
-	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
-	Oid			userid;
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;
-	int			numParams;
-	int			i;
-	ListCell   *lc;
+	ForeignServer  *server;
+	UserMapping	   *user;
+	int				numParams;
+	int				i;
+	ListCell	   *lc;
 
-	/*
-	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
-	 */
-	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
-		return;
-
-	/*
-	 * We'll save private state in node->fdw_state.
-	 */
+	/* Allocation of private state */
 	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
-	node->fdw_state = (void *) fsstate;
-
-	/*
-	 * Identify which user to do the remote access as.	This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
+	fsstate->scan_tupdesc = tupdesc;
+	fsstate->query = remote_query;
+	fsstate->retrieved_attrs = retrieved_attrs;
 
 	/*
 	 * Get connection to the foreign server.  Connection manager will
-	 * establish new connection if necessary.
+	 * establish new connection on demand.
 	 */
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
 	fsstate->cursor_exists = false;
 
-	/* Get private info created by planner functions. */
-	fsstate->query = strVal(list_nth(fsplan->fdw_private,
-									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
-
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
-	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->batch_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											   "postgres_fdw tuple data",
 											   ALLOCSET_DEFAULT_MINSIZE,
 											   ALLOCSET_DEFAULT_INITSIZE,
 											   ALLOCSET_DEFAULT_MAXSIZE);
-	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->temp_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_MINSIZE,
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
 	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->scan_tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
-	numParams = list_length(fsplan->fdw_exprs);
+	numParams = list_length(remote_exprs);
 	fsstate->numParams = numParams;
 	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
 
 	i = 0;
-	foreach(lc, fsplan->fdw_exprs)
+	foreach(lc, remote_exprs)
 	{
 		Node	   *param_expr = (Node *) lfirst(lc);
 		Oid			typefnoid;
@@ -940,17 +893,62 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * benefit, and it'd require postgres_fdw to know more than is desirable
 	 * about Param evaluation.)
 	 */
-	fsstate->param_exprs = (List *)
-		ExecInitExpr((Expr *) fsplan->fdw_exprs,
-					 (PlanState *) node);
+	fsstate->param_exprs = (List *) ExecInitExpr((Expr *) remote_exprs, ps);
 
 	/*
 	 * Allocate buffer for text form of query parameters, if any.
 	 */
 	if (numParams > 0)
-		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+	return fsstate;
+}
+
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
+	PgFdwScanState *fsstate;
+	EState	   *estate = node->ss.ps.state;
+	Relation	rel;
+	char	   *remote_query;
+	List	   *retrieved_attrs;
+	RangeTblEntry *rte;
+	Oid			userid;
+	ForeignTable *table;
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Identify which user to do the remote access as.	This should match what
+	 * ExecCheckRTEPerms() does.
+	 */
+	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	/* Get info about foreign table. */
+	rel = node->ss.ss_currentRelation;
+	table = GetForeignTable(RelationGetRelid(rel));
+
+	/* Get private info created by planner functions. */
+	remote_query = strVal(list_nth(fsplan->fdw_private,
+								   FdwScanPrivateSelectSql));
+	retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
+
+	fsstate = commonBeginForeignScan(&node->ss.ps, RelationGetDescr(rel),
+									 table->serverid, userid,
+									 remote_query, retrieved_attrs,
+									 fsplan->fdw_exprs);
+	fsstate->rel = rel;
+
+	node->fdw_state = fsstate;
 }
 
 /*
@@ -959,17 +957,15 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
  *		EOF.
  */
 static TupleTableSlot *
-postgresIterateForeignScan(ForeignScanState *node)
+commonIterateForeignScan(PgFdwScanState *fsstate, PlanState *ps,
+						 TupleTableSlot *slot)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-
 	/*
 	 * If this is the first call after Begin or ReScan, we need to create the
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, ps->ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +974,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -995,14 +991,22 @@ postgresIterateForeignScan(ForeignScanState *node)
 	return slot;
 }
 
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
 /*
  * postgresReScanForeignScan
  *		Restart the scan.
  */
 static void
-postgresReScanForeignScan(ForeignScanState *node)
+commonReScanForeignScan(PgFdwScanState *fsstate, PlanState *ps)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	char		sql[64];
 	PGresult   *res;
 
@@ -1016,7 +1020,7 @@ postgresReScanForeignScan(ForeignScanState *node)
 	 * be good enough.	If we've only fetched zero or one batch, we needn't
 	 * even rewind the cursor, just rescan what we have.
 	 */
-	if (node->ss.ps.chgParam != NULL)
+	if (ps->chgParam != NULL)
 	{
 		fsstate->cursor_exists = false;
 		snprintf(sql, sizeof(sql), "CLOSE c%u",
@@ -1051,19 +1055,21 @@ postgresReScanForeignScan(ForeignScanState *node)
 	fsstate->eof_reached = false;
 }
 
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
 /*
  * postgresEndForeignScan
  *		Finish scanning foreign table and dispose objects used for this scan
  */
 static void
-postgresEndForeignScan(ForeignScanState *node)
+commonEndForeignScan(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-
-	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
-	if (fsstate == NULL)
-		return;
-
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -1075,6 +1081,18 @@ postgresEndForeignScan(ForeignScanState *node)
 	/* MemoryContexts will be deleted automatically. */
 }
 
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	commonEndForeignScan(fsstate);
+}
+
 /*
  * postgresAddForeignUpdateTargets
  *		Add resjunk column(s) needed for update/delete on a foreign table
@@ -1704,10 +1722,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1881,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1969,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1990,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +2009,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2235,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2529,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2552,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2585,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2671,794 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		FdwRoutine			pgroutine;
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte = planner_rt_fetch(rel->relid, root);
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		memset(&pgroutine, 0, sizeof(FdwRoutine));
+		pgroutine.GetForeignRelSize = postgresGetForeignRelSize;
+
+		if (!is_fdw_managed_relation(rte->relid, &pgroutine))
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (contain_wholerow_reference((Node *)joinrel->reltargetlist) ||
+		contain_wholerow_reference((Node *)j_local_conds))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * Most of logic are equivalent to postgresBeginForeignScan, however,
+ * needs adjustment because of difference in the nature.
+ * The biggest one is, it has to open the underlying relation by itself
+ * and needs to construct tuple-descriptor from the var-list to be fetched,
+ * because custom-scan (in this case; a scan on remote join instead of
+ * local join) does not have a particular relation on its behaind, thus
+ * it needs to manage correctly.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *join_rels = NIL;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+	Oid				userid;
+	int				i;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		join_rels = lappend(join_rels, rel);
+	}
+
+	/*
+	 * Determine a user-id. Current user-id shall be applied without something
+	 * special configuration on the reference.
+	 */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+
+	/* common part to begin remote query execution */
+	fsstate = commonBeginForeignScan(&node->ss.ps, tupdesc,
+									 jinfo.fdw_server_oid, userid,
+									 jinfo.select_qry,
+									 retrieved_attrs,
+									 jinfo.select_params);
+	/* also, underlying relations also have to be saved */
+	fsstate->join_rels = join_rels;
+
+	node->custom_state = fsstate;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * Access method to fetch a tuple from the remote join query.
+ * It performs equivalent job as postgresIterateForeignScan() doing on
+ * queries to single relation.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of regular ExecScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * Nothing are different from postgresEndForeignScan, except for closing
+ * underlying relations by itself.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	/* cleanup resources used in common portion */
+	commonEndForeignScan(fsstate);
+
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * Same as postgresReScanForeignScan() doing.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 85fc25a..5688d8e 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 35924f1..7926d54 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -350,6 +350,16 @@
   </para>
 
   <para>
+   In addition, <productname>PostgreSQL</> 9.4 or later adaptively tries
+   to join relations, being managed by a same foreign server, on the remote
+   node if supplied join condition is sufficient to run on the remote side.
+   It performs as if a local custom scan node walks on a virtual relation
+   being consists of multiple relations according to remote join, thus
+   it usually has cheaper cost than data translation of both relations and
+   local join operations.
+  </para>
+
+  <para>
    The query that is actually sent to the remote server for execution can
    be examined using <command>EXPLAIN VERBOSE</>.
   </para>
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 6d548b7..c33d958 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -23,6 +23,7 @@
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -621,3 +622,31 @@ get_foreign_server_oid(const char *servername, bool missing_ok)
 				 errmsg("server \"%s\" does not exist", servername)));
 	return oid;
 }
+
+/*
+ * is_fdw_managed_relation
+ *
+ * It checks whether the supplied relation is a foreign table managed
+ * by the module that has FdwRoutine, or not.
+ */
+bool
+is_fdw_managed_relation(Oid tableoid, const FdwRoutine *routines_self)
+{
+	FdwRoutine *routines;
+	char		relkind = get_rel_relkind(tableoid);
+
+	if (relkind == RELKIND_FOREIGN_TABLE)
+	{
+		routines = GetFdwRoutineByRelId(tableoid);
+
+		/*
+		 * Our assumption is a particular callback being implemented by
+		 * a particular extension shall not be shared with other extension.
+		 * So, we don't need to compare all the function pointers in the
+		 * FdwRoutine, but only one member.
+		 */
+		if (routines->GetForeignRelSize == routines_self->GetForeignRelSize)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 3a6d0fb..c619d5d 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a' + 10);
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A' + 10);
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index d629fcd..21ca783 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -73,6 +73,7 @@ static bool pull_varattnos_walker(Node *node, pull_varattnos_context *context);
 static bool pull_vars_walker(Node *node, pull_vars_context *context);
 static bool contain_var_clause_walker(Node *node, void *context);
 static bool contain_vars_of_level_walker(Node *node, int *sublevels_up);
+static bool contain_wholerow_reference_walker(Node *node, void *context);
 static bool locate_var_of_level_walker(Node *node,
 						   locate_var_of_level_context *context);
 static bool pull_var_clause_walker(Node *node,
@@ -418,6 +419,44 @@ contain_vars_of_level_walker(Node *node, int *sublevels_up)
 								  (void *) sublevels_up);
 }
 
+/*
+ * contain_wholerow_reference
+ *
+ *    Recursively scan a clause to discover whether it contains any Var nodes
+ *    of whole-row reference in the current query level.
+ *
+ *    Returns true if any such Var found.
+ */
+bool
+contain_wholerow_reference(Node *node)
+{
+	return contain_wholerow_reference_walker(node, NULL);
+}
+
+static bool
+contain_wholerow_reference_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return contain_wholerow_reference_walker((Node *)rinfo->clause,
+												 context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node,
+								  contain_wholerow_reference_walker,
+								  context);
+}
 
 /*
  * locate_var_of_level
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index ac080d7..2340a23 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -13,6 +13,7 @@
 #ifndef FOREIGN_H
 #define FOREIGN_H
 
+#include "foreign/fdwapi.h"
 #include "nodes/parsenodes.h"
 
 
@@ -81,4 +82,7 @@ extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
 extern Oid	get_foreign_data_wrapper_oid(const char *fdwname, bool missing_ok);
 extern Oid	get_foreign_server_oid(const char *servername, bool missing_ok);
 
+extern bool	is_fdw_managed_relation(Oid tableoid,
+									const FdwRoutine *routines_self);
+
 #endif   /* FOREIGN_H */
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index f770608..fa8005d 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */
diff --git a/src/include/optimizer/var.h b/src/include/optimizer/var.h
index fb99a12..f677ff5 100644
--- a/src/include/optimizer/var.h
+++ b/src/include/optimizer/var.h
@@ -36,6 +36,7 @@ extern void pull_varattnos(Node *node, Index varno, Bitmapset **varattnos);
 extern List *pull_vars_of_level(Node *node, int levelsup);
 extern bool contain_var_clause(Node *node);
 extern bool contain_vars_of_level(Node *node, int levelsup);
+extern bool contain_wholerow_reference(Node *node);
 extern int	locate_var_of_level(Node *node, int levelsup);
 extern List *pull_var_clause(Node *node, PVCAggregateBehavior aggbehavior,
 				PVCPlaceHolderBehavior phbehavior);

#21

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Kohei KaiGai (#20)

Re: Custom Scan APIs (Re: Custom Plan node)

Hackers,

Is somebody available to volunteer to review the custom-scan patch?

Even though Hanada-san acknowledged before, it seems to me this patch
has potentially arguable implementations. Even if you have enough time
to review whole of the code, it helps me if you can comment on the
following topics.

(1) Interface to add alternative paths instead of built-in join paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

(2) CUSTOM_VAR for special Var reference

@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR       65000       /* reference to inner subplan */
 #define    OUTER_VAR       65001       /* reference to outer subplan */
 #define    INDEX_VAR       65002       /* reference to index column */
+#define    CUSTOM_VAR      65003       /* reference to custom column */

I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
instead of built-in joins, because its tuples being fetched are usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
Sent: Tuesday, January 14, 2014 11:20 PM
To: Shigeru Hanada
Cc: Kaigai, Kouhei(海外, 浩平); Jim Mlodgenski; Robert Haas; Tom Lane;
PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hello,

The attached patches are the ones rebased to the latest git tree, but no
functional changes from the previous revision on the commit-fest:Nov.
Hanada-san volunteered to review the series of patches, including the
portion for postgres_fdw, then marked it as "ready for committer" on the
last commit fest.
So, I hope someone of committer also volunteer to review the patches for
final checking.

* Part-1 - CustomScan APIs
This patch provides a set of interfaces to interact query-optimizer and
-executor for extensions. The new add_scan_path_hook or add_join_path_hook
allows to offer alternative ways to scan a particular relation or to join
a particular relations.
Then, once the alternative ways are chosen by the optimizer, associated
callbacks shall be kicked from the executor. In this case, extension has
responsibility to return a slot that hold a tuple (or empty for end of scan)
being scanned from the underlying relation.

* Part-2 - contrib/ctidscan
This patch provides a simple example implementation of CustomScan API.
It enables to skip pages when inequality operators are given on ctid system
columns. That is, at least, better than sequential full-scan, so it usually
wins to SeqScan, but Index-scan is much better.

* Part-3 - remote join implementation
This patch provides an example to replace a join by a custom scan node that
runs on a result set of remote join query, on top of existing postgres_fdw
extension. The idea is, a result set of remote query looks like a relation
but intangible, thus, it is feasible to replace a local join by a scan on
the result set of a query executed on the remote host, if both of the relation
to be joined belongs to the identical foreign server.
This patch gives postgres_fdw a capability to run a join on the remote host.

Thanks,

2013/12/16 Shigeru Hanada <shigeru.hanada@gmail.com>:

KaiGai-san,

2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2013/12/16 14:15), Shigeru Hanada wrote:

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.

Check.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.

Check.

I think that these patches are enough considered to mark as "Ready for
Committer".

Regards,
--
Shigeru HANADA

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#21)

Re: Custom Scan APIs (Re: Custom Plan node)

KaiGai Kohei,

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Is somebody available to volunteer to review the custom-scan patch?

I looked through it a bit and my first take away from it was that the
patches to actually use the new hooks were also making more changes to
the backend code, leaving me with the impression that the proposed
interface isn't terribly stable. Perhaps those changes should have just
been in the first patch, but they weren't and that certainly gave me
pause.

I'm also not entirely convinced that this is the direction to go in when
it comes to pushing down joins to FDWs. While that's certainly a goal
that I think we all share, this seems to be intending to add a
completely different feature which happens to be able to be used for
that. For FDWs, wouldn't we only present the FDW with the paths where
the foreign tables for that FDW, or perhaps just a given foreign server,
are being joined?

Thanks,

Stephen

#23

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Stephen Frost (#22)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi Stephen,

Thanks for your comments.

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Is somebody available to volunteer to review the custom-scan patch?

I looked through it a bit and my first take away from it was that the patches
to actually use the new hooks were also making more changes to the backend
code, leaving me with the impression that the proposed interface isn't
terribly stable. Perhaps those changes should have just been in the first
patch, but they weren't and that certainly gave me pause.

Yes, the part-1 patch provides a set of interface portion to interact
between the backend code and extension code. Rest of part-2 and part-3
portions are contrib modules that implements its feature on top of
custom-scan API.

I'm also not entirely convinced that this is the direction to go in when
it comes to pushing down joins to FDWs. While that's certainly a goal that
I think we all share, this seems to be intending to add a completely different
feature which happens to be able to be used for that. For FDWs, wouldn't
we only present the FDW with the paths where the foreign tables for that
FDW, or perhaps just a given foreign server, are being joined?

FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU acceleration
feature on top of this interface, that offers alternative way to scan or join
relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement radix-sort on top
of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down)
is a demonstration of custom-scan interface for application, but not designed
for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
side, it might be an idea to have common code (like a logic to check whether
the both relations to be joined belongs to same foreign server) on the backend
side as something like a gateway of them.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something" on behalf of
a particular data structure being declared with CREATE FOREIGN TABLE.
In other words, extension's responsibility is to generate a view of "something"
according to PostgreSQL' internal data structure, instead of the object itself.
On the other hands, custom-scan interface allows extensions to implement
alternative methods to scan or join particular relations, but it is not a role
to perform as a target being referenced in queries. In other words, it is methods
to access objects.
It is natural both features are similar because both of them intends extensions
to hook the planner and executor, however, its purpose is different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Christian Convey

christian.convey@gmail.com

almost 12 years ago

In reply to: Kouhei Kaigai (#23)

Re: Custom Scan APIs (Re: Custom Plan node)

On Mon, Jan 27, 2014 at 7:14 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU
acceleration
feature on top of this interface, that offers alternative way to scan or
join
relations or potentially sort or aggregate.

I'm curious how this relates to the pluggable storage idea discussed here
https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting and here
http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html

I haven't seen a specific proposal about how much functionality should be
encapsulated by a pluggable storage system. But I wonder if that would be
the best place for specialized table-scan code to end up?

#25

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Christian Convey (#24)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-01-29 Christian Convey <christian.convey@gmail.com>:

On Mon, Jan 27, 2014 at 7:14 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

FDW's join pushing down is one of the valuable use-cases of this
interface,
but not all. As you might know, my motivation is to implement GPU
acceleration
feature on top of this interface, that offers alternative way to scan or
join
relations or potentially sort or aggregate.

I'm curious how this relates to the pluggable storage idea discussed here
https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting and here
http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html

I haven't seen a specific proposal about how much functionality should be
encapsulated by a pluggable storage system. But I wonder if that would be
the best place for specialized table-scan code to end up?

If you are interested in designing your own storage layer (thus it needs to
have own scan/writer implementation), FDW is an option currently available.
It defines a set of interface that allows extensions to generate "things to be
there" on the fly. It does not force to perform as a client of remote database,
even though it was originally designed for dblink purpose.
In other words, FDW is a feature to translate a particular data source into
something visible according to the table definition. As long as driver can
intermediate table definition and data format of your own storage layer,
it shall work.

On the other hands, custom-scan interface, basically, allows extensions to
implement "alternative way to access" the data. If we have wiser way to
scan or join relations than built-in logic (yep, it will be a wiser
logic to scan
a result set of remote-join than local join on a couple of remote scan results),
this interface suggest the backend "I have such a wise strategy", then planner
will choose one of them; including either built-in or additional one, according
to the cost.

Let's back to your question. This interface is, right now, not designed to
implement pluggable storage layer. FDW is an option now, and maybe
a development item in v9.5 cycle if we want regular tables being pluggable.
Because I'm motivated to implement my GPU acceleration feature to
perform on regular relations, I put my higher priority on the interface to
allow extension to suggest "how to scan" it.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Kouhei Kaigai (#23)

3 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Folks,

Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join relations.
Unfortunately, small amount of discussion we could have in this commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.

Prior to time-up, I'd like to ask hacker's opinion about its potential
arguable points (from my standpoint) if it needs to be fixed up.
One is hook definition to add alternative join path, and the other one
is a special varno when a custom scan replace a join node.
I'd like to see your opinion about them while we still have to change
the design if needed.

(1) Interface to add alternative paths in addition to built-in join paths

I'd like to seem idea to improve the current interface specification.

(2) CUSTOM_VAR for special Var reference

@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR       65000       /* reference to inner subplan */
 #define    OUTER_VAR       65001       /* reference to outer subplan */
 #define    INDEX_VAR       65002       /* reference to index column */
+#define    CUSTOM_VAR      65003       /* reference to custom column */

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Hi Stephen,

Thanks for your comments.

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Is somebody available to volunteer to review the custom-scan patch?

I looked through it a bit and my first take away from it was that the patches
to actually use the new hooks were also making more changes to the backend
code, leaving me with the impression that the proposed interface isn't
terribly stable. Perhaps those changes should have just been in the first
patch, but they weren't and that certainly gave me pause.

Yes, the part-1 patch provides a set of interface portion to interact
between the backend code and extension code. Rest of part-2 and part-3
portions are contrib modules that implements its feature on top of
custom-scan API.

I'm also not entirely convinced that this is the direction to go in when
it comes to pushing down joins to FDWs. While that's certainly a goal that
I think we all share, this seems to be intending to add a completely different
feature which happens to be able to be used for that. For FDWs, wouldn't
we only present the FDW with the paths where the foreign tables for that
FDW, or perhaps just a given foreign server, are being joined?

FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU acceleration
feature on top of this interface, that offers alternative way to scan or join
relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement radix-sort on top
of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down)
is a demonstration of custom-scan interface for application, but not designed
for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
side, it might be an idea to have common code (like a logic to check whether
the both relations to be joined belongs to same foreign server) on the backend
side as something like a gateway of them.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something" on behalf of
a particular data structure being declared with CREATE FOREIGN TABLE.
In other words, extension's responsibility is to generate a view of "something"
according to PostgreSQL' internal data structure, instead of the object itself.
On the other hands, custom-scan interface allows extensions to implement
alternative methods to scan or join particular relations, but it is not a role
to perform as a target being referenced in queries. In other words, it is methods
to access objects.
It is natural both features are similar because both of them intends extensions
to hook the planner and executor, however, its purpose is different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-2.v7.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v7.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 760 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 108 ++++
 doc/src/sgml/custom-scan.sgml              |   8 +-
 doc/src/sgml/filelist.sgml                 |   1 +
 src/backend/optimizer/path/costsize.c      |   7 +-
 src/backend/optimizer/plan/setrefs.c       |   2 +-
 src/include/catalog/pg_operator.h          |   4 +
 src/include/optimizer/cost.h               |   3 +
 src/include/optimizer/planmain.h           |   1 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 17 files changed, 1253 insertions(+), 14 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index c90fe29..3c1987d 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..72bbf17
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,760 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the cheapest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+	((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chained with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being chosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chained with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be sought later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we sought.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seek to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 336ba0c..7042d76 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..d010d5c
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,108 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>ctidscan</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequential scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be chosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries">,
+   <xref linkend="guc-local-preload-libraries"> or
+   <xref linkend="guc-session-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..f53902d 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -18,7 +18,7 @@
   Overall, there are four major tasks that a custom-scan provider should 
   implement. The first task is the registration of custom-scan provider itself.
   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
-  entrypoint when the module is loading. The remaing three tasks are all done
+  entrypoint when the module is loading. The reaming three tasks are all done
   when a query is planning and executing. The second task is the submission of
   candidate paths to either scan or join relations with an adequate cost for
   the core planner. Then, the planner will choose the cheapest path from all of
@@ -50,7 +50,7 @@
      <para>
       This custom scan in this module replaces a local join of foreign tables
       managed by <literal>postgres_fdw</literal> with a scan that fetches
-      remotely joined relations. It demostrates the way to implement a custom
+      remotely joined relations. It demonstrates the way to implement a custom
       scan node that performs join nodes.
      </para>
     </listitem>
@@ -145,7 +145,7 @@ typedef struct CustomPath
   <sect2 id="custom-scan-plan">
    <title>Construction of custom plan node</title>
    <para>
-    Once <literal>CustomPath</literal> was choosen by the query planner,
+    Once <literal>CustomPath</literal> was chosen by the query planner,
     it calls back to its associated to the custom scan provider to complete 
     setting up the <literal>CustomScan</literal> plan node according to the
     path information.
@@ -160,7 +160,7 @@ InitCustomScanPlan(PlannerInfo *root,
     The query planner does basic initialization on the <literal>cscan_plan</>
     being allocated, then the custom scan provider can apply final 
     initialization. <literal>cscan_path</> is the path node that was 
-    constructed on the previous stage then was choosen.
+    constructed on the previous stage then was chosen.
     <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
     on the <literal>Plan</> portion in the <literal>cscan_plan</>.
     Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d63b1a8..aa2be4b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -105,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a7ebe7d..581f584 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -132,9 +132,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -933,7 +930,7 @@ cost_tidscan(Path *path, PlannerInfo *root,
 
 	/*
 	 * The TID qual expressions will be computed once, any other baserestrict
-	 * quals once per retrived tuple.
+	 * quals once per retrieved tuple.
 	 */
 	cost_qual_eval(&tid_qual_cost, tidquals, root);
 
@@ -3157,7 +3154,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b10a2c9..ee3fbab 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1082,7 +1082,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index e07d6d9..0f4ba9f 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -161,15 +161,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ec1605d..6d77264 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -143,6 +143,9 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
+						  ParamPathInfo *param_info,
+						  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 8bdb7db..064640c 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index b084e0a..3030a3e 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..a5a205d
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..fc13e9f
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2e3eba8..827acc4 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 4f1dede..df391b8 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -92,6 +92,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

pgsql-v9.4-custom-scan.part-1.v7.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v7.patchDownload

 doc/src/sgml/custom-scan.sgml           | 295 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  99 +++++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  34 +++-
 src/backend/executor/execProcnode.c     |  14 ++
 src/backend/executor/execQual.c         |  10 +-
 src/backend/executor/execUtils.c        |   4 +-
 src/backend/executor/nodeCustom.c       | 252 +++++++++++++++++++++++++++
 src/backend/nodes/copyfuncs.c           |  30 ++++
 src/backend/nodes/outfuncs.c            |  19 ++
 src/backend/nodes/print.c               |   4 +
 src/backend/optimizer/path/allpaths.c   |  23 +++
 src/backend/optimizer/path/costsize.c   |   2 +-
 src/backend/optimizer/path/joinpath.c   |  18 ++
 src/backend/optimizer/plan/createplan.c | 104 +++++++++++
 src/backend/optimizer/plan/setrefs.c    |  25 +++
 src/backend/optimizer/plan/subselect.c  |  10 ++
 src/backend/optimizer/util/pathnode.c   |  40 +++++
 src/backend/utils/adt/ruleutils.c       |  44 ++++-
 src/include/executor/executor.h         |   3 +-
 src/include/executor/nodeCustom.h       |  94 ++++++++++
 src/include/nodes/execnodes.h           |  17 ++
 src/include/nodes/nodes.h               |   3 +
 src/include/nodes/plannodes.h           |  16 ++
 src/include/nodes/primnodes.h           |   1 +
 src/include/nodes/relation.h            |  16 ++
 src/include/optimizer/pathnode.h        |  10 ++
 src/include/optimizer/paths.h           |  25 +++
 30 files changed, 1201 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..b57d82f
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,295 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-scan API enables an extension to provide alternative ways to scan
+  or join relations leveraging the cost based optimizer. The API consists of a
+  set of callbacks, with a unique names, to be invoked during query planning 
+  and execution. A custom-scan provider should implement these callback 
+  functions according to the expectation of the API.
+ </para>
+ <para>
+  Overall, there are four major tasks that a custom-scan provider should 
+  implement. The first task is the registration of custom-scan provider itself.
+  Usually, this needs to be done once at the <literal>_PG_init()</literal> 
+  entrypoint when the module is loading. The remaing three tasks are all done
+  when a query is planning and executing. The second task is the submission of
+  candidate paths to either scan or join relations with an adequate cost for
+  the core planner. Then, the planner will choose the cheapest path from all of
+  the candidates. If the custom path survived, the planner starts the third 
+  task; construction of a <literal>CustomScan</literal> plan node, located
+  within the query plan tree instead of the built-in plan node. The last task
+  is the execution of its implementation in answer to invocations by the core
+  executor.
+ </para>
+ <para>
+  Some of contrib modules utilize the custom-scan API. They may provide a good
+  example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan to skip earlier pages or
+      terminate prior to end of the relation, if the inequality operator on the
+      <literal>ctid</literal> system column can narrow down the scope to be
+      scanned, instead of a sequential scan which reads a relation from the
+      head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      This custom scan in this module replaces a local join of foreign tables
+      managed by <literal>postgres_fdw</literal> with a scan that fetches
+      remotely joined relations. It demostrates the way to implement a custom
+      scan node that performs join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Currently, only scan and join are fully supported with integrated cost
+  based query optimization using the custom scan API. You might be able to
+  implement other stuff, like sort or aggregation, with manipulation of the
+  planned tree, however, the extension has to be responsible to handle this
+  replacement correctly. There is no support in the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first task for a custom scan provider is the registration of a set of
+    callbacks with a unique names. Usually, this is done once upon module
+    loading in the <literal>_PG_init()</literal> entrypoint.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds the best way to scan or join relations from various
+    potential paths using a combination of scan algorithms and target 
+    relations. Prior to this selection, we list all of the potential paths
+    towards a target relation (if it is a base relation) or a pair of relations
+    (if it is a join). The <literal>add_scan_path_hook</> and
+    <literal>add_join_path_hook</> allow extensions to add alternative scan
+    paths in addition to built-in paths.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct a <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is a common field for all the path nodes to store
+    a cost estimation. In addition, <literal>custom_name</> is the name of
+    the registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, the custom scan provider has to be responsible to mark and
+        restore a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scans.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> was choosen by the query planner,
+    it calls back to its associated to the custom scan provider to complete 
+    setting up the <literal>CustomScan</literal> plan node according to the
+    path information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    The query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then the custom scan provider can apply final 
+    initialization. <literal>cscan_path</> is the path node that was 
+    constructed on the previous stage then was choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during a relation scan. Its expression portion will also be
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after construction of the plan node.
+    For example, Var node in the target list of the join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if the custom scan node is a vanilla relation
+    scan because there is nothing special to do. Elsewhere, it needs to
+    be handled by the custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    The query executor also launches the associated callbacks to begin, execute
+    and end the custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows the custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch the actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by the custom scan
+    provider. Also, <literal>eflags</> has flag bits of the executor's
+    operating mode for this plan node. Note that the custom scan provider
+    should not perform anything visible externally if 
+    <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations, if joining,
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether the next
+    tuple matches the qualifier of this scan, or not.
+    The usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations, if
+    joining, according to the custom logic. Pay attention the data format (and
+    the way to return also) since it depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and releases resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> was saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 09de4bd..d63b1a8 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index b47bf52..ed76d33 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 08f3167..2a6136d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -90,6 +91,7 @@ static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -721,6 +723,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -847,6 +854,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -935,6 +944,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+		    custom_name = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1036,6 +1052,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom Provider", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1051,6 +1069,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1347,6 +1369,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->functions != NIL && es->verbose)
+			{
+				List	   *fexprs = NIL;
+				ListCell   *lc;
+
+				foreach(lc, ((CustomScan *) plan)->functions)
+				{
+					RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
+
+					fexprs = lappend(fexprs, rtfunc->funcexpr);
+				}
+				/* We rely on show_expression to insert commas as needed */
+				show_expression((Node *) fexprs,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			}
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1986,6 +2031,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2158,6 +2216,47 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				List	   *functions = ((CustomScan *) plan)->functions;
+
+				if (functions && list_length(functions) == 1)
+				{
+					RangeTblFunction *rtfunc = linitial(functions);
+
+					if (IsA(rtfunc->funcexpr, FuncExpr))
+					{
+						FuncExpr   *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+						Oid			funcid = funcexpr->funcid;
+
+						objectname = get_func_name(funcid);
+						if (es->verbose)
+							namespace =
+								get_namespace_name(get_func_namespace(funcid));
+					}
+				}
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 8c01a63..2443e24 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c5ecd18..b4a7411 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 0eba025..e71ce9b 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 46895b2..58d7190 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c89d808..d48b3d7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(functions);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3983,6 +4010,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bfb4b9f..7dc1631 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -564,6 +564,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(functions);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2828,6 +2844,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 9f7f322..9f2b6bb 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..c7fcb80 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -389,6 +391,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -417,6 +422,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1236,6 +1244,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1307,6 +1318,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1330,6 +1344,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1399,6 +1416,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1452,6 +1472,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 9bca968..a7ebe7d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2268,7 +2268,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index a996116..48f5ad4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 184d37a..c07e000 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -233,6 +237,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -409,6 +414,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2006,6 +2018,98 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			List   *functions = rte->functions;
+
+			if (best_path->path.param_info)
+				functions = (List *)
+					replace_nestloop_params(root, (Node *)functions);
+			scan_plan->functions = functions;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information.
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 46affe7..b10a2c9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -576,6 +577,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a3f3583..74ff415 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2236,6 +2236,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b79af7a..17827e0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomPath *pathnode = makeNode(CustomPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index add5cd1..d099d16 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -145,6 +145,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2482,14 +2483,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3747,6 +3753,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5414,6 +5428,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5684,6 +5710,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index eb78776..7fe0998 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a301a08..621830a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1501,6 +1501,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5b8df59..681c1b1 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 38c039c..85d088d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -479,6 +479,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	List	   *functions;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 4992bc0..7d9b0c0 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 8aa40d0..527a060 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -877,6 +877,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index a0bcc82..b99d841 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomPath *create_customscan_path(PlannerInfo *root,
+										  RelOptInfo *baserel,
+										  double rows,
+										  Cost startup_cost,
+										  Cost total_cost,
+										  List *pathkeys,
+										  Relids required_outer,
+										  const char *custom_name,
+										  uint32 custom_flags,
+										  List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9b22fda..a561387 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {														\
+		if (add_scan_path_hook)										\
+			(*add_scan_path_hook)((root),(baserel),(rte));			\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,

pgsql-v9.4-custom-scan.part-3.v7.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v7.patchDownload

 contrib/postgres_fdw/deparse.c                 |  209 ++++-
 contrib/postgres_fdw/expected/postgres_fdw.out |   34 +-
 contrib/postgres_fdw/postgres_fdw.c            | 1075 +++++++++++++++++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   64 ++
 doc/src/sgml/postgres-fdw.sgml                 |   10 +
 src/backend/foreign/foreign.c                  |   29 +
 src/backend/nodes/bitmapset.c                  |   62 ++
 src/backend/optimizer/util/var.c               |   39 +
 src/include/foreign/foreign.h                  |    4 +
 src/include/nodes/bitmapset.h                  |    4 +
 src/include/optimizer/var.h                    |    1 +
 11 files changed, 1360 insertions(+), 171 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index e5e9c2d..85e98b5 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,8 +44,10 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/pathnode.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
@@ -88,6 +90,7 @@ typedef struct deparse_expr_cxt
 	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	bool		var_qualified;	/* columns reference needs to be qualified */
 } deparse_expr_cxt;
 
 /*
@@ -106,6 +109,8 @@ static void deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -113,7 +118,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
 					 List *returningList,
 					 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-				 PlannerInfo *root);
+							 bool var_qualified, PlannerInfo *root);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseStringLiteral(StringInfo buf, const char *val);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
@@ -142,6 +147,7 @@ static void deparseArrayExpr(ArrayExpr *node, deparse_expr_cxt *context);
 void
 classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds)
 {
@@ -150,7 +156,7 @@ classifyConditions(PlannerInfo *root,
 	*remote_conds = NIL;
 	*local_conds = NIL;
 
-	foreach(lc, baserel->baserestrictinfo)
+	foreach(lc, restrictinfo_list)
 	{
 		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
 
@@ -244,7 +250,7 @@ foreign_expr_walker(Node *node,
 				 * Param's collation, ie it's not safe for it to have a
 				 * non-default collation.
 				 */
-				if (var->varno == glob_cxt->foreignrel->relid &&
+				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
 					var->varlevelsup == 0)
 				{
 					/* Var belongs to foreign table */
@@ -678,8 +684,8 @@ deparseSelectSql(StringInfo buf,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, baserel->relid, rel, true, false,
+					  attrs_used, retrieved_attrs);
 
 	/*
 	 * Construct FROM clause
@@ -702,12 +708,13 @@ deparseTargetList(StringInfo buf,
 				  PlannerInfo *root,
 				  Index rtindex,
 				  Relation rel,
+				  bool first,
+				  bool qualified,
 				  Bitmapset *attrs_used,
 				  List **retrieved_attrs)
 {
 	TupleDesc	tupdesc = RelationGetDescr(rel);
 	bool		have_wholerow;
-	bool		first;
 	int			i;
 
 	*retrieved_attrs = NIL;
@@ -716,7 +723,6 @@ deparseTargetList(StringInfo buf,
 	have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
 								  attrs_used);
 
-	first = true;
 	for (i = 1; i <= tupdesc->natts; i++)
 	{
 		Form_pg_attribute attr = tupdesc->attrs[i - 1];
@@ -733,7 +739,9 @@ deparseTargetList(StringInfo buf,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, i, root);
+			if (qualified)
+				appendStringInfo(buf, "r%d.", rtindex);
+			deparseColumnRef(buf, rtindex, i, false, root);
 
 			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
 		}
@@ -781,6 +789,8 @@ appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params)
 {
 	deparse_expr_cxt context;
@@ -795,6 +805,7 @@ appendWhereClause(StringInfo buf,
 	context.foreignrel = baserel;
 	context.buf = buf;
 	context.params_list = params;
+	context.var_qualified = qualified;
 
 	/* Make sure any constants in the exprs are printed portably */
 	nestlevel = set_transmission_modes();
@@ -805,7 +816,7 @@ appendWhereClause(StringInfo buf,
 
 		/* Connect expressions with "AND" and parenthesize each condition. */
 		if (is_first)
-			appendStringInfoString(buf, " WHERE ");
+			appendStringInfoString(buf, !is_join_on ? " WHERE " : " ON ");
 		else
 			appendStringInfoString(buf, " AND ");
 
@@ -852,7 +863,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				appendStringInfoString(buf, ", ");
 			first = false;
 
-			deparseColumnRef(buf, rtindex, attnum, root);
+			deparseColumnRef(buf, rtindex, attnum, false, root);
 		}
 
 		appendStringInfoString(buf, ") VALUES (");
@@ -912,7 +923,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 			appendStringInfoString(buf, ", ");
 		first = false;
 
-		deparseColumnRef(buf, rtindex, attnum, root);
+		deparseColumnRef(buf, rtindex, attnum, false, root);
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
@@ -968,8 +979,165 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
 				   &attrs_used);
 
 	appendStringInfoString(buf, " RETURNING ");
-	deparseTargetList(buf, root, rtindex, rel, attrs_used,
-					  retrieved_attrs);
+	deparseTargetList(buf, root, rtindex, rel, true, false,
+					  attrs_used, retrieved_attrs);
+}
+
+/*
+ * deparseRemoteJoinRelation
+ *
+ * The main job portion of deparseRemoteJoinSql. It deparses a relation,
+ * might be join not only regular table, to SQL expression.
+ */
+static void
+deparseRemoteJoinRelation(StringInfo tlist_buf,
+						  StringInfo from_buf,
+						  StringInfo where_buf,
+						  PlannerInfo *root, Node *relinfo,
+						  List *target_list, List *local_conds,
+						  List **select_vars, List **select_params)
+{
+	/*
+	 * 'relinfo' is either List or Integer.
+	 * In case of List, it is a packed PgRemoteJoinInfo that contains
+	 * outer and inner join references, so needs to deparse recursively.
+	 * In case of Integer, it is rtindex of a particular foreign table.
+	 */
+	if (IsA(relinfo, List))
+	{
+		PgRemoteJoinInfo jinfo;
+
+		unpackPgRemoteJoinInfo(&jinfo, (List *)relinfo);
+
+		appendStringInfoChar(from_buf, '(');
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.outer_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		switch (jinfo.jointype)
+		{
+			case JOIN_INNER:
+				appendStringInfoString(from_buf, " JOIN ");
+				break;
+			case JOIN_LEFT:
+				appendStringInfoString(from_buf, " LEFT JOIN ");
+				break;
+			case JOIN_FULL:
+				appendStringInfoString(from_buf, " FULL JOIN ");
+				break;
+			case JOIN_RIGHT:
+				appendStringInfoString(from_buf, " RIGHT JOIN ");
+				break;
+			default:
+				elog(ERROR, "unexpected join type: %d", (int)jinfo.jointype);
+				break;
+		}
+		deparseRemoteJoinRelation(tlist_buf, from_buf, where_buf,
+								  root, jinfo.inner_rel,
+								  target_list, local_conds,
+								  select_vars, select_params);
+		if (jinfo.remote_conds)
+		{
+			RelOptInfo *joinrel = find_join_rel(root, jinfo.relids);
+			appendWhereClause(from_buf, root, joinrel,
+							  jinfo.remote_conds,
+							  true, true, true, select_params);
+		}
+		else
+		{
+			/* prevent syntax error */
+			appendStringInfoString(from_buf, " ON true");
+		}
+		appendStringInfoChar(from_buf, ')');
+	}
+	else if (IsA(relinfo, Integer))
+	{
+		Index			rtindex = intVal(relinfo);
+		RangeTblEntry  *rte = planner_rt_fetch(rtindex, root);
+		RelOptInfo	   *baserel = root->simple_rel_array[rtindex];
+		Relation		rel;
+		TupleDesc		tupdesc;
+		Bitmapset	   *attrs_used = NULL;
+		List		   *retrieved_attrs = NIL;
+		ListCell	   *lc;
+		PgFdwRelationInfo *fpinfo;
+
+		rel = heap_open(rte->relid, NoLock);
+		deparseRelation(from_buf, rel);
+		appendStringInfo(from_buf, " r%d", rtindex);
+
+		pull_varattnos((Node *) target_list, rtindex, &attrs_used);
+		pull_varattnos((Node *) local_conds, rtindex, &attrs_used);
+		deparseTargetList(tlist_buf, root, rtindex, rel,
+						  (bool)(tlist_buf->len == 0), true,
+						  attrs_used, &retrieved_attrs);
+
+		/*
+		 * Columns being referenced in target-list and local conditions has
+		 * to be fetched from the remote server, but not all the columns.
+		 */
+		tupdesc = RelationGetDescr(rel);
+		foreach (lc, retrieved_attrs)
+		{
+			AttrNumber	anum = lfirst_int(lc);
+			Form_pg_attribute attr = tupdesc->attrs[anum - 1];
+
+			*select_vars = lappend(*select_vars,
+								   makeVar(rtindex,
+										   anum,
+										   attr->atttypid,
+										   attr->atttypmod,
+										   attr->attcollation,
+										   0));
+		}
+		/* deparse WHERE clause, to be appended later */
+		fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+		if (fpinfo->remote_conds)
+			appendWhereClause(where_buf, root, baserel,
+							  fpinfo->remote_conds,
+							  where_buf->len == 0, false, true,
+							  select_params);
+
+		heap_close(rel, NoLock);
+	}
+	else
+		elog(ERROR, "unexpected path type: %d", (int)nodeTag(relinfo));
+}
+
+/*
+ * deparseRemoteJoinSql
+ *
+ * It deparses a join tree to be executed on the remote server.
+ * It assumes the top-level 'relinfo' is one for remote join relation, thus
+ * it has to be a List object that packs PgRemoteJoinInfo.
+ */
+void
+deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+					 List *relinfo,
+					 List *target_list,
+					 List *local_conds,
+					 List **select_vars,
+					 List **select_params)
+{
+	StringInfoData	tlist_buf;
+	StringInfoData	from_buf;
+	StringInfoData	where_buf;
+
+	Assert(IsA(relinfo, List));
+	initStringInfo(&tlist_buf);
+	initStringInfo(&from_buf);
+	initStringInfo(&where_buf);
+
+	deparseRemoteJoinRelation(&tlist_buf, &from_buf, &where_buf,
+							  root, (Node *)relinfo,
+							  target_list, local_conds,
+							  select_vars, select_params);
+	appendStringInfo(buf, "SELECT %s FROM %s%s",
+					 tlist_buf.len > 0 ? tlist_buf.data : "NULL",
+					 from_buf.data,
+					 where_buf.len > 0 ? where_buf.data : "");
+	pfree(tlist_buf.data);
+	pfree(from_buf.data);
 }
 
 /*
@@ -1060,7 +1228,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
  * If it has a column_name FDW option, use that instead of attribute name.
  */
 static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno,
+				 bool var_qualified, PlannerInfo *root)
 {
 	RangeTblEntry *rte;
 	char	   *colname = NULL;
@@ -1096,6 +1265,13 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
 	if (colname == NULL)
 		colname = get_relid_attribute_name(rte->relid, varattno);
 
+	/*
+	 * In case of remote join, column reference may become bogus without
+	 * qualification to relations.
+	 */
+	if (var_qualified)
+		appendStringInfo(buf, "r%d.", varno);
+
 	appendStringInfoString(buf, quote_identifier(colname));
 }
 
@@ -1243,11 +1419,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 
-	if (node->varno == context->foreignrel->relid &&
+	if (bms_is_member(node->varno, context->foreignrel->relids) &&
 		node->varlevelsup == 0)
 	{
 		/* Var belongs to foreign table */
-		deparseColumnRef(buf, node->varno, node->varattno, context->root);
+		deparseColumnRef(buf, node->varno, node->varattno,
+						 context->var_qualified, context->root);
 	}
 	else
 	{
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 38c6cf8..e6368c5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -450,17 +450,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
 -- parameterized remote path
 EXPLAIN (VERBOSE, COSTS false)
   SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
- Nested Loop
-   Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-   ->  Foreign Scan on public.ft2 a
-         Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
-   ->  Foreign Scan on public.ft2 b
-         Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
-         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+                                                                                                             QUERY PLAN                                                                                                              
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON ((r1.c2 = r2."C 1"))) WHERE ((r1."C 1" = 47))
+(3 rows)
 
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
  c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
@@ -474,17 +469,12 @@ SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
 -- simple join
 PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
 EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Nested Loop
-   Output: t1.c3, t2.c3
-   ->  Foreign Scan on public.ft1 t1
-         Output: t1.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
-   ->  Foreign Scan on public.ft2 t2
-         Output: t2.c3
-         Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
-(8 rows)
+                                                           QUERY PLAN                                                            
+---------------------------------------------------------------------------------------------------------------------------------
+ Custom Scan (postgres-fdw)
+   Output: c3, c3
+   Remote SQL: SELECT r1.c3, r2.c3 FROM ("S 1"."T 1" r1 JOIN "S 1"."T 1" r2 ON true) WHERE ((r1."C 1" = 1)) AND ((r2."C 1" = 2))
+(3 rows)
 
 EXECUTE st1(1, 1);
   c3   |  c3   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fde1ec1..18caa1a 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
@@ -47,40 +48,6 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /*
- * FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table.  This information is collected by postgresGetForeignRelSize.
- */
-typedef struct PgFdwRelationInfo
-{
-	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
-	List	   *remote_conds;
-	List	   *local_conds;
-
-	/* Bitmap of attr numbers we need to fetch from the remote server. */
-	Bitmapset  *attrs_used;
-
-	/* Cost and selectivity of local_conds. */
-	QualCost	local_conds_cost;
-	Selectivity local_conds_sel;
-
-	/* Estimated size and cost for a scan with baserestrictinfo quals. */
-	double		rows;
-	int			width;
-	Cost		startup_cost;
-	Cost		total_cost;
-
-	/* Options extracted from catalogs. */
-	bool		use_remote_estimate;
-	Cost		fdw_startup_cost;
-	Cost		fdw_tuple_cost;
-
-	/* Cached catalog information. */
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;			/* only set in use_remote_estimate mode */
-} PgFdwRelationInfo;
-
-/*
  * Indexes of FDW-private information stored in fdw_private lists.
  *
  * We store various information in ForeignScan.fdw_private to pass it from
@@ -129,6 +96,9 @@ enum FdwModifyPrivateIndex
 typedef struct PgFdwScanState
 {
 	Relation	rel;			/* relcache entry for the foreign table */
+	List	   *join_rels;		/* list of underlying relcache entries, if *
+								 * remote join on top of CustomScan */
+	TupleDesc	scan_tupdesc;	/* tuple descriptor of scanned relation */
 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
 
 	/* extracted fdw_private data */
@@ -214,7 +184,8 @@ typedef struct PgFdwAnalyzeState
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry */
+	const char *relname;		/* name of the foreign table, if any */
+	TupleDesc	tupdesc;		/* tuple descriptor of scanned relation */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
 } ConversionLocation;
 
@@ -306,8 +277,8 @@ static void get_remote_estimate(const char *sql,
 static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
 						  EquivalenceClass *ec, EquivalenceMember *em,
 						  void *arg);
-static void create_cursor(ForeignScanState *node);
-static void fetch_more_data(ForeignScanState *node);
+static void create_cursor(PgFdwScanState *fsstate, ExprContext *econtext);
+static void fetch_more_data(PgFdwScanState *fsstate);
 static void close_cursor(PGconn *conn, unsigned int cursor_number);
 static void prepare_foreign_modify(PgFdwModifyState *fmstate);
 static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
@@ -323,12 +294,19 @@ static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
 
+void		_PG_init(void);
+
+/*
+ * Static variables
+ */
+static add_join_path_hook_type	add_join_path_next = NULL;
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
@@ -444,7 +422,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	 * Identify which baserestrictinfo clauses can be sent to the remote
 	 * server and which can't.
 	 */
-	classifyConditions(root, baserel,
+	classifyConditions(root, baserel, baserel->baserestrictinfo,
 					   &fpinfo->remote_conds, &fpinfo->local_conds);
 
 	/*
@@ -770,7 +748,7 @@ postgresGetForeignPlan(PlannerInfo *root,
 					 &retrieved_attrs);
 	if (remote_conds)
 		appendWhereClause(&sql, root, baserel, remote_conds,
-						  true, &params_list);
+						  true, false, false, &params_list);
 
 	/*
 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
@@ -844,84 +822,59 @@ postgresGetForeignPlan(PlannerInfo *root,
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
  */
-static void
-postgresBeginForeignScan(ForeignScanState *node, int eflags)
+static PgFdwScanState *
+commonBeginForeignScan(PlanState *ps, TupleDesc tupdesc,
+					   Oid serverid, Oid userid,
+					   char *remote_query, List *retrieved_attrs,
+					   List *remote_exprs)
 {
-	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
-	EState	   *estate = node->ss.ps.state;
 	PgFdwScanState *fsstate;
-	RangeTblEntry *rte;
-	Oid			userid;
-	ForeignTable *table;
-	ForeignServer *server;
-	UserMapping *user;
-	int			numParams;
-	int			i;
-	ListCell   *lc;
+	ForeignServer  *server;
+	UserMapping	   *user;
+	int				numParams;
+	int				i;
+	ListCell	   *lc;
 
-	/*
-	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
-	 */
-	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
-		return;
-
-	/*
-	 * We'll save private state in node->fdw_state.
-	 */
+	/* Allocation of private state */
 	fsstate = (PgFdwScanState *) palloc0(sizeof(PgFdwScanState));
-	node->fdw_state = (void *) fsstate;
-
-	/*
-	 * Identify which user to do the remote access as.	This should match what
-	 * ExecCheckRTEPerms() does.
-	 */
-	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
-	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-	/* Get info about foreign table. */
-	fsstate->rel = node->ss.ss_currentRelation;
-	table = GetForeignTable(RelationGetRelid(fsstate->rel));
-	server = GetForeignServer(table->serverid);
-	user = GetUserMapping(userid, server->serverid);
+	fsstate->scan_tupdesc = tupdesc;
+	fsstate->query = remote_query;
+	fsstate->retrieved_attrs = retrieved_attrs;
 
 	/*
 	 * Get connection to the foreign server.  Connection manager will
-	 * establish new connection if necessary.
+	 * establish new connection on demand.
 	 */
+	server = GetForeignServer(serverid);
+	user = GetUserMapping(userid, serverid);
 	fsstate->conn = GetConnection(server, user, false);
 
 	/* Assign a unique ID for my cursor */
 	fsstate->cursor_number = GetCursorNumber(fsstate->conn);
 	fsstate->cursor_exists = false;
 
-	/* Get private info created by planner functions. */
-	fsstate->query = strVal(list_nth(fsplan->fdw_private,
-									 FdwScanPrivateSelectSql));
-	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
-											   FdwScanPrivateRetrievedAttrs);
-
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
-	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->batch_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											   "postgres_fdw tuple data",
 											   ALLOCSET_DEFAULT_MINSIZE,
 											   ALLOCSET_DEFAULT_INITSIZE,
 											   ALLOCSET_DEFAULT_MAXSIZE);
-	fsstate->temp_cxt = AllocSetContextCreate(estate->es_query_cxt,
+	fsstate->temp_cxt = AllocSetContextCreate(ps->state->es_query_cxt,
 											  "postgres_fdw temporary data",
 											  ALLOCSET_SMALL_MINSIZE,
 											  ALLOCSET_SMALL_INITSIZE,
 											  ALLOCSET_SMALL_MAXSIZE);
 
 	/* Get info we'll need for input data conversion. */
-	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->scan_tupdesc);
 
 	/* Prepare for output conversion of parameters used in remote query. */
-	numParams = list_length(fsplan->fdw_exprs);
+	numParams = list_length(remote_exprs);
 	fsstate->numParams = numParams;
 	fsstate->param_flinfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * numParams);
 
 	i = 0;
-	foreach(lc, fsplan->fdw_exprs)
+	foreach(lc, remote_exprs)
 	{
 		Node	   *param_expr = (Node *) lfirst(lc);
 		Oid			typefnoid;
@@ -940,17 +893,62 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 * benefit, and it'd require postgres_fdw to know more than is desirable
 	 * about Param evaluation.)
 	 */
-	fsstate->param_exprs = (List *)
-		ExecInitExpr((Expr *) fsplan->fdw_exprs,
-					 (PlanState *) node);
+	fsstate->param_exprs = (List *) ExecInitExpr((Expr *) remote_exprs, ps);
 
 	/*
 	 * Allocate buffer for text form of query parameters, if any.
 	 */
 	if (numParams > 0)
-		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
+		fsstate->param_values = palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+	return fsstate;
+}
+
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
+	PgFdwScanState *fsstate;
+	EState	   *estate = node->ss.ps.state;
+	Relation	rel;
+	char	   *remote_query;
+	List	   *retrieved_attrs;
+	RangeTblEntry *rte;
+	Oid			userid;
+	ForeignTable *table;
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Identify which user to do the remote access as.	This should match what
+	 * ExecCheckRTEPerms() does.
+	 */
+	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+	/* Get info about foreign table. */
+	rel = node->ss.ss_currentRelation;
+	table = GetForeignTable(RelationGetRelid(rel));
+
+	/* Get private info created by planner functions. */
+	remote_query = strVal(list_nth(fsplan->fdw_private,
+								   FdwScanPrivateSelectSql));
+	retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
+										FdwScanPrivateRetrievedAttrs);
+
+	fsstate = commonBeginForeignScan(&node->ss.ps, RelationGetDescr(rel),
+									 table->serverid, userid,
+									 remote_query, retrieved_attrs,
+									 fsplan->fdw_exprs);
+	fsstate->rel = rel;
+
+	node->fdw_state = fsstate;
 }
 
 /*
@@ -959,17 +957,15 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
  *		EOF.
  */
 static TupleTableSlot *
-postgresIterateForeignScan(ForeignScanState *node)
+commonIterateForeignScan(PgFdwScanState *fsstate, PlanState *ps,
+						 TupleTableSlot *slot)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-
 	/*
 	 * If this is the first call after Begin or ReScan, we need to create the
 	 * cursor on the remote side.
 	 */
 	if (!fsstate->cursor_exists)
-		create_cursor(node);
+		create_cursor(fsstate, ps->ps_ExprContext);
 
 	/*
 	 * Get some more tuples, if we've run out.
@@ -978,7 +974,7 @@ postgresIterateForeignScan(ForeignScanState *node)
 	{
 		/* No point in another fetch if we already detected EOF, though. */
 		if (!fsstate->eof_reached)
-			fetch_more_data(node);
+			fetch_more_data(fsstate);
 		/* If we didn't get any tuples, must be end of data. */
 		if (fsstate->next_tuple >= fsstate->num_tuples)
 			return ExecClearTuple(slot);
@@ -995,14 +991,22 @@ postgresIterateForeignScan(ForeignScanState *node)
 	return slot;
 }
 
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
 /*
  * postgresReScanForeignScan
  *		Restart the scan.
  */
 static void
-postgresReScanForeignScan(ForeignScanState *node)
+commonReScanForeignScan(PgFdwScanState *fsstate, PlanState *ps)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	char		sql[64];
 	PGresult   *res;
 
@@ -1016,7 +1020,7 @@ postgresReScanForeignScan(ForeignScanState *node)
 	 * be good enough.	If we've only fetched zero or one batch, we needn't
 	 * even rewind the cursor, just rescan what we have.
 	 */
-	if (node->ss.ps.chgParam != NULL)
+	if (ps->chgParam != NULL)
 	{
 		fsstate->cursor_exists = false;
 		snprintf(sql, sizeof(sql), "CLOSE c%u",
@@ -1051,19 +1055,21 @@ postgresReScanForeignScan(ForeignScanState *node)
 	fsstate->eof_reached = false;
 }
 
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
 /*
  * postgresEndForeignScan
  *		Finish scanning foreign table and dispose objects used for this scan
  */
 static void
-postgresEndForeignScan(ForeignScanState *node)
+commonEndForeignScan(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-
-	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
-	if (fsstate == NULL)
-		return;
-
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -1075,6 +1081,18 @@ postgresEndForeignScan(ForeignScanState *node)
 	/* MemoryContexts will be deleted automatically. */
 }
 
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	commonEndForeignScan(fsstate);
+}
+
 /*
  * postgresAddForeignUpdateTargets
  *		Add resjunk column(s) needed for update/delete on a foreign table
@@ -1704,10 +1722,10 @@ estimate_path_cost_size(PlannerInfo *root,
 						 &retrieved_attrs);
 		if (fpinfo->remote_conds)
 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
-							  true, NULL);
+							  true, false, false, NULL);
 		if (join_conds)
 			appendWhereClause(&sql, root, baserel, join_conds,
-							  (fpinfo->remote_conds == NIL), NULL);
+						  (fpinfo->remote_conds == NIL), false, false, NULL);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -1863,10 +1881,8 @@ ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  * Create cursor for node's query with current parameter values.
  */
 static void
-create_cursor(ForeignScanState *node)
+create_cursor(PgFdwScanState *fsstate, ExprContext *econtext)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
-	ExprContext *econtext = node->ss.ps.ps_ExprContext;
 	int			numParams = fsstate->numParams;
 	const char **values = fsstate->param_values;
 	PGconn	   *conn = fsstate->conn;
@@ -1953,9 +1969,8 @@ create_cursor(ForeignScanState *node)
  * Fetch some more rows from the node's cursor.
  */
 static void
-fetch_more_data(ForeignScanState *node)
+fetch_more_data(PgFdwScanState *fsstate)
 {
-	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
 	PGresult   *volatile res = NULL;
 	MemoryContext oldcontext;
 
@@ -1975,6 +1990,7 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		const char *relname = NULL;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
 		fetch_size = 100;
@@ -1993,11 +2009,15 @@ fetch_more_data(ForeignScanState *node)
 		fsstate->num_tuples = numrows;
 		fsstate->next_tuple = 0;
 
+		if (fsstate->rel)
+			relname = RelationGetRelationName(fsstate->rel);
+
 		for (i = 0; i < numrows; i++)
 		{
 			fsstate->tuples[i] =
 				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
+										   relname,
+										   fsstate->scan_tupdesc,
 										   fsstate->attinmeta,
 										   fsstate->retrieved_attrs,
 										   fsstate->temp_cxt);
@@ -2215,11 +2235,13 @@ store_returning_result(PgFdwModifyState *fmstate,
 	{
 		HeapTuple	newtup;
 
-		newtup = make_tuple_from_result_row(res, 0,
-											fmstate->rel,
-											fmstate->attinmeta,
-											fmstate->retrieved_attrs,
-											fmstate->temp_cxt);
+		newtup =
+			make_tuple_from_result_row(res, 0,
+									   RelationGetRelationName(fmstate->rel),
+									   RelationGetDescr(fmstate->rel),
+									   fmstate->attinmeta,
+									   fmstate->retrieved_attrs,
+									   fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
 		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
 	}
@@ -2507,11 +2529,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		 */
 		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
 
-		astate->rows[pos] = make_tuple_from_result_row(res, row,
-													   astate->rel,
-													   astate->attinmeta,
-													 astate->retrieved_attrs,
-													   astate->temp_cxt);
+		astate->rows[pos] =
+			make_tuple_from_result_row(res, row,
+									   RelationGetRelationName(astate->rel),
+									   RelationGetDescr(astate->rel),
+									   astate->attinmeta,
+									   astate->retrieved_attrs,
+									   astate->temp_cxt);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
@@ -2528,13 +2552,13 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 static HeapTuple
 make_tuple_from_result_row(PGresult *res,
 						   int row,
-						   Relation rel,
+						   const char *relname,
+						   TupleDesc tupdesc,
 						   AttInMetadata *attinmeta,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context)
 {
 	HeapTuple	tuple;
-	TupleDesc	tupdesc = RelationGetDescr(rel);
 	Datum	   *values;
 	bool	   *nulls;
 	ItemPointer ctid = NULL;
@@ -2561,7 +2585,8 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
+	errpos.relname = relname;
+	errpos.tupdesc = tupdesc;
 	errpos.cur_attno = 0;
 	errcallback.callback = conversion_error_callback;
 	errcallback.arg = (void *) &errpos;
@@ -2646,10 +2671,794 @@ static void
 conversion_error_callback(void *arg)
 {
 	ConversionLocation *errpos = (ConversionLocation *) arg;
-	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
 
-	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-		errcontext("column \"%s\" of foreign table \"%s\"",
-				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
-				   RelationGetRelationName(errpos->rel));
+	if (errpos->cur_attno > 0 && errpos->cur_attno <= errpos->tupdesc->natts)
+	{
+		Form_pg_attribute attr = errpos->tupdesc->attrs[errpos->cur_attno - 1];
+
+		if (errpos->relname)
+			errcontext("column \"%s\" of foreign table \"%s\"",
+					   NameStr(attr->attname), errpos->relname);
+		else
+			errcontext("column \"%s\" of remote join relation",
+					   NameStr(attr->attname));
+	}
+}
+
+/* ------------------------------------------------------------
+ *
+ * Remote JOIN support
+ *
+ * ------------------------------------------------------------
+ */
+enum PgRemoteJoinPrivateIndex
+{
+	PgCust_FdwServUserIds,	/* oid pair of foreign server and user */
+	PgCust_JoinRelids,		/* bitmapset of rtindexes to be joinned */
+	PgCust_JoinType,		/* one of JOIN_* */
+	PgCust_OuterRel,		/* packed joinrel of outer relation */
+	PgCust_InnerRel,		/* packed joinrel of inner relation */
+	PgCust_RemoteConds,		/* remote conditions */
+	PgCust_LocalConds,		/* local conditions */
+	PgCust_SelectVars,		/* list of Var nodes to be fetched */
+	PgCust_SelectParams,	/* list of Var nodes being parameterized */
+	PgCust_SelectSql,		/* remote query being deparsed */
+};
+
+/*
+ * packPgRemoteJoinInfo
+ *
+ * pack PgRemoteJoinInfo into a List object to save as private datum
+ */
+List *
+packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo)
+{
+	List   *result = NIL;
+
+	/* PgCust_FdwServUserIds */
+	result = lappend(result, list_make2_oid(jinfo->fdw_server_oid,
+											jinfo->fdw_user_oid));
+	/* PgCust_JoinRelids */
+	result = lappend(result, makeString(bms_to_string(jinfo->relids)));
+	/* PgCust_JoinType */
+	result = lappend(result, makeInteger((long) jinfo->jointype));
+	/* PgCust_OuterRel */
+	result = lappend(result, jinfo->outer_rel);
+	/* PgCust_InnerRel */
+	result = lappend(result, jinfo->inner_rel);
+	/* PgCust_RemoteConds */
+	result = lappend(result, jinfo->remote_conds);
+	/* PgCust_LocalConds */
+	result = lappend(result, jinfo->local_conds);
+	/* PgCust_SelectVars */
+	result = lappend(result, jinfo->select_vars);
+	/* PgCust_SelectParams */
+	result = lappend(result, jinfo->select_params);
+	/* PgCust_SelectSql */
+	result = lappend(result, makeString(jinfo->select_qry));
+
+	return result;
+}
+
+/*
+ * unpackPgRemoteJoinInfo
+ *
+ * unpack a private datum to PgRemoteJoinInfo
+ */
+void
+unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo, List *custom_private)
+{
+	ListCell   *lc;
+	int			index = PgCust_FdwServUserIds;
+
+	memset(jinfo, 0, sizeof(PgRemoteJoinInfo));
+	foreach (lc, custom_private)
+	{
+		switch (index)
+		{
+			case PgCust_FdwServUserIds:
+				jinfo->fdw_server_oid = linitial_oid(lfirst(lc));
+				jinfo->fdw_user_oid = lsecond_oid(lfirst(lc));
+				break;
+			case PgCust_JoinRelids:
+				jinfo->relids = bms_from_string(strVal(lfirst(lc)));
+				break;
+			case PgCust_JoinType:
+				jinfo->jointype = (JoinType) intVal(lfirst(lc));
+				break;
+			case PgCust_OuterRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->outer_rel = lfirst(lc);
+				break;
+			case PgCust_InnerRel:
+				Assert(IsA(lfirst(lc), List) || IsA(lfirst(lc), Integer));
+				jinfo->inner_rel = lfirst(lc);
+				break;
+			case PgCust_RemoteConds:
+				jinfo->remote_conds = lfirst(lc);
+				break;
+			case PgCust_LocalConds:
+				jinfo->local_conds = lfirst(lc);
+				break;
+			case PgCust_SelectVars:
+				jinfo->select_vars = lfirst(lc);
+				break;
+			case PgCust_SelectParams:
+				jinfo->select_params = lfirst(lc);
+				break;
+			case PgCust_SelectSql:
+				jinfo->select_qry = strVal(lfirst(lc));
+				break;
+			default:
+				elog(ERROR, "unexpected member in remote join relinfo");
+		}
+		index++;
+	}
+}
+
+/*
+ * is_self_managed_relation
+ *
+ * It checks whether the supplied relation is either a foreign table or remote
+ * join managed by postgres_fdw. If not, false shall be returned.
+ * If it is a managed relation, some related properties shall be returned to
+ * the caller.
+ */
+static bool
+is_self_managed_relation(PlannerInfo *root, RelOptInfo *rel,
+						 Oid *fdw_server_oid, Oid *fdw_user_oid,
+						 Node **relinfo,
+						 List **remote_conds, List **local_conds)
+{
+	if (rel->reloptkind == RELOPT_BASEREL)
+	{
+		FdwRoutine			pgroutine;
+		PgFdwRelationInfo  *fpinfo;
+		RangeTblEntry	   *rte = planner_rt_fetch(rel->relid, root);
+
+		/* Is it a foreign table managed by postgres_fdw? */
+		memset(&pgroutine, 0, sizeof(FdwRoutine));
+		pgroutine.GetForeignRelSize = postgresGetForeignRelSize;
+
+		if (!is_fdw_managed_relation(rte->relid, &pgroutine))
+			return false;
+
+		/*
+		 * Inform the caller its server-id and local user-id also.
+		 * Note that remote user-id is determined according to the pair
+		 * of server-id and local user-id on execution time, not planning
+		 * stage, so we might need to pay attention a scenario that executes
+		 * a plan with different user-id.
+		 * However, all we need to know here is whether both of relations
+		 * shall be run with same credential, or not. Its identical user-id
+		 * is not required here.
+		 * So, InvalidOid shall be set on fdw_user_oid for comparison
+		 * purpose, if it runs based on the credential of GetUserId().
+		 */
+		*fdw_user_oid = rte->checkAsUser;
+
+		fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+		*fdw_server_oid = fpinfo->server->serverid;
+		*remote_conds = fpinfo->remote_conds;
+		*local_conds = fpinfo->local_conds;
+
+		*relinfo = (Node *) makeInteger(rel->relid);
+
+		return true;
+	}
+	else if (rel->reloptkind == RELOPT_JOINREL)
+	{
+		ListCell   *cell;
+
+		foreach (cell, rel->pathlist)
+		{
+			CustomPath *cpath = lfirst(cell);
+
+			if (IsA(cpath, CustomPath) &&
+				strcmp(cpath->custom_name, "postgres-fdw") == 0)
+			{
+				PgRemoteJoinInfo	jinfo;
+
+				/*
+				 * Note that CustomScan(postgres-fdw) should be constructed
+				 * only when underlying foreign tables use identical server
+				 * and user-id for each.
+				 */
+				unpackPgRemoteJoinInfo(&jinfo, cpath->custom_private);
+				*fdw_server_oid = jinfo.fdw_server_oid;
+				*fdw_user_oid = jinfo.fdw_user_oid;
+				*remote_conds = jinfo.remote_conds;
+				*local_conds = jinfo.local_conds;
+
+				*relinfo = (Node *) cpath->custom_private;
+
+				return true;
+			}
+		}
+	}
+	return false;
+}
+
+/*
+ * estimate_remote_join_cost
+ *
+ * It calculates cost for remote join, then put them on the Path structure.
+ */
+static void
+estimate_remote_join_cost(PlannerInfo *root,
+						  CustomPath *cpath,
+						  PgRemoteJoinInfo *jinfo,
+						  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo	   *joinrel = cpath->path.parent;
+	ForeignServer  *server;
+	ListCell	   *lc;
+	Cost			startup_cost = DEFAULT_FDW_STARTUP_COST;
+	Cost			tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	Cost			total_cost;
+	QualCost		qual_cost;
+	Selectivity		local_sel;
+	Selectivity		remote_sel;
+	double			rows = joinrel->rows;
+	double			retrieved_rows;
+
+	server = GetForeignServer(jinfo->fdw_server_oid);
+	foreach(lc, server->options)
+	{
+		DefElem	   *def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, "fdw_startup_cost") == 0)
+			startup_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+			tuple_cost = strtod(defGetString(def), NULL);
+	}
+	cost_qual_eval(&qual_cost, jinfo->local_conds, root);
+	local_sel = clauselist_selectivity(root,
+									   jinfo->local_conds,
+									   0,
+									   JOIN_INNER,
+									   NULL);
+	remote_sel = clauselist_selectivity(root,
+										jinfo->remote_conds,
+										0,
+										jinfo->jointype,
+										sjinfo);
+	retrieved_rows = remote_sel * rows;
+
+	startup_cost += qual_cost.startup * retrieved_rows;
+	total_cost = startup_cost;
+	total_cost += tuple_cost * retrieved_rows;
+	total_cost += qual_cost.per_tuple * retrieved_rows;
+	total_cost += cpu_tuple_cost * local_sel * retrieved_rows;
+
+	cpath->path.rows = local_sel * retrieved_rows;
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = total_cost;
+}
+
+/*
+ * postgresAddJoinPaths
+ *
+ * A callback routine of add_join_path_hook. It checks whether this join can
+ * be run on the remote server, and add a custom-scan path that launches
+ * a remote join instead of a pair of remote scan and local join.
+ */
+static void
+postgresAddJoinPaths(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Oid			o_server_oid;
+	Oid			o_user_oid;
+	Node	   *o_relinfo;
+	List	   *o_local_conds;
+	List	   *o_remote_conds;
+	Oid			i_server_oid;
+	Oid			i_user_oid;
+	Node	   *i_relinfo;
+	List	   *i_local_conds;
+	List	   *i_remote_conds;
+	List	   *j_local_conds;
+	List	   *j_remote_conds;
+	ListCell   *lc;
+	Relids		required_outer;
+	PgRemoteJoinInfo jinfo;
+	CustomPath *cpath;
+
+	if (add_join_path_next)
+		(*add_join_path_next)(root, joinrel, outerrel, innerrel,
+							  jointype, sjinfo, restrictlist,
+							  mergeclause_list, semifactors,
+							  param_source_rels, extra_lateral_rels);
+
+	/* only regular SQL JOIN syntax is supported */
+	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+		jointype != JOIN_FULL  && jointype != JOIN_RIGHT)
+		return;
+
+	/* outerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, outerrel,
+								  &o_server_oid, &o_user_oid, &o_relinfo,
+								  &o_remote_conds, &o_local_conds))
+		return;
+
+	/* innerrel is managed by this extension? */
+	if (!is_self_managed_relation(root, innerrel,
+								  &i_server_oid, &i_user_oid, &i_relinfo,
+								  &i_remote_conds, &i_local_conds))
+		return;
+
+	/* Is remote query run with a common credential? */
+	if (o_server_oid != i_server_oid || o_user_oid != i_user_oid)
+		return;
+
+	/* unable to pull up local conditions any more */
+	if ((jointype == JOIN_LEFT && o_local_conds != NIL) ||
+		(jointype == JOIN_RIGHT && i_local_conds != NIL) ||
+		(jointype == JOIN_FULL && (o_local_conds != NIL ||
+								   i_local_conds != NIL)))
+		return;
+
+	classifyConditions(root, joinrel, restrictlist,
+					   &j_remote_conds, &j_local_conds);
+	/* pull-up local conditions, if any */
+	j_local_conds = list_concat(j_local_conds, o_local_conds);
+	j_local_conds = list_concat(j_local_conds, i_local_conds);
+
+	/*
+	 * Not supported to run remote join if whole-row reference is
+	 * included in either of target-list or local-conditions.
+	 *
+	 * XXX - Because we don't have reasonable way to reconstruct a RECORD
+	 * datum from individual columns once extracted. On the other hand, it
+	 * takes additional network bandwidth if we put whole-row reference on
+	 * the remote-join query.
+	 */
+	if (contain_wholerow_reference((Node *)joinrel->reltargetlist) ||
+		contain_wholerow_reference((Node *)j_local_conds))
+		return;
+
+	required_outer = pull_varnos((Node *) joinrel->reltargetlist);
+	foreach (lc, j_local_conds)
+	{
+		RestrictInfo   *rinfo = lfirst(lc);
+
+		required_outer = bms_union(required_outer,
+								   pull_varnos((Node *)rinfo->clause));
+	}
+	required_outer = bms_difference(required_outer, joinrel->relids);
+
+	/* OK, make a CustomScan node to run remote join */
+	cpath = makeNode(CustomPath);
+	cpath->path.pathtype = T_CustomScan;
+	cpath->path.parent = joinrel;
+	cpath->path.param_info = get_baserel_parampathinfo(root, joinrel,
+													   required_outer);
+	cpath->custom_name = pstrdup("postgres-fdw");
+	cpath->custom_flags = 0;
+
+	memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
+	jinfo.fdw_server_oid = o_server_oid;
+	jinfo.fdw_user_oid = o_user_oid;
+	jinfo.relids = joinrel->relids;
+	jinfo.jointype = jointype;
+	jinfo.outer_rel = o_relinfo;
+	jinfo.inner_rel = i_relinfo;
+	jinfo.remote_conds = j_remote_conds;
+	jinfo.local_conds = j_local_conds;
+
+	cpath->custom_private = packPgRemoteJoinInfo(&jinfo);
+
+	estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
+
+	add_path(joinrel, &cpath->path);
+}
+
+/*
+ * postgresInitCustomScanPlan
+ *
+ * construction of CustomScan according to remote join path above.
+ */
+static void
+postgresInitCustomScanPlan(PlannerInfo *root,
+						   CustomScan *cscan_plan,
+						   CustomPath *cscan_path,
+						   List *tlist,
+						   List *scan_clauses)
+{
+	PgRemoteJoinInfo jinfo;
+	StringInfoData sql;
+	List	   *relinfo = cscan_path->custom_private;
+	List	   *local_conds = NIL;
+	List	   *remote_conds = NIL;
+	ListCell   *lc;
+
+	Assert(cscan_path->path.parent->reloptkind == RELOPT_JOINREL);
+	unpackPgRemoteJoinInfo(&jinfo, relinfo);
+
+	/* pulls expressions from RestrictInfo */
+	local_conds = extract_actual_clauses(jinfo.local_conds, false);
+	remote_conds = extract_actual_clauses(jinfo.remote_conds, false);
+
+	foreach (lc, scan_clauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		Assert(IsA(rinfo, RestrictInfo));
+
+		/* Ignore any pseudoconstants, they're dealt with elsewhere */
+		if (rinfo->pseudoconstant)
+			continue;
+
+		if (!list_member(remote_conds, rinfo->clause) &&
+			!list_member(local_conds, rinfo->clause))
+			local_conds = lappend(local_conds, rinfo->clause);
+	}
+
+	/* construct a remote join query */
+	initStringInfo(&sql);
+	deparseRemoteJoinSql(&sql, root, cscan_path->custom_private,
+						 tlist,
+						 local_conds,
+						 &jinfo.select_vars,
+						 &jinfo.select_params);
+	jinfo.local_conds = NIL;	/* never used any more */
+	jinfo.remote_conds = NIL;	/* never used any more */
+	jinfo.select_qry = sql.data;
+
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = local_conds;
+	cscan_plan->custom_exprs = remote_conds;
+	cscan_plan->custom_private = packPgRemoteJoinInfo(&jinfo);
+}
+
+/*
+ * fixup_remote_join_expr
+ *
+ * Var nodes that reference a relation of remote join have varno of underlying
+ * foreign tables. It makes a problem because it shall be eventually replaced
+ * by references to outer or inner relation, however, result of remote join is
+ * stored on the scan-tuple-slot neither outer nor inner.
+ * So, we need to replace varno of Var nodes that reference a relation of
+ * remote join by CUSTOM_VAR; that is a pseudo varno to reference a tuple in
+ * the scan-tuple-slot.
+ */
+typedef struct {
+	PlannerInfo *root;
+	List   *select_vars;
+	int		rtoffset;
+} fixup_remote_join_context;
+
+static Node *
+fixup_remote_join_mutator(Node *node, fixup_remote_join_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var))
+	{
+		Var		   *newvar = (Var *) copyObject(node);
+		ListCell   *lc;
+		AttrNumber	resno = 1;
+
+		/* remote columns are ordered according to the select_vars */
+		foreach (lc, context->select_vars)
+		{
+			Var	   *selvar = (Var *) lfirst(lc);
+
+			Assert(newvar->varlevelsup == 0);
+
+			if (newvar->varno == selvar->varno &&
+				newvar->varattno == selvar->varattno)
+			{
+				Assert(newvar->vartype == selvar->vartype);
+				Assert(newvar->vartypmod == selvar->vartypmod);
+				Assert(newvar->varcollid == selvar->varcollid);
+
+				newvar->varno = CUSTOM_VAR;
+				newvar->varattno = resno;
+
+				return (Node *) newvar;
+			}
+			resno++;
+		}
+		elog(ERROR, "referenced variable was not in select_vars");
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fixup_remote_join_mutator((Node *) phv->phexpr, context);
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fixup_remote_join_mutator,
+								   (void *) context);
+}
+
+static Node *
+fixup_remote_join_expr(Node *node, PlannerInfo *root,
+					   List *select_vars, int rtoffset)
+{
+	fixup_remote_join_context context;
+
+	context.root = root;
+	context.select_vars = select_vars;
+	context.rtoffset = rtoffset;
+
+	return fixup_remote_join_mutator(node, &context);
+}
+
+/*
+ * postgresSetPlanRefCustomScan
+ *
+ * We need a special treatment of Var nodes to reference columns in remote
+ * join relation, because we replaces a join relation by a remote query that
+ * returns a result of join being executed remotely.
+ */
+static void
+postgresSetPlanRefCustomScan(PlannerInfo *root,
+							 CustomScan *csplan,
+							 int rtoffset)
+{
+	PgRemoteJoinInfo	jinfo;
+
+	Assert(csplan->scan.scanrelid == 0);
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	csplan->scan.plan.targetlist =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.targetlist,
+										root, jinfo.select_vars, rtoffset);
+	csplan->scan.plan.qual =
+		(List *) fixup_remote_join_expr((Node *)csplan->scan.plan.qual,
+										root, jinfo.select_vars, rtoffset);
+
+	if (rtoffset > 0)
+	{
+		ListCell   *lc;
+
+		foreach (lc, jinfo.select_vars)
+		{
+			Var	*var = lfirst(lc);
+
+			var->varno += rtoffset;
+		}
+	}
+}
+
+/*
+ * postgresBeginCustomScan
+ *
+ * Most of logic are equivalent to postgresBeginForeignScan, however,
+ * needs adjustment because of difference in the nature.
+ * The biggest one is, it has to open the underlying relation by itself
+ * and needs to construct tuple-descriptor from the var-list to be fetched,
+ * because custom-scan (in this case; a scan on remote join instead of
+ * local join) does not have a particular relation on its behaind, thus
+ * it needs to manage correctly.
+ */
+static void
+postgresBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *csplan = (CustomScan *) node->ss.ps.plan;
+	EState		   *estate = node->ss.ps.state;
+	PgRemoteJoinInfo jinfo;
+	PgFdwScanState *fsstate;
+	TupleDesc		tupdesc;
+	List		   *join_rels = NIL;
+	List		   *att_names = NIL;
+	List		   *att_types = NIL;
+	List		   *att_typmods = NIL;
+	List		   *att_collations = NIL;
+	List		   *retrieved_attrs = NIL;
+	ListCell	   *lc;
+	Oid				userid;
+	int				i;
+
+	unpackPgRemoteJoinInfo(&jinfo, csplan->custom_private);
+
+	/*
+	 * ss_ScanTupleSlot of ScanState has to be correctly initialized
+	 * even if this invocation is EXPLAIN (without ANALYZE), because
+	 * Var node with CUSTOM_VAR references its TupleDesc to get
+	 * virtual attribute name on the scanned slot.
+	 */
+	ExecInitScanTupleSlot(estate, &node->ss);
+	foreach (lc, jinfo.select_vars)
+	{
+		Oid		reloid;
+		char   *attname;
+		Var	   *var = lfirst(lc);
+
+		Assert(IsA(var, Var));
+		reloid = getrelid(var->varno, estate->es_range_table);
+		attname = get_relid_attribute_name(reloid, var->varattno);
+
+		att_names = lappend(att_names, makeString(attname));
+		att_types = lappend_oid(att_types, var->vartype);
+		att_typmods = lappend_int(att_typmods, var->vartypmod);
+		att_collations = lappend_oid(att_collations, var->varcollid);
+
+		retrieved_attrs = lappend_int(retrieved_attrs,
+									  list_length(retrieved_attrs) + 1);
+	}
+	tupdesc = BuildDescFromLists(att_names, att_types,
+								 att_typmods, att_collations);
+	ExecAssignScanType(&node->ss, tupdesc);
+
+	/*
+	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/*
+	 * Needs to open underlying relations by itself
+	 */
+	while ((i = bms_first_member(jinfo.relids)) >= 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, i, eflags);
+
+		join_rels = lappend(join_rels, rel);
+	}
+
+	/*
+	 * Determine a user-id. Current user-id shall be applied without something
+	 * special configuration on the reference.
+	 */
+	userid = OidIsValid(jinfo.fdw_user_oid) ? jinfo.fdw_user_oid : GetUserId();
+
+	/* common part to begin remote query execution */
+	fsstate = commonBeginForeignScan(&node->ss.ps, tupdesc,
+									 jinfo.fdw_server_oid, userid,
+									 jinfo.select_qry,
+									 retrieved_attrs,
+									 jinfo.select_params);
+	/* also, underlying relations also have to be saved */
+	fsstate->join_rels = join_rels;
+
+	node->custom_state = fsstate;
+}
+
+/*
+ * postgresExecCustomAccess
+ *
+ * Access method to fetch a tuple from the remote join query.
+ * It performs equivalent job as postgresIterateForeignScan() doing on
+ * queries to single relation.
+ */
+static TupleTableSlot *
+postgresExecCustomAccess(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+	return commonIterateForeignScan(fsstate, &node->ss.ps, slot);
+}
+
+/*
+ * postgresExecCustomRecheck
+ *
+ * No need to recheck it again.
+ */
+static bool
+postgresExecCustomRecheck(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * postgresExecCustomScan
+ *
+ * Just a wrapper of regular ExecScan
+ */
+static TupleTableSlot *
+postgresExecCustomScan(CustomScanState *node)
+{
+	return ExecScan((ScanState *) node,
+					(ExecScanAccessMtd) postgresExecCustomAccess,
+					(ExecScanRecheckMtd) postgresExecCustomRecheck);
+}
+
+/*
+ * postgresEndCustomScan
+ *
+ * Nothing are different from postgresEndForeignScan, except for closing
+ * underlying relations by itself.
+ */
+static void
+postgresEndCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = (PgFdwScanState *) node->custom_state;
+	ListCell   *lc;
+
+	/* if fsstate is NULL, we are in EXPLAIN; nothing to do */
+	if (fsstate == NULL)
+		return;
+
+	/* cleanup resources used in common portion */
+	commonEndForeignScan(fsstate);
+
+	foreach (lc, fsstate->join_rels)
+		ExecCloseScanRelation(lfirst(lc));
+}
+
+/*
+ * postgresReScanCustomScan
+ *
+ * Same as postgresReScanForeignScan() doing.
+ */
+static void
+postgresReScanCustomScan(CustomScanState *node)
+{
+	PgFdwScanState *fsstate = node->custom_state;
+
+	commonReScanForeignScan(fsstate, &node->ss.ps);
+}
+
+/*
+ * postgresExplainCustomScan
+ *
+ * Callback routine on EXPLAIN. It just adds remote query, if verbose mode.
+ */
+static void
+postgresExplainCustomScan(CustomScanState *csstate,
+						  ExplainState *es)
+{
+	if (es->verbose)
+	{
+		PgRemoteJoinInfo jinfo;
+		CustomScan *cscan = (CustomScan *)csstate->ss.ps.plan;
+
+		unpackPgRemoteJoinInfo(&jinfo, cscan->custom_private);
+
+		ExplainPropertyText("Remote SQL", jinfo.select_qry, es);
+	}
+}
+
+/*
+ * _PG_init
+ *
+ * Entrypoint of this module; registration of custom-scan provider, but
+ * no special registration is not needed for FDW portion.
+ */
+void
+_PG_init(void)
+{
+	CustomProvider	provider;
+
+	/* registration of hook on add_join_paths */
+	add_join_path_next = add_join_path_hook;
+	add_join_path_hook = postgresAddJoinPaths;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "postgres-fdw");
+	provider.InitCustomScanPlan   = postgresInitCustomScanPlan;
+	provider.SetPlanRefCustomScan = postgresSetPlanRefCustomScan;
+	provider.BeginCustomScan      = postgresBeginCustomScan;
+	provider.ExecCustomScan       = postgresExecCustomScan;
+	provider.EndCustomScan        = postgresEndCustomScan;
+	provider.ReScanCustomScan     = postgresReScanCustomScan;
+	provider.ExplainCustomScan    = postgresExplainCustomScan;
+
+	register_custom_provider(&provider);
 }
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 228345d..8f3645c 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -21,6 +21,41 @@
 #include "libpq-fe.h"
 
 /* in postgres_fdw.c */
+
+/*
+ * FDW-specific planner information kept in RelOptInfo.fdw_private for a
+ * foreign table.  This information is collected by postgresGetForeignRelSize.
+ */
+typedef struct PgFdwRelationInfo
+{
+	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
+	List	   *remote_conds;
+	List	   *local_conds;
+
+	/* Bitmap of attr numbers we need to fetch from the remote server. */
+	Bitmapset  *attrs_used;
+
+	/* Cost and selectivity of local_conds. */
+	QualCost	local_conds_cost;
+	Selectivity local_conds_sel;
+
+	/* Estimated size and cost for a scan with baserestrictinfo quals. */
+	double		rows;
+	int			width;
+	Cost		startup_cost;
+	Cost		total_cost;
+
+	/* Options extracted from catalogs. */
+	bool		use_remote_estimate;
+	Cost		fdw_startup_cost;
+	Cost		fdw_tuple_cost;
+
+	/* Cached catalog information. */
+	ForeignTable *table;
+	ForeignServer *server;
+	UserMapping *user;			/* only set in use_remote_estimate mode */
+} PgFdwRelationInfo;
+
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
 
@@ -41,6 +76,7 @@ extern int ExtractConnectionOptions(List *defelems,
 /* in deparse.c */
 extern void classifyConditions(PlannerInfo *root,
 				   RelOptInfo *baserel,
+				   List *restrictinfo_list,
 				   List **remote_conds,
 				   List **local_conds);
 extern bool is_foreign_expr(PlannerInfo *root,
@@ -56,6 +92,8 @@ extern void appendWhereClause(StringInfo buf,
 				  RelOptInfo *baserel,
 				  List *exprs,
 				  bool is_first,
+				  bool is_join_on,
+				  bool qualified,
 				  List **params);
 extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
@@ -69,8 +107,34 @@ extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
+extern void deparseRemoteJoinSql(StringInfo buf, PlannerInfo *root,
+								 List *relinfo,
+								 List *target_list,
+								 List *local_conds,
+								 List **select_vars,
+								 List **param_list);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 
+/* remote join support on top of custom-scan APIs */
+typedef struct
+{
+	Oid			fdw_server_oid;	/* server oid commonly used */
+	Oid			fdw_user_oid;	/* user oid commonly used */
+	Relids		relids;			/* bitmapset of range table indexes */
+	JoinType	jointype;		/* one of JOIN_* */
+	Node	   *outer_rel;		/* packed information of outer relation */
+	Node	   *inner_rel;		/* packed information of inner relation */
+	List	   *remote_conds;	/* condition to be run on remote server */
+	List	   *local_conds;	/* condition to be run on local server */
+	List	   *select_vars;	/* List of Var nodes to be fetched */
+	List	   *select_params;	/* List of Var nodes being parameralized */
+	char	   *select_qry;		/* remote query being deparsed */
+} PgRemoteJoinInfo;
+
+extern List *packPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo);
+extern void unpackPgRemoteJoinInfo(PgRemoteJoinInfo *jinfo,
+								   List *custom_private);
+
 #endif   /* POSTGRES_FDW_H */
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index e6f6e20..ed932e6 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -350,6 +350,16 @@
   </para>
 
   <para>
+   In addition, <productname>PostgreSQL</> 9.4 or later adaptively tries
+   to join relations, being managed by a same foreign server, on the remote
+   node if supplied join condition is sufficient to run on the remote side.
+   It performs as if a local custom scan node walks on a virtual relation
+   being consists of multiple relations according to remote join, thus
+   it usually has cheaper cost than data translation of both relations and
+   local join operations.
+  </para>
+
+  <para>
    The query that is actually sent to the remote server for execution can
    be examined using <command>EXPLAIN VERBOSE</>.
   </para>
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 6d548b7..c33d958 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -23,6 +23,7 @@
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -621,3 +622,31 @@ get_foreign_server_oid(const char *servername, bool missing_ok)
 				 errmsg("server \"%s\" does not exist", servername)));
 	return oid;
 }
+
+/*
+ * is_fdw_managed_relation
+ *
+ * It checks whether the supplied relation is a foreign table managed
+ * by the module that has FdwRoutine, or not.
+ */
+bool
+is_fdw_managed_relation(Oid tableoid, const FdwRoutine *routines_self)
+{
+	FdwRoutine *routines;
+	char		relkind = get_rel_relkind(tableoid);
+
+	if (relkind == RELKIND_FOREIGN_TABLE)
+	{
+		routines = GetFdwRoutineByRelId(tableoid);
+
+		/*
+		 * Our assumption is a particular callback being implemented by
+		 * a particular extension shall not be shared with other extension.
+		 * So, we don't need to compare all the function pointers in the
+		 * FdwRoutine, but only one member.
+		 */
+		if (routines->GetForeignRelSize == routines_self->GetForeignRelSize)
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 3a6d0fb..c619d5d 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,65 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text
+ * representation for portability purpose.
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text representation: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a' + 10);
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A' + 10);
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+
+	return result;
+}
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index d629fcd..21ca783 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -73,6 +73,7 @@ static bool pull_varattnos_walker(Node *node, pull_varattnos_context *context);
 static bool pull_vars_walker(Node *node, pull_vars_context *context);
 static bool contain_var_clause_walker(Node *node, void *context);
 static bool contain_vars_of_level_walker(Node *node, int *sublevels_up);
+static bool contain_wholerow_reference_walker(Node *node, void *context);
 static bool locate_var_of_level_walker(Node *node,
 						   locate_var_of_level_context *context);
 static bool pull_var_clause_walker(Node *node,
@@ -418,6 +419,44 @@ contain_vars_of_level_walker(Node *node, int *sublevels_up)
 								  (void *) sublevels_up);
 }
 
+/*
+ * contain_wholerow_reference
+ *
+ *    Recursively scan a clause to discover whether it contains any Var nodes
+ *    of whole-row reference in the current query level.
+ *
+ *    Returns true if any such Var found.
+ */
+bool
+contain_wholerow_reference(Node *node)
+{
+	return contain_wholerow_reference_walker(node, NULL);
+}
+
+static bool
+contain_wholerow_reference_walker(Node *node, void *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+
+		return contain_wholerow_reference_walker((Node *)rinfo->clause,
+												 context);
+	}
+	if (IsA(node, Var))
+	{
+		Var	   *var = (Var *) node;
+
+		if (var->varlevelsup == 0 && var->varattno == 0)
+			return true;
+		return false;
+	}
+	return expression_tree_walker(node,
+								  contain_wholerow_reference_walker,
+								  context);
+}
 
 /*
  * locate_var_of_level
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index ac080d7..2340a23 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -13,6 +13,7 @@
 #ifndef FOREIGN_H
 #define FOREIGN_H
 
+#include "foreign/fdwapi.h"
 #include "nodes/parsenodes.h"
 
 
@@ -81,4 +82,7 @@ extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
 extern Oid	get_foreign_data_wrapper_oid(const char *fdwname, bool missing_ok);
 extern Oid	get_foreign_server_oid(const char *servername, bool missing_ok);
 
+extern bool	is_fdw_managed_relation(Oid tableoid,
+									const FdwRoutine *routines_self);
+
 #endif   /* FOREIGN_H */
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index f770608..fa8005d 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for string representation */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */
diff --git a/src/include/optimizer/var.h b/src/include/optimizer/var.h
index fb99a12..f677ff5 100644
--- a/src/include/optimizer/var.h
+++ b/src/include/optimizer/var.h
@@ -36,6 +36,7 @@ extern void pull_varattnos(Node *node, Index varno, Bitmapset **varattnos);
 extern List *pull_vars_of_level(Node *node, int levelsup);
 extern bool contain_var_clause(Node *node);
 extern bool contain_vars_of_level(Node *node, int levelsup);
+extern bool contain_wholerow_reference(Node *node);
 extern int	locate_var_of_level(Node *node, int levelsup);
 extern List *pull_var_clause(Node *node, PVCAggregateBehavior aggbehavior,
 				PVCPlaceHolderBehavior phbehavior);

#27

Shigeru Hanada

shigeru.hanada@gmail.com

almost 12 years ago

In reply to: Kohei KaiGai (#26)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi Kaigai-san,

Sorry to leave the thread for a while.

2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:

(1) Interface to add alternative paths in addition to built-in join paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

I've read the code path to add custom join again, and felt that
providing semifactors seems not necessary for the first cut, because
it is used in only initial_cost_nestloop (final_cost_nestloop receives
semifactors but it is not used there), and external module would not
become so smart before 9.5 development cycle. It seems enough complex
to postpone determinig whether it's essential for add_join_path_hook.
Do you have any concrete use case for the parameter?

mergeclause_list and param_source_rels seem little easier to use, but
anyway it should be documented how to use those parameters.

IMHO, minimal interface seems better than fully-fledged but half-baked
one, especially in the first-cut.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Shigeru Hanada

shigeru.hanada@gmail.com

almost 12 years ago

In reply to: Kohei KaiGai (#26)

1 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:

Folks,

Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join relations.
Unfortunately, small amount of discussion we could have in this commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.

I found some cosmetic flaw and .gitignore leak in the patches. Please
see attached a patch for details.

--
Shigeru HANADA

Attachments:

custom_scan_cosmetic_fix.patchapplication/octet-stream; name=custom_scan_cosmetic_fix.patchDownload

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c7fcb80..6201a97 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -392,7 +392,7 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	create_tidscan_paths(root, rel);
 
 	/* Consider Custom scans */
-	add_custom_scan_paths(root,rel,rte);
+	add_custom_scan_paths(root, rel, rte);
 
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
@@ -423,7 +423,7 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
 	/* Consider Custom scans */
-	add_custom_scan_paths(root,rel,rte);
+	add_custom_scan_paths(root, rel, rte);
 
 	/* Select cheapest path */
 	set_cheapest(rel);
@@ -1245,7 +1245,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
 	/* Consider Custom scans */
-	add_custom_scan_paths(root,rel,rte);
+	add_custom_scan_paths(root, rel, rte);
 
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
@@ -1319,7 +1319,7 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 										   pathkeys, required_outer));
 
 	/* Consider Custom scans */
-	add_custom_scan_paths(root,rel,rte);
+	add_custom_scan_paths(root, rel, rte);
 
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
@@ -1345,7 +1345,7 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
 	/* Consider Custom scans */
-	add_custom_scan_paths(root,rel,rte);
+	add_custom_scan_paths(root, rel, rte);
 
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
@@ -1417,7 +1417,7 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
 	/* Consider Custom scans */
-	add_custom_scan_paths(root,rel,rte);
+	add_custom_scan_paths(root, rel, rte);
 
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
@@ -1473,7 +1473,7 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
 	/* Consider Custom scans */
-	add_custom_scan_paths(root,rel,rte);
+	add_custom_scan_paths(root, rel, rte);
 
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index a561387..b613946 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -35,10 +35,10 @@ typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
 										RangeTblEntry *rte);
 extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
 
-#define add_custom_scan_paths(root,baserel,rte)				\
+#define add_custom_scan_paths(root, baserel, rte)				\
 	do {														\
-		if (add_scan_path_hook)										\
-			(*add_scan_path_hook)((root),(baserel),(rte));			\
+		if (add_scan_path_hook)									\
+			(*add_scan_path_hook)((root), (baserel), (rte));	\
 	} while(0)
 
 /* Hook for plugins to add custom join path, in addition to default ones */
diff --git a/src/test/regress/expected/.gitignore b/src/test/regress/expected/.gitignore
index 93c56c8..7e35e74 100644
--- a/src/test/regress/expected/.gitignore
+++ b/src/test/regress/expected/.gitignore
@@ -2,6 +2,7 @@
 /copy.out
 /create_function_1.out
 /create_function_2.out
+/custom_scan.out
 /largeobject.out
 /largeobject_1.out
 /misc.out
diff --git a/src/test/regress/sql/.gitignore b/src/test/regress/sql/.gitignore
index 46c8112..8eeb461 100644
--- a/src/test/regress/sql/.gitignore
+++ b/src/test/regress/sql/.gitignore
@@ -2,6 +2,7 @@
 /copy.sql
 /create_function_1.sql
 /create_function_2.sql
+/custom_scan.sql
 /largeobject.sql
 /misc.sql
 /security_label.sql

#29

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Shigeru Hanada (#27)

Re: Custom Scan APIs (Re: Custom Plan node)

/* Hook for plugins to add custom join path, in addition to default
ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to
extensions, because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and
extra_lateral_rels to check whether the path is sensible for parameterized

paths.

On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

I've read the code path to add custom join again, and felt that providing
semifactors seems not necessary for the first cut, because it is used in
only initial_cost_nestloop (final_cost_nestloop receives semifactors but
it is not used there), and external module would not become so smart before
9.5 development cycle. It seems enough complex to postpone determinig
whether it's essential for add_join_path_hook.
Do you have any concrete use case for the parameter?

The reason why I asked the question above is, I haven't been 100% certain
about its usage. Indeed, semifactors is applied on a limited usage, but
quite easy to reproduce by extension later (using clauselist_selectivity)
if extension wants this factor. So, I agree with removing the semifactors
here.

mergeclause_list and param_source_rels seem little easier to use, but
anyway it should be documented how to use those parameters.

The mergeclause_list might not be sufficient for extensions to determine
whether its own mergejoin is applicable here. See the comment below; that
is on the head of select_mergejoin_clauses.

| * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
| * this is a right/full join and there are nonmergejoinable join clauses.
| * The executor's mergejoin machinery cannot handle such cases, so we have
| * to avoid generating a mergejoin plan. (Note that this flag does NOT
| * consider whether there are actually any mergejoinable clauses. This is
| * correct because in some cases we need to build a clauseless mergejoin.
| * Simply returning NIL is therefore not enough to distinguish safe from
| * unsafe cases.)
|
It says, mergejoin_clause == NIL is not a sufficient check to determine
whether the mergejoin logic is applicable on the target join.
So, either of them is probably an option for extension that tries to implement
their own mergejoin logic; (1) putting both of mergejoin_allowed and
mergeclause_list as arguments of the hook, or (2) re-definition of
select_mergejoin_clauses() as extern function to reproduce the variables on
demand. Which one is more preferable?

BTW, I found a duplicate clause_sides_match_join() definition, that is
invoked at select_mergejoin_clauses(), in joinpath.c and analyzejoins.c.
Either of them should be eliminated, I think.

For param_source_rels and extra_lateral_rels, I'll put source code comments
around add_join_path_hook.
Earlier half of try_(nestloop|hashjoin|mergejoin)_path is probably useful
as example of extension implementation.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
Sent: Tuesday, February 25, 2014 12:41 AM
To: Kohei KaiGai
Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Jim Mlodgenski; Robert Haas;
Tom Lane; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hi Kaigai-san,

Sorry to leave the thread for a while.

2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:

(1) Interface to add alternative paths in addition to built-in join
paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the
built-in join paths. Custom-scan path being added is assumed to
perform to scan on a (virtual) relation that is a result set of joining

relations.

My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default
ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors

*semifactors,

Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to
extensions, because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and
extra_lateral_rels to check whether the path is sensible for parameterized

paths.

On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

I've read the code path to add custom join again, and felt that providing
semifactors seems not necessary for the first cut, because it is used in
only initial_cost_nestloop (final_cost_nestloop receives semifactors but
it is not used there), and external module would not become so smart before
9.5 development cycle. It seems enough complex to postpone determinig
whether it's essential for add_join_path_hook.
Do you have any concrete use case for the parameter?

mergeclause_list and param_source_rels seem little easier to use, but
anyway it should be documented how to use those parameters.

IMHO, minimal interface seems better than fully-fledged but half-baked one,
especially in the first-cut.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 12 years ago

In reply to: Kohei KaiGai (#26)

Re: Custom Scan APIs (Re: Custom Plan node)

On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Folks,

Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join relations.
Unfortunately, small amount of discussion we could have in this commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.

Sorry for jumping into this late.
Instead of custom node, it might be better idea to improve FDW
infrastructure to push join. For the starters, is it possible for the
custom scan node hooks to create a ForeignScan node? In general, I think,
it might be better for the custom scan hooks to create existing nodes if
they serve the purpose.

Prior to time-up, I'd like to ask hacker's opinion about its potential
arguable points (from my standpoint) if it needs to be fixed up.
One is hook definition to add alternative join path, and the other one
is a special varno when a custom scan replace a join node.
I'd like to see your opinion about them while we still have to change
the design if needed.

(1) Interface to add alternative paths in addition to built-in join paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to
extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

Since a custom node is open implementation, it will be important to pass as
much information down to the hooks as possible; lest the hooks will be
constrained. Since the functions signatures within the planner, optimizer
will change from time to time, so the custom node hook signatures will need
to change from time to time. That might turn out to be maintenance overhead.

BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?

(2) CUSTOM_VAR for special Var reference
@@ -134,6 +134,7 @@ typedef struct Expr
#define    INNER_VAR       65000       /* reference to inner subplan */
#define    OUTER_VAR       65001       /* reference to outer subplan */
#define    INDEX_VAR       65002       /* reference to index column */
+#define    CUSTOM_VAR      65003       /* reference to custom column */
I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
instead of built-in joins, because its tuples being fetched are usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Hi Stephen,

Thanks for your comments.

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Is somebody available to volunteer to review the custom-scan patch?

I looked through it a bit and my first take away from it was that the

patches

to actually use the new hooks were also making more changes to the

backend

code, leaving me with the impression that the proposed interface isn't
terribly stable. Perhaps those changes should have just been in the

first

patch, but they weren't and that certainly gave me pause.

Yes, the part-1 patch provides a set of interface portion to interact
between the backend code and extension code. Rest of part-2 and part-3
portions are contrib modules that implements its feature on top of
custom-scan API.

I'm also not entirely convinced that this is the direction to go in when
it comes to pushing down joins to FDWs. While that's certainly a goal

that

I think we all share, this seems to be intending to add a completely

different

feature which happens to be able to be used for that. For FDWs,

wouldn't

we only present the FDW with the paths where the foreign tables for that
FDW, or perhaps just a given foreign server, are being joined?

FDW's join pushing down is one of the valuable use-cases of this

interface,

but not all. As you might know, my motivation is to implement GPU

acceleration

feature on top of this interface, that offers alternative way to scan or

join

relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement radix-sort on

top

of FDW. I'd like you to understand the part-3 patch (FDW's join

pushing-down)

is a demonstration of custom-scan interface for application, but not

designed

for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver on

postgres_fdw

side, it might be an idea to have common code (like a logic to check

whether

the both relations to be joined belongs to same foreign server) on the

backend

side as something like a gateway of them.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something" on

behalf of

a particular data structure being declared with CREATE FOREIGN TABLE.
In other words, extension's responsibility is to generate a view of

"something"

according to PostgreSQL' internal data structure, instead of the object

itself.

On the other hands, custom-scan interface allows extensions to implement
alternative methods to scan or join particular relations, but it is not

a role

to perform as a target being referenced in queries. In other words, it

is methods

to access objects.
It is natural both features are similar because both of them intends

extensions

to hook the planner and executor, however, its purpose is different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#31

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Ashutosh Bapat (#30)

Re: Custom Scan APIs (Re: Custom Plan node)

Sorry for jumping into this late.

Instead of custom node, it might be better idea to improve FDW infrastructure
to push join. For the starters, is it possible for the custom scan node
hooks to create a ForeignScan node? In general, I think, it might be better
for the custom scan hooks to create existing nodes if they serve the purpose.

It does not work well because existing FDW infrastructure is designed to
perform on foreign tables, not regular tables. Probably, it needs to revise
much our assumption around the background code, if we re-define the purpose
of FDW infrastructure. For example, ForeignScan is expected to return a tuple
according to the TupleDesc that is exactly same with table definition.
It does not fit the requirement if we replace a join-node by ForeignScan
because its TupleDesc of joined relations is not predefined.

I'd like to define these features are designed for individual purpose.
FDW is designed to intermediate an external data source and internal heap
representation according to foreign table definition. In other words, its
role is to generate contents of predefined database object on the fly.
On the other hands, custom-scan is designed to implement alternative ways
to scan / join relations in addition to the methods supported by built-in
feature.

I'm motivated to implement GPU acceleration feature that works transparently
for application. Thus, it has to be capable on regular tables, because most
of application stores data on regular tables, not foreign ones.

Since a custom node is open implementation, it will be important to pass
as much information down to the hooks as possible; lest the hooks will be
constrained. Since the functions signatures within the planner, optimizer
will change from time to time, so the custom node hook signatures will need
to change from time to time. That might turn out to be maintenance overhead.

Yes. You are also right. But it also makes maintenance overhead if hook has
many arguments nobody uses.
Probably, it makes sense to list up the arguments that cannot be reproduced
from other information, can be reproduced but complicated steps, and can be
reproduced easily.

Below is the information we cannot reproduce:
- PlannerInfo *root
- RelOptInfo *joinrel
- RelOptInfo *outerrel
- RelOptInfo *innerrel
- JoinType jointype
- SpecialJoinInfo *sjinfo
- List *restrictlist

Below is the information we can reproduce but complicated steps:
- List *mergeclause_list
- bool mergejoin_allow
- Relids param_source_rels
- Relids extra_lateral_rels

Below is the information we can reproduce easily:
- SemiAntiJoinFactors *semifactors

I think, the first two categories or the first category (if functions to
reproduce the second group is exposed) should be informed to extension,
however, priority of the third group is not high.

BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?

Yes. I plan to support above plan node, in addition to scan / join only.
The custom-scan node is thin abstraction towards general executor behavior,
so I believe it is not hard to enhance this node, without new plan node
for each of them.
Of course, it will need separate hook to add alternative path on the planner
stage, but no individual plan nodes. (Sorry, it was unclear for me what
does the "hook" mean.)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com]
Sent: Tuesday, February 25, 2014 5:59 PM
To: Kohei KaiGai
Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim
Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Folks,

Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join
relations.
Unfortunately, small amount of discussion we could have in this
commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.

Sorry for jumping into this late.

Instead of custom node, it might be better idea to improve FDW infrastructure
to push join. For the starters, is it possible for the custom scan node
hooks to create a ForeignScan node? In general, I think, it might be better
for the custom scan hooks to create existing nodes if they serve the purpose.

Prior to time-up, I'd like to ask hacker's opinion about its
potential
arguable points (from my standpoint) if it needs to be fixed up.
One is hook definition to add alternative join path, and the other
one
is a special varno when a custom scan replace a join node.
I'd like to see your opinion about them while we still have to change
the design if needed.

(1) Interface to add alternative paths in addition to built-in join
paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to
allow
extensions to provide alternative scan path in addition to the
built-in
join paths. Custom-scan path being added is assumed to perform to
scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared
as follows:

/* Hook for plugins to add custom join path, in addition to default
ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo
*sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors
*semifactors,
Relids
param_source_rels,
Relids
extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed
to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be
informed.
Probably, it makes sense to inform param_source_rels and
extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to
deliver.
(It may make sense if someone tries to implement their own
merge-join
implementation??)

I'd like to seem idea to improve the current interface
specification.

Since a custom node is open implementation, it will be important to pass
as much information down to the hooks as possible; lest the hooks will be
constrained. Since the functions signatures within the planner, optimizer
will change from time to time, so the custom node hook signatures will need
to change from time to time. That might turn out to be maintenance overhead.

BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?

(2) CUSTOM_VAR for special Var reference
@@ -134,6 +134,7 @@ typedef struct Expr
#define    INNER_VAR       65000       /* reference to inner
subplan */
#define    OUTER_VAR       65001       /* reference to outer
subplan */
#define    INDEX_VAR       65002       /* reference to index
column */
+#define    CUSTOM_VAR      65003       /* reference to custom
column */
I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either
ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan
runs
instead of built-in joins, because its tuples being fetched are
usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs,
allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in
ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Hi Stephen,

Thanks for your comments.

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Is somebody available to volunteer to review the custom-scan

patch?

I looked through it a bit and my first take away from it was

that the patches

to actually use the new hooks were also making more changes to

the backend

code, leaving me with the impression that the proposed interface

isn't

terribly stable. Perhaps those changes should have just been

in the first

patch, but they weren't and that certainly gave me pause.

Yes, the part-1 patch provides a set of interface portion to

interact

between the backend code and extension code. Rest of part-2 and

part-3

portions are contrib modules that implements its feature on top

of

custom-scan API.

I'm also not entirely convinced that this is the direction to

go in when

it comes to pushing down joins to FDWs. While that's certainly

a goal that

I think we all share, this seems to be intending to add a

completely different

feature which happens to be able to be used for that. For FDWs,

wouldn't

we only present the FDW with the paths where the foreign tables

for that

FDW, or perhaps just a given foreign server, are being joined?

FDW's join pushing down is one of the valuable use-cases of this

interface,

but not all. As you might know, my motivation is to implement

GPU acceleration

feature on top of this interface, that offers alternative way

to scan or join

relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement

radix-sort on top

of FDW. I'd like you to understand the part-3 patch (FDW's join

pushing-down)

is a demonstration of custom-scan interface for application, but

not designed

for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver

on postgres_fdw

side, it might be an idea to have common code (like a logic to

check whether

the both relations to be joined belongs to same foreign server)

on the backend

side as something like a gateway of them.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something"

on behalf of

a particular data structure being declared with CREATE FOREIGN

TABLE.

In other words, extension's responsibility is to generate a view

of "something"

according to PostgreSQL' internal data structure, instead of the

object itself.

On the other hands, custom-scan interface allows extensions to

implement

alternative methods to scan or join particular relations, but

it is not a role

to perform as a target being referenced in queries. In other words,

it is methods

to access objects.
It is natural both features are similar because both of them

intends extensions

to hook the planner and executor, however, its purpose is

different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--

KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list
(pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 12 years ago

In reply to: Kouhei Kaigai (#31)

Re: Custom Scan APIs (Re: Custom Plan node)

On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Sorry for jumping into this late.

Instead of custom node, it might be better idea to improve FDW

infrastructure

to push join. For the starters, is it possible for the custom scan node
hooks to create a ForeignScan node? In general, I think, it might be

better

for the custom scan hooks to create existing nodes if they serve the

purpose.

It does not work well because existing FDW infrastructure is designed to
perform on foreign tables, not regular tables. Probably, it needs to revise
much our assumption around the background code, if we re-define the purpose
of FDW infrastructure. For example, ForeignScan is expected to return a
tuple
according to the TupleDesc that is exactly same with table definition.
It does not fit the requirement if we replace a join-node by ForeignScan
because its TupleDesc of joined relations is not predefined.

If one wants to push joins, aggregates, grouping across to other data
sources capable of handling them, that will need to change. But, at the
same time, letting custom scan node being able to decide that doesn't seem
to be a very good idea. Although, through custom scan nodes, we can see the
potential in adding these features.

I'd like to define these features are designed for individual purpose.
FDW is designed to intermediate an external data source and internal heap
representation according to foreign table definition. In other words, its
role is to generate contents of predefined database object on the fly.
On the other hands, custom-scan is designed to implement alternative ways
to scan / join relations in addition to the methods supported by built-in
feature.

I'm motivated to implement GPU acceleration feature that works
transparently
for application. Thus, it has to be capable on regular tables, because most
of application stores data on regular tables, not foreign ones.

It looks like my description was misleading. In some cases, it might be
possible that the ultimate functionality that a particular instantiation of
custom node scan is already available as a Plan node in PG, but PG
optimizer is not able to optimize the operation that way. In such case,
custom scan node infrastructure should produce the corresponding Path node
and not implement that functionality itself.

Since a custom node is open implementation, it will be important to pass
as much information down to the hooks as possible; lest the hooks will be
constrained. Since the functions signatures within the planner,

optimizer

will change from time to time, so the custom node hook signatures will

need

to change from time to time. That might turn out to be maintenance

overhead.

Yes. You are also right. But it also makes maintenance overhead if hook has
many arguments nobody uses.
Probably, it makes sense to list up the arguments that cannot be reproduced
from other information, can be reproduced but complicated steps, and can be
reproduced easily.

Below is the information we cannot reproduce:
- PlannerInfo *root
- RelOptInfo *joinrel
- RelOptInfo *outerrel
- RelOptInfo *innerrel
- JoinType jointype
- SpecialJoinInfo *sjinfo
- List *restrictlist

Most of this information is available through corresponding RelOptInfo, or
we should make RelOptInfo contain all the information related to every
relation required to be computed during the query. So, any function which
creates paths can just take that RelOptInfo as an argument and produce the
path/s. That way there is lesser chance that the function signatures change.

Below is the information we can reproduce but complicated steps:
- List *mergeclause_list
- bool mergejoin_allow
- Relids param_source_rels
- Relids extra_lateral_rels

Below is the information we can reproduce easily:
- SemiAntiJoinFactors *semifactors

I think, the first two categories or the first category (if functions to
reproduce the second group is exposed) should be informed to extension,
however, priority of the third group is not high.

BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?

Yes. I plan to support above plan node, in addition to scan / join only.
The custom-scan node is thin abstraction towards general executor behavior,
so I believe it is not hard to enhance this node, without new plan node
for each of them.
Of course, it will need separate hook to add alternative path on the
planner
stage, but no individual plan nodes. (Sorry, it was unclear for me what
does the "hook" mean.)

If we represent all the operation like grouping, sorting, aggregation, as
some sort of relation, we can create paths for each of the relation like we
do (I am heavily borrowing from Tom's idea of pathifying those operations).
We will need much lesser hooks in custom scan node.

BTW, from the patch, I do not see this change to be light weight. I was
expecting more of a list of hooks to be defined by the user and this
infrastructure just calling them at appropriate places.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com]
Sent: Tuesday, February 25, 2014 5:59 PM
To: Kohei KaiGai
Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim
Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp>

wrote:

Folks,

Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join
relations.
Unfortunately, small amount of discussion we could have in this
commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.

Sorry for jumping into this late.

Instead of custom node, it might be better idea to improve FDW

infrastructure

to push join. For the starters, is it possible for the custom scan node
hooks to create a ForeignScan node? In general, I think, it might be

better

for the custom scan hooks to create existing nodes if they serve the

purpose.

Prior to time-up, I'd like to ask hacker's opinion about its
potential
arguable points (from my standpoint) if it needs to be fixed up.
One is hook definition to add alternative join path, and the other
one
is a special varno when a custom scan replace a join node.
I'd like to see your opinion about them while we still have to

change

the design if needed.

(1) Interface to add alternative paths in addition to built-in join
paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to
allow
extensions to provide alternative scan path in addition to the
built-in
join paths. Custom-scan path being added is assumed to perform to
scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared
as follows:

/* Hook for plugins to add custom join path, in addition to default
ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo
*sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors
*semifactors,
Relids
param_source_rels,
Relids
extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed
to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be
informed.
Probably, it makes sense to inform param_source_rels and
extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to
deliver.
(It may make sense if someone tries to implement their own
merge-join
implementation??)

I'd like to seem idea to improve the current interface
specification.

Since a custom node is open implementation, it will be important to pass
as much information down to the hooks as possible; lest the hooks will be
constrained. Since the functions signatures within the planner,

optimizer

will change from time to time, so the custom node hook signatures will

need

to change from time to time. That might turn out to be maintenance

overhead.
BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?

(2) CUSTOM_VAR for special Var reference
@@ -134,6 +134,7 @@ typedef struct Expr
#define    INNER_VAR       65000       /* reference to inner
subplan */
#define    OUTER_VAR       65001       /* reference to outer
subplan */
#define    INDEX_VAR       65002       /* reference to index
column */
+#define    CUSTOM_VAR      65003       /* reference to custom
column */
I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either
ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan
runs
instead of built-in joins, because its tuples being fetched are
usually
stored on the ecxt_scantuple, thus Var-nodes also need to have
right

varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs,
allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in
ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Hi Stephen,

Thanks for your comments.

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Is somebody available to volunteer to review the custom-scan

patch?

I looked through it a bit and my first take away from it was

that the patches

to actually use the new hooks were also making more changes to

the backend

code, leaving me with the impression that the proposed interface

isn't

terribly stable. Perhaps those changes should have just been

in the first

patch, but they weren't and that certainly gave me pause.

Yes, the part-1 patch provides a set of interface portion to

interact

between the backend code and extension code. Rest of part-2 and

part-3

portions are contrib modules that implements its feature on top

of

custom-scan API.

I'm also not entirely convinced that this is the direction to

go in when

it comes to pushing down joins to FDWs. While that's certainly

a goal that

I think we all share, this seems to be intending to add a

completely different

feature which happens to be able to be used for that. For FDWs,

wouldn't

we only present the FDW with the paths where the foreign tables

for that

FDW, or perhaps just a given foreign server, are being joined?

FDW's join pushing down is one of the valuable use-cases of this

interface,

but not all. As you might know, my motivation is to implement

GPU acceleration

feature on top of this interface, that offers alternative way

to scan or join

relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement

radix-sort on top

of FDW. I'd like you to understand the part-3 patch (FDW's join

pushing-down)

is a demonstration of custom-scan interface for application, but

not designed

for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver

on postgres_fdw

side, it might be an idea to have common code (like a logic to

check whether

the both relations to be joined belongs to same foreign server)

on the backend

side as something like a gateway of them.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something"

on behalf of

a particular data structure being declared with CREATE FOREIGN

TABLE.

In other words, extension's responsibility is to generate a view

of "something"

according to PostgreSQL' internal data structure, instead of the

object itself.

On the other hands, custom-scan interface allows extensions to

implement

alternative methods to scan or join particular relations, but

it is not a role

to perform as a target being referenced in queries. In other

words,

it is methods

to access objects.
It is natural both features are similar because both of them

intends extensions

to hook the planner and executor, however, its purpose is

different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--

KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list
(pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#33

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Ashutosh Bapat (#32)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-02-25 20:32 GMT+09:00 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>:

On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Sorry for jumping into this late.

Instead of custom node, it might be better idea to improve FDW
infrastructure
to push join. For the starters, is it possible for the custom scan node
hooks to create a ForeignScan node? In general, I think, it might be
better
for the custom scan hooks to create existing nodes if they serve the
purpose.

It does not work well because existing FDW infrastructure is designed to
perform on foreign tables, not regular tables. Probably, it needs to
revise
much our assumption around the background code, if we re-define the
purpose
of FDW infrastructure. For example, ForeignScan is expected to return a
tuple
according to the TupleDesc that is exactly same with table definition.
It does not fit the requirement if we replace a join-node by ForeignScan
because its TupleDesc of joined relations is not predefined.

If one wants to push joins, aggregates, grouping across to other data
sources capable of handling them, that will need to change. But, at the same
time, letting custom scan node being able to decide that doesn't seem to be
a very good idea. Although, through custom scan nodes, we can see the
potential in adding these features.

Of course, existing form of custom-scan node is designed to support
scan or join relations, as a first step. It will also need some enhancement
to support other class of execution node in the future version.
I'm not certain why it is problematic.

I'd like to define these features are designed for individual purpose.
FDW is designed to intermediate an external data source and internal heap
representation according to foreign table definition. In other words, its
role is to generate contents of predefined database object on the fly.
On the other hands, custom-scan is designed to implement alternative ways
to scan / join relations in addition to the methods supported by built-in
feature.

I'm motivated to implement GPU acceleration feature that works
transparently
for application. Thus, it has to be capable on regular tables, because
most
of application stores data on regular tables, not foreign ones.

It looks like my description was misleading. In some cases, it might be
possible that the ultimate functionality that a particular instantiation of
custom node scan is already available as a Plan node in PG, but PG optimizer
is not able to optimize the operation that way. In such case, custom scan
node infrastructure should produce the corresponding Path node and not
implement that functionality itself.

You are suggesting that CustomSort, CustomAgg, CustomAppend and
so on should be supported in the future version, for better integration with
the plan optimizer. Right?
It is probably a good idea if optimizer needs to identify CustomXXXX node
using node tag, rather than something others like custom-scan provider
name,
Right now, custom-scan feature focuses on optimization of relation scan
and join as its first scope, and does not need to identify the class of
corresponding Path node.
On the upthread of this discussion, I initially proposed to have separated
CustomScan and CustomJoin node, however, our consensus was that
CustomScan can perform as like a scan on the result set of joined
relations, so I dropped multiple node types from the first version.

Since a custom node is open implementation, it will be important to pass
as much information down to the hooks as possible; lest the hooks will
be
constrained. Since the functions signatures within the planner,
optimizer
will change from time to time, so the custom node hook signatures will
need
to change from time to time. That might turn out to be maintenance
overhead.

Yes. You are also right. But it also makes maintenance overhead if hook
has
many arguments nobody uses.
Probably, it makes sense to list up the arguments that cannot be
reproduced
from other information, can be reproduced but complicated steps, and can
be
reproduced easily.

Below is the information we cannot reproduce:
- PlannerInfo *root
- RelOptInfo *joinrel
- RelOptInfo *outerrel
- RelOptInfo *innerrel
- JoinType jointype
- SpecialJoinInfo *sjinfo
- List *restrictlist

Most of this information is available through corresponding RelOptInfo, or
we should make RelOptInfo contain all the information related to every
relation required to be computed during the query. So, any function which
creates paths can just take that RelOptInfo as an argument and produce the
path/s. That way there is lesser chance that the function signatures change.

Uhmm.... It is inconvenience to write extensions. I want the variables
in the first and second groups being delivered to the hook, even though
it may have minor modification in the future release.
Relations join is one of the heart of RDBMS, so I'd like to believe these
arguments are one of the most stable stuffs.

Below is the information we can reproduce but complicated steps:
- List *mergeclause_list
- bool mergejoin_allow
- Relids param_source_rels
- Relids extra_lateral_rels

Below is the information we can reproduce easily:
- SemiAntiJoinFactors *semifactors

I think, the first two categories or the first category (if functions to
reproduce the second group is exposed) should be informed to extension,
however, priority of the third group is not high.

BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?

Yes. I plan to support above plan node, in addition to scan / join only.
The custom-scan node is thin abstraction towards general executor
behavior,
so I believe it is not hard to enhance this node, without new plan node
for each of them.
Of course, it will need separate hook to add alternative path on the
planner
stage, but no individual plan nodes. (Sorry, it was unclear for me what
does the "hook" mean.)

If we represent all the operation like grouping, sorting, aggregation, as
some sort of relation, we can create paths for each of the relation like we
do (I am heavily borrowing from Tom's idea of pathifying those operations).
We will need much lesser hooks in custom scan node.

BTW, from the patch, I do not see this change to be light weight. I was
expecting more of a list of hooks to be defined by the user and this
infrastructure just calling them at appropriate places.

Let's focus on scan and join that we are currently working on.
Even if we need separate node type for grouping or sorting, it will not
be necessary to construct whole of the framework from the scratch.
For example, definition of CustomProvider table will be able to reuse
for other class of operations, because most of them are thin abstraction
of existing executor's interface.

Thanks,

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com]
Sent: Tuesday, February 25, 2014 5:59 PM
To: Kohei KaiGai
Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim
Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp>
wrote:

Folks,

Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join
relations.
Unfortunately, small amount of discussion we could have in this
commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.

Sorry for jumping into this late.

Instead of custom node, it might be better idea to improve FDW
infrastructure
to push join. For the starters, is it possible for the custom scan node
hooks to create a ForeignScan node? In general, I think, it might be
better
for the custom scan hooks to create existing nodes if they serve the
purpose.

Prior to time-up, I'd like to ask hacker's opinion about its
potential
arguable points (from my standpoint) if it needs to be fixed up.
One is hook definition to add alternative join path, and the other
one
is a special varno when a custom scan replace a join node.
I'd like to see your opinion about them while we still have to
change
the design if needed.

(1) Interface to add alternative paths in addition to built-in
join
paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to
allow
extensions to provide alternative scan path in addition to the
built-in
join paths. Custom-scan path being added is assumed to perform to
scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared
as follows:

/* Hook for plugins to add custom join path, in addition to
default
ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo
*sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors
*semifactors,
Relids
param_source_rels,
Relids
extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed
to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be
informed.
Probably, it makes sense to inform param_source_rels and
extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to
deliver.
(It may make sense if someone tries to implement their own
merge-join
implementation??)

I'd like to seem idea to improve the current interface
specification.

Since a custom node is open implementation, it will be important to pass
as much information down to the hooks as possible; lest the hooks will
be
constrained. Since the functions signatures within the planner,
optimizer
will change from time to time, so the custom node hook signatures will
need
to change from time to time. That might turn out to be maintenance
overhead.

BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?

(2) CUSTOM_VAR for special Var reference
@@ -134,6 +134,7 @@ typedef struct Expr
#define    INNER_VAR       65000       /* reference to inner
subplan */
#define    OUTER_VAR       65001       /* reference to outer
subplan */
#define    INDEX_VAR       65002       /* reference to index
column */
+#define    CUSTOM_VAR      65003       /* reference to custom
column */
I newly added CUSTOM_VAR to handle a case when custom-scan
override
join relations.
Var-nodes within join plan are adjusted to refer either
ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan
runs
instead of built-in joins, because its tuples being fetched are
usually
stored on the ecxt_scantuple, thus Var-nodes also need to have
right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs,
allows
extensions to rewrite Var-nodes within custom-scan node to
indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in
ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please
give
your ideas.

Thanks,

2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Hi Stephen,

Thanks for your comments.

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Is somebody available to volunteer to review the custom-scan

patch?

I looked through it a bit and my first take away from it was

that the patches

to actually use the new hooks were also making more changes to

the backend

code, leaving me with the impression that the proposed

interface
isn't

terribly stable. Perhaps those changes should have just been

in the first

patch, but they weren't and that certainly gave me pause.

Yes, the part-1 patch provides a set of interface portion to

interact

between the backend code and extension code. Rest of part-2 and

part-3

portions are contrib modules that implements its feature on top

of

custom-scan API.

I'm also not entirely convinced that this is the direction to

go in when

it comes to pushing down joins to FDWs. While that's certainly

a goal that

I think we all share, this seems to be intending to add a

completely different

feature which happens to be able to be used for that. For

FDWs,
wouldn't

we only present the FDW with the paths where the foreign tables

for that

FDW, or perhaps just a given foreign server, are being joined?

FDW's join pushing down is one of the valuable use-cases of this

interface,

but not all. As you might know, my motivation is to implement

GPU acceleration

feature on top of this interface, that offers alternative way

to scan or join

relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement

radix-sort on top

of FDW. I'd like you to understand the part-3 patch (FDW's join

pushing-down)

is a demonstration of custom-scan interface for application, but

not designed

for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver

on postgres_fdw

side, it might be an idea to have common code (like a logic to

check whether

the both relations to be joined belongs to same foreign server)

on the backend

side as something like a gateway of them.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement

"something"
on behalf of

a particular data structure being declared with CREATE FOREIGN

TABLE.

In other words, extension's responsibility is to generate a view

of "something"

according to PostgreSQL' internal data structure, instead of the

object itself.

On the other hands, custom-scan interface allows extensions to

implement

alternative methods to scan or join particular relations, but

it is not a role

to perform as a target being referenced in queries. In other

words,
it is methods

to access objects.
It is natural both features are similar because both of them

intends extensions

to hook the planner and executor, however, its purpose is

different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--

KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list
(pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Shigeru Hanada

shigeru.hanada@gmail.com

almost 12 years ago

In reply to: Kouhei Kaigai (#29)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi Kaigai-san,

2014-02-25 13:28 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

The reason why I asked the question above is, I haven't been 100% certain
about its usage. Indeed, semifactors is applied on a limited usage, but
quite easy to reproduce by extension later (using clauselist_selectivity)
if extension wants this factor. So, I agree with removing the semifactors
here.

Agreed. It would be nice to mention how to obtain semifactos for
people who want to implement advanced join overriding.

mergeclause_list and param_source_rels seem little easier to use, but
anyway it should be documented how to use those parameters.

The mergeclause_list might not be sufficient for extensions to determine
whether its own mergejoin is applicable here. See the comment below; that
is on the head of select_mergejoin_clauses.

| * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
| * this is a right/full join and there are nonmergejoinable join clauses.
| * The executor's mergejoin machinery cannot handle such cases, so we have
| * to avoid generating a mergejoin plan. (Note that this flag does NOT
| * consider whether there are actually any mergejoinable clauses. This is
| * correct because in some cases we need to build a clauseless mergejoin.
| * Simply returning NIL is therefore not enough to distinguish safe from
| * unsafe cases.)
|
It says, mergejoin_clause == NIL is not a sufficient check to determine
whether the mergejoin logic is applicable on the target join.
So, either of them is probably an option for extension that tries to implement

Perhaps you mean "both of them"?

their own mergejoin logic; (1) putting both of mergejoin_allowed and
mergeclause_list as arguments of the hook, or (2) re-definition of
select_mergejoin_clauses() as extern function to reproduce the variables on
demand. Which one is more preferable?

I prefer (1), because exposing inside of planner might blocks changing
those internal functions. If (at the moment) those information is
enough for overriding merge join for CSP, let's provide as parameters.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Shigeru Hanada

shigeru.hanada@gmail.com

almost 12 years ago

In reply to: Kohei KaiGai (#26)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi Kaigai-san,

2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:

(1) Interface to add alternative paths in addition to built-in join paths

I found that create_custom_path is not used at all in your patch.
I revised postgresql_fdw.c to use it like this.

...
/* Create join information which is stored as private information. */
memset(&jinfo, 0, sizeof(PgRemoteJoinInfo));
jinfo.fdw_server_oid = o_server_oid;
jinfo.fdw_user_oid = o_user_oid;
jinfo.relids = joinrel->relids;
jinfo.jointype = jointype;
jinfo.outer_rel = o_relinfo;
jinfo.inner_rel = i_relinfo;
jinfo.remote_conds = j_remote_conds;
jinfo.local_conds = j_local_conds;

/* OK, make a CustomScan node to run remote join */
cpath = create_customscan_path(root,
joinrel,
0, 0, 0, /* estimate later */
NIL,
required_outer,
"postgres-fdw",
0,
packPgRemoteJoinInfo(&jinfo));

estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);

add_path(joinrel, &cpath->path);
...

This seems to work fine. Is this right approach? If so,this portion
would be a good example to replace local join with custom scan for
authors of custom scan providers. One thing I worry is the case that
you've intentionally avoided calling create_customscan_path.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#31)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Instead of custom node, it might be better idea to improve FDW infrastructure
to push join. For the starters, is it possible for the custom scan node
hooks to create a ForeignScan node? In general, I think, it might be better
for the custom scan hooks to create existing nodes if they serve the purpose.

It does not work well because existing FDW infrastructure is designed to
perform on foreign tables, not regular tables. Probably, it needs to revise
much our assumption around the background code, if we re-define the purpose
of FDW infrastructure. For example, ForeignScan is expected to return a tuple
according to the TupleDesc that is exactly same with table definition.
It does not fit the requirement if we replace a join-node by ForeignScan
because its TupleDesc of joined relations is not predefined.

I'm not following this logic at all- how are you defining "foreign" from
"regular"? Certainly, in-memory-only tables which are sitting out in
some non-persistent GPU memory aren't "regular" by any PG definition.
Perhaps you can't make ForeignScan suddenly work as a join-node
replacement, but I've not seen where anyone has proposed that (directly-
I've implied it on occation where a remote view can be used, but that's
not the same thing as having proper push-down support for joins).

I'd like to define these features are designed for individual purpose.

My previous complaint about this patch set has been precisely that each
piece seems to be custom-built and every patch needs more and more
backend changes. If every time someone wants to do something with this
CustomScan API, they need changes made to the backend code, then it's
not a generally useful external API. We really don't want to define
such an external API as then we have to deal with backwards
compatibility, particularly when it's all specialized to specific use
cases which are all different.

FDW is designed to intermediate an external data source and internal heap
representation according to foreign table definition. In other words, its
role is to generate contents of predefined database object on the fly.

There's certainly nothing in the FDW API which requires that the remote
side have an internal heap representation, as evidenced by the various
FDWs which already exist and certainly are not any kind of 'normal'
heap. Every query against the foriegn relation goes through the FDW API
and can end up returning whatever the FDW author decides is appropriate
to return at that time, as long as it matches the tuple description-
which is absolutely necessary for any kind of sanity, imv.

On the other hands, custom-scan is designed to implement alternative ways
to scan / join relations in addition to the methods supported by built-in
feature.

I can see the usefulness in being able to push down aggregates or other
function-type calls to the remote side of an FDW and would love to see
work done along those lines, along with the ability to push down joins
to remote systems- but I'm not convinced that the claimed flexibility
with the CustomScan API is there, given the need to continue modifying
the backend code for each use-case, nor that there are particularly new
and inventive ways of saying "find me all the cases where set X overlaps
with set Y". I'm certainly open to the idea that we could have an FDW
API which allows us to ask exactly that question and let the remote side
cost it out and give us an answer for a pair of relations but that isn't
what this is. Note also that in any kind of aggregation push-down we
must be sure that the function is well-defined and that the FDW is on
the hook to ensure that the returned data is the same as if we ran the
same aggregate function locally, otherwise the results of a query might
differ based on if the aggregate was fired locally or remotely (which
could be influenced by costing- eg: the size of the relation or its
statistics).

I'm motivated to implement GPU acceleration feature that works transparently
for application. Thus, it has to be capable on regular tables, because most
of application stores data on regular tables, not foreign ones.

You want to persist that data in the GPU across multiple calls though,
which makes it unlike any kind of regular PG table and much more like
some foreign table. Perhaps the data is initially loaded from a local
table and then updated on the GPU card in some way when the 'real' table
is updated, but neither of those makes it a "regular" PG table.

Since a custom node is open implementation, it will be important to pass
as much information down to the hooks as possible; lest the hooks will be
constrained. Since the functions signatures within the planner, optimizer
will change from time to time, so the custom node hook signatures will need
to change from time to time. That might turn out to be maintenance overhead.

It's more than "from time-to-time", it was "for each use case in the
given patch set asking for this feature", which is why I'm pushing back
on it.

Yes. You are also right. But it also makes maintenance overhead if hook has
many arguments nobody uses.

I can agree with this- there should be a sensible API if we're going to
do this.

Probably, it makes sense to list up the arguments that cannot be reproduced
from other information, can be reproduced but complicated steps, and can be
reproduced easily.

This really strikes me as the wrong approach for an FDW join-pushdown
API, which should be geared around giving the remote side an opportunity
on a case-by-case basis to cost out joins using whatever methods it has
available to implement them. I've outlined above the reasons I don't
agree with just making the entire planner/optimizer pluggable.

Thanks,

Stephen

#37

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#23)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Yes, the part-1 patch provides a set of interface portion to interact
between the backend code and extension code. Rest of part-2 and part-3
portions are contrib modules that implements its feature on top of
custom-scan API.

Just to come back to this- the other two "contrib module" patches, at
least as I read over their initial submission, were *also* patching
portions of backend code which it was apparently discovered that they
needed. That's a good bit of my complaint regarding this approach.

FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU acceleration
feature on top of this interface, that offers alternative way to scan or join
relations or potentially sort or aggregate.

If you're looking to just use GPU acceleration for improving individual
queries, I would think that Robert's work around backend workers would
be a more appropriate way to go, with the ability to move a working set
of data from shared buffers and on-disk representation of a relation
over to the GPU's memory, perform the operation, and then copy the
results back. If that's not possible or effective wrt performance, then
I think we need to look at managing the external GPU memory as a foreign
system through an FDW which happens to be updated through triggers or
similar. The same could potentially be done for memcached systems, etc.

"regular" PG tables, just to point out one issue, can be locked on a
row-by-row basis, and we know exactly where in shared buffers to go hunt
down the rows. How is that going to work here, if this is both a
"regular" table and stored off in a GPU's memory across subsequent
queries or even transactions?

Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
side, it might be an idea to have common code (like a logic to check whether
the both relations to be joined belongs to same foreign server) on the backend
side as something like a gateway of them.

Yes, that's what I was suggesting above- we should be asking the FDWs on
a case-by-case basis how to cost out the join between foreign tables
which they are responsible for. Asking two different FDWs servers to
cost out a join between their tables doesn't make any sense to me.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something" on behalf of
a particular data structure being declared with CREATE FOREIGN TABLE.

That's where it is today, but certainly not our end goal.

In other words, extension's responsibility is to generate a view of "something"
according to PostgreSQL' internal data structure, instead of the object itself.

The result of the FDW call needs to be something which PG understands
and can work with, otherwise we wouldn't be able to, say, run PL/pgsql
code on the result, or pass it into some other aggregate which we
decided was cheaper to run locally. Being able to push down aggregates
to the remote side of an FDW certainly fits in quite well with that.

On the other hands, custom-scan interface allows extensions to implement
alternative methods to scan or join particular relations, but it is not a role
to perform as a target being referenced in queries. In other words, it is methods
to access objects.

The custom-scan interface still needs to produce "something" according
to PG's internal data structures, so it's not clear to me where you're
going with this.

It is natural both features are similar because both of them intends extensions
to hook the planner and executor, however, its purpose is different.

I disagree as I don't really view FDWs as "hooks". A "hook" is more
like a trigger- sure, you can modify the data in transit, or throw an
error if you see an issue, but you don't get to redefine the world and
throw out what the planner or optimizer knows about the rest of what is
going on in the query.

Thanks,

Stephen

#38

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Stephen Frost (#36)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Instead of custom node, it might be better idea to improve FDW
infrastructure to push join. For the starters, is it possible for
the custom scan node hooks to create a ForeignScan node? In general,
I think, it might be better for the custom scan hooks to create existing

nodes if they serve the purpose.

It does not work well because existing FDW infrastructure is designed
to perform on foreign tables, not regular tables. Probably, it needs
to revise much our assumption around the background code, if we
re-define the purpose of FDW infrastructure. For example, ForeignScan
is expected to return a tuple according to the TupleDesc that is exactly

same with table definition.

It does not fit the requirement if we replace a join-node by
ForeignScan because its TupleDesc of joined relations is not predefined.

I'm not following this logic at all- how are you defining "foreign" from
"regular"? Certainly, in-memory-only tables which are sitting out in some
non-persistent GPU memory aren't "regular" by any PG definition.
Perhaps you can't make ForeignScan suddenly work as a join-node replacement,
but I've not seen where anyone has proposed that (directly- I've implied
it on occation where a remote view can be used, but that's not the same
thing as having proper push-down support for joins).

This regular one means usual tables. Even though custom implementation
may reference self-managed in-memory cache instead of raw heap, the table
pointed in user's query shall be a usual table.
In the past, Hanada-san had proposed an enhancement of FDW to support
remote-join but eventually rejected.

I'd like to define these features are designed for individual purpose.

My previous complaint about this patch set has been precisely that each
piece seems to be custom-built and every patch needs more and more backend
changes. If every time someone wants to do something with this CustomScan
API, they need changes made to the backend code, then it's not a generally
useful external API. We really don't want to define such an external API
as then we have to deal with backwards compatibility, particularly when
it's all specialized to specific use cases which are all different.

The changes to backend are just for convenient. We may be able to implement
functions to translate Bitmapset from/to cstring form in postgres_fdw,
does it make sense to maintain individually?
I thought these functions were useful to have in the backend commonly, but
is not a fundamental functionality lacks of the custom-scan interface.

FDW is designed to intermediate an external data source and internal
heap representation according to foreign table definition. In other
words, its role is to generate contents of predefined database object

on the fly.

There's certainly nothing in the FDW API which requires that the remote
side have an internal heap representation, as evidenced by the various FDWs
which already exist and certainly are not any kind of 'normal'
heap. Every query against the foriegn relation goes through the FDW API
and can end up returning whatever the FDW author decides is appropriate
to return at that time, as long as it matches the tuple description- which
is absolutely necessary for any kind of sanity, imv.

Yes. It's my understanding for the role of FDW driver.

On the other hands, custom-scan is designed to implement alternative
ways to scan / join relations in addition to the methods supported by
built-in feature.

I can see the usefulness in being able to push down aggregates or other
function-type calls to the remote side of an FDW and would love to see work
done along those lines, along with the ability to push down joins to remote
systems- but I'm not convinced that the claimed flexibility with the
CustomScan API is there, given the need to continue modifying the backend
code for each use-case, nor that there are particularly new and inventive
ways of saying "find me all the cases where set X overlaps with set Y".
I'm certainly open to the idea that we could have an FDW API which allows
us to ask exactly that question and let the remote side cost it out and
give us an answer for a pair of relations but that isn't what this is. Note
also that in any kind of aggregation push-down we must be sure that the
function is well-defined and that the FDW is on the hook to ensure that
the returned data is the same as if we ran the same aggregate function locally,
otherwise the results of a query might differ based on if the aggregate
was fired locally or remotely (which could be influenced by costing- eg:
the size of the relation or its statistics).

I can also understand the usefulness of join or aggregation into the remote
side in case of foreign table reference. In similar way, it is also useful
if we can push these CPU intensive operations into co-processors on regular
table references.
As I mentioned above, the backend changes by the part-2/-3 patches are just
minor stuff, and I thought it should not be implemented by contrib module
locally.
Regarding to the condition where we can run remote aggregation, you are
right. As current postgres_fdw push-down qualifiers into remote side,
we need to ensure remote aggregate definition is identical with local one.

I'm motivated to implement GPU acceleration feature that works
transparently for application. Thus, it has to be capable on regular
tables, because most of application stores data on regular tables, not

foreign ones.

You want to persist that data in the GPU across multiple calls though, which
makes it unlike any kind of regular PG table and much more like some foreign
table. Perhaps the data is initially loaded from a local table and then
updated on the GPU card in some way when the 'real' table is updated, but
neither of those makes it a "regular" PG table.

No. What I want to implement is, read the regular table and transfer the
contents into GPU's local memory for calculation, then receives its
calculation result. The in-memory cache (also I'm working on) is supplemental
stuff because disk access is much slower and row-oriented data structure is
not suitable for SIMD style instructions.

Since a custom node is open implementation, it will be important to
pass as much information down to the hooks as possible; lest the
hooks will be constrained. Since the functions signatures within
the planner, optimizer will change from time to time, so the custom
node hook signatures will need to change from time to time. That might

turn out to be maintenance overhead.

It's more than "from time-to-time", it was "for each use case in the given
patch set asking for this feature", which is why I'm pushing back on it.

My patch set didn't change the interface itself. All it added was (probably)
useful utility routines to be placed on the backend, rather than contrib.

Yes. You are also right. But it also makes maintenance overhead if
hook has many arguments nobody uses.

I can agree with this- there should be a sensible API if we're going to
do this.

Probably, it makes sense to list up the arguments that cannot be
reproduced from other information, can be reproduced but complicated
steps, and can be reproduced easily.

This really strikes me as the wrong approach for an FDW join-pushdown API,
which should be geared around giving the remote side an opportunity on a
case-by-case basis to cost out joins using whatever methods it has available
to implement them. I've outlined above the reasons I don't agree with just
making the entire planner/optimizer pluggable.

I'm also inclined to have arguments that will provide enough information
for extensions to determine the best path for them.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#38)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

This regular one means usual tables. Even though custom implementation
may reference self-managed in-memory cache instead of raw heap, the table
pointed in user's query shall be a usual table.
In the past, Hanada-san had proposed an enhancement of FDW to support
remote-join but eventually rejected.

I'm not aware of the specifics around that proposal but I don't believe
we, as a community, have decided to reject the idea in general.

The changes to backend are just for convenient. We may be able to implement
functions to translate Bitmapset from/to cstring form in postgres_fdw,
does it make sense to maintain individually?

Perhaps not.

I thought these functions were useful to have in the backend commonly, but
is not a fundamental functionality lacks of the custom-scan interface.

Then perhaps they should be exposed more directly? I can understand
generally useful functionality being exposed in a way that anyone can
use it, but we need to avoid interfaces which can't be stable due to
normal / ongoing changes to the backend code.

I can also understand the usefulness of join or aggregation into the remote
side in case of foreign table reference. In similar way, it is also useful
if we can push these CPU intensive operations into co-processors on regular
table references.

That's fine, if we can get data to and from those co-processors
efficiently enough that it's worth doing so. If moving the data to the
GPU's memory will take longer than running the actual aggregation, then
it doesn't make any sense for regular tables because then we'd have to
cache the data in the GPU's memory in some way across multiple queries,
which isn't something we're set up to do.

As I mentioned above, the backend changes by the part-2/-3 patches are just
minor stuff, and I thought it should not be implemented by contrib module
locally.

Fine- then propose them as generally useful additions, not as patches
which are supposed to just be for contrib modules using an already
defined interface. If you can make a case for that then perhaps this is
more practical.

Regarding to the condition where we can run remote aggregation, you are
right. As current postgres_fdw push-down qualifiers into remote side,
we need to ensure remote aggregate definition is identical with local one.

Of course.

No. What I want to implement is, read the regular table and transfer the
contents into GPU's local memory for calculation, then receives its
calculation result. The in-memory cache (also I'm working on) is supplemental
stuff because disk access is much slower and row-oriented data structure is
not suitable for SIMD style instructions.

Is that actually performant? Is it actually faster than processing the
data directly? The discussions that I've had with folks have cast a
great deal of doubt in my mind about just how well that kind of quick
turn-around to the GPU's memory actually works.

This really strikes me as the wrong approach for an FDW join-pushdown API,
which should be geared around giving the remote side an opportunity on a
case-by-case basis to cost out joins using whatever methods it has available
to implement them. I've outlined above the reasons I don't agree with just
making the entire planner/optimizer pluggable.

I'm also inclined to have arguments that will provide enough information
for extensions to determine the best path for them.

For join push-down, I proposed above that we have an interface to the
FDW which allows us to ask it how much each join of the tables which are
on a given FDW's server would cost if the FDW did it vs. pulling it back
and doing it locally. We could also pass all of the relations to the
FDW with the various join-quals and try to get an answer to everything,
but I'm afraid that'd simply end up duplicating the logic of the
optimizer into every FDW, which would be counter-productive.
Admittedly, getting the costing right isn't easy either, but it's not
clear to me how it'd make sense for the local server to be doing costing
for remote servers.

Thanks,

Stephen

#40

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Stephen Frost (#37)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Yes, the part-1 patch provides a set of interface portion to interact
between the backend code and extension code. Rest of part-2 and part-3
portions are contrib modules that implements its feature on top of
custom-scan API.

Just to come back to this- the other two "contrib module" patches, at least
as I read over their initial submission, were *also* patching portions of
backend code which it was apparently discovered that they needed. That's
a good bit of my complaint regarding this approach.

?? Sorry, are you still negative on the portion of backend patched
by the part-2 and part-3 portion??

FDW's join pushing down is one of the valuable use-cases of this
interface, but not all. As you might know, my motivation is to
implement GPU acceleration feature on top of this interface, that
offers alternative way to scan or join relations or potentially sort or

aggregate.

If you're looking to just use GPU acceleration for improving individual
queries, I would think that Robert's work around backend workers would be
a more appropriate way to go, with the ability to move a working set of
data from shared buffers and on-disk representation of a relation over to
the GPU's memory, perform the operation, and then copy the results back.

The approach is similar to the Robert's work except for GPU adoption,
instead of multicore CPUs. So, I tried to review his work to apply
the facilities on my extension also.

If that's not possible or effective wrt performance, then I think we need
to look at managing the external GPU memory as a foreign system through
an FDW which happens to be updated through triggers or similar. The same
could potentially be done for memcached systems, etc.

I didn't imagine the idea that expose GPU's local memory.
A supplemental stuff for the data load performance I'm planning is just
a cache mechanism besides regular tables.

"regular" PG tables, just to point out one issue, can be locked on a
row-by-row basis, and we know exactly where in shared buffers to go hunt
down the rows. How is that going to work here, if this is both a "regular"
table and stored off in a GPU's memory across subsequent queries or even
transactions?

It shall be handled "case-by-case" basis, I think. If row-level lock is
required over the table scan, custom-scan node shall return a tuple being
located on the shared buffer, instead of the cached tuples. Of course,
it is an option for custom-scan node to calculate qualifiers by GPU with
cached data and returns tuples identified by ctid of the cached tuples.
Anyway, it is not a significant problem.

Right now, I put all the logic to interact CSI and FDW driver on
postgres_fdw side, it might be an idea to have common code (like a
logic to check whether the both relations to be joined belongs to same
foreign server) on the backend side as something like a gateway of them.

Yes, that's what I was suggesting above- we should be asking the FDWs on
a case-by-case basis how to cost out the join between foreign tables which
they are responsible for. Asking two different FDWs servers to cost out
a join between their tables doesn't make any sense to me.

OK, I'll move the portion that will be needed commonly for other FDWs into
the backend code.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something" on
behalf of a particular data structure being declared with CREATE FOREIGN

TABLE.

That's where it is today, but certainly not our end goal.

In other words, extension's responsibility is to generate a view of

"something"

according to PostgreSQL' internal data structure, instead of the object

itself.

The result of the FDW call needs to be something which PG understands and
can work with, otherwise we wouldn't be able to, say, run PL/pgsql code
on the result, or pass it into some other aggregate which we decided was
cheaper to run locally. Being able to push down aggregates to the remote
side of an FDW certainly fits in quite well with that.

Yes. According to the previous discussion around postgres_fdw getting
merged, all we can trust on the remote side are built-in data types,
functions, operators or other stuffs only.

On the other hands, custom-scan interface allows extensions to
implement alternative methods to scan or join particular relations,
but it is not a role to perform as a target being referenced in
queries. In other words, it is methods to access objects.

The custom-scan interface still needs to produce "something" according to
PG's internal data structures, so it's not clear to me where you're going
with this.

The custom-scan node is intended to perform on regular relations, not
only foreign tables. It means a special feature (like GPU acceleration)
can perform transparently for most of existing applications. Usually,
it defines regular tables for their work on installation, not foreign
tables. It is the biggest concern for me.

It is natural both features are similar because both of them intends
extensions to hook the planner and executor, however, its purpose is

different.

I disagree as I don't really view FDWs as "hooks". A "hook" is more like
a trigger- sure, you can modify the data in transit, or throw an error if
you see an issue, but you don't get to redefine the world and throw out
what the planner or optimizer knows about the rest of what is going on in
the query.

I might have miswording. Anyway, I want plan nodes that enable extensions
to define its behavior, even though it's similar to ForeignScan, but allows
to perform on regular relations. Also, not only custom-scan and foreign-scan,
any plan nodes work according to the interface to co-work with other nodes,
it is not strange that both of interfaces are similar.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Shigeru Hanada

shigeru.hanada@gmail.com

almost 12 years ago

In reply to: Kouhei Kaigai (#40)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-02-26 16:46 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Just to come back to this- the other two "contrib module" patches, at least
as I read over their initial submission, were *also* patching portions of
backend code which it was apparently discovered that they needed. That's
a good bit of my complaint regarding this approach.

?? Sorry, are you still negative on the portion of backend patched
by the part-2 and part-3 portion??

Perhaps he meant to separate patches based on feature-based rule. IMO
if exposing utilities is essential for Custom Scan API in practical
meaning, IOW to implement and maintain an extension which implements
Custom Scan API, they should be go into the first patch. IIUC two
contrib modules are also PoC for the API, so part-2/3 patch should
contain only changes against contrib and its document.

Besides that, some typo fixing are mixed in part-2 patch. They should
go into the part-1 patch where the typo introduced.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#40)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Just to come back to this- the other two "contrib module" patches, at least
as I read over their initial submission, were *also* patching portions of
backend code which it was apparently discovered that they needed. That's
a good bit of my complaint regarding this approach.

?? Sorry, are you still negative on the portion of backend patched
by the part-2 and part-3 portion??

Pretty sure that I sent that prior to your last email, or at least
before I was to the end of it.

If you're looking to just use GPU acceleration for improving individual
queries, I would think that Robert's work around backend workers would be
a more appropriate way to go, with the ability to move a working set of
data from shared buffers and on-disk representation of a relation over to
the GPU's memory, perform the operation, and then copy the results back.

The approach is similar to the Robert's work except for GPU adoption,
instead of multicore CPUs. So, I tried to review his work to apply
the facilities on my extension also.

Good, I'd be very curious to hear how that might solve the issue for
you, instead of using hte CustomScan approach..

"regular" PG tables, just to point out one issue, can be locked on a
row-by-row basis, and we know exactly where in shared buffers to go hunt
down the rows. How is that going to work here, if this is both a "regular"
table and stored off in a GPU's memory across subsequent queries or even
transactions?

It shall be handled "case-by-case" basis, I think. If row-level lock is
required over the table scan, custom-scan node shall return a tuple being
located on the shared buffer, instead of the cached tuples. Of course,
it is an option for custom-scan node to calculate qualifiers by GPU with
cached data and returns tuples identified by ctid of the cached tuples.
Anyway, it is not a significant problem.

I think you're being a bit too hand-wavey here, but if we're talking
about pre-scanning the data using PG before sending it to the GPU and
then only performing a single statement on the GPU, we should be able to
deal with it. I'm worried about your ideas to try and cache things on
the GPU though, if you're not prepared to deal with locks happening in
shared memory on the rows you've got cached out on the GPU, or hint
bits, or the visibility map being updated, etc...

OK, I'll move the portion that will be needed commonly for other FDWs into
the backend code.

Alright- but realize that there may be objections there on the basis
that the code/structures which you're exposing aren't, and will not be,
stable. I'll have to go back and look at them myself, certainly, and
their history.

Yes. According to the previous discussion around postgres_fdw getting
merged, all we can trust on the remote side are built-in data types,
functions, operators or other stuffs only.

Well, we're going to need to expand that a bit for aggregates, I'm
afraid, but we should be able to define the API for those aggregates
very tightly based on what PG does today and require that any FDW
purporting to provides those aggregates do it the way PG does. Note
that this doesn't solve all the problems- we've got other issues with
regard to pushing aggregates down into FDWs that need to be solved.

The custom-scan node is intended to perform on regular relations, not
only foreign tables. It means a special feature (like GPU acceleration)
can perform transparently for most of existing applications. Usually,
it defines regular tables for their work on installation, not foreign
tables. It is the biggest concern for me.

The line between a foreign table and a local one is becoming blurred
already, but still, if this is the goal then I really think the
background worker is where you should be focused, not on this Custom
Scan API. Consider that, once we've got proper background workers,
we're going to need new nodes which operate in parallel (or some other
rejiggering of the nodes- I don't pretend to know exactly what Robert is
thinking here, and I've apparently forgotten it if he's posted it
somewhere) and those interfaces may drive changes which would impact the
Custom Scan API- or worse, make us deprecate or regret having added it
because now we'll need to break backwards compatibility to add in the
parallel node capability to satisfy the more general non-GPU case.

I might have miswording. Anyway, I want plan nodes that enable extensions
to define its behavior, even though it's similar to ForeignScan, but allows
to perform on regular relations. Also, not only custom-scan and foreign-scan,
any plan nodes work according to the interface to co-work with other nodes,
it is not strange that both of interfaces are similar.

It sounds a lot like you're trying to define, external to PG, what
Robert is already trying to get going *internal* to PG, and I really
don't want to end up in a situation where we've got a solution for the
uncommon case but aren't able to address the common case due to risk of
breaking backwards compatibility...

Thanks,

Stephen

#43

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Shigeru Hanada (#41)

Re: Custom Scan APIs (Re: Custom Plan node)

* Shigeru Hanada (shigeru.hanada@gmail.com) wrote:

Perhaps he meant to separate patches based on feature-based rule. IMO
if exposing utilities is essential for Custom Scan API in practical
meaning, IOW to implement and maintain an extension which implements
Custom Scan API, they should be go into the first patch. IIUC two
contrib modules are also PoC for the API, so part-2/3 patch should
contain only changes against contrib and its document.

That's what I was getting at, yes.

Besides that, some typo fixing are mixed in part-2 patch. They should
go into the part-1 patch where the typo introduced.

Agreed.

THanks,

Stephen

#44

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Stephen Frost (#39)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

This regular one means usual tables. Even though custom implementation
may reference self-managed in-memory cache instead of raw heap, the
table pointed in user's query shall be a usual table.
In the past, Hanada-san had proposed an enhancement of FDW to support
remote-join but eventually rejected.

I'm not aware of the specifics around that proposal but I don't believe
we, as a community, have decided to reject the idea in general.

IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.
I believe it is still one of our killer feature if we can revise the
implementation.

Hanada-san, could you put the reason why your proposition was rejected
before?

I thought these functions were useful to have in the backend commonly,
but is not a fundamental functionality lacks of the custom-scan interface.

Then perhaps they should be exposed more directly? I can understand
generally useful functionality being exposed in a way that anyone can use
it, but we need to avoid interfaces which can't be stable due to normal
/ ongoing changes to the backend code.

The functions my patches want to expose are:
- get_restriction_qual_cost()
- fix_expr_common()

And, the functions my patches newly want are:
- bms_to_string()
- bms_from_string()

Above two functions are defined as static functions because cost estimation
is done at costsize.c and set-reference is done at setrefs.c, however,
custom-scan breaks this assumption, so I moved it into public.
These are used by everyone, but everyone exists on a particular file.

I can also understand the usefulness of join or aggregation into the
remote side in case of foreign table reference. In similar way, it is
also useful if we can push these CPU intensive operations into
co-processors on regular table references.

That's fine, if we can get data to and from those co-processors efficiently
enough that it's worth doing so. If moving the data to the GPU's memory
will take longer than running the actual aggregation, then it doesn't make
any sense for regular tables because then we'd have to cache the data in
the GPU's memory in some way across multiple queries, which isn't something
we're set up to do.

When I made a prototype implementation on top of FDW, using CUDA, it enabled
to run sequential scan 10 times faster than SeqScan on regular tables, if
qualifiers are enough complex.
Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer
mode using hardware DMA. It allows to hide the cost of data transfer by
pipelining, if here is enough number of records to be transferred.
Also, the recent trend of semiconductor device is GPU integration with CPU,
that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or
Tegra K1 of nvidia. All of them shares same memory, so no need to transfer
the data to be calculated. This trend is dominated by physical law because
of energy consumption by semiconductor. So, I'm optimistic for my idea.

As I mentioned above, the backend changes by the part-2/-3 patches are
just minor stuff, and I thought it should not be implemented by
contrib module locally.

Fine- then propose them as generally useful additions, not as patches which
are supposed to just be for contrib modules using an already defined
interface. If you can make a case for that then perhaps this is more
practical.

The usage was found by the contrib module that wants to call static
functions, or feature to translate existing data structure to/from
cstring. But, anyway, does separated patch make sense?

No. What I want to implement is, read the regular table and transfer
the contents into GPU's local memory for calculation, then receives
its calculation result. The in-memory cache (also I'm working on) is
supplemental stuff because disk access is much slower and row-oriented
data structure is not suitable for SIMD style instructions.

Is that actually performant? Is it actually faster than processing the
data directly? The discussions that I've had with folks have cast a great
deal of doubt in my mind about just how well that kind of quick turn-around
to the GPU's memory actually works.

See above.

This really strikes me as the wrong approach for an FDW
join-pushdown API, which should be geared around giving the remote
side an opportunity on a case-by-case basis to cost out joins using
whatever methods it has available to implement them. I've outlined
above the reasons I don't agree with just making the entire

planner/optimizer pluggable.

I'm also inclined to have arguments that will provide enough
information for extensions to determine the best path for them.

For join push-down, I proposed above that we have an interface to the FDW
which allows us to ask it how much each join of the tables which are on
a given FDW's server would cost if the FDW did it vs. pulling it back and
doing it locally. We could also pass all of the relations to the FDW with
the various join-quals and try to get an answer to everything, but I'm afraid
that'd simply end up duplicating the logic of the optimizer into every FDW,
which would be counter-productive.

Hmm... It seems to me we should follow the existing manner to construct
join path, rather than special handling. Even if a query contains three or
more foreign tables managed by same server, it shall be consolidated into
one remote join as long as its cost is less than local ones.
So, I'd like to bed using the new add_join_path_hook to compute possible
join path. If remote join implemented by custom-scan is cheaper than local
join, it shall be chosen, then optimizer will try joining with other foreign
tables with this custom-scan node. If remote-join is still cheap, then it
shall be consolidated again.

Admittedly, getting the costing right isn't easy either, but it's not clear
to me how it'd make sense for the local server to be doing costing for remote
servers.

Right now, I ignored the cost to run remote-server, focused on the cost to
transfer via network. It might be an idea to discount the CPU cost of remote
execution.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Stephen Frost (#42)

Re: Custom Scan APIs (Re: Custom Plan node)

If you're looking to just use GPU acceleration for improving
individual queries, I would think that Robert's work around backend
workers would be a more appropriate way to go, with the ability to
move a working set of data from shared buffers and on-disk
representation of a relation over to the GPU's memory, perform the

operation, and then copy the results back.

The approach is similar to the Robert's work except for GPU adoption,
instead of multicore CPUs. So, I tried to review his work to apply the
facilities on my extension also.

Good, I'd be very curious to hear how that might solve the issue for you,
instead of using hte CustomScan approach..

I (plan to) use custom-scan of course. Once a relation is referenced
and optimizer decided GPU acceleration is cheaper, associated custom-
scan node read the data from underlying relation (or in-memory cache
if exists) then move to the shared memory buffer to deliver GPU
management background worker that launches asynchronous DMA one by one.
After that, custom-scan node receives filtered records via shared-
memory buffer, so it can construct tuples to be returned to the upper
node.

"regular" PG tables, just to point out one issue, can be locked on a
row-by-row basis, and we know exactly where in shared buffers to go
hunt down the rows. How is that going to work here, if this is both

a "regular"

table and stored off in a GPU's memory across subsequent queries or
even transactions?

It shall be handled "case-by-case" basis, I think. If row-level lock
is required over the table scan, custom-scan node shall return a tuple
being located on the shared buffer, instead of the cached tuples. Of
course, it is an option for custom-scan node to calculate qualifiers
by GPU with cached data and returns tuples identified by ctid of the cached

tuples.

Anyway, it is not a significant problem.

I think you're being a bit too hand-wavey here, but if we're talking about
pre-scanning the data using PG before sending it to the GPU and then only
performing a single statement on the GPU, we should be able to deal with
it.

It's what I want to implement.

I'm worried about your ideas to try and cache things on the GPU though,
if you're not prepared to deal with locks happening in shared memory on
the rows you've got cached out on the GPU, or hint bits, or the visibility
map being updated, etc...

It does not remain any state/information on the GPU side. Things related
to PG internal stuff is job of CPU.

OK, I'll move the portion that will be needed commonly for other FDWs
into the backend code.

Alright- but realize that there may be objections there on the basis that
the code/structures which you're exposing aren't, and will not be, stable.
I'll have to go back and look at them myself, certainly, and their history.

I see, but it is a process during code getting merged.

Yes. According to the previous discussion around postgres_fdw getting
merged, all we can trust on the remote side are built-in data types,
functions, operators or other stuffs only.

Well, we're going to need to expand that a bit for aggregates, I'm afraid,
but we should be able to define the API for those aggregates very tightly
based on what PG does today and require that any FDW purporting to provides
those aggregates do it the way PG does. Note that this doesn't solve all
the problems- we've got other issues with regard to pushing aggregates down
into FDWs that need to be solved.

I see. It probably needs more detailed investigation.

The custom-scan node is intended to perform on regular relations, not
only foreign tables. It means a special feature (like GPU
acceleration) can perform transparently for most of existing
applications. Usually, it defines regular tables for their work on
installation, not foreign tables. It is the biggest concern for me.

The line between a foreign table and a local one is becoming blurred already,
but still, if this is the goal then I really think the background worker
is where you should be focused, not on this Custom Scan API. Consider that,
once we've got proper background workers, we're going to need new nodes
which operate in parallel (or some other rejiggering of the nodes- I don't
pretend to know exactly what Robert is thinking here, and I've apparently
forgotten it if he's posted it
somewhere) and those interfaces may drive changes which would impact the
Custom Scan API- or worse, make us deprecate or regret having added it
because now we'll need to break backwards compatibility to add in the
parallel node capability to satisfy the more general non-GPU case.

The custom-scan API is thin abstraction towards the plan node interface,
not tightly convinced with a particular use case, like GPU, remote-join
and so on. So, I'm quite optimistic for the future maintainability.
Also, please remind the discussion at the last developer meeting.
The purpose of custom-scan (we didn't name it at that time) is to avoid
unnecessary project branch for people who want to implement their own
special feature but no facilities to enhance optimizer/executor are
supported.
Even though we have in-core parallel execution feature by CPU, it also
makes sense to provide some unique implementation that may be suitable
for a specific region.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Shigeru Hanada

shigeru.hanada@gmail.com

almost 12 years ago

In reply to: Kouhei Kaigai (#44)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-02-26 17:31 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.
I believe it is still one of our killer feature if we can revise the
implementation.

Hanada-san, could you put the reason why your proposition was rejected
before?

IIUC it was not rejected, just returned-with-feedback. We could not
get consensus about how join-push-down works. A duscussion point was
multiple paths for a joinrel, but it was not so serious point. Here
is the tail of the thread.

/messages/by-id/4F058241.2000606@enterprisedb.com

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:

Hmm, so you're saying that the FDW function needs to be able to return
multiple paths for a single joinrel. Fair enough, and that's not
specific to remote joins. Even a single-table foreign scan could be
implemented differently depending on whether you prefer fast-start or
cheapest total.

... or ordered vs unordered, etc. Yeah, good point, we already got this
wrong with the PlanForeignScan API. Good thing we didn't promise that
would be stable.

This discussion withered down here...

I think the advice to Shigeru-san is to work on the API. We didn't reach a
consensus on what exactly it should look like, but at least you need to be
able to return multiple paths for a single joinrel, and should look at
fixing the PlanForeignScan API to allow that too.

And I've gave up for lack of time, IOW to finish more fundamental
portion of FDW API.

/messages/by-id/4F39FC1A.7090202@gmail.com

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#44)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.

For my part, trying to consider doing remote joins *without* going
through FDWs is just nonsensical. What are you joining remotely if not
two foreign tables? With regard to the GPU approach, if that model
works whereby the normal PG tuples are read off disk, fed over to the
GPU, processed, then returned back to the user through PG, then I
wouldn't consider it really a 'remote' join but rather simply a new
execution node inside of PG which is planned and costed just like the
others. We've been over the discussion already about trying to make
that a pluggable system but the, very reasonable, push-back on that has
been if it's really possible and really makes sense to be pluggable. It
certainly doesn't *have* to be- PostgreSQL is written in C, as we all
know, and plenty of C code talks to GPUs and shuffles memory around- and
that's almost exactly what Robert is working on supporting with regular
CPUs and PG backends already.

In many ways, trying to conflate this idea of using-GPUs-to-do-work with
the idea of remote-FDW-joins has really disillusioned me with regard to
the CustomScan approach.

Then perhaps they should be exposed more directly? I can understand
generally useful functionality being exposed in a way that anyone can use
it, but we need to avoid interfaces which can't be stable due to normal
/ ongoing changes to the backend code.

The functions my patches want to expose are:
- get_restriction_qual_cost()
- fix_expr_common()

I'll try and find time to go look at these in more detail later this
week. I have reservations about exposing the current estimates on costs
as we may want to adjust them in the future- but such adjustments may
need to be made in balance with other changes throughout the system and
an external module which depends on one result from the qual costing
might end up having problems with the costing changes because the
extension author wasn't aware of the other changes happening in other
areas of the costing.

I'm talking about this from a "beyond-just-the-GUCs" point of view, I
realize that the extension author could go look at the GUC settings, but
it's entirely reasonable to believe we'll make changes to the default
GUC settings along with how they're used in the future.

And, the functions my patches newly want are:
- bms_to_string()
- bms_from_string()

Offhand, these look fine, if there's really an external use for them.
Will try to look at them in more detail later.

That's fine, if we can get data to and from those co-processors efficiently
enough that it's worth doing so. If moving the data to the GPU's memory
will take longer than running the actual aggregation, then it doesn't make
any sense for regular tables because then we'd have to cache the data in
the GPU's memory in some way across multiple queries, which isn't something
we're set up to do.

When I made a prototype implementation on top of FDW, using CUDA, it enabled
to run sequential scan 10 times faster than SeqScan on regular tables, if
qualifiers are enough complex.
Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer
mode using hardware DMA. It allows to hide the cost of data transfer by
pipelining, if here is enough number of records to be transferred.

That sounds very interesting and certainly figuring out the costing to
support that model will be tricky. Also, shuffling the data around in
that way will also be interesting. It strikes me that it'll be made
more difficult if we're trying to do it through the limitations of a
pre-defined API between the core code and an extension.

Also, the recent trend of semiconductor device is GPU integration with CPU,
that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or
Tegra K1 of nvidia. All of them shares same memory, so no need to transfer
the data to be calculated. This trend is dominated by physical law because
of energy consumption by semiconductor. So, I'm optimistic for my idea.

And this just makes me wonder why the focus isn't on the background
worker approach instead of trying to do this all in an extension.

The usage was found by the contrib module that wants to call static
functions, or feature to translate existing data structure to/from
cstring. But, anyway, does separated patch make sense?

I haven't had a chance to go back and look into the functions in detail,
but offhand I'd say the bms ones are probably fine while the others
would need more research as to if they make sense to expose to an
extension.

Hmm... It seems to me we should follow the existing manner to construct
join path, rather than special handling. Even if a query contains three or
more foreign tables managed by same server, it shall be consolidated into
one remote join as long as its cost is less than local ones.

I'm not convinced that it's going to be that simple, but I'm certainly
interested in the general idea.

So, I'd like to bed using the new add_join_path_hook to compute possible
join path. If remote join implemented by custom-scan is cheaper than local
join, it shall be chosen, then optimizer will try joining with other foreign
tables with this custom-scan node. If remote-join is still cheap, then it
shall be consolidated again.

And I'm still unconvinced that trying to make this a hook and
implemented by an extension makes sense.

Admittedly, getting the costing right isn't easy either, but it's not clear
to me how it'd make sense for the local server to be doing costing for remote
servers.

Right now, I ignored the cost to run remote-server, focused on the cost to
transfer via network. It might be an idea to discount the CPU cost of remote
execution.

Pretty sure we're going to need to consider the remote processing cost
of the join as well..

Thanks,

Stephen

#48

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#45)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

I (plan to) use custom-scan of course. Once a relation is referenced
and optimizer decided GPU acceleration is cheaper, associated custom-
scan node read the data from underlying relation (or in-memory cache
if exists) then move to the shared memory buffer to deliver GPU
management background worker that launches asynchronous DMA one by one.
After that, custom-scan node receives filtered records via shared-
memory buffer, so it can construct tuples to be returned to the upper
node.

Alright- but have you discussed this with Robert? We're going to be
whacking things around for parallel support with new nodes and more
built-in helper functionality for doing this work and I'm not anxious to
have CustomScan end up being a legacy interface that we're required to
pull forward because we accepted it before things had settled.

I'm worried about your ideas to try and cache things on the GPU though,
if you're not prepared to deal with locks happening in shared memory on
the rows you've got cached out on the GPU, or hint bits, or the visibility
map being updated, etc...

It does not remain any state/information on the GPU side. Things related
to PG internal stuff is job of CPU.

Right, good, I'm glad to hear that this approach is for doing things at
only a individual statement level and it's good to know that it can be
performant at that level now.

Well, we're going to need to expand that a bit for aggregates, I'm afraid,
but we should be able to define the API for those aggregates very tightly
based on what PG does today and require that any FDW purporting to provides
those aggregates do it the way PG does. Note that this doesn't solve all
the problems- we've got other issues with regard to pushing aggregates down
into FDWs that need to be solved.

I see. It probably needs more detailed investigation.

These issues will hopefully not be a problem (or at least, one that can
be worked around) for non-FDW implementations which are part of core and
implemented in a similar way to the existing aggregates.. Where the
scan node could continue to be a simple SeqScan as it is today.

The custom-scan API is thin abstraction towards the plan node interface,
not tightly convinced with a particular use case, like GPU, remote-join
and so on. So, I'm quite optimistic for the future maintainability.

I don't see how you can be when there hasn't been any discussion that
I've seen about how parallel query execution is going to change things
for us.

Also, please remind the discussion at the last developer meeting.
The purpose of custom-scan (we didn't name it at that time) is to avoid
unnecessary project branch for people who want to implement their own
special feature but no facilities to enhance optimizer/executor are
supported.
Even though we have in-core parallel execution feature by CPU, it also
makes sense to provide some unique implementation that may be suitable
for a specific region.

The issue here is that we're going to be expected to maintain an
interface once we provide it and so that isn't something we should be
doing lightly. Particularly when it's as involved as this kind of
change is with what's going on in the backend where we are nearly 100%
sure to be changing things in the next release or two.

Thanks,

Stephen

#49

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Stephen Frost (#47)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.

For my part, trying to consider doing remote joins *without* going through
FDWs is just nonsensical. What are you joining remotely if not two foreign
tables?

It is a case to be joined locally. If query has two foreign tables managed
by same server, this couple shall be found during the optimizer tries
various possible combinations.

With regard to the GPU approach, if that model works whereby the
normal PG tuples are read off disk, fed over to the GPU, processed, then
returned back to the user through PG, then I wouldn't consider it really
a 'remote' join but rather simply a new execution node inside of PG which
is planned and costed just like the others. We've been over the discussion
already about trying to make that a pluggable system but the, very reasonable,
push-back on that has been if it's really possible and really makes sense
to be pluggable. It certainly doesn't *have* to be- PostgreSQL is written
in C, as we all know, and plenty of C code talks to GPUs and shuffles memory
around- and that's almost exactly what Robert is working on supporting with
regular CPUs and PG backends already.

In many ways, trying to conflate this idea of using-GPUs-to-do-work with
the idea of remote-FDW-joins has really disillusioned me with regard to
the CustomScan approach.

Are you suggesting me to focus on the GPU stuff, rather than killing two birds
with a stone? It may be an approach, however, these have common part because
the plan-node for remote-join will pops tuples towards its upper node.
From viewpoint of the upper node, it looks like a black box that returns tuples
that joined two underlying relations. On the other hands, here is another black
box that returns tuples that scans or joins underlying relations with GPU assist.
Both of implementation detail is not visible for the upper node, but its external
interface is common. The custom-scan node can provide a pluggable way for both
of use-case.
Anyway, I'm not motivated to remote-join feature more than GPU-acceleration
stuff. If it is better to drop FDW's remote-join stuff from the custom-scan
scope, I don't claim it.

Then perhaps they should be exposed more directly? I can understand
generally useful functionality being exposed in a way that anyone
can use it, but we need to avoid interfaces which can't be stable
due to normal / ongoing changes to the backend code.

The functions my patches want to expose are:
- get_restriction_qual_cost()
- fix_expr_common()

I'll try and find time to go look at these in more detail later this week.
I have reservations about exposing the current estimates on costs as we
may want to adjust them in the future- but such adjustments may need to
be made in balance with other changes throughout the system and an external
module which depends on one result from the qual costing might end up having
problems with the costing changes because the extension author wasn't aware
of the other changes happening in other areas of the costing.

It is also the point of mine. If cost estimation logic is revised in
the future, it makes a problem if extension cuts and copies the code.

I'm talking about this from a "beyond-just-the-GUCs" point of view, I
realize that the extension author could go look at the GUC settings, but
it's entirely reasonable to believe we'll make changes to the default GUC
settings along with how they're used in the future.

Is the GUC something like Boolean that shows whether the new costing model
is applied or not? If so, extension needs to keep two cost estimation logics
within its code, isn't it?
If the GUC shows something like a weight, I also think it makes sense.

And, the functions my patches newly want are:
- bms_to_string()
- bms_from_string()

Offhand, these look fine, if there's really an external use for them.
Will try to look at them in more detail later.

At least, it makes sense to carry bitmap data structure on the private
field of custom-scan, because all the plan node has to be safe for
copyObject() manner.

That's fine, if we can get data to and from those co-processors
efficiently enough that it's worth doing so. If moving the data to
the GPU's memory will take longer than running the actual
aggregation, then it doesn't make any sense for regular tables
because then we'd have to cache the data in the GPU's memory in some
way across multiple queries, which isn't something we're set up to do.

When I made a prototype implementation on top of FDW, using CUDA, it
enabled to run sequential scan 10 times faster than SeqScan on regular
tables, if qualifiers are enough complex.
Library to communicate GPU (OpenCL/CUDA) has asynchronous data
transfer mode using hardware DMA. It allows to hide the cost of data
transfer by pipelining, if here is enough number of records to be

transferred.

That sounds very interesting and certainly figuring out the costing to
support that model will be tricky. Also, shuffling the data around in that
way will also be interesting. It strikes me that it'll be made more
difficult if we're trying to do it through the limitations of a pre-defined
API between the core code and an extension.

This data shuffling is done within extension side, so it looks like the core
PG just picks up tuples from the box that handles underlying table scan in
some way.

Also, the recent trend of semiconductor device is GPU integration with
CPU, that shares a common memory space. See, Haswell of Intel, Kaveri
of AMD, or Tegra K1 of nvidia. All of them shares same memory, so no
need to transfer the data to be calculated. This trend is dominated by
physical law because of energy consumption by semiconductor. So, I'm

optimistic for my idea.

And this just makes me wonder why the focus isn't on the background worker
approach instead of trying to do this all in an extension.

The GPU portion of above processors have different instruction set from CPU,
so we cannot utilize its parallel execution capability even if we launch
tons of background workers; that run existing CPU instructions.

Hmm... It seems to me we should follow the existing manner to
construct join path, rather than special handling. Even if a query
contains three or more foreign tables managed by same server, it shall
be consolidated into one remote join as long as its cost is less than

local ones.

I'm not convinced that it's going to be that simple, but I'm certainly
interested in the general idea.

That is implemented in my part-3 patch, add_join_path hook adds custom-scan
path that joins two foreign tables, a foreign table and a custom-scan, or
two custom-scans if all of them are managed in same foreign server.
As long as its execution cost is reasonable, it allows to run remote join
that contains three or more relations.

So, I'd like to bed using the new add_join_path_hook to compute
possible join path. If remote join implemented by custom-scan is
cheaper than local join, it shall be chosen, then optimizer will try
joining with other foreign tables with this custom-scan node. If
remote-join is still cheap, then it shall be consolidated again.

And I'm still unconvinced that trying to make this a hook and implemented
by an extension makes sense.

The postgresAddJoinPaths() in my part-3 patch is doing that. Of course,
some portion of its code might have been supported at the code backend.
However, I don't think overall design is unreasonable than special handling.

Admittedly, getting the costing right isn't easy either, but it's
not clear to me how it'd make sense for the local server to be doing
costing for remote servers.

Right now, I ignored the cost to run remote-server, focused on the
cost to transfer via network. It might be an idea to discount the CPU
cost of remote execution.

Pretty sure we're going to need to consider the remote processing cost of
the join as well..

I also think so, even though it is not done yet.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Stephen Frost (#48)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

I (plan to) use custom-scan of course. Once a relation is referenced
and optimizer decided GPU acceleration is cheaper, associated custom-
scan node read the data from underlying relation (or in-memory cache
if exists) then move to the shared memory buffer to deliver GPU
management background worker that launches asynchronous DMA one by one.
After that, custom-scan node receives filtered records via shared-
memory buffer, so it can construct tuples to be returned to the upper
node.

Alright- but have you discussed this with Robert? We're going to be
whacking things around for parallel support with new nodes and more built-in
helper functionality for doing this work and I'm not anxious to have
CustomScan end up being a legacy interface that we're required to pull
forward because we accepted it before things had settled.

I had briefly introduced him my idea using GPU at Ottawa last year,
even though I'm not certain he remembered it.
At least, idea of custom-scan node came from the discussion at that
time.

The custom-scan API is thin abstraction towards the plan node
interface, not tightly convinced with a particular use case, like GPU,
remote-join and so on. So, I'm quite optimistic for the future

maintainability.

I don't see how you can be when there hasn't been any discussion that I've
seen about how parallel query execution is going to change things for us.

If parallel query execution changes whole of the structure of plan nodes,
it will also affect to the interface of custom-scan because it is a thin-
abstraction of plan-node. However, if parallel execution feature is
implemented as one of new plan node in addition to existing one, I cannot
imagine a scenario that affects to the structure of another plan node.

Also, please remind the discussion at the last developer meeting.
The purpose of custom-scan (we didn't name it at that time) is to
avoid unnecessary project branch for people who want to implement
their own special feature but no facilities to enhance
optimizer/executor are supported.
Even though we have in-core parallel execution feature by CPU, it also
makes sense to provide some unique implementation that may be suitable
for a specific region.

The issue here is that we're going to be expected to maintain an interface
once we provide it and so that isn't something we should be doing lightly.
Particularly when it's as involved as this kind of change is with what's
going on in the backend where we are nearly 100% sure to be changing things
in the next release or two.

FDW APIs are also revised several times in the recent releases. If we can
design "perfect" interface from the beginning, it's best but usually hard
to design.
Also, custom-scan interface is almost symmetric with existing plan node
structures, so its stability is relatively high, I think.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#50)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

* Stephen Frost (sfrost@snowman.net) wrote:

I don't see how you can be when there hasn't been any discussion that I've
seen about how parallel query execution is going to change things for us.

If parallel query execution changes whole of the structure of plan nodes,
it will also affect to the interface of custom-scan because it is a thin-
abstraction of plan-node. However, if parallel execution feature is
implemented as one of new plan node in addition to existing one, I cannot
imagine a scenario that affects to the structure of another plan node.

Let's just say that I have doubts that we'll be able to implement
parallel execution *without* changing the plan node interface in some
way which will require, hopefully minor, changes to all of the nodes.
The issue is that even a minor change would break the custom-scan API
and we'd immediately be in the boat of dealing with complaints regarding
backwards compatibility. Perhaps we can hand-wave that, and we've had
some success changing hook APIs between major releases, but such changes
may also be in ways which wouldn't break in obvious ways or even
possibly be changes which have to be introduced into back-branches.
Parallel query is going to be brand-new real soon and it's reasonable to
think we'll need to make bug-fix changes to it after it's out which
might even involve changes to the API which is developed for it.

The issue here is that we're going to be expected to maintain an interface
once we provide it and so that isn't something we should be doing lightly.
Particularly when it's as involved as this kind of change is with what's
going on in the backend where we are nearly 100% sure to be changing things
in the next release or two.

FDW APIs are also revised several times in the recent releases. If we can
design "perfect" interface from the beginning, it's best but usually hard
to design.

Sure, but FDWs also have a *much* broader set of use-cases, in my view,
which is also why I was pushing to work on join-push-down to happen
there instead of having this kind of a hook interface, which I don't
think we'd want to directly expose as part of the 'official FDW API' as
it ends up putting all the work on the FDW with little aide, making it
terribly likely to end up with a bunch of duplciated code in the FDWs
from the backend to deal with it, particularly for individuals writing
FDWs who aren't familiar with what PG already has.

Also, custom-scan interface is almost symmetric with existing plan node
structures, so its stability is relatively high, I think.

Perhaps it will come to pass that parallel query execution doesn't
require any changes to the plan node structure, but that's not the horse
that I'd bet on at this point.

Thanks,

Stephen

#52

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Stephen Frost (#51)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-03-01 0:36 GMT+09:00 Stephen Frost <sfrost@snowman.net>:

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

* Stephen Frost (sfrost@snowman.net) wrote:

I don't see how you can be when there hasn't been any discussion that I've
seen about how parallel query execution is going to change things for us.

If parallel query execution changes whole of the structure of plan nodes,
it will also affect to the interface of custom-scan because it is a thin-
abstraction of plan-node. However, if parallel execution feature is
implemented as one of new plan node in addition to existing one, I cannot
imagine a scenario that affects to the structure of another plan node.

Let's just say that I have doubts that we'll be able to implement
parallel execution *without* changing the plan node interface in some
way which will require, hopefully minor, changes to all of the nodes.
The issue is that even a minor change would break the custom-scan API
and we'd immediately be in the boat of dealing with complaints regarding
backwards compatibility. Perhaps we can hand-wave that, and we've had
some success changing hook APIs between major releases, but such changes
may also be in ways which wouldn't break in obvious ways or even
possibly be changes which have to be introduced into back-branches.
Parallel query is going to be brand-new real soon and it's reasonable to
think we'll need to make bug-fix changes to it after it's out which
might even involve changes to the API which is developed for it.

Even if we will change the plan-node interface in the future release,
it shall not be a change that makes the existing stuff impossible.
The custom-scan API is designed to provide alternative way to scan
or join relations, in addition to the existing logic like SeqScan or
NestLoop. If this change breaks plan-node interfaces and it couldn't
implement existing stuff, it is problematic for all the stuff, not only
custom-scan node. I don't think such a change that makes impossible
to implement existing logic will be merged. Likely, the new parallel
execution feature can work together existing sequential logic and
custom-scan interface.

BTW, this kind of discussion looks like a talk with a ghost because
we cannot see the new interface according to the parallel execution
right now, so we cannot have tangible investigation whether it becomes
really serious backward incompatibility, or not.
My bet is minor one. I cannot imagine plan-node interface that does
not support existing non-parallel SeqScan or NestLoop and so on.

The issue here is that we're going to be expected to maintain an interface
once we provide it and so that isn't something we should be doing lightly.
Particularly when it's as involved as this kind of change is with what's
going on in the backend where we are nearly 100% sure to be changing things
in the next release or two.

FDW APIs are also revised several times in the recent releases. If we can
design "perfect" interface from the beginning, it's best but usually hard
to design.

Sure, but FDWs also have a *much* broader set of use-cases, in my view,
which is also why I was pushing to work on join-push-down to happen
there instead of having this kind of a hook interface, which I don't
think we'd want to directly expose as part of the 'official FDW API' as
it ends up putting all the work on the FDW with little aide, making it
terribly likely to end up with a bunch of duplciated code in the FDWs
from the backend to deal with it, particularly for individuals writing
FDWs who aren't familiar with what PG already has.

It might not be a good idea to use postgres_fdw as a basis of proof-
of-concept to demonstrate that custom-scan can effectively host
an alternative way to join; that fetches the result set of remote-join
as if relation scan, even though it demonstrated it is possible.
So, I never mind the part-3 portion (remote join of postgres_fdw on
top of custom-scan) being dropped from the submission.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kohei KaiGai (#52)

Re: Custom Scan APIs (Re: Custom Plan node)

KaiGai,

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:

BTW, this kind of discussion looks like a talk with a ghost because
we cannot see the new interface according to the parallel execution
right now, so we cannot have tangible investigation whether it becomes
really serious backward incompatibility, or not.

Yeah, it would certainly be nice if we had all of the answers up-front.
What I keep hoping for is that someone who has been working on this area
(eg: Robert) would speak up...

My bet is minor one. I cannot imagine plan-node interface that does
not support existing non-parallel SeqScan or NestLoop and so on.

Sure you can- because once we change the interface, we're probably going
to go through and make everything use the new one rather than have to
special-case things. That's more-or-less exactly my point here because
having an external hook like CustomScan would make that kind of
wholesale change more difficult.

That does *not* mean I'm against using GPUs and GPU optimizations. What
it means is that I'd rather see that done in core, which would allow us
to simply change that interface along with the rest when doing wholesale
changes and not have to worry about backwards compatibility and breaking
other people's code.

Thanks,

Stephen

#54

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Stephen Frost (#53)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-03-01 22:38 GMT+09:00 Stephen Frost <sfrost@snowman.net>:

KaiGai,

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:

BTW, this kind of discussion looks like a talk with a ghost because
we cannot see the new interface according to the parallel execution
right now, so we cannot have tangible investigation whether it becomes
really serious backward incompatibility, or not.

Yeah, it would certainly be nice if we had all of the answers up-front.
What I keep hoping for is that someone who has been working on this area
(eg: Robert) would speak up...

I'd also like to see his opinion.

My bet is minor one. I cannot imagine plan-node interface that does
not support existing non-parallel SeqScan or NestLoop and so on.

Sure you can- because once we change the interface, we're probably going
to go through and make everything use the new one rather than have to
special-case things. That's more-or-less exactly my point here because
having an external hook like CustomScan would make that kind of
wholesale change more difficult.

I think, we should follow the general rule in case of custom-scan also.
In other words, it's responsibility of extension's author to follow up the
latest specification of interfaces.
For example, I have an extension module that is unable to work on the
latest PG- code because of interface changes at ProcessUtility_hook.
Is it a matter of backward incompatibility? Definitely, no. It should be
my job.

That does *not* mean I'm against using GPUs and GPU optimizations. What
it means is that I'd rather see that done in core, which would allow us
to simply change that interface along with the rest when doing wholesale
changes and not have to worry about backwards compatibility and breaking
other people's code.

I also have to introduce a previous background discussion.
Now we have two options for GPU programming: CUDA or OpenCL.
Both of libraries and drivers are provided under the proprietary license,
so it does not fit for the core implementation of PostgreSQL, but
extensions that shall be installed on user's responsibility.
Because of the story, I brought up a discussion about pluggable
planner/executor node (as a basis of GPU acceleration) in the last
developer meeting, then has worked for custom-scan node interface.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kohei KaiGai (#54)

Re: Custom Scan APIs (Re: Custom Plan node)

KaiGai,

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:

Now we have two options for GPU programming: CUDA or OpenCL.
Both of libraries and drivers are provided under the proprietary license,
so it does not fit for the core implementation of PostgreSQL, but
extensions that shall be installed on user's responsibility.

Being able to work with libraries which are not BSD licensed doesn't
change the license under which PostgreSQL code is released. Nor does it
require PostgreSQL to be licensed in any different way from how it is
today. Where it would get a bit ugly, I agree, is for the packagers who
have to decide if they want to build against those libraries or not. We
might be able to make things a bit easier by having a startup-time
determination of if these nodes are able to be used or not. This isn't
unlike OpenSSL which certainly isn't BSD nor is it even GPL-compatible,
a situation which causes a great deal of difficulty already due to the
whole readline nonsense- but it's difficulty for the packagers, not for
the PostgreSQL project, per se.

Because of the story, I brought up a discussion about pluggable
planner/executor node (as a basis of GPU acceleration) in the last
developer meeting, then has worked for custom-scan node interface.

And I'm still unconvinced of this approach and worry that it's going to
break more often than it works. That's my 2c on it, but I won't get in
the way if someone else wants to step up and support it. As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.

Thanks,

Stephen

#56

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Stephen Frost (#55)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-03-02 9:51 GMT+09:00 Stephen Frost <sfrost@snowman.net>:

KaiGai,

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:

Now we have two options for GPU programming: CUDA or OpenCL.
Both of libraries and drivers are provided under the proprietary license,
so it does not fit for the core implementation of PostgreSQL, but
extensions that shall be installed on user's responsibility.

Being able to work with libraries which are not BSD licensed doesn't
change the license under which PostgreSQL code is released. Nor does it
require PostgreSQL to be licensed in any different way from how it is
today. Where it would get a bit ugly, I agree, is for the packagers who
have to decide if they want to build against those libraries or not. We
might be able to make things a bit easier by having a startup-time
determination of if these nodes are able to be used or not. This isn't
unlike OpenSSL which certainly isn't BSD nor is it even GPL-compatible,
a situation which causes a great deal of difficulty already due to the
whole readline nonsense- but it's difficulty for the packagers, not for
the PostgreSQL project, per se.

As you mentioned, it is a headache for packagers, and does not make
sense for us if packager disabled the feature that requires proprietary
drivers. In fact, Fedora / RHEL does not admit to distribute software
under the none open source software license. Obviously, nvidia's cuda
is a library being distributed under the proprietary license, thus out of
the scope for the Linux distributors. It also leads them to turn off the
feature that shall be linked with proprietary drivers.
All we can do is to implement these features as extension, then offer
an option for users to use or not to use.

Because of the story, I brought up a discussion about pluggable
planner/executor node (as a basis of GPU acceleration) in the last
developer meeting, then has worked for custom-scan node interface.

And I'm still unconvinced of this approach and worry that it's going to
break more often than it works. That's my 2c on it, but I won't get in
the way if someone else wants to step up and support it. As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.

It's right approach for FDW functionality enhancement, I never opposed to.

What I'd like to implement is GPU acceleration that can perform on
regular tables, not only foreign tables. Also, regarding to the backlog
in the developer meeting, pluggable executor node is also required
feature by PG-XC folks to work their project with upstream.
I think custom-scan feature is capable to host these requirement,
and does not prevent enhancement FDW features.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kohei KaiGai (#56)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:

As you mentioned, it is a headache for packagers, and does not make
sense for us if packager disabled the feature that requires proprietary
drivers.

No, I disagree with that. I don't expect this use-case to be very
common to begin with and telling individuals that they have to compile
it themselves is certainly not out of the question.

In fact, Fedora / RHEL does not admit to distribute software
under the none open source software license.

I'm pretty confident you can get RPMs for those distributions.

Obviously, nvidia's cuda
is a library being distributed under the proprietary license, thus out of
the scope for the Linux distributors.

This also doesn't make any sense to me- certainly the CUDA libraries are
available under Debian derivatives, along with open-source wrappers for
them like pycuda.

It also leads them to turn off the
feature that shall be linked with proprietary drivers.
All we can do is to implement these features as extension, then offer
an option for users to use or not to use.

No, we can tell individuals who want it that they're going to need to
build with support for it. We don't offer OpenSSL as an extension (I
certainly wish we did- and had a way to replace it w/ GNUTLS or one of
the better licensed options).

What I'd like to implement is GPU acceleration that can perform on
regular tables, not only foreign tables. Also, regarding to the backlog
in the developer meeting, pluggable executor node is also required
feature by PG-XC folks to work their project with upstream.
I think custom-scan feature is capable to host these requirement,
and does not prevent enhancement FDW features.

I think you're conflating things again- while it might be possible to
use CustomScan to implement FDW join-pushdown or FDW aggregate-pushdown,
*I* don't think it's the right approach. Regarding the PG-XC
requirement, I expect they're looking for FDW join/aggregate-pushdown
and also see that it *could* be done w/ CustomScan.

We could punt on the whole thing and drop in hooks which could be used
to replace everything done from the planner through to the executor and
then anyone *could* implement any of the above, and parallel query too.
That doesn't make it the right approach.

Thanks,

Stephen

#58

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Stephen Frost (#42)

Re: Custom Scan APIs (Re: Custom Plan node)

On Wed, Feb 26, 2014 at 3:02 AM, Stephen Frost <sfrost@snowman.net> wrote:

The custom-scan node is intended to perform on regular relations, not
only foreign tables. It means a special feature (like GPU acceleration)
can perform transparently for most of existing applications. Usually,
it defines regular tables for their work on installation, not foreign
tables. It is the biggest concern for me.

The line between a foreign table and a local one is becoming blurred
already, but still, if this is the goal then I really think the
background worker is where you should be focused, not on this Custom
Scan API. Consider that, once we've got proper background workers,
we're going to need new nodes which operate in parallel (or some other
rejiggering of the nodes- I don't pretend to know exactly what Robert is
thinking here, and I've apparently forgotten it if he's posted it
somewhere) and those interfaces may drive changes which would impact the
Custom Scan API- or worse, make us deprecate or regret having added it
because now we'll need to break backwards compatibility to add in the
parallel node capability to satisfy the more general non-GPU case.

This critique seems pretty odd to me. I haven't had the time to look
at this patch set, but I don't see why anyone would want to use the
background worker facility for GPU acceleration, which is what
KaiGai's trying to accomplish here. Surely you want, if possible, to
copy the data directly from the user backend into the GPU's working
memory. What would the use of a background worker be? We definitely
need background workers to make use of additional *CPUs*, but I don't
see what good they are in leveraging *GPUs*.

I seriously doubt there's any real conflict with parallelism here.
Parallelism may indeed add more ways to scan a relation
(ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
shouldn't have a Custom Scan node too. Indeed, my principle concern
about this patch set isn't that it's too specialized, as you seem to
be worrying about, but that it's aiming to satisfy *too many* use
cases. I think FDW join pushdown is a fairly different problem from
adding a custom scan type, and I doubt one patch should try to solve
both problems.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Stephen Frost (#47)

Re: Custom Scan APIs (Re: Custom Plan node)

On Wed, Feb 26, 2014 at 10:23 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.

For my part, trying to consider doing remote joins *without* going
through FDWs is just nonsensical.

That is, of course, true by definition, but I think it's putting the
focus in the wrong place. It's possible that there are other cases
when a scan might a plausible path for a joinrel even if there are no
foreign tables in play. For example, you could cache the joinrel
output and then inject a cache scan as a path for the joinrel.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Stephen Frost (#51)

Re: Custom Scan APIs (Re: Custom Plan node)

On Fri, Feb 28, 2014 at 10:36 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

* Stephen Frost (sfrost@snowman.net) wrote:

I don't see how you can be when there hasn't been any discussion that I've
seen about how parallel query execution is going to change things for us.

If parallel query execution changes whole of the structure of plan nodes,
it will also affect to the interface of custom-scan because it is a thin-
abstraction of plan-node. However, if parallel execution feature is
implemented as one of new plan node in addition to existing one, I cannot
imagine a scenario that affects to the structure of another plan node.

Let's just say that I have doubts that we'll be able to implement
parallel execution *without* changing the plan node interface in some
way which will require, hopefully minor, changes to all of the nodes.
The issue is that even a minor change would break the custom-scan API
and we'd immediately be in the boat of dealing with complaints regarding
backwards compatibility. Perhaps we can hand-wave that, and we've had
some success changing hook APIs between major releases, but such changes
may also be in ways which wouldn't break in obvious ways or even
possibly be changes which have to be introduced into back-branches.
Parallel query is going to be brand-new real soon and it's reasonable to
think we'll need to make bug-fix changes to it after it's out which
might even involve changes to the API which is developed for it.

For what it's worth, and I can't claim to have all the answers here,
this doesn't match my expectation. I think we'll do two kinds of
parallelism. One will be parallelism within nodes, like parallel sort
or parallel seqscan. Any node we parallelize this way is likely to be
heavily rewritten, or else to get a sister that looks quite different
from the original. The other kind of parallelism will involve pushing
a whole subtree of the plan into a different node. In this case we'll
need to pass data between nodes in some different way (this was one of
the major reasons I designed the shm_mq stuff) but the nodes
themselves should change little if at all.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Robert Haas (#58)

Re: Custom Scan APIs (Re: Custom Plan node)

* Robert Haas (robertmhaas@gmail.com) wrote:

On Wed, Feb 26, 2014 at 3:02 AM, Stephen Frost <sfrost@snowman.net> wrote:

The line between a foreign table and a local one is becoming blurred
already, but still, if this is the goal then I really think the
background worker is where you should be focused, not on this Custom
Scan API. Consider that, once we've got proper background workers,
we're going to need new nodes which operate in parallel (or some other
rejiggering of the nodes- I don't pretend to know exactly what Robert is
thinking here, and I've apparently forgotten it if he's posted it
somewhere) and those interfaces may drive changes which would impact the
Custom Scan API- or worse, make us deprecate or regret having added it
because now we'll need to break backwards compatibility to add in the
parallel node capability to satisfy the more general non-GPU case.

This critique seems pretty odd to me. I haven't had the time to look
at this patch set, but I don't see why anyone would want to use the
background worker facility for GPU acceleration, which is what
KaiGai's trying to accomplish here.

Eh, that didn't come out quite right. I had intended it to be more
along the lines of "look at what Robert's doing".

I was trying to point out that parallel query execution is coming soon
thanks to the work on background worker and that parallel query
execution might drive changes to the way nodes interact in the executor
driving a need to change the API. In other words, CustomScan could
easily end up being broken by that change and I'd rather we not have to
worry about such breakage.

I seriously doubt there's any real conflict with parallelism here.
Parallelism may indeed add more ways to scan a relation
(ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
shouldn't have a Custom Scan node too.

What about parallel execution through the tree itself, rather than just
at specific end nodes like SeqScan and IndexScan? Or parallel
aggregates? I agree that simple parallel SeqScan/IndexScan isn't going
to change any of the interfaces, but surely we're going for more than
that. Indeed, I'm wishing that I had found more time to spend on just
a simple select-based Append node which could parallelize I/O across
tablespaces and FDWs underneath the Append.

Indeed, my principle concern
about this patch set isn't that it's too specialized, as you seem to
be worrying about, but that it's aiming to satisfy *too many* use
cases. I think FDW join pushdown is a fairly different problem from
adding a custom scan type, and I doubt one patch should try to solve
both problems.

Yeah, I've voiced those same concerns later in the thread also,
specifically that this punts on nearly everything and expects the
implementor to figure it all out. We should be able to do better wrt
FDW join-pushdown, etc.

Thanks,

Stephen

#62

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Stephen Frost (#61)

Re: Custom Scan APIs (Re: Custom Plan node)

On Sat, Mar 1, 2014 at 8:49 PM, Stephen Frost <sfrost@snowman.net> wrote:

This critique seems pretty odd to me. I haven't had the time to look
at this patch set, but I don't see why anyone would want to use the
background worker facility for GPU acceleration, which is what
KaiGai's trying to accomplish here.

Eh, that didn't come out quite right. I had intended it to be more
along the lines of "look at what Robert's doing".

I was trying to point out that parallel query execution is coming soon
thanks to the work on background worker and that parallel query
execution might drive changes to the way nodes interact in the executor
driving a need to change the API. In other words, CustomScan could
easily end up being broken by that change and I'd rather we not have to
worry about such breakage.

I think the relation is pretty tangential. I'm worried about the
possibility that the Custom Scan API is broken *ab initio*, but I'm
not worried about a conflict with parallel query.

I seriously doubt there's any real conflict with parallelism here.
Parallelism may indeed add more ways to scan a relation
(ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
shouldn't have a Custom Scan node too.

What about parallel execution through the tree itself, rather than just
at specific end nodes like SeqScan and IndexScan? Or parallel
aggregates? I agree that simple parallel SeqScan/IndexScan isn't going
to change any of the interfaces, but surely we're going for more than
that. Indeed, I'm wishing that I had found more time to spend on just
a simple select-based Append node which could parallelize I/O across
tablespaces and FDWs underneath the Append.

Well, as I said in another recent reply that you probably got after
sending this, when you split between nodes, that mostly just has to do
with how you funnel the tuples from one node to another. The nodes
themselves probably don't even need to know. Or at least that's what
I'd hope.

I don't see that parallelizing Append is any easier than any other
problem in this space. There's no parallel I/O facility, so you need
a background worker per append branch to wait on I/O. And you have
all the problems of making sure that the workers have the same
snapshot, making sure they don't self-deadlock, etc. that you have for
any other case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Stephen Frost (#57)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-03-02 10:29 GMT+09:00 Stephen Frost <sfrost@snowman.net>:

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:

As you mentioned, it is a headache for packagers, and does not make
sense for us if packager disabled the feature that requires proprietary
drivers.

No, I disagree with that. I don't expect this use-case to be very
common to begin with and telling individuals that they have to compile
it themselves is certainly not out of the question.

In fact, Fedora / RHEL does not admit to distribute software
under the none open source software license.

I'm pretty confident you can get RPMs for those distributions.

Obviously, nvidia's cuda
is a library being distributed under the proprietary license, thus out of
the scope for the Linux distributors.

This also doesn't make any sense to me- certainly the CUDA libraries are
available under Debian derivatives, along with open-source wrappers for
them like pycuda.

It also leads them to turn off the
feature that shall be linked with proprietary drivers.
All we can do is to implement these features as extension, then offer
an option for users to use or not to use.

No, we can tell individuals who want it that they're going to need to
build with support for it. We don't offer OpenSSL as an extension (I
certainly wish we did- and had a way to replace it w/ GNUTLS or one of
the better licensed options).

I know there is some alternative ways. However, it requires users to take
additional knowledge and setting up efforts, also loses the benefit to use
distributor's Linux if alternative RPMs are required.
I don't want to recommend users such a complicated setting up procedure.

What I'd like to implement is GPU acceleration that can perform on
regular tables, not only foreign tables. Also, regarding to the backlog
in the developer meeting, pluggable executor node is also required
feature by PG-XC folks to work their project with upstream.
I think custom-scan feature is capable to host these requirement,
and does not prevent enhancement FDW features.

I think you're conflating things again- while it might be possible to
use CustomScan to implement FDW join-pushdown or FDW aggregate-pushdown,
*I* don't think it's the right approach. Regarding the PG-XC
requirement, I expect they're looking for FDW join/aggregate-pushdown
and also see that it *could* be done w/ CustomScan.

The reason why I submitted the part-3 patch (that enhances postgres_fdw
for remote-join using custom-scan) is easy to demonstrate the usage of
join-replacement by a special scan, with minimum scale of the patch to be
reviewed. If we have another idea to demonstrate it, I don't stick on the remot-
join feature on foreign tables.
Regarding to the PG-XC, I didn't know their exact needs because I didn't
attend the cluster meeting, but the someone mentioned about pluggable
plan/exec node in this context.

We could punt on the whole thing and drop in hooks which could be used
to replace everything done from the planner through to the executor and
then anyone *could* implement any of the above, and parallel query too.
That doesn't make it the right approach.

That is a problem I pointed out in the last developer meeting. Because we
have no way to enhance a part of plan / exec logic by extension, extension
has to replace whole of the planner / executor using hooks. It is painful for
authors of extensions.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Robert Haas (#62)

Re: Custom Scan APIs (Re: Custom Plan node)

* Robert Haas (robertmhaas@gmail.com) wrote:

I don't see that parallelizing Append is any easier than any other
problem in this space. There's no parallel I/O facility, so you need
a background worker per append branch to wait on I/O. And you have
all the problems of making sure that the workers have the same
snapshot, making sure they don't self-deadlock, etc. that you have for
any other case.

Erm, my thought was to use a select() loop which sends out I/O requests
and then loops around waiting to see who finishes it. It doesn't
parallelize the CPU cost of getting the rows back to the caller, but
it'd at least parallelize the I/O, and if what's underneath is actually
a remote FDW running a complex query (because the other side is actually
a view), it would be a massive win to have all the remote FDWs executing
concurrently instead of serially as we have today.

Thanks,

Stephen

#65

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Robert Haas (#60)

Re: Custom Scan APIs (Re: Custom Plan node)

* Robert Haas (robertmhaas@gmail.com) wrote:

For what it's worth, and I can't claim to have all the answers here,
this doesn't match my expectation. I think we'll do two kinds of
parallelism. One will be parallelism within nodes, like parallel sort
or parallel seqscan. Any node we parallelize this way is likely to be
heavily rewritten, or else to get a sister that looks quite different
from the original.

Sure.

The other kind of parallelism will involve pushing
a whole subtree of the plan into a different node. In this case we'll
need to pass data between nodes in some different way (this was one of
the major reasons I designed the shm_mq stuff) but the nodes
themselves should change little if at all.

It's that "some different way" of passing data between the nodes that
makes me worry, but I hope you're right and we won't actually need to
change the interfaces or the nodes very much.

Thanks,

Stephen

#66

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Robert Haas (#59)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-03-02 10:38 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

On Wed, Feb 26, 2014 at 10:23 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.

For my part, trying to consider doing remote joins *without* going
through FDWs is just nonsensical.

That is, of course, true by definition, but I think it's putting the
focus in the wrong place. It's possible that there are other cases
when a scan might a plausible path for a joinrel even if there are no
foreign tables in play. For example, you could cache the joinrel
output and then inject a cache scan as a path for the joinrel.

That might be an idea to demonstrate usage of custom-scan node,
rather than the (ad-hoc) enhancement of postgres_fdw.
As I have discussed in another thread, it is available to switch heap
reference by cache reference on the fly, it shall be a possible use-
case for custom-scan node.

So, I'm inclined to drop the portion for postgres_fdw in my submission
to focus on custom-scan capability.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Stephen Frost (#64)

Re: Custom Scan APIs (Re: Custom Plan node)

On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

I don't see that parallelizing Append is any easier than any other
problem in this space. There's no parallel I/O facility, so you need
a background worker per append branch to wait on I/O. And you have
all the problems of making sure that the workers have the same
snapshot, making sure they don't self-deadlock, etc. that you have for
any other case.

Erm, my thought was to use a select() loop which sends out I/O requests
and then loops around waiting to see who finishes it. It doesn't
parallelize the CPU cost of getting the rows back to the caller, but
it'd at least parallelize the I/O, and if what's underneath is actually
a remote FDW running a complex query (because the other side is actually
a view), it would be a massive win to have all the remote FDWs executing
concurrently instead of serially as we have today.

I can't really make sense of this. In general, what's under each
branch of an append node is an arbitrary plan tree, and the only
operation you can count on being able to do for each is "get me the
next tuple" (i.e. ExecProcNode). Append has no idea whether that
involves disk I/O or for what blocks. But even if it did, there's no
standard API for issuing an asynchronous read(), which is how we get
blocks into shared buffers. We do have an API for requesting the
prefetch of a block on platforms with posix_fadvise(), but can't find
out whether it's completed using select(), and even if you could you
still have to do the actual read() afterwards.

For FDWs, one idea might be to kick off the remote query at
ExecInitNode() time rather than ExecProcNode() time, at least if the
remote query doesn't depend on parameters that aren't available until
run time. That actually would allow multiple remote queries to run
simultaneously or in parallel with local work. It would also run them
in cases where the relevant plan node is never executed, which would
be bad but perhaps rare enough not to worry about. Or we could add a
new API like ExecPrefetchNode() that tells nodes to prepare to have
tuples pulled, and they can do things like kick off asynchronous
queries. But I still don't see any clean way for the Append node to
find out which one is ready to return results first.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Robert Haas (#67)

Re: Custom Scan APIs (Re: Custom Plan node)

* Robert Haas (robertmhaas@gmail.com) wrote:

On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote:

Erm, my thought was to use a select() loop which sends out I/O requests
and then loops around waiting to see who finishes it. It doesn't
parallelize the CPU cost of getting the rows back to the caller, but
it'd at least parallelize the I/O, and if what's underneath is actually
a remote FDW running a complex query (because the other side is actually
a view), it would be a massive win to have all the remote FDWs executing
concurrently instead of serially as we have today.

I can't really make sense of this.

Sorry, that was a bit hand-wavey since I had posted about it previously
here:
/messages/by-id/20131104032604.GB2706@tamriel.snowman.net

It'd clearly be more involved than "just build a select() loop" and
would require adding an async mechanism. I had been thinking about this
primairly with the idea of FDWs and you're right that it'd require more
thought to deal with getting data into/through shared_buffers. Still,
we seqscan into a ring buffer, I'd think we could make it work but it
would require additional work.

For FDWs, one idea might be to kick off the remote query at
ExecInitNode() time rather than ExecProcNode() time, at least if the
remote query doesn't depend on parameters that aren't available until
run time.

Right, I had speculated about that also (option #2 in my earlier email).

That actually would allow multiple remote queries to run
simultaneously or in parallel with local work. It would also run them
in cases where the relevant plan node is never executed, which would
be bad but perhaps rare enough not to worry about.

This was my primary concern, along with the fact that we explicitly says
"don't do that" in the docs for the FDW API.

Or we could add a
new API like ExecPrefetchNode() that tells nodes to prepare to have
tuples pulled, and they can do things like kick off asynchronous
queries. But I still don't see any clean way for the Append node to
find out which one is ready to return results first.

Yeah, that's tricky.

Thanks,

Stephen

#69

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Stephen Frost (#68)

Re: Custom Scan APIs (Re: Custom Plan node)

On Mon, Mar 3, 2014 at 10:43 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote:

Erm, my thought was to use a select() loop which sends out I/O requests
and then loops around waiting to see who finishes it. It doesn't
parallelize the CPU cost of getting the rows back to the caller, but
it'd at least parallelize the I/O, and if what's underneath is actually
a remote FDW running a complex query (because the other side is actually
a view), it would be a massive win to have all the remote FDWs executing
concurrently instead of serially as we have today.

I can't really make sense of this.

Sorry, that was a bit hand-wavey since I had posted about it previously
here:
/messages/by-id/20131104032604.GB2706@tamriel.snowman.net

Huh, somehow I can't remember reading that... but I didn't think I had
missed any posts, either. Evidently I did.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Robert Haas (#69)

Re: Custom Scan APIs (Re: Custom Plan node)

* Robert Haas (robertmhaas@gmail.com) wrote:

/messages/by-id/20131104032604.GB2706@tamriel.snowman.net

Huh, somehow I can't remember reading that... but I didn't think I had
missed any posts, either. Evidently I did.

You and everyone else- you'll note it got exactly zero responses..

Ah well. :)

Thanks,

Stephen

#71

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Stephen Frost (#55)

Re: Custom Scan APIs (Re: Custom Plan node)

KaiGai,

* Stephen Frost (sfrost@snowman.net) wrote:

And I'm still unconvinced of this approach and worry that it's going to
break more often than it works. That's my 2c on it, but I won't get in
the way if someone else wants to step up and support it.

Alright, having heard from Robert (thanks!) regarding his thoughts
(which are pretty similar to my own, though he doesn't anticipate issues
with API changes), I'm going to step back a bit form the above position.

As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.

In accordance with the above, what I'd like to see with this patch is
removal of the postgres_fdw changes and any changes which were for that
support. In addition, I'd like to understand why 'ctidscan' makes any
sense to have as an example of what to use this for- if that's valuable,
why wouldn't we simply implement that in core? I do want an example in
contrib of how to properly use this capability, but I don't think that's
it.

For one thing, an example where you could have this CustomScan node
calling other nodes underneath would be interesting. I realize the CTID
scan can't do that directly but I would think your GPU-based system
could; after all, if you're running a join or an aggregate with the GPU,
the rows could come from nearly anything. Have you considered that, or
is the expectation that users will just go off and access the heap
and/or whatever indexes directly, like ctidscan does? How would such a
requirement be handled?

Thanks,

Stephen

#72

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Stephen Frost (#71)

Re: Custom Scan APIs (Re: Custom Plan node)

As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.

In accordance with the above, what I'd like to see with this patch is removal
of the postgres_fdw changes and any changes which were for that support.

I don't argue this approach. It might be useful to *demonstrate* how custom-
scan node works as replacement of join, however,

In addition, I'd like to understand why 'ctidscan' makes any sense to have
as an example of what to use this for- if that's valuable, why wouldn't
we simply implement that in core? I do want an example in contrib of how
to properly use this capability, but I don't think that's it.

Do you think it makes sense if my submission was only interface portion
without working example? The purpose of ctidscan module is, similar to
postgres_fdw, to demonstrate the usage of custom-scan interface with
enough small code scale. If tons of code example were attached, nobody
will want to review the patch.
The "cache_scan" module that I and Haribabu are discussing in another
thread also might be a good demonstration for custom-scan interface,
however, its code scale is a bit larger than ctidscan.

For one thing, an example where you could have this CustomScan node calling
other nodes underneath would be interesting. I realize the CTID scan can't
do that directly but I would think your GPU-based system could; after all,
if you're running a join or an aggregate with the GPU, the rows could come
from nearly anything. Have you considered that, or is the expectation that
users will just go off and access the heap and/or whatever indexes directly,
like ctidscan does? How would such a requirement be handled?

In case when custom-scan node has underlying nodes, it shall be invoked using
ExecProcNode as built-in node doing, then it will be able to fetch tuples
come from underlying nodes. Of course, custom-scan provider can perform the
tuples come from somewhere as if it came from underlying relation. It is
responsibility of extension module. In some cases, it shall be required to
return junk system attribute, like ctid, for row-level locks or table updating.
It is also responsibility of the extension module (or, should not add custom-
path if this custom-scan provider cannot perform as required).

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 12 years ago

In reply to: Stephen Frost (#68)

Re: Custom Scan APIs (Re: Custom Plan node)

On Mon, Mar 3, 2014 at 9:13 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net>

wrote:

Erm, my thought was to use a select() loop which sends out I/O requests
and then loops around waiting to see who finishes it. It doesn't
parallelize the CPU cost of getting the rows back to the caller, but
it'd at least parallelize the I/O, and if what's underneath is actually
a remote FDW running a complex query (because the other side is

actually

a view), it would be a massive win to have all the remote FDWs

executing

concurrently instead of serially as we have today.

I can't really make sense of this.

Sorry, that was a bit hand-wavey since I had posted about it previously
here:

/messages/by-id/20131104032604.GB2706@tamriel.snowman.net

It'd clearly be more involved than "just build a select() loop" and
would require adding an async mechanism. I had been thinking about this
primairly with the idea of FDWs and you're right that it'd require more
thought to deal with getting data into/through shared_buffers. Still,
we seqscan into a ring buffer, I'd think we could make it work but it
would require additional work.

For FDWs, one idea might be to kick off the remote query at
ExecInitNode() time rather than ExecProcNode() time, at least if the
remote query doesn't depend on parameters that aren't available until
run time.

Right, I had speculated about that also (option #2 in my earlier email).

During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
to foreign servers, those would be fired while EXPLAINing a query as well.
We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
server. But again, not all foreign servers would be able to EXPLAIN the
query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
if it's for EXPLAIN (except for ANALYSE may be).

That actually would allow multiple remote queries to run
simultaneously or in parallel with local work. It would also run them
in cases where the relevant plan node is never executed, which would
be bad but perhaps rare enough not to worry about.

This was my primary concern, along with the fact that we explicitly says
"don't do that" in the docs for the FDW API.

Or we could add a
new API like ExecPrefetchNode() that tells nodes to prepare to have
tuples pulled, and they can do things like kick off asynchronous
queries. But I still don't see any clean way for the Append node to
find out which one is ready to return results first.

Yeah, that's tricky.

Thanks,

Stephen

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#74

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Kouhei Kaigai (#40)

2 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

According to the suggestion, I dropped the enhancement of postgres_fdw on
top of custom-scan interface from my submission, and also move the supplemental
functions into the part-1 portion (implementation of custom-scan interface).
Even though the ctidscan module is under discussion, I still include this module
because of its usefulness as demonstration / example of custom-scan node.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Kaigai Kouhei(海外浩平)
Sent: Tuesday, March 04, 2014 8:26 AM
To: 'Stephen Frost'; Kohei KaiGai
Cc: Shigeru Hanada; Jim Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter
Eisentraut
Subject: RE: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW
execution and I don't see this CustomScan approach as the right
answer to any of those.

In accordance with the above, what I'd like to see with this patch is
removal of the postgres_fdw changes and any changes which were for that

support.

I don't argue this approach. It might be useful to *demonstrate* how custom-
scan node works as replacement of join, however,

In addition, I'd like to understand why 'ctidscan' makes any sense to
have as an example of what to use this for- if that's valuable, why
wouldn't we simply implement that in core? I do want an example in
contrib of how to properly use this capability, but I don't think that's

it.

Do you think it makes sense if my submission was only interface portion
without working example? The purpose of ctidscan module is, similar to
postgres_fdw, to demonstrate the usage of custom-scan interface with enough
small code scale. If tons of code example were attached, nobody will want
to review the patch.
The "cache_scan" module that I and Haribabu are discussing in another thread
also might be a good demonstration for custom-scan interface, however, its
code scale is a bit larger than ctidscan.

For one thing, an example where you could have this CustomScan node
calling other nodes underneath would be interesting. I realize the
CTID scan can't do that directly but I would think your GPU-based
system could; after all, if you're running a join or an aggregate with
the GPU, the rows could come from nearly anything. Have you
considered that, or is the expectation that users will just go off and
access the heap and/or whatever indexes directly, like ctidscan does?

How would such a requirement be handled?

In case when custom-scan node has underlying nodes, it shall be invoked
using ExecProcNode as built-in node doing, then it will be able to fetch
tuples come from underlying nodes. Of course, custom-scan provider can
perform the tuples come from somewhere as if it came from underlying relation.
It is responsibility of extension module. In some cases, it shall be required
to return junk system attribute, like ctid, for row-level locks or table
updating.
It is also responsibility of the extension module (or, should not add custom-
path if this custom-scan provider cannot perform as required).

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.4-custom-scan.part-2.v9.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v9.patchDownload

 contrib/Makefile                           |   1 +
 contrib/ctidscan/Makefile                  |  14 +
 contrib/ctidscan/ctidscan.c                | 760 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/ctidscan.sgml                 | 108 ++++
 doc/src/sgml/custom-scan.sgml              |   8 +-
 doc/src/sgml/filelist.sgml                 |   1 +
 src/include/catalog/pg_operator.h          |   4 +
 src/test/regress/GNUmakefile               |  15 +-
 src/test/regress/input/custom_scan.source  |  49 ++
 src/test/regress/output/custom_scan.source | 290 +++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 13 files changed, 1246 insertions(+), 8 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index c90fe29..3c1987d 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -12,6 +12,7 @@ SUBDIRS = \
 		btree_gist	\
 		chkpass		\
 		citext		\
+		ctidscan	\
 		cube		\
 		dblink		\
 		dict_int	\
diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..708c5b7
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,14 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..72bbf17
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,760 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the cheapest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+extern void		_PG_init(void);
+
+PG_MODULE_MAGIC;
+
+static add_scan_path_hook_type	add_scan_path_next;
+
+#define IsCTIDVar(node,rtindex)											\
+	((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chained with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CustomPath *cpath)
+{
+	List	   *ctidquals = cpath->custom_private;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (cpath->path.param_info)
+		cpath->path.rows = cpath->path.param_info->ppi_rows;
+	else
+		cpath->path.rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctidquals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&ctid_qual_cost, ctidquals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/* Add scanning CPU costs */
+	get_restriction_qual_cost(root, baserel,
+							  cpath->path.param_info,
+							  &qpqual_cost);
+
+	/*
+	 * We don't decrease cost for the inequality operators, because
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	cpath->path.startup_cost = startup_cost;
+	cpath->path.total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CTidAddScanPath
+ *
+ * It adds a custom scan path if inequality operators are given on the
+ * relation to be scanned and makes sense to reduce number of tuples.
+ */
+static void
+CTidAddScanPath(PlannerInfo *root,
+				RelOptInfo *baserel,
+				RangeTblEntry *rte)
+{
+	char		relkind;
+	List	   *rlst = NIL;
+	ListCell   *lc;
+
+	/* Gives another extensions chance to add a path */
+	if (add_scan_path_next)
+		(*add_scan_path_next)(root, baserel, rte);
+
+	/* All we support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		return;
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		return;
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		rlst = list_concat(rlst, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (rlst != NIL)
+	{
+		CustomPath *cpath = makeNode(CustomPath);
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		cpath->path.pathtype = T_CustomScan;
+		cpath->path.parent = baserel;
+		cpath->path.param_info = get_baserel_parampathinfo(root, baserel,
+														   required_outer);
+		cpath->custom_name = pstrdup("ctidscan");
+		cpath->custom_flags = CUSTOM__SUPPORT_BACKWARD_SCAN;
+		cpath->custom_private = rlst;
+
+		CTidEstimateCosts(root, baserel, cpath);
+
+		add_path(baserel, &cpath->path);
+	}
+}
+
+/*
+ * CTidInitCustomScanPlan
+ *
+ * It initializes the given CustomScan plan object according to the CustomPath
+ * being chosen by the optimizer.
+ */
+static void
+CTidInitCustomScanPlan(PlannerInfo *root,
+					   CustomScan *cscan_plan,
+					   CustomPath *cscan_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	List	   *ctidquals = cscan_path->custom_private;
+
+	/* should be a base relation */
+	Assert(cscan_path->path.parent->relid > 0);
+	Assert(cscan_path->path.parent->rtekind == RTE_RELATION);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+
+	/*
+	 * Most of initialization stuff was done at nodeCustomScan.c. So, all
+	 * we need to do is to put clauses that were little bit adjusted and
+	 * private stuff; list of restriction clauses in this case.
+	 */
+	cscan_plan->scan.plan.targetlist = tlist;
+	cscan_plan->scan.plan.qual = scan_clauses;
+	cscan_plan->custom_private = ctidquals;
+}
+
+/*
+ * CTidScanState
+ *
+ * State of custom-tid scan during its execution.
+ */
+typedef struct {
+	Index			scanrelid;		/* range table index of the relation */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int32			ip_min_comp;	/* comparison policy to ip_min */
+	int32			ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+} CTidScanState;
+
+static bool
+CTidEvalScanZone(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	ExprContext	   *econtext = node->ss.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctss->ip_min_comp = -1;
+	ctss->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctss->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctss->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctss->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) <= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = -1;
+					}
+					break;
+				case TIDLessEqualOperator:
+					if (ctss->ip_max_comp > 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_max) < 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_max);
+						ctss->ip_max_comp = 0;
+					}
+					break;
+				case TIDGreaterOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) >= 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 0;
+					}
+					break;
+				case TIDGreaterEqualOperator:
+					if (ctss->ip_min_comp < 0 ||
+						ItemPointerCompare(itemptr, &ctss->ip_min) > 0)
+					{
+						ItemPointerCopy(itemptr, &ctss->ip_min);
+						ctss->ip_min_comp = 1;
+					}
+					break;
+				default:
+					elog(ERROR, "unsupported operator");
+					break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chained with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * CTidBeginCustomScan
+ *
+ * It initializes the given CustomScanState according to the CustomScan plan.
+ */
+static void
+CTidBeginCustomScan(CustomScanState *node, int eflags)
+{
+	CustomScan	   *cscan = (CustomScan *)node->ss.ps.plan;
+	Index			scanrelid = ((Scan *)node->ss.ps.plan)->scanrelid;
+	EState		   *estate = node->ss.ps.state;
+	CTidScanState  *ctss;
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* Begin sequential scan, but pointer shall be sought later */
+	node->ss.ss_currentScanDesc
+		= heap_beginscan(node->ss.ss_currentRelation,
+						 estate->es_snapshot, 0, NULL);
+
+	/* init CTidScanState */
+	ctss = palloc0(sizeof(CTidScanState));
+	ctss->scanrelid = scanrelid;
+	ctss->ctid_quals = (List *)
+		ExecInitExpr((Expr *)cscan->custom_private, &node->ss.ps);
+	ctss->ip_needs_eval = true;
+
+	node->custom_state = ctss;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we sought.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecScan(). It fetches a tuple from the underlying heap
+ * scan that was started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+	HeapScanDesc	scan = node->ss.ss_currentScanDesc;
+	TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+	EState		   *estate = node->ss.ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctss->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(node))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctss->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seek to the point if max-tid was obvious */
+			if (ctss->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctss->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctss->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctss->ip_max) > ctss->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctss->ip_min) < ctss->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * CTidRecheckCustomScan
+ *
+ * Recheck method of ExecScan(). We don't need recheck logic.
+ */
+static bool
+CTidRecheckCustomScan(CustomScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+/*
+ * CTidExecCustomScan
+ *
+ * It fetches a tuple from the underlying heap scan, according to
+ * the Execscan() manner.
+ */
+static TupleTableSlot *
+CTidExecCustomScan(CustomScanState *node)
+{
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) CTidAccessCustomScan,
+					(ExecScanRecheckMtd) CTidRecheckCustomScan);
+}
+
+/*
+ * CTidEndCustomScan
+ *
+ * It terminates custom tid scan.
+ */
+static void
+CTidEndCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	/* if ctss != NULL, we started underlying heap-scan */
+	if (ctss)
+		heap_endscan(node->ss.ss_currentScanDesc);
+}
+
+/*
+ * CTidReScanCustomScan
+ *
+ * It rewinds current position of the scan. Setting ip_needs_eval indicates
+ * to calculate the starting point again and rewinds underlying heap scan
+ * on the next ExecScan timing.
+ */
+static void
+CTidReScanCustomScan(CustomScanState *node)
+{
+	CTidScanState  *ctss = node->custom_state;
+
+	ctss->ip_needs_eval = true;
+
+	ExecScanReScan(&node->ss);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	CustomProvider		provider;
+
+	/* registration of callback on add scan path */
+	add_scan_path_next = add_scan_path_hook;
+	add_scan_path_hook = CTidAddScanPath;
+
+	/* registration of custom scan provider */
+	memset(&provider, 0, sizeof(provider));
+	snprintf(provider.name, sizeof(provider.name), "ctidscan");
+	provider.InitCustomScanPlan   = CTidInitCustomScanPlan;
+	provider.BeginCustomScan      = CTidBeginCustomScan;
+	provider.ExecCustomScan       = CTidExecCustomScan;
+	provider.EndCustomScan        = CTidEndCustomScan;
+	provider.ReScanCustomScan     = CTidReScanCustomScan;
+
+	register_custom_provider(&provider);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 336ba0c..7042d76 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -109,6 +109,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &btree-gist;
  &chkpass;
  &citext;
+ &ctidscan;
  &cube;
  &dblink;
  &dict-int;
diff --git a/doc/src/sgml/ctidscan.sgml b/doc/src/sgml/ctidscan.sgml
new file mode 100644
index 0000000..d010d5c
--- /dev/null
+++ b/doc/src/sgml/ctidscan.sgml
@@ -0,0 +1,108 @@
+<!-- doc/src/sgml/ctidscan.sgml -->
+
+<sect1 id="ctidscan" xreflabel="ctidscan">
+ <title>ctidscan</title>
+
+ <indexterm zone="ctidscan">
+  <primary>ctidscan</primary>
+ </indexterm>
+
+ <para>
+  The <filename>ctidscan</> module provides an additional logic to scan
+  regular relations if <literal>WHERE</> clause contains inequality
+  operators that compares something with <literal>ctid</> system column.
+  It also performs as a proof-of-concept implementation that works on
+  the custom-scan APIs that enables to extend the core executor system.
+ </para>
+
+ <sect2>
+  <title>Overview</title>
+  <para>
+   Once this module is loaded, it registers itself as a custom-scan provider.
+   It allows to provide an additional scan path on regular relations using
+   qualifiers that reference <literal>ctid</> system column.
+  </para>
+  <para>
+   For example, the query below usually falls to sequential scan if this
+   module was not loaded.
+<programlisting>
+SELECT ctid,* FROM my_table WHERE ctid > '(100,0)'::tid;
+</programlisting>
+   On the other hand, <filename>ctidscan</> module can construct an alternative
+   scan plan utilizing inequality operators that involve <literal>ctid</> 
+   system column, to reduce number of rows to be processed.
+   It does not make sense obviously to read tuples within pages being located
+   on 99th page or prior. So, it seeks the internal pointer to scan into
+   <literal>(100,0)</> at beginning of the scan, even though it internally
+   uses same logic with sequential scan.
+  </para>
+  <para>
+   Usually, <productname>PostgreSQL</> runs queries with inequality operators
+   that involves <literal>ctid</> system column using sequential scan, as
+   follows.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                       QUERY PLAN
+--------------------------------------------------------
+ Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   It works well except for the waste of i/o loads on the pages that contains
+   the records to be skipped.
+  </para>
+  <para>
+   On the other hands, an alternative scan path implemented with
+   <filename>ctidscan</> provides more efficient way; that skips the first
+   100 pages prior to sequential scan, as follows.
+<programlisting>
+postgres=# load 'ctidscan';
+LOAD
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid;
+                              QUERY PLAN
+----------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=37)
+   Filter: (ctid > '(100,0)'::tid)
+(2 rows)
+</programlisting>
+   The optimizer internally compares all the candidates of scan paths, then
+   chooses a path with cheapest cost. The custom-scan path provided by
+   <filename>ctidscan</> is usually cheaper than sequential scan because of
+   smaller number of tuples to be processed.
+  </para>
+  <para>
+   Of course, it shall not be chosen if we have more cheaper path than the
+   above custom-scan path. Index-scan based on equality operation is usually
+   cheaper than this custom-scan, so optimizer adopts it instead of sequential
+   scan or custom scan provided by <filename>ctidscan</> for instance.
+<programlisting>
+postgres=# EXPLAIN SELECT * FROM t1 WHERE ctid > '(100,0)'::tid AND a = 100;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=37)
+   Index Cond: (a = 100)
+   Filter: (ctid > '(100,0)'::tid)
+(3 rows)
+</programlisting>
+  </para>
+  <para>
+   Its usage is quite simple. All you need to do is, loading
+   the <filename>ctidscan</> into <productname>PostgreSQL</> using
+   <xref linkend="sql-load"> command,
+   <xref linkend="guc-shared-preload-libraries">,
+   <xref linkend="guc-local-preload-libraries"> or
+   <xref linkend="guc-session-preload-libraries"> parameter, according to
+   your convenience.
+  </para>
+  <para>
+   We have no configurable parameter in this module, right now.
+  </para>
+ </sect2>
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@kaigai.gr.jp</email>
+  </para>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index b57d82f..f53902d 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -18,7 +18,7 @@
   Overall, there are four major tasks that a custom-scan provider should 
   implement. The first task is the registration of custom-scan provider itself.
   Usually, this needs to be done once at the <literal>_PG_init()</literal> 
-  entrypoint when the module is loading. The remaing three tasks are all done
+  entrypoint when the module is loading. The reaming three tasks are all done
   when a query is planning and executing. The second task is the submission of
   candidate paths to either scan or join relations with an adequate cost for
   the core planner. Then, the planner will choose the cheapest path from all of
@@ -50,7 +50,7 @@
      <para>
       This custom scan in this module replaces a local join of foreign tables
       managed by <literal>postgres_fdw</literal> with a scan that fetches
-      remotely joined relations. It demostrates the way to implement a custom
+      remotely joined relations. It demonstrates the way to implement a custom
       scan node that performs join nodes.
      </para>
     </listitem>
@@ -145,7 +145,7 @@ typedef struct CustomPath
   <sect2 id="custom-scan-plan">
    <title>Construction of custom plan node</title>
    <para>
-    Once <literal>CustomPath</literal> was choosen by the query planner,
+    Once <literal>CustomPath</literal> was chosen by the query planner,
     it calls back to its associated to the custom scan provider to complete 
     setting up the <literal>CustomScan</literal> plan node according to the
     path information.
@@ -160,7 +160,7 @@ InitCustomScanPlan(PlannerInfo *root,
     The query planner does basic initialization on the <literal>cscan_plan</>
     being allocated, then the custom scan provider can apply final 
     initialization. <literal>cscan_path</> is the path node that was 
-    constructed on the previous stage then was choosen.
+    constructed on the previous stage then was chosen.
     <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
     on the <literal>Plan</> portion in the <literal>cscan_plan</>.
     Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index d63b1a8..aa2be4b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -105,6 +105,7 @@
 <!ENTITY btree-gist      SYSTEM "btree-gist.sgml">
 <!ENTITY chkpass         SYSTEM "chkpass.sgml">
 <!ENTITY citext          SYSTEM "citext.sgml">
+<!ENTITY ctidscan        SYSTEM "ctidscan.sgml">
 <!ENTITY cube            SYSTEM "cube.sgml">
 <!ENTITY dblink          SYSTEM "dblink.sgml">
 <!ENTITY dict-int        SYSTEM "dict-int.sgml">
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index e07d6d9..0f4ba9f 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -161,15 +161,19 @@ DESCR("equal");
 #define TIDEqualOperator   387
 DATA(insert OID = 402 (  "<>"	   PGNSP PGUID b f f	27	27	16 402 387 tidne neqsel neqjoinsel ));
 DESCR("not equal");
+#define TIDNotEqualOperator	402
 DATA(insert OID = 2799 (  "<"	   PGNSP PGUID b f f	27	27	16 2800 2802 tidlt scalarltsel scalarltjoinsel ));
 DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator		2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator	2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator	2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index b084e0a..3030a3e 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -90,6 +90,7 @@ regress_data_files = \
 
 install-tests: all install install-lib installdirs-tests
 	$(MAKE) -C $(top_builddir)/contrib/spi install
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan install
 	for file in $(regress_data_files); do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(pkglibdir)/regress/'$$file || exit; \
 	done
@@ -98,9 +99,9 @@ installdirs-tests: installdirs
 	$(MKDIR_P)  $(patsubst $(srcdir)/%/,'$(DESTDIR)$(pkglibdir)/regress/%',$(sort $(dir $(regress_data_files))))
 
 
-# Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
+# Get some extra C modules from contrib/spi, dummy_seclabel and ctidscan
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) ctidscan$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +112,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+ctidscan$(DLSUFFIX): $(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/ctidscan/ctidscan$(DLSUFFIX): | submake-contrib-ctidscan
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-ctidscan:
+	$(MAKE) -C $(top_builddir)/contrib/ctidscan
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-ctidscan
 
 # Tablespace setup
 
diff --git a/src/test/regress/input/custom_scan.source b/src/test/regress/input/custom_scan.source
new file mode 100644
index 0000000..a5a205d
--- /dev/null
+++ b/src/test/regress/input/custom_scan.source
@@ -0,0 +1,49 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
\ No newline at end of file
diff --git a/src/test/regress/output/custom_scan.source b/src/test/regress/output/custom_scan.source
new file mode 100644
index 0000000..fc13e9f
--- /dev/null
+++ b/src/test/regress/output/custom_scan.source
@@ -0,0 +1,290 @@
+--
+-- Regression Tests for Custom Scan APIs
+--
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Seq Scan on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+LOAD '@libdir@/ctidscan@DLSUFFIX@';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Custom Scan (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                  QUERY PLAN                  
+----------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom Scan (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom Scan (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+(10 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2e3eba8..827acc4 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ ignore: random
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete custom_scan
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 4f1dede..df391b8 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -92,6 +92,7 @@ test: btree_index
 test: hash_index
 test: update
 test: delete
+test: custom_scan
 test: namespace
 test: prepared_xacts
 test: privileges

pgsql-v9.4-custom-scan.part-1.v9.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v9.patchDownload

 doc/src/sgml/custom-scan.sgml           | 295 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  99 +++++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  34 +++-
 src/backend/executor/execProcnode.c     |  14 ++
 src/backend/executor/execQual.c         |  10 +-
 src/backend/executor/execUtils.c        |   4 +-
 src/backend/executor/nodeCustom.c       | 252 +++++++++++++++++++++++++++
 src/backend/nodes/bitmapset.c           |  61 +++++++
 src/backend/nodes/copyfuncs.c           |  30 ++++
 src/backend/nodes/outfuncs.c            |  19 ++
 src/backend/nodes/print.c               |   4 +
 src/backend/optimizer/path/allpaths.c   |  23 +++
 src/backend/optimizer/path/costsize.c   |   7 +-
 src/backend/optimizer/path/joinpath.c   |  18 ++
 src/backend/optimizer/plan/createplan.c | 104 +++++++++++
 src/backend/optimizer/plan/setrefs.c    |  27 ++-
 src/backend/optimizer/plan/subselect.c  |  10 ++
 src/backend/utils/adt/ruleutils.c       |  44 ++++-
 src/include/executor/executor.h         |   3 +-
 src/include/executor/nodeCustom.h       |  94 ++++++++++
 src/include/nodes/bitmapset.h           |   4 +
 src/include/nodes/execnodes.h           |  17 ++
 src/include/nodes/nodes.h               |   3 +
 src/include/nodes/plannodes.h           |  16 ++
 src/include/nodes/primnodes.h           |   1 +
 src/include/nodes/relation.h            |  16 ++
 src/include/optimizer/cost.h            |   4 +
 src/include/optimizer/paths.h           |  25 +++
 src/include/optimizer/planmain.h        |   1 +
 src/test/regress/expected/.gitignore    |   1 +
 src/test/regress/sql/.gitignore         |   1 +
 34 files changed, 1225 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..b57d82f
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,295 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+  <primary>custom scan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-scan API enables an extension to provide alternative ways to scan
+  or join relations leveraging the cost based optimizer. The API consists of a
+  set of callbacks, with a unique names, to be invoked during query planning 
+  and execution. A custom-scan provider should implement these callback 
+  functions according to the expectation of the API.
+ </para>
+ <para>
+  Overall, there are four major tasks that a custom-scan provider should 
+  implement. The first task is the registration of custom-scan provider itself.
+  Usually, this needs to be done once at the <literal>_PG_init()</literal> 
+  entrypoint when the module is loading. The remaing three tasks are all done
+  when a query is planning and executing. The second task is the submission of
+  candidate paths to either scan or join relations with an adequate cost for
+  the core planner. Then, the planner will choose the cheapest path from all of
+  the candidates. If the custom path survived, the planner starts the third 
+  task; construction of a <literal>CustomScan</literal> plan node, located
+  within the query plan tree instead of the built-in plan node. The last task
+  is the execution of its implementation in answer to invocations by the core
+  executor.
+ </para>
+ <para>
+  Some of contrib modules utilize the custom-scan API. They may provide a good
+  example for new development.
+  <variablelist>
+   <varlistentry>
+    <term><xref linkend="ctidscan"></term>
+    <listitem>
+     <para>
+      This custom scan in this module enables a scan to skip earlier pages or
+      terminate prior to end of the relation, if the inequality operator on the
+      <literal>ctid</literal> system column can narrow down the scope to be
+      scanned, instead of a sequential scan which reads a relation from the
+      head to the end.
+     </para>
+    </listitem>
+   </varlistentry>
+   <varlistentry>
+    <term><xref linkend="postgres-fdw"></term>
+    <listitem>
+     <para>
+      This custom scan in this module replaces a local join of foreign tables
+      managed by <literal>postgres_fdw</literal> with a scan that fetches
+      remotely joined relations. It demostrates the way to implement a custom
+      scan node that performs join nodes.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </para>
+ <para>
+  Currently, only scan and join are fully supported with integrated cost
+  based query optimization using the custom scan API. You might be able to
+  implement other stuff, like sort or aggregation, with manipulation of the
+  planned tree, however, the extension has to be responsible to handle this
+  replacement correctly. There is no support in the core.
+ </para>
+
+ <sect1 id="custom-scan-spec">
+  <title>Custom Scan API Functions and Callbacks</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom scan provider</title>
+   <para>
+    The first task for a custom scan provider is the registration of a set of
+    callbacks with a unique names. Usually, this is done once upon module
+    loading in the <literal>_PG_init()</literal> entrypoint.
+<programlisting>
+void
+register_custom_provider(const CustomProvider *provider);
+</programlisting>
+    Its argument, <literal>CustomProvider</literal> structure, contains
+    a name and a set of callback function pointers but some of them are
+    optional.
+    Once registered, it is copied on the internal table, so the caller
+    does not need to keep this structure any more.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-path">
+   <title>Submission of custom paths</title>
+   <para>
+    The query planner finds the best way to scan or join relations from various
+    potential paths using a combination of scan algorithms and target 
+    relations. Prior to this selection, we list all of the potential paths
+    towards a target relation (if it is a base relation) or a pair of relations
+    (if it is a join). The <literal>add_scan_path_hook</> and
+    <literal>add_join_path_hook</> allow extensions to add alternative scan
+    paths in addition to built-in paths.
+    If custom-scan provider can submit a potential scan path towards the
+    supplied relation, it shall construct a <literal>CustomPath</> object
+    with appropriate parameters.
+<programlisting>
+typedef struct CustomPath
+{
+    Path        path;
+    const char *custom_name;        /* name of custom scan provider */
+    int         custom_flags;       /* CUSTOM__* flags in nodeCustom.h */
+    List       *custom_private;     /* can be used for private data */
+} CustomPath;
+</programlisting>
+    Its <literal>path</> is a common field for all the path nodes to store
+    a cost estimation. In addition, <literal>custom_name</> is the name of
+    the registered custom scan provider, <literal>custom_flags</> is a set of
+    flags below, and <literal>custom_private</> can be used to store private
+    data of the custom scan provider.
+   </para>
+   <para>
+    <variablelist>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_MARK_RESTORE</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        <literal>ExecMarkPosCustomScan</> and
+        <literal>ExecRestorePosCustomScan</> methods.
+        Also, the custom scan provider has to be responsible to mark and
+        restore a particular position.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><literal>CUSTOM__SUPPORT_BACKWARD_SCAN</></term>
+      <listitem>
+       <para>
+        It informs the query planner this custom scan node supports
+        backward scans.
+        Also, custom scan provider has to be responsible to scan with
+        backward direction.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-plan">
+   <title>Construction of custom plan node</title>
+   <para>
+    Once <literal>CustomPath</literal> was choosen by the query planner,
+    it calls back to its associated to the custom scan provider to complete 
+    setting up the <literal>CustomScan</literal> plan node according to the
+    path information.
+<programlisting>
+void
+InitCustomScanPlan(PlannerInfo *root,
+                   CustomScan *cscan_plan,
+                   CustomPath *cscan_path,
+                   List *tlist,
+                   List *scan_clauses);
+</programlisting>
+    The query planner does basic initialization on the <literal>cscan_plan</>
+    being allocated, then the custom scan provider can apply final 
+    initialization. <literal>cscan_path</> is the path node that was 
+    constructed on the previous stage then was choosen.
+    <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned
+    on the <literal>Plan</> portion in the <literal>cscan_plan</>.
+    Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to
+    be checked during a relation scan. Its expression portion will also be
+    assigned on the <literal>Plan</> portion, but can be eliminated from
+    this list if custom scan provider can handle these checks by itself.
+   </para>
+   <para>
+    It often needs to adjust <literal>varno</> of <literal>Var</> node that
+    references a particular scan node, after construction of the plan node.
+    For example, Var node in the target list of the join node originally
+    references a particular relation underlying a join, however, it has to
+    be adjusted to either inner or outer reference.
+<programlisting>
+void
+SetPlanRefCustomScan(PlannerInfo *root,
+                     CustomScan *cscan_plan,
+                     int rtoffset);
+</programlisting>
+    This callback is optional if the custom scan node is a vanilla relation
+    scan because there is nothing special to do. Elsewhere, it needs to
+    be handled by the custom scan provider in case when a custom scan replaced
+    a join with two or more relations for example.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-exec">
+   <title>Execution of custom scan node</title>
+   <para>
+    The query executor also launches the associated callbacks to begin, execute
+    and end the custom scan according to the executor's manner.
+   </para>
+   <para>
+<programlisting>
+void
+BeginCustomScan(CustomScanState *csstate, int eflags);
+</programlisting>
+    It begins execution of the custom scan on starting up executor.
+    It allows the custom scan provider to do any initialization job around this
+    plan, however, it is not a good idea to launch the actual scanning jobs.
+    (It shall be done on the first invocation of <literal>ExecCustomScan</>
+    instead.)
+    The <literal>custom_state</> field of <literal>CustomScanState</> is
+    intended to save the private state being managed by the custom scan
+    provider. Also, <literal>eflags</> has flag bits of the executor's
+    operating mode for this plan node. Note that the custom scan provider
+    should not perform anything visible externally if 
+    <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given,
+   </para>
+
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches one tuple from the underlying relation or relations, if joining,
+    according to the custom logic. Unlike <literal>IterateForeignScan</>
+    method in foreign table, it is also responsible to check whether the next
+    tuple matches the qualifier of this scan, or not.
+    The usual way to implement this method is the callback performs just an
+    entrypoint of <literal>ExecQual</> with its own access method.
+   </para>
+
+   <para>
+<programlisting>
+Node *
+MultiExecCustomScan(CustomScanState *csstate);
+</programlisting>
+    It fetches multiple tuples from the underlying relation or relations, if
+    joining, according to the custom logic. Pay attention the data format (and
+    the way to return also) since it depends on the type of upper node.
+   </para>
+
+   <para>
+<programlisting>
+void
+EndCustomScan(CustomScanState *csstate);
+</programlisting>
+    It ends the scan and releases resources privately allocated.
+    It is usually not important to release memory in per-execution memory
+    context. So, all this callback should be responsible is its own
+    resources regardless from the framework.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-scan-misc">
+   <title>Miscellaneous jobs</title>
+   <para>
+<programlisting>
+void
+ReScanCustomScan(CustomScanState *csstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomScan(CustomScanState *csstate);
+</programlisting>
+    It saves the current position of the custom scan on somewhere private
+    state.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+RestorePosCustom(CustomScanState *csstate);
+</programlisting>
+    It rewinds the current position of the custom scan to the position
+    where <literal>MarkPosCustomScan</> was saved before.
+    Note that it is optional to implement, only when
+    <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomScan(CustomScanState *csstate,
+                  ExplainState *es);
+</programlisting>
+    It prints additional <command>EXPLAIN</> output for a custom scan plan.
+    This callback is expected to call <literal>ExplainPropertyText</> to
+    make additional field of <command>EXPLAIN</> output.
+    The flag fields in <literal>ExplainState</> indicates what shall be
+    printed, and the state of the <literal>CustomScanState</> will provide
+    run-time statistics in the <command>EXPLAIN ANALYZE</> case.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 09de4bd..d63b1a8 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan  SYSTEM "custom-scan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index b47bf52..ed76d33 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-scan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 08f3167..2a6136d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -90,6 +91,7 @@ static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_customscan_info(CustomScanState *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -721,6 +723,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				*rels_used = bms_add_member(*rels_used,
+											((Scan *) plan)->scanrelid);
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -847,6 +854,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -935,6 +944,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			snprintf(namebuf, sizeof(namebuf), "Custom Scan (%s)",
+					 ((CustomScan *) plan)->custom_name);
+			pname = pstrdup(namebuf);
+			sname = "Custom Scan";
+		    custom_name = ((CustomScan *) plan)->custom_name;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1036,6 +1052,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom Provider", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1051,6 +1069,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomScan:
+			if (((Scan *) plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1347,6 +1369,29 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			if (((CustomScan *)plan)->functions != NIL && es->verbose)
+			{
+				List	   *fexprs = NIL;
+				ListCell   *lc;
+
+				foreach(lc, ((CustomScan *) plan)->functions)
+				{
+					RangeTblFunction *rtfunc = (RangeTblFunction *) lfirst(lc);
+
+					fexprs = lappend(fexprs, rtfunc->funcexpr);
+				}
+				/* We rely on show_expression to insert commas as needed */
+				show_expression((Node *) fexprs,
+								"Function Call", planstate, ancestors,
+								es->verbose, es);
+			}
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_customscan_info((CustomScanState *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1986,6 +2031,19 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomScan node.
+ */
+static void
+show_customscan_info(CustomScanState *cstate, ExplainState *es)
+{
+	CustomProvider *provider = cstate->custom_provider;
+
+	/* Let custom scan provider emit whatever fields it wants */
+	if (provider->ExplainCustomScan != NULL)
+		provider->ExplainCustomScan(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2158,6 +2216,47 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_JOIN)
+			{
+				objectname = rte->eref->aliasname;
+				objecttag = "Join Alias";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				List	   *functions = ((CustomScan *) plan)->functions;
+
+				if (functions && list_length(functions) == 1)
+				{
+					RangeTblFunction *rtfunc = linitial(functions);
+
+					if (IsA(rtfunc->funcexpr, FuncExpr))
+					{
+						FuncExpr   *funcexpr = (FuncExpr *) rtfunc->funcexpr;
+						Oid			funcid = funcexpr->funcid;
+
+						objectname = get_func_name(funcid);
+						if (es->verbose)
+							namespace =
+								get_namespace_name(get_func_namespace(funcid));
+					}
+				}
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 8c01a63..2443e24 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomMarkPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecCustomRestrPos((CustomScanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -379,9 +392,9 @@ ExecRestrPos(PlanState *node)
  * and valuesscan support is actually useless code at present.)
  */
 bool
-ExecSupportsMarkRestore(NodeTag plantype)
+ExecSupportsMarkRestore(Path *path)
 {
-	switch (plantype)
+	switch (path->pathtype)
 	{
 		case T_SeqScan:
 		case T_IndexScan:
@@ -392,6 +405,14 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_Sort:
 			return true;
 
+		case T_CustomPath:
+			{
+				int	flags = ((CustomPath *) path)->custom_flags;
+				if (flags & CUSTOM__SUPPORT_MARK_RESTORE)
+					return true;
+				return false;
+			}
+
 		case T_Result:
 
 			/*
@@ -465,6 +486,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomScan:
+			{
+				int		flags = ((CustomScan *) node)->custom_flags;
+
+				if (flags & CUSTOM__SUPPORT_BACKWARD_SCAN)
+					return TargetListSupportsBackwardScan(node->targetlist);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c5ecd18..b4a7411 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomScan:
+			result = (PlanState *) ExecInitCustomScan((CustomScan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +448,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			result = ExecCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +688,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecEndCustomScan((CustomScanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 0eba025..e71ce9b 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -592,7 +592,7 @@ ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -680,7 +680,7 @@ ExecEvalScalarVarFast(ExprState *exprstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -732,7 +732,7 @@ ExecEvalWholeRowVar(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -915,7 +915,7 @@ ExecEvalWholeRowFast(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
@@ -991,7 +991,7 @@ ExecEvalWholeRowSlow(WholeRowVarExprState *wrvstate, ExprContext *econtext,
 			slot = econtext->ecxt_outertuple;
 			break;
 
-			/* INDEX_VAR is handled by default case */
+			/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 		default:				/* get the tuple from the relation being
 								 * scanned */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 46895b2..58d7190 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -578,7 +578,7 @@ ExecBuildProjectionInfo(List *targetList,
 						projInfo->pi_lastOuterVar = attnum;
 					break;
 
-					/* INDEX_VAR is handled by default case */
+					/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 				default:
 					varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
@@ -638,7 +638,7 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 					projInfo->pi_lastOuterVar = attnum;
 				break;
 
-				/* INDEX_VAR is handled by default case */
+				/* INDEX_VAR and CUSTOM_VAR are handled by default case */
 
 			default:
 				if (projInfo->pi_lastScanVar < attnum)
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..2d89d7a
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,252 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan, scan and join node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/nodeCustom.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* static variables */
+static HTAB *custom_provider_hash = NULL;
+
+/*
+ * register_custom_provider
+ *
+ * It registers a custom execution provider; that consists of a set of
+ * callbacks and is identified with a unique name.
+ */
+void
+register_custom_provider(const CustomProvider *provider)
+{
+	CustomProvider *entry;
+	bool			found;
+
+	if (!custom_provider_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomProvider);
+
+		custom_provider_hash = hash_create("custom execution providers",
+										   32,
+										   &ctl,
+										   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_provider_hash,
+						provider->name,
+						HASH_ENTER, &found);
+	if (found)
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("duplicate custom execution provider \"%s\"",
+						provider->name)));
+
+	Assert(strcmp(provider->name, entry->name) == 0);
+	memcpy(entry, provider, sizeof(CustomProvider));
+}
+
+/*
+ * get_custom_provider
+ *
+ * It finds a registered custom execution provide by its name
+ */
+CustomProvider *
+get_custom_provider(const char *custom_name)
+{
+	CustomProvider *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_provider_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomProvider *) hash_search(custom_provider_hash,
+										   custom_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						custom_name)));
+
+	return entry;
+}
+
+/*
+ * ExecInitCustomScan
+ *
+ * Allocation of CustomScanState and various initialization stuff.
+ * Note that some of initialization jobs are skipped if scanrelid is zero
+ * (that means this custom scan plan is not associated with a particular
+ * relation in range-table list.)
+ */
+CustomScanState *
+ExecInitCustomScan(CustomScan *node, EState *estate, int eflags)
+{
+	CustomProvider	   *provider = get_custom_provider(node->custom_name);
+	CustomScanState	   *csstate;
+	Plan			   *plan = &node->scan.plan;
+	Index				scanrelid = node->scan.scanrelid;
+
+	/*
+	 * Create state structure
+	 */
+	csstate = makeNode(CustomScanState);
+	csstate->ss.ps.plan = plan;
+	csstate->ss.ps.state = estate;
+	csstate->custom_provider = provider;
+	csstate->custom_flags = node->custom_flags;
+	csstate->custom_state = NULL;
+
+	/*
+	 * Miscellaneous initialization
+	 */
+	ExecAssignExprContext(estate, &csstate->ss.ps);
+
+	/*
+	 * Initialization of child expressions
+	 */
+	csstate->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist, &csstate->ss.ps);
+	csstate->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual, &csstate->ss.ps);
+
+	/*
+	 * tuple table initialization
+	 *
+	 * Note that ss_ScanTupleSlot is set only when scanrelid is associated
+	 * with a particular relation. Elsewhere, it needs to be initialized by
+	 * custom-scan provider itself if it internally uses ss_ScanTupleSlot.
+	 * If it replaces varno of Var node by CUSTOM_VAR, it has to be set to
+	 * reference underlying attribute name to generate EXPLAIN output.
+	 */
+	ExecInitResultTupleSlot(estate, &csstate->ss.ps);
+	if (scanrelid > 0)
+		ExecInitScanTupleSlot(estate, &csstate->ss);
+
+	/*
+	 * open the base relation and acquire appropriate lock on it,
+	 * if this custom scan is connected with a particular relaion.
+	 * Also, assign its scan type according to the table definition.
+	 */
+	if (scanrelid > 0)
+	{
+		Relation	rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+
+		csstate->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&csstate->ss, RelationGetDescr(rel));
+
+		csstate->ss.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&csstate->ss.ps);
+
+	if (scanrelid > 0)
+		ExecAssignScanProjectionInfo(&csstate->ss);
+	else
+		ExecAssignProjectionInfo(&csstate->ss.ps, NULL);
+
+	/*
+	 * Final initialization based on callback of BeginCustomScan method.
+	 * Extension may be able to override initialization stuff above, if
+	 * needed.
+	 */
+	csstate->custom_provider->BeginCustomScan(csstate, eflags);
+
+	return csstate;
+}
+
+/*
+ * ExecCustomScan
+ *
+ * Just an entrypoint of ExecCustomScan method. All the stuff to fetch
+ * a tuple is a job of custom-scan provider.
+ */
+TupleTableSlot *
+ExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->ExecCustomScan(csstate);
+}
+
+/*
+ * MultiExecCustomScan
+ *
+ * Aldo, just an entrypoint of MultiExecCustomScan method. All the stuff
+ * to fetch multiple tuples (according to expectation of upper node) is
+ * a job of custom-scan provider.
+ */
+Node *
+MultiExecCustomScan(CustomScanState *csstate)
+{
+	return csstate->custom_provider->MultiExecCustomScan(csstate);
+}
+
+/*
+ * ExecEndCustomScan
+ *
+ * It releases all the resources allocated on this scan.
+ */
+void
+ExecEndCustomScan(CustomScanState *csstate)
+{
+	/* Let the custom-exec shut down */
+	csstate->custom_provider->EndCustomScan(csstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&csstate->ss.ps);
+
+	/* Clean out the tuple table, if exists */
+	ExecClearTuple(csstate->ss.ps.ps_ResultTupleSlot);
+	if (csstate->ss.ss_ScanTupleSlot)
+		ExecClearTuple(csstate->ss.ss_ScanTupleSlot);
+
+	/* close the relation, if opened */
+	if (csstate->ss.ss_currentRelation)
+		ExecCloseScanRelation(csstate->ss.ss_currentRelation);
+}
+
+/*
+ * ExecReScanCustomScan
+ */
+void
+ExecReScanCustomScan(CustomScanState *csstate)
+{
+	csstate->custom_provider->ReScanCustomScan(csstate);
+}
+
+/*
+ * ExecCustomMarkPos
+ */
+void
+ExecCustomMarkPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->MarkPosCustomScan(csstate);
+}
+
+/*
+ * ExecCustomRestrPos
+ */
+void
+ExecCustomRestrPos(CustomScanState *csstate)
+{
+	Assert((csstate->custom_flags & CUSTOM__SUPPORT_MARK_RESTORE) != 0);
+	csstate->custom_provider->RestorePosCustom(csstate);
+}
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 3a6d0fb..3a1465e 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -865,3 +865,64 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_to_string / bms_from_string - transform bitmapset to/from text form
+ */
+char *
+bms_to_string(Bitmapset *a)
+{
+	char   *result;
+	char   *pos;
+	int		i;
+
+	if (bms_is_empty(a))
+		return NULL;
+
+	result = palloc(a->nwords * (BITS_PER_BITMAPWORD / 4) + 1);
+	for (i = a->nwords, pos = result; i > 0; i--)
+		pos += sprintf(pos, "%08x", a->words[i - 1]);
+
+	return result;
+}
+
+Bitmapset *
+bms_from_string(const char *a)
+{
+	Bitmapset  *result;
+	Size		len;
+	int			nwords;
+	int			i, offset = 0;
+
+	if (a == NULL)
+		return NULL;
+
+	len = strlen(a);
+	if (len % (BITS_PER_BITMAPWORD / 4) != 0)
+		elog(WARNING, "strange bitmapset text form: %s", a);
+
+	nwords = (len + BITS_PER_BITMAPWORD / 4 - 1) / (BITS_PER_BITMAPWORD / 4);
+	result = palloc(BITMAPSET_SIZE(nwords));
+	result->nwords = nwords;
+
+	for (i=result->nwords; i > 0; i--)
+	{
+		bitmapword	word = 0;
+
+		do {
+			int		c = a[offset++];
+
+			if (c >= '0' && c <= '9')
+				word = (word << 4) | (c - '0');
+			else if (c >= 'a' && c <= 'f')
+				word = (word << 4) | (c - 'a' + 10);
+			else if (c >= 'A' && c <= 'F')
+				word = (word << 4) | (c - 'A' + 10);
+			else
+				elog(ERROR, "invalid hexadecimal digit");
+		} while ((len - offset) % (BITS_PER_BITMAPWORD / 4) != 0);
+
+		result->words[i - 1] = word;
+	}
+	return result;
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c89d808..d48b3d7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,33 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_exprs);
+
+	COPY_NODE_FIELD(subqry_plan);
+	COPY_NODE_FIELD(functions);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3983,6 +4010,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bfb4b9f..7dc1631 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -564,6 +564,22 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_INT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+
+	WRITE_NODE_FIELD(subqry_plan);
+	WRITE_NODE_FIELD(functions);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -2828,6 +2844,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 9f7f322..9f2b6bb 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -333,6 +333,10 @@ print_expr(const Node *expr, const List *rtable)
 				relname = "INDEX";
 				attname = "?";
 				break;
+			case CUSTOM_VAR:
+				relname = "CUSTOM";
+				attname = "?";
+				break;
 			default:
 				{
 					RangeTblEntry *rte;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..6201a97 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -389,6 +391,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root, rel, rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -417,6 +422,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root, rel, rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1236,6 +1244,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root, rel, rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1307,6 +1318,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root, rel, rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1330,6 +1344,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root, rel, rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1399,6 +1416,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root, rel, rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1452,6 +1472,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root, rel, rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 9bca968..c42dc9e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -132,9 +132,6 @@ static MergeScanSelCache *cached_scansel(PlannerInfo *root,
 static void cost_rescan(PlannerInfo *root, Path *path,
 			Cost *rescan_startup_cost, Cost *rescan_total_cost);
 static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
-static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
-						  ParamPathInfo *param_info,
-						  QualCost *qpqual_cost);
 static bool has_indexed_join_quals(NestPath *joinpath);
 static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
 				   List *quals);
@@ -2268,7 +2265,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * it off does not entitle us to deliver an invalid plan.
 	 */
 	else if (innersortkeys == NIL &&
-			 !ExecSupportsMarkRestore(inner_path->pathtype))
+			 !ExecSupportsMarkRestore(inner_path))
 		path->materialize_inner = true;
 
 	/*
@@ -3157,7 +3154,7 @@ cost_qual_eval_walker(Node *node, cost_qual_eval_context *context)
  * some of the quals.  We assume baserestrictcost was previously set by
  * set_baserel_size_estimates().
  */
-static void
+void
 get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
 						  ParamPathInfo *param_info,
 						  QualCost *qpqual_cost)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index a996116..48f5ad4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 184d37a..c07e000 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,6 +78,9 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+										  CustomPath *best_path,
+										  List *tlist, List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -233,6 +237,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
@@ -409,6 +414,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *) create_customscan_plan(root,
+												   (CustomPath *) best_path,
+												   tlist,
+												   scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -2006,6 +2018,98 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan = makeNode(CustomScan);
+	RelOptKind		reloptkind = best_path->path.parent->reloptkind;
+	RangeTblEntry  *rte;
+	Index			scan_relid;
+
+	if (reloptkind == RELOPT_BASEREL ||
+		reloptkind == RELOPT_OTHER_MEMBER_REL)
+	{
+		scan_relid = best_path->path.parent->relid;
+
+		rte = planner_rt_fetch(scan_relid, root);
+		/*
+		 * For EXPLAIN output, we save various information in CustomScan plan
+		 * structure. Custom-scan provider can utilize them, but it is not
+		 * recommendablt to adjust.
+		 */
+		if (rte->rtekind == RTE_SUBQUERY)
+		{
+			if (best_path->path.param_info)
+			{
+				List   *subplan_params
+					= best_path->path.parent->subplan_params;
+				process_subquery_nestloop_params(root, subplan_params);
+			}
+			scan_plan->subqry_plan = best_path->path.parent->subplan;
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			List   *functions = rte->functions;
+
+			if (best_path->path.param_info)
+				functions = (List *)
+					replace_nestloop_params(root, (Node *)functions);
+			scan_plan->functions = functions;
+		}
+	}
+	else if (reloptkind == RELOPT_JOINREL)
+		scan_relid = 0;
+	else
+		elog(ERROR, "unexpected reloptkind: %d", (int)reloptkind);
+
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	scan_plan->scan.plan.targetlist = NULL;	/* to be set by callback */
+	scan_plan->scan.plan.qual = NULL;		/* to be set by callback */
+	scan_plan->scan.plan.lefttree = NULL;
+	scan_plan->scan.plan.righttree = NULL;
+	scan_plan->scan.scanrelid = scan_relid;
+
+	scan_plan->custom_name = pstrdup(best_path->custom_name);
+	scan_plan->custom_flags = best_path->custom_flags;
+	scan_plan->custom_private = NIL;
+	scan_plan->custom_exprs = NULL;
+
+	/*
+	 * Let custom scan provider perform to set up this custom-scan plan
+	 * according to the given path information.
+	 */
+	provider->InitCustomScanPlan(root, scan_plan,
+								 best_path, tlist, scan_clauses);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params in the qual
+	 * and custom_exprs expressions.  We do this last so that the FDW doesn't
+	 * have to be involved.  (Note that parts of custom_exprs could have come
+	 * from join clauses, so doing this beforehand on the scan_clauses
+	 * wouldn't work.)
+	 */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 46affe7..ee3fbab 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -576,6 +577,30 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomScan:
+			{
+				CustomScan	   *splan = (CustomScan *) plan;
+				CustomProvider *provider
+					= get_custom_provider(splan->custom_name);
+
+				if (provider->SetPlanRefCustomScan)
+					provider->SetPlanRefCustomScan(root, splan, rtoffset);
+				else if (splan->scan.scanrelid > 0)
+				{
+					splan->scan.scanrelid += rtoffset;
+					splan->scan.plan.targetlist =
+						fix_scan_list(root, splan->scan.plan.targetlist,
+									  rtoffset);
+					splan->scan.plan.qual =
+						fix_scan_list(root, splan->scan.plan.qual, rtoffset);
+					splan->custom_exprs =
+						fix_scan_list(root, splan->custom_exprs, rtoffset);
+				}
+				else
+					elog(ERROR, "No implementation to set plan reference");
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
@@ -1057,7 +1082,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a3f3583..74ff415 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2236,6 +2236,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomScan:
+			finalize_primnode((Node *) ((CustomScan *) plan)->custom_exprs,
+							  &context);
+			context.paramids = bms_add_members(context.paramids, scan_params);
+			/*
+			 * XXX - Is it sufficient to do? Don't we need something special
+			 * if CustomScan override FunctionScan or SubqueryScan.
+			 */
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index add5cd1..d099d16 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -145,6 +145,7 @@ typedef struct
 	List	   *outer_tlist;	/* referent for OUTER_VAR Vars */
 	List	   *inner_tlist;	/* referent for INNER_VAR Vars */
 	List	   *index_tlist;	/* referent for INDEX_VAR Vars */
+	TupleDesc	custom_tupdesc;	/* referent for CUSTOM_VAR Vars */
 } deparse_namespace;
 
 /*
@@ -2482,14 +2483,19 @@ deparse_context_for(const char *aliasname, Oid relid)
  * deparse_context_for_planstate	- Build deparse context for a plan
  *
  * When deparsing an expression in a Plan tree, we might have to resolve
- * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
- * provide the parent PlanState node.  Then OUTER_VAR and INNER_VAR references
- * can be resolved by drilling down into the left and right child plans.
+ * special varno (OUTER_VAR, INNER_VAR, INDEX_VAR or CUSTOM_VAR) references.
+ * To do this, the caller must provide the parent PlanState node.  Then
+ * OUTER_VAR and INNER_VAR references can be resolved by drilling down into
+ * the left and right child plans.
  * Similarly, INDEX_VAR references can be resolved by reference to the
  * indextlist given in the parent IndexOnlyScan node.  (Note that we don't
  * currently support deparsing of indexquals in regular IndexScan or
  * BitmapIndexScan nodes; for those, we can only deparse the indexqualorig
  * fields, which won't contain INDEX_VAR Vars.)
+ * Also, CUSTOM_VAR references can be resolved by reference to the TupleDesc
+ * of ss_ScanTupleSlot in CustomScanState node. (Note that custom scan
+ * provider must be responsible to initialize the ss_ScanTupleSlot with
+ * appropriate TupleDesc; being likely constructed by ExecTypeFromTL).
  *
  * Note: planstate really ought to be declared as "PlanState *", but we use
  * "Node *" to avoid having to include execnodes.h in builtins.h.
@@ -3747,6 +3753,14 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
 		dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
 	else
 		dpns->index_tlist = NIL;
+
+	/* custom_tupdesc is set only if it's an CustomScan */
+	if (IsA(ps, CustomScanState) &&
+		((CustomScanState *)ps)->ss.ss_ScanTupleSlot)
+		dpns->custom_tupdesc =
+			((CustomScanState *)ps)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	else
+		dpns->custom_tupdesc = NULL;
 }
 
 /*
@@ -5414,6 +5428,18 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 
 		return NULL;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		attname = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+		appendStringInfoString(buf, quote_identifier(attname));
+
+		return attname;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
@@ -5684,6 +5710,18 @@ get_name_for_var_field(Var *var, int fieldno,
 
 		return result;
 	}
+	else if (var->varno == CUSTOM_VAR && dpns->custom_tupdesc)
+	{
+		TupleDesc	tupdesc = dpns->custom_tupdesc;
+		const char *result;
+
+		Assert(netlevelsup == 0);
+		Assert(var->varattno > 0 && var->varattno <= tupdesc->natts);
+
+		result = NameStr(tupdesc->attrs[var->varattno - 1]->attname);
+
+		return result;
+	}
 	else
 	{
 		elog(ERROR, "bogus varno: %d", var->varno);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index eb78776..7fe0998 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -16,6 +16,7 @@
 
 #include "executor/execdesc.h"
 #include "nodes/parsenodes.h"
+#include "nodes/relation.h"
 
 
 /*
@@ -102,7 +103,7 @@ extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
 extern void ExecReScan(PlanState *node);
 extern void ExecMarkPos(PlanState *node);
 extern void ExecRestrPos(PlanState *node);
-extern bool ExecSupportsMarkRestore(NodeTag plantype);
+extern bool ExecSupportsMarkRestore(Path *path);
 extern bool ExecSupportsBackwardScan(Plan *node);
 extern bool ExecMaterializesOutput(NodeTag plantype);
 
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..a484f8b
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,94 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "commands/explain.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+/*
+ * Definition of the custom execution provider callbacks
+ */
+typedef void (*InitCustomScanPlan_function)(PlannerInfo *root,
+											CustomScan *cscan_plan,
+											CustomPath *cscan_path,
+											List *tlist,
+											List *scan_clauses);
+typedef void (*SetPlanRefCustomScan_function)(PlannerInfo *root,
+											  CustomScan *cscan_plan,
+											  int rtoffset);
+typedef void (*BeginCustomScan_function)(CustomScanState *csstate, int eflags);
+typedef TupleTableSlot *(*ExecCustomScan_function)(CustomScanState *csstate);
+typedef Node *(*MultiExecCustomScan_function)(CustomScanState *csstate);
+typedef void (*EndCustomScan_function)(CustomScanState *csstate);
+
+typedef void (*ReScanCustomScan_function)(CustomScanState *csstate);
+typedef void (*MarkPosCustomScan_function)(CustomScanState *csstate);
+typedef void (*RestorePosCustom_function)(CustomScanState *csstate);
+
+typedef void (*ExplainCustomScan_function)(CustomScanState *csstate,
+										   ExplainState *es);
+
+typedef struct CustomProvider
+{
+	char							name[NAMEDATALEN];
+
+	InitCustomScanPlan_function		InitCustomScanPlan;
+	SetPlanRefCustomScan_function	SetPlanRefCustomScan;
+
+	BeginCustomScan_function		BeginCustomScan;
+	ExecCustomScan_function			ExecCustomScan;
+	MultiExecCustomScan_function	MultiExecCustomScan;
+	EndCustomScan_function			EndCustomScan;
+
+	ReScanCustomScan_function		ReScanCustomScan;
+	MarkPosCustomScan_function		MarkPosCustomScan;
+	RestorePosCustom_function		RestorePosCustom;
+
+	ExplainCustomScan_function		ExplainCustomScan;
+} CustomProvider;
+
+/* Flags of CustomScan */
+
+/*
+ * CUSTOM__SUPPORT_MARK_RESTORE informs optimizer this custom scan provider
+ * support ExecCustomMarkPos and ExecCustomRestrPos callbacks.
+ */
+#define CUSTOM__SUPPORT_MARK_RESTORE			0x0001
+
+/*
+ * CUSTOM__SUPPORT_BACKWARD_SCAN informs optimizer this custom scan provider
+ * is designed to support backward scan.
+ */
+#define CUSTOM__SUPPORT_BACKWARD_SCAN			0x0002
+
+/*
+ * Registration and lookup custom execution provider
+ */
+extern void register_custom_provider(const CustomProvider *provider);
+
+extern CustomProvider *get_custom_provider(const char *custom_name);
+
+/*
+ * General executor code
+ */
+extern CustomScanState *ExecInitCustomScan(CustomScan *csstate,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomScan(CustomScanState *csstate);
+extern Node *MultiExecCustomScan(CustomScanState *csstate);
+extern void ExecEndCustomScan(CustomScanState *csstate);
+
+extern void ExecReScanCustomScan(CustomScanState *csstate);
+extern void ExecCustomMarkPos(CustomScanState *csstate);
+extern void ExecCustomRestrPos(CustomScanState *csstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index f770608..a860a4e 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -93,4 +93,8 @@ extern int	bms_first_member(Bitmapset *a);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+/* support for text form */
+extern char *bms_to_string(Bitmapset *a);
+extern Bitmapset *bms_from_string(const char *a);
+
 #endif   /* BITMAPSET_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a301a08..621830a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1501,6 +1501,23 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomScanState information
+ *
+ *		CustomScan nodes are used to scan various relations using custom
+ *		logic.
+ * ----------------
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	int			custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5b8df59..681c1b1 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,7 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomScan,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +108,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomScanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +226,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 38c039c..85d088d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -479,6 +479,22 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ * ----------------
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* a set of CUSTOM__* flags */
+	List	   *custom_private;		/* private data for CSP  */
+	List	   *custom_exprs;		/* expressions that CSP may execute */
+
+	Plan	   *subqry_plan;		/* valid, if RTE_SUBQUERY */
+	List	   *functions;			/* valid, if RTE_FUNCTION */
+} CustomScan;
 
 /*
  * ==========
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 4992bc0..7d9b0c0 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR		65000		/* reference to inner subplan */
 #define    OUTER_VAR		65001		/* reference to outer subplan */
 #define    INDEX_VAR		65002		/* reference to index column */
+#define    CUSTOM_VAR		65003		/* reference to custom column */
 
 #define IS_SPECIAL_VARNO(varno)		((varno) >= INNER_VAR)
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 8aa40d0..527a060 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -877,6 +877,22 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_name is the identifier of custom scan provider when it was
+ * registered. custom_flags is a set of CUSTOM__* bits to control its
+ * behavior. custom_private allows extension to store its private data
+ * but has to be safe for copyObject().
+ */
+typedef struct CustomPath
+{
+	Path		path;
+	const char *custom_name;		/* name of custom scan provider */
+	int			custom_flags;		/* CUSTOM__* flags in nodeCustom.h */
+	List	   *custom_private;		/* can be used for private data */
+} CustomPath;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index ec1605d..8857206 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -143,6 +143,10 @@ extern void final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 extern void cost_subplan(PlannerInfo *root, SubPlan *subplan, Plan *plan);
 extern void cost_qual_eval(QualCost *cost, List *quals, PlannerInfo *root);
 extern void cost_qual_eval_node(QualCost *cost, Node *qual, PlannerInfo *root);
+extern void get_restriction_qual_cost(PlannerInfo *root,
+									  RelOptInfo *baserel,
+									  ParamPathInfo *param_info,
+									  QualCost *qpqual_cost);
 extern void compute_semi_anti_join_factors(PlannerInfo *root,
 							   RelOptInfo *outerrel,
 							   RelOptInfo *innerrel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9b22fda..b613946 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root, baserel, rte)				\
+	do {														\
+		if (add_scan_path_hook)									\
+			(*add_scan_path_hook)((root), (baserel), (rte));	\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 8bdb7db..064640c 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -127,6 +127,7 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/test/regress/expected/.gitignore b/src/test/regress/expected/.gitignore
index 93c56c8..7e35e74 100644
--- a/src/test/regress/expected/.gitignore
+++ b/src/test/regress/expected/.gitignore
@@ -2,6 +2,7 @@
 /copy.out
 /create_function_1.out
 /create_function_2.out
+/custom_scan.out
 /largeobject.out
 /largeobject_1.out
 /misc.out
diff --git a/src/test/regress/sql/.gitignore b/src/test/regress/sql/.gitignore
index 46c8112..8eeb461 100644
--- a/src/test/regress/sql/.gitignore
+++ b/src/test/regress/sql/.gitignore
@@ -2,6 +2,7 @@
 /copy.sql
 /create_function_1.sql
 /create_function_2.sql
+/custom_scan.sql
 /largeobject.sql
 /misc.sql
 /security_label.sql

#75

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Stephen Frost (#71)

Re: Custom Scan APIs (Re: Custom Plan node)

On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:

As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.

In accordance with the above, what I'd like to see with this patch is
removal of the postgres_fdw changes and any changes which were for that
support. In addition, I'd like to understand why 'ctidscan' makes any
sense to have as an example of what to use this for- if that's valuable,
why wouldn't we simply implement that in core? I do want an example in
contrib of how to properly use this capability, but I don't think that's
it.

I suggested that example to KaiGai at last year's PGCon. It may
indeed be something we want to have in core, but right now we don't.

More generally, I think this discussion is focusing on the wrong set
of issues. The threshold issue for this patch is whether there is a
set of hook points that enable a workable custom-scan functionality,
and whether KaiGai has correctly identified them. In other words, I
think we should be worrying about whether KaiGai's found all of the
places that need to be modified to support a custom scan, and whether
the modifications he's made to each of those places are correct and
adequate. Whether he's picked the best possible example does not
strike me as a matter of principal concern, and it's far too late to
tell him he's got to go pick a different one at this point anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Kouhei Kaigai (#72)

Re: Custom Scan APIs (Re: Custom Plan node)

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

Do you think it makes sense if my submission was only interface portion
without working example?

No, we're pretty strongly against putting in interfaces which don't have
working examples in contrib- for one thing, we want to know when we
break it.

The purpose of ctidscan module is, similar to
postgres_fdw, to demonstrate the usage of custom-scan interface with
enough small code scale. If tons of code example were attached, nobody
will want to review the patch.

I gathered that's why it was included. Is the plan to eventually submit
something larger to go into -contrib which will use this interface? Or
will it always be external?

The "cache_scan" module that I and Haribabu are discussing in another
thread also might be a good demonstration for custom-scan interface,
however, its code scale is a bit larger than ctidscan.

That does sound interesting though I'm curious about the specifics...

For one thing, an example where you could have this CustomScan node calling
other nodes underneath would be interesting. I realize the CTID scan can't
do that directly but I would think your GPU-based system could; after all,
if you're running a join or an aggregate with the GPU, the rows could come
from nearly anything. Have you considered that, or is the expectation that
users will just go off and access the heap and/or whatever indexes directly,
like ctidscan does? How would such a requirement be handled?

In case when custom-scan node has underlying nodes, it shall be invoked using
ExecProcNode as built-in node doing, then it will be able to fetch tuples
come from underlying nodes. Of course, custom-scan provider can perform the
tuples come from somewhere as if it came from underlying relation. It is
responsibility of extension module. In some cases, it shall be required to
return junk system attribute, like ctid, for row-level locks or table updating.
It is also responsibility of the extension module (or, should not add custom-
path if this custom-scan provider cannot perform as required).

Right, tons of work to do to make it all fit together and play nice-
what I was trying to get at is: has this actually been done? Is the GPU
extension that you're talking about as the use-case for this been
written? How does it handle all of the above? Or are we going through
all these gyrations in vain hope that it'll actually all work when
someone tries to use it for something real?

Thanks,

Stephen

#77

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 12 years ago

In reply to: Robert Haas (#75)

Re: Custom Scan APIs (Re: Custom Plan node)

On Tue, Mar 4, 2014 at 7:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:

As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.

In accordance with the above, what I'd like to see with this patch is
removal of the postgres_fdw changes and any changes which were for that
support. In addition, I'd like to understand why 'ctidscan' makes any
sense to have as an example of what to use this for- if that's valuable,
why wouldn't we simply implement that in core? I do want an example in
contrib of how to properly use this capability, but I don't think that's
it.

I suggested that example to KaiGai at last year's PGCon. It may
indeed be something we want to have in core, but right now we don't.

More generally, I think this discussion is focusing on the wrong set
of issues. The threshold issue for this patch is whether there is a
set of hook points that enable a workable custom-scan functionality,
and whether KaiGai has correctly identified them. In other words, I
think we should be worrying about whether KaiGai's found all of the
places that need to be modified to support a custom scan, and whether
the modifications he's made to each of those places are correct and
adequate. Whether he's picked the best possible example does not
strike me as a matter of principal concern, and it's far too late to
tell him he's got to go pick a different one at this point anyway.

There are so many places in the planner and optimizer code, where we create
various types of paths and the number of such paths is again significant,
if not large. If we want the custom scan contrib module to work in all
those cases (which seems to be the intention here), then we have to expose
so many hooks. I don't think all of those hooks have been identified.
Second problem is, the functions which create those paths have signatures
difficult enough to be exposed as hooks. Take example of the join hook that
was exposed. These function signatures do get changed from time to time and
thus corresponding hooks need to be changed to. This is going to be a
maintenance burden.

So, unless we have some way of exposing these hooks such that the
definitions of the hooks are independent of the internal function
signatures, supporting custom scan looks difficult.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#78

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Ashutosh Bapat (#73)

Re: Custom Scan APIs (Re: Custom Plan node)

* Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote:

During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
to foreign servers, those would be fired while EXPLAINing a query as well.
We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
server. But again, not all foreign servers would be able to EXPLAIN the
query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
if it's for EXPLAIN (except for ANALYSE may be).

Agreed that we wouldn't want to actually run a query when it's just
being explain'd. If the FDW can't tell the difference then we'd need to
address that, of course. A similar issue would, presumably, be around
prepare/execute, though I haven't looked yet. These kinds of issues are
why it was option '#2' instead of '#1'. :) I'm not sure they're able to
be addressed. :/

Thanks,

Stephen

#79

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Robert Haas (#75)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-03-04 23:09 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:

As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.

In accordance with the above, what I'd like to see with this patch is
removal of the postgres_fdw changes and any changes which were for that
support. In addition, I'd like to understand why 'ctidscan' makes any
sense to have as an example of what to use this for- if that's valuable,
why wouldn't we simply implement that in core? I do want an example in
contrib of how to properly use this capability, but I don't think that's
it.

I suggested that example to KaiGai at last year's PGCon. It may
indeed be something we want to have in core, but right now we don't.

More generally, I think this discussion is focusing on the wrong set
of issues. The threshold issue for this patch is whether there is a
set of hook points that enable a workable custom-scan functionality,
and whether KaiGai has correctly identified them. In other words, I
think we should be worrying about whether KaiGai's found all of the
places that need to be modified to support a custom scan, and whether
the modifications he's made to each of those places are correct and
adequate. Whether he's picked the best possible example does not
strike me as a matter of principal concern, and it's far too late to
tell him he's got to go pick a different one at this point anyway.

That is definitely the point to be discussed here. Even though I *believe*
I could put the callbacks needed to implement alternative join / scan,
it may lead different conclusion from other person's viewpoint.

At least, I could implement a custom-scan as an alternative of join
using postgres_fdw, however, it's uncertain whether I could cover
all the possible case we should care about.
So, I'd like to see comments from others.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Robert Haas (#75)

Re: Custom Scan APIs (Re: Custom Plan node)

* Robert Haas (robertmhaas@gmail.com) wrote:

More generally, I think this discussion is focusing on the wrong set
of issues. The threshold issue for this patch is whether there is a
set of hook points that enable a workable custom-scan functionality,
and whether KaiGai has correctly identified them.

Right- I was trying to hit on that in my email this morning.

Thanks,

Stephen

#81

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Stephen Frost (#78)

Re: Custom Scan APIs (Re: Custom Plan node)

Stephen Frost <sfrost@snowman.net> writes:

* Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote:

During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
to foreign servers, those would be fired while EXPLAINing a query as well.
We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
server. But again, not all foreign servers would be able to EXPLAIN the
query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
if it's for EXPLAIN (except for ANALYSE may be).

Agreed that we wouldn't want to actually run a query when it's just
being explain'd. If the FDW can't tell the difference then we'd need to
address that, of course.

EXEC_FLAG_EXPLAIN_ONLY ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#82

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Stephen Frost (#76)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-03-04 23:10 GMT+09:00 Stephen Frost <sfrost@snowman.net>:

The "cache_scan" module that I and Haribabu are discussing in another
thread also might be a good demonstration for custom-scan interface,
however, its code scale is a bit larger than ctidscan.

That does sound interesting though I'm curious about the specifics...

This module caches a part of columns, but not all, thus allows to hold
much larger number of records for a particular amount of RAM than the
standard buffer cache.
It is constructed on top of custom-scan node, and also performs a new
hook for a callback on page vacuuming to invalidate its cache entry.
(I originally designed this module for demonstration of on-vacuum hook
because I already made ctidscan and postgres_fdw enhancement for
custom-scan node, by the way.)

For one thing, an example where you could have this CustomScan node calling
other nodes underneath would be interesting. I realize the CTID scan can't
do that directly but I would think your GPU-based system could; after all,
if you're running a join or an aggregate with the GPU, the rows could come
from nearly anything. Have you considered that, or is the expectation that
users will just go off and access the heap and/or whatever indexes directly,
like ctidscan does? How would such a requirement be handled?

In case when custom-scan node has underlying nodes, it shall be invoked using
ExecProcNode as built-in node doing, then it will be able to fetch tuples
come from underlying nodes. Of course, custom-scan provider can perform the
tuples come from somewhere as if it came from underlying relation. It is
responsibility of extension module. In some cases, it shall be required to
return junk system attribute, like ctid, for row-level locks or table updating.
It is also responsibility of the extension module (or, should not add custom-
path if this custom-scan provider cannot perform as required).

Right, tons of work to do to make it all fit together and play nice-
what I was trying to get at is: has this actually been done? Is the GPU
extension that you're talking about as the use-case for this been
written?

Its chicken-and-egg problem, because implementation of the extension module
fully depends on the interface from the backend. Unlike commit-fest, here is no
deadline for my extension module, so I put higher priority on the submission of
custom-scan node, than the extension.
However, GPU extension is not fully theoretical stuff. I had implemented
a prototype using FDW APIs, and it allowed to accelerate sequential scan if
query has enough complicated qualifiers.

See the movie (from 2:45). The table t1 is a regular table, and t2 is a foreign
table. Both of them has same contents, however, response time of the query
is much faster, if GPU acceleration is working.
http://www.youtube.com/watch?v=xrUBffs9aJ0
So, I'm confident that GPU acceleration will have performance gain once it
can run regular tables, not only foreign tables.

How does it handle all of the above? Or are we going through
all these gyrations in vain hope that it'll actually all work when
someone tries to use it for something real?

I don't talk something difficult. If junk attribute requires to return "ctid" of
the tuple, custom-scan provider reads a tuple of underlying relation then
includes a correct item pointer. If this custom-scan is designed to run on
the cache, all it needs to do is reconstruct a tuple with correct item-pointer
(thus this cache needs to have ctid also). It's all I did in the cache_scan
module.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Tom Lane (#81)

Re: Custom Scan APIs (Re: Custom Plan node)

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote:

During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
to foreign servers, those would be fired while EXPLAINing a query as well.
We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
server. But again, not all foreign servers would be able to EXPLAIN the
query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
if it's for EXPLAIN (except for ANALYSE may be).

Agreed that we wouldn't want to actually run a query when it's just
being explain'd. If the FDW can't tell the difference then we'd need to
address that, of course.

EXEC_FLAG_EXPLAIN_ONLY ...

Yeah, figured there should be a way. Still not sure that kicking the
query off from ExecInitNode() is a good idea though. Perhaps it could
be optional somehow. I really like the idea of being able to make
Append work in an async mode where it's pulling data from multiple
sources at the same time, but it's a fair bit of work.

Thanks,

Stephen

#84

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Robert Haas (#75)

Re: Custom Scan APIs (Re: Custom Plan node)

* Robert Haas (robertmhaas@gmail.com) wrote:

On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:

In accordance with the above, what I'd like to see with this patch is
removal of the postgres_fdw changes and any changes which were for that
support. In addition, I'd like to understand why 'ctidscan' makes any
sense to have as an example of what to use this for- if that's valuable,
why wouldn't we simply implement that in core? I do want an example in
contrib of how to properly use this capability, but I don't think that's
it.

I suggested that example to KaiGai at last year's PGCon. It may
indeed be something we want to have in core, but right now we don't.

Alright- so do you feel that the simple ctidscan use-case is a
sufficient justification and example of how this can be generally
useful that we should be adding these hooks to core..? I'm willing to
work through the patch and clean it up this weekend if we agree that
it's useful and unlikely to immediately be broken by expected changes..

Thanks,

Stephen

#85

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Kohei KaiGai (#82)

Re: Custom Scan APIs (Re: Custom Plan node)

I apologize for not having paid much attention to this thread so far.
It kept getting stuck on my "to look at later" queue. Anyway, I've
taken a preliminary look at the v7 patch now.

While the patch seems roughly along the lines of what we talked about
last PGCon, I share Stephen's unease about a lot of the details. It's
not entirely clear that these hooks are really good for anything, and
it's even less clear what APIs the hook functions should be expected
to depend on. I really do not like the approach embodied in the later
patches of "oh, we'll just expose whatever static planner functions seem
convenient". That's not an approach that's available when doing actual
external development of an extension, and even if it were that doesn't
make it a good idea. The larger the exposed surface of functions the
harder it is to know what's safe to change.

Anyway, on to specifics:

* Please drop the whole register_custom_provider/get_custom_provider API.
There is no reason other than debugging for a provider to have a name at
all, and if we expect providers to have unique names then that creates a
collision risk for independently-created extensions. AFAICS, it's
sufficient for a function hooked into one of the add-a-path hooks to
include a pointer to a struct-of-function-pointers in the Path object it
returns, and similarly the CustomScan Plan object can contain a pointer
inserted when it's created. I don't object to having a name field in the
function pointer structs for debugging reasons, but I don't want any
lookups being done on it.

* The function-struct pointers should be marked const in the referencing
nodes, to indicate that the core code won't be modifying or copying them.
In practice they'd probably be statically allocated constants in the
extensions anyway.

* The patch does lots of violence to the separation between planner and
executor, starting with the decision to include nodes/relation.h in
executor.h. That will not do at all. I see that you did that because you
wanted to make ExecSupportsMarkRestore take a Path, but we need some other
answer. One slightly grotty answer is to invent two different customscan
Plan types, one that supports mark/restore and one that doesn't, so that
ExecSupportsMarkRestore could still just look at the Plan type tag.
(BTW, path->pathtype is supposed to contain the node tag of the Plan node
that the path would produce. Putting T_CustomPath in it is entirely
tone-deaf.) Another way would be to remove ExecSupportsMarkRestore in
favor of some new function in the planner; but it's good to keep it in
execAmi.c since that has other knowledge of which plan types support
mark/restore.

* More generally, I'm not convinced that exactly one Path type and exactly
one Plan type is going to get us very far. It seems rather ugly to use
the same Plan type for both scan and join nodes, and what will happen if
somebody wants to build a custom Append node, or something else that has
neither zero nor two subplans?

* nodeCustom.h is being completely abused. That should only export the
functions in nodeCustom.c, which are going to be pretty much one-liners
anyway. The right place to put the function pointer struct definitions
is someplace else. I'd be inclined to start by separating the function
pointers into two structs, one for functions needed for a Path and one for
functions needed for a Plan, so that you don't have this problem of having
to import everything the planner knows into an executor header or vice
versa. Most likely you could just put the Path function pointer struct
declaration next to CustomPath in relation.h, and the one for Plans next
to CustomPlan (or the variants thereof) in plannodes.h.

* The set of fields provided in CustomScan seems nonsensical. I'm not
even sure that it should be derived from Scan; that's okay for plan types
that actually are scans of a base relation, but it's confusing overhead
for anything that's say a join, or a custom sort node, or anything like
that. Maybe one argument for multiple plan node types is that one would
be derived from Scan and one directly from Plan.

* More generally, what this definition for CustomScan exposes is that we
have no idea whatever what fields a custom plan node might need. I'm
inclined to think that what we should be assuming is that any custom path
or plan node is really an object of a struct type known only to its
providing extension, whose first field is the CustomPath or CustomPlan
struct known to the core backend. (Think C++ subclassing.) This would
imply that copyfuncs/equalfuncs/outfuncs support would have to be provided
by the extension, which is in principle possible if we add function
pointers for those operations to the struct linked to from the path/plan
object. (Notationally this might be a bit of a pain, since the macros
that we use in the functions in copyfuncs.c etc aren't public. Not sure
if it's worth exposing those somewhere, or if people should just
copy/paste them.) This approach would probably also avoid the need for
the klugy bitmapset representation you propose in patch 3.

* This also implies that create_customscan_plan is completely bogus.
A custom plan provider *must* provide a callback function for that,
because only it will know how big a node to palloc. There are far too
many assumptions in create_customscan_plan anyway; I think there is
probably nothing at all in that function that shouldn't be getting done
by the custom provider instead.

* Likewise, there is way too much hard-wired stuff in explain.c. It
should not be optional whether a custom plan provider provides an explain
support function, and that function should be doing pretty much everything
involved in printing the node.

* I don't believe in the hard-wired implementation in setrefs.c either.

* Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).
That's at best badly thought out, and at worst a source of new bugs.
Under what circumstances is a custom plan node going to contain any Vars
that don't reduce to either scan or input-plan variables? None, because
that would imply that it was doing something unrelated to the requested
query.

* The API for add_join_path_hook seems overcomplex, as well as too full
of implementation details that should remain private to joinpath.c.
I particularly object to passing the mergeclause lists, which seem
unlikely to be of interest for non-mergejoin plan types anyway.
More generally, it seems likely that this hook is at the wrong level of
abstraction; it forces the hook function to concern itself with a lot of
stuff about join legality and parameterization (which I note your patch3
code fails to do; but that's not an optional thing).

* After a bit of reflection I think the best thing to do might be to ditch
add_join_path_hook for 9.4, on these grounds:
1. You've got enough to do to get the rest of the patch committable.
2. Like Stephen, I feel that the proposed usage as a substitute for
FDW-based foreign join planning is not the direction we want to travel.
3. Without that, the use case for new join nodes seems pretty marginal,
as opposed to say alternative sort-node implementations (a case not
supported by this patch, except to the extent that you could use them in
explicit-sort mergejoins if you duplicated large parts of joinpath.c).

* Getting nitpicky ... what is the rationale for the doubled underscore
in the CUSTOM__ flag names? That's just a typo waiting to happen IMO.

* Why is there more than one call site for add_scan_path_hook? I don't
see the need for the calling macro with the randomly inconsistent name,
either.

* The test arrangement for contrib/ctidscan is needlessly complex, and
testing it in the core tests is a bogus idea anyway. Why not just
let it contain its own test script like most other contrib modules?

That's all I've got for now. I've not really looked at the extension code
in either patch2 or patch3, just at the changes in the core code.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#86

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Stephen Frost (#84)

Re: Custom Scan APIs (Re: Custom Plan node)

On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote:

Alright- so do you feel that the simple ctidscan use-case is a
sufficient justification and example of how this can be generally
useful that we should be adding these hooks to core..? I'm willing to
work through the patch and clean it up this weekend if we agree that
it's useful and unlikely to immediately be broken by expected changes..

Yeah, I think it's useful. But based on Tom's concurrently-posted
review, I think there's probably a good deal of work left here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#87

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Robert Haas (#86)

Re: Custom Scan APIs (Re: Custom Plan node)

* Robert Haas (robertmhaas@gmail.com) wrote:

On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote:

Alright- so do you feel that the simple ctidscan use-case is a
sufficient justification and example of how this can be generally
useful that we should be adding these hooks to core..? I'm willing to
work through the patch and clean it up this weekend if we agree that
it's useful and unlikely to immediately be broken by expected changes..

Yeah, I think it's useful. But based on Tom's concurrently-posted
review, I think there's probably a good deal of work left here.

Yeah, it certainly looks like it.

KaiGai- will you have time to go over and address Tom's concerns..?

Thanks!

Stephen

#88

Kohei KaiGai

kaigai@kaigai.gr.jp

almost 12 years ago

In reply to: Stephen Frost (#87)

Re: Custom Scan APIs (Re: Custom Plan node)

2014-03-05 5:52 GMT+09:00 Stephen Frost <sfrost@snowman.net>:

* Robert Haas (robertmhaas@gmail.com) wrote:

On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote:

Alright- so do you feel that the simple ctidscan use-case is a
sufficient justification and example of how this can be generally
useful that we should be adding these hooks to core..? I'm willing to
work through the patch and clean it up this weekend if we agree that
it's useful and unlikely to immediately be broken by expected changes..

Yeah, I think it's useful. But based on Tom's concurrently-posted
review, I think there's probably a good deal of work left here.

Yeah, it certainly looks like it.

KaiGai- will you have time to go over and address Tom's concerns..?

Yes, I need to do. Let me take it through the later half of this week and
the weekend. So, I'd like to submit revised one by next Monday.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#89

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Tom Lane (#85)

Re: Custom Scan APIs (Re: Custom Plan node)

Tom, thanks for your detailed comments.

I apologize for not having paid much attention to this thread so far.
It kept getting stuck on my "to look at later" queue. Anyway, I've taken
a preliminary look at the v7 patch now.

While the patch seems roughly along the lines of what we talked about last
PGCon, I share Stephen's unease about a lot of the details. It's not
entirely clear that these hooks are really good for anything, and it's even
less clear what APIs the hook functions should be expected to depend on.
I really do not like the approach embodied in the later patches of "oh,
we'll just expose whatever static planner functions seem convenient".
That's not an approach that's available when doing actual external
development of an extension, and even if it were that doesn't make it a
good idea. The larger the exposed surface of functions the harder it is
to know what's safe to change.

Hmm. It needs a clear reasoning to expose the function rather than
its convenience.

Anyway, on to specifics:

* Please drop the whole register_custom_provider/get_custom_provider API.
There is no reason other than debugging for a provider to have a name at
all, and if we expect providers to have unique names then that creates a
collision risk for independently-created extensions. AFAICS, it's
sufficient for a function hooked into one of the add-a-path hooks to include
a pointer to a struct-of-function-pointers in the Path object it returns,
and similarly the CustomScan Plan object can contain a pointer inserted
when it's created. I don't object to having a name field in the function
pointer structs for debugging reasons, but I don't want any lookups being
done on it.

One thing I was worrying about is how copyObject() and nodeToString() support
set of function pointer tables around custom-scan node, however, you suggested
to change the assumption here. So, I think these functions become unnecessary.

* The function-struct pointers should be marked const in the referencing
nodes, to indicate that the core code won't be modifying or copying them.
In practice they'd probably be statically allocated constants in the
extensions anyway.

OK,

* The patch does lots of violence to the separation between planner and
executor, starting with the decision to include nodes/relation.h in
executor.h. That will not do at all. I see that you did that because you
wanted to make ExecSupportsMarkRestore take a Path, but we need some other
answer. One slightly grotty answer is to invent two different customscan
Plan types, one that supports mark/restore and one that doesn't, so that
ExecSupportsMarkRestore could still just look at the Plan type tag.
(BTW, path->pathtype is supposed to contain the node tag of the Plan node
that the path would produce. Putting T_CustomPath in it is entirely
tone-deaf.) Another way would be to remove ExecSupportsMarkRestore in
favor of some new function in the planner; but it's good to keep it in
execAmi.c since that has other knowledge of which plan types support
mark/restore.

OK, I'll add one derivative node delivertive plan node type,
CustomScanMarkRestore for instance.
Probably, it shall be populated on the create_customscan_plan()
according to the flag being set on the CustomPath.

* More generally, I'm not convinced that exactly one Path type and exactly
one Plan type is going to get us very far. It seems rather ugly to use
the same Plan type for both scan and join nodes, and what will happen if
somebody wants to build a custom Append node, or something else that has
neither zero nor two subplans?

In the previous discussion, CustomJoin will be nonsense because we know
limited number of join algorithms: nest-loop, hash-join and merge-join, unlike
variation of logic to scan relations. Also, IIUC, someone didn't want to add
custom- something node types for each built-in types.
So, we concluded to put CustomScan node to replace built-in join / scan at
that time.
Regarding to the Append node, it probably needs to be enhanced to have
list of subplans on CustomScan, or add individual CustomAppend node, or
opaque "CustomPlan" may be sufficient if it handles EXPLAIN by itself.

* nodeCustom.h is being completely abused. That should only export the
functions in nodeCustom.c, which are going to be pretty much one-liners
anyway. The right place to put the function pointer struct definitions
is someplace else. I'd be inclined to start by separating the function
pointers into two structs, one for functions needed for a Path and one for
functions needed for a Plan, so that you don't have this problem of having
to import everything the planner knows into an executor header or vice versa.
Most likely you could just put the Path function pointer struct declaration
next to CustomPath in relation.h, and the one for Plans next to CustomPlan
(or the variants thereof) in plannodes.h.

Yes. I didn't have clear idea where we should put the definition of interfaces.
Probably, InitCustomScanPlan (maybe, CreateCustomScanPlan) shall be moved to
relation.h, and rest of callbacks shall be moved to plannodes.h.

* The set of fields provided in CustomScan seems nonsensical. I'm not even
sure that it should be derived from Scan; that's okay for plan types that
actually are scans of a base relation, but it's confusing overhead for
anything that's say a join, or a custom sort node, or anything like that.
Maybe one argument for multiple plan node types is that one would be derived
from Scan and one directly from Plan.

The reason why CustomScan is derived from Scan is, some of backend code
wants to know rtindex of the relation to be referenced by this CustomScan.
The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c and
explain.c. The usage in nodeCustom.c is just for service routines, and the
usage in setrefs.c can be moved to the extension according to your suggestion.
We need to investigate the usage in explain.c; ExplainPreScanNode() walks
around the nodes to collect relids referenced in this query. If we don't
want to put a callback for this specific usage, it is a reasonable choice
to show the backend the associated scanrelid of CustomScan.
Is it a confusable rule, if extension has to set 0 when a particular relation
is not scanned in this CustomScan.

* More generally, what this definition for CustomScan exposes is that we
have no idea whatever what fields a custom plan node might need. I'm
inclined to think that what we should be assuming is that any custom path
or plan node is really an object of a struct type known only to its providing
extension, whose first field is the CustomPath or CustomPlan struct known
to the core backend. (Think C++ subclassing.) This would imply that
copyfuncs/equalfuncs/outfuncs support would have to be provided by the
extension, which is in principle possible if we add function pointers for
those operations to the struct linked to from the path/plan object.
(Notationally this might be a bit of a pain, since the macros that we use
in the functions in copyfuncs.c etc aren't public. Not sure if it's worth
exposing those somewhere, or if people should just copy/paste them.) This
approach would probably also avoid the need for the klugy bitmapset
representation you propose in patch 3.

It's a breakthrough!
Probably, Path node needs to have a callback on outfuncs.c.
Also, Plan node needs to have a callback on copyfuncs.c and outfuncs.c.
I think, prototype of the callback functions are not specific to custom-
scan node, so it should be declared in the nodes/nodes.h.

* This also implies that create_customscan_plan is completely bogus.
A custom plan provider *must* provide a callback function for that, because
only it will know how big a node to palloc. There are far too many
assumptions in create_customscan_plan anyway; I think there is probably
nothing at all in that function that shouldn't be getting done by the custom
provider instead.

OK, InitCustomScanPlan shall become CreateCustomScanPlan probably, to
return an object being palloc()'ed with arbitrary size.

* Likewise, there is way too much hard-wired stuff in explain.c. It should
not be optional whether a custom plan provider provides an explain support
function, and that function should be doing pretty much everything involved
in printing the node.

Probably, the hunk around show_customscan_info() call can be entirely moved
to the extension side. If so, I want ExplainNode() being an extern function,
because it allows extensions to print underlying plan-nodes.

* I don't believe in the hard-wired implementation in setrefs.c either.

Are you saying the hard-wired portion in setrefs.c can be moved to the
extension side? If fix_scan_expr() become extern function, I think it
is feasible.

* Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).
That's at best badly thought out, and at worst a source of new bugs.
Under what circumstances is a custom plan node going to contain any Vars
that don't reduce to either scan or input-plan variables? None, because
that would imply that it was doing something unrelated to the requested
query.

I also want to rid it, if we have alternative idea to solve the issue.
The varno of Var-node originally has rt-index of the relation being referenced,
then setrefs.c adjusts it according to the offset and relations type to be
referenced.
In case of Var-node that references joined-relations, it shall be replaced to
either INNER_VAR or OUTER_VAR according the location of underlying
relations. It eventually makes ExecEvalScalarVar() to reference either
ecxt_innertuple or ecxt_outertuple, however, it is problematic if we already
consolidated tuples come from the both side into one.
For example, the enhanced postgres_fdw fetches the result set of remote
join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?

* The API for add_join_path_hook seems overcomplex, as well as too full
of implementation details that should remain private to joinpath.c.
I particularly object to passing the mergeclause lists, which seem unlikely
to be of interest for non-mergejoin plan types anyway.
More generally, it seems likely that this hook is at the wrong level of
abstraction; it forces the hook function to concern itself with a lot of
stuff about join legality and parameterization (which I note your patch3
code fails to do; but that's not an optional thing).

I'd like to see if you have idea where the hook shall be located, and which
kind of abstraction is suitable.

* After a bit of reflection I think the best thing to do might be to ditch
add_join_path_hook for 9.4, on these grounds:
1. You've got enough to do to get the rest of the patch committable.
2. Like Stephen, I feel that the proposed usage as a substitute for FDW-based
foreign join planning is not the direction we want to travel.
3. Without that, the use case for new join nodes seems pretty marginal,
as opposed to say alternative sort-node implementations (a case not
supported by this patch, except to the extent that you could use them in
explicit-sort mergejoins if you duplicated large parts of joinpath.c).

Are you suggesting an alternative merge join path that uses sort-node
on top of custom-scan node, aren't you?
Probably, this path shall be constructed using existing MergeJoin path
that has one or two CustomScan node for sorting or scan and sorting.
Can I share your suggestion correctly?

* Getting nitpicky ... what is the rationale for the doubled underscore
in the CUSTOM__ flag names? That's just a typo waiting to happen IMO.

Sorry, I intended the double underline as a separator towards prefix portion
of this label.

* Why is there more than one call site for add_scan_path_hook? I don't
see the need for the calling macro with the randomly inconsistent name,
either.

Where is the best place to do? Even though I cannot imagine the situation
to run sub-query or cte by extensions, its path is constructed during
set_rel_size(), so I had to put the hook for each set_xxxx_pathlist()
functions.

* The test arrangement for contrib/ctidscan is needlessly complex, and
testing it in the core tests is a bogus idea anyway. Why not just let it
contain its own test script like most other contrib modules?

I thought the regression test of CustomScan interface is useful in the
core test. However, it seems to me a reasonable suggestion to implement
this test as usual contrib regression test. So, I'll adjust it.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#90

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Kouhei Kaigai (#89)

Re: Custom Scan APIs (Re: Custom Plan node)

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

* Please drop the whole register_custom_provider/get_custom_provider API.

One thing I was worrying about is how copyObject() and nodeToString() support
set of function pointer tables around custom-scan node, however, you suggested
to change the assumption here. So, I think these functions become unnecessary.

If we allow the extension to control copyObject behavior, it can do what
it likes with the function-struct pointer. I think the typical case would
be that it's a simple pointer to a never-copied static constant. But you
could imagine that it's a pointer to a struct embedded further down in the
custom path or plan node, if the extension wants different functions for
different plans.

If we had to support stringToNode() for paths or plans, things would get
much more complicated, but we don't (and there are already lots of other
things that would be difficult for that).

The reason why CustomScan is derived from Scan is, some of backend code
wants to know rtindex of the relation to be referenced by this CustomScan.
The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c and
explain.c. The usage in nodeCustom.c is just for service routines, and the
usage in setrefs.c can be moved to the extension according to your suggestion.
We need to investigate the usage in explain.c; ExplainPreScanNode() walks
around the nodes to collect relids referenced in this query. If we don't
want to put a callback for this specific usage, it is a reasonable choice
to show the backend the associated scanrelid of CustomScan.

I think we have to add another callback for that :-(. It's a pain since
it's such a trivial point; but the existing code cannot support a custom
node referencing more than one RTE, which seems possible for join or
append type cases.

Probably, the hunk around show_customscan_info() call can be entirely moved
to the extension side. If so, I want ExplainNode() being an extern function,
because it allows extensions to print underlying plan-nodes.

I haven't looked at what explain.c would have to expose to make this
workable, but yeah, we will probably have to export a few things.

Are you saying the hard-wired portion in setrefs.c can be moved to the
extension side? If fix_scan_expr() become extern function, I think it
is feasible.

My recollection is that fix_scan_expr() is a bit specialized. Not sure
exactly what we'd have to export there --- but we'd have to do it anyway.
What you had in the patch was a hook that could be called, but no way
for it to do what it would likely need to do.

* Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).

In case of Var-node that references joined-relations, it shall be replaced to
either INNER_VAR or OUTER_VAR according the location of underlying
relations. It eventually makes ExecEvalScalarVar() to reference either
ecxt_innertuple or ecxt_outertuple, however, it is problematic if we already
consolidated tuples come from the both side into one.

So? If you did that, then you wouldn't have renumbered the Vars as
INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all;
if it is needed, then there would also be a need for an additional
tuple slot in executor contexts, which you haven't provided.

For example, the enhanced postgres_fdw fetches the result set of remote
join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?

That would be a scan situation, and the vars could reference the scan
tuple slot. Which in fact was the implementation you were using, so
how is CUSTOM_VAR adding anything?

* Why is there more than one call site for add_scan_path_hook? I don't
see the need for the calling macro with the randomly inconsistent name,
either.

Where is the best place to do? Even though I cannot imagine the situation
to run sub-query or cte by extensions, its path is constructed during
set_rel_size(), so I had to put the hook for each set_xxxx_pathlist()
functions.

Hm. We could still call the hook in set_rel_pathlist, if we were to
get rid of the individual calls to set_cheapest and do that in one
spot at the bottom of set_rel_pathlist (after the hook call). Calling
set_cheapest in one place seems more consistent anyway.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#91

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Tom Lane (#90)

Re: Custom Scan APIs (Re: Custom Plan node)

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

* Please drop the whole register_custom_provider/get_custom_provider

API.

One thing I was worrying about is how copyObject() and nodeToString()
support set of function pointer tables around custom-scan node,
however, you suggested to change the assumption here. So, I think these

functions become unnecessary.

If we allow the extension to control copyObject behavior, it can do what
it likes with the function-struct pointer. I think the typical case would
be that it's a simple pointer to a never-copied static constant. But you
could imagine that it's a pointer to a struct embedded further down in the
custom path or plan node, if the extension wants different functions for
different plans.

If we had to support stringToNode() for paths or plans, things would get
much more complicated, but we don't (and there are already lots of other
things that would be difficult for that).

I expected to include simple function pointers for copying and text-output
as follows:

typedef struct {
Plan plan;
:
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
} Custom;

Then, sub-routine in copyfuncs.c shall be:
static Custom *
_copyCustom(const Custom *from)
{
return from->node_copy(from);
}

Can I share same image for this? It allows Custom node to have polymorphism
on the node it enhanced.

Sorry, I got little bit confused. Is the function-struct pointer you
mentioned something different from usual function pointer?

The reason why CustomScan is derived from Scan is, some of backend
code wants to know rtindex of the relation to be referenced by this

CustomScan.

The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c
and explain.c. The usage in nodeCustom.c is just for service routines,
and the usage in setrefs.c can be moved to the extension according to

your suggestion.

We need to investigate the usage in explain.c; ExplainPreScanNode()
walks around the nodes to collect relids referenced in this query. If
we don't want to put a callback for this specific usage, it is a
reasonable choice to show the backend the associated scanrelid of

CustomScan.

I think we have to add another callback for that :-(. It's a pain since
it's such a trivial point; but the existing code cannot support a custom
node referencing more than one RTE, which seems possible for join or append
type cases.

It's more generic approach, I like this.
Probably, it can kill the characteristic as Scan of CustomScan from the
view of core backend. It shall perform as an opaque Plan node that may
scan, join, sort or something, so more appropriate its name may be
CustomPlan or simply Custom.

Probably, the hunk around show_customscan_info() call can be entirely
moved to the extension side. If so, I want ExplainNode() being an
extern function, because it allows extensions to print underlying

plan-nodes.

I haven't looked at what explain.c would have to expose to make this workable,
but yeah, we will probably have to export a few things.

OK,

Are you saying the hard-wired portion in setrefs.c can be moved to the
extension side? If fix_scan_expr() become extern function, I think it
is feasible.

My recollection is that fix_scan_expr() is a bit specialized. Not sure
exactly what we'd have to export there --- but we'd have to do it anyway.
What you had in the patch was a hook that could be called, but no way for
it to do what it would likely need to do.

It probably needs to be exported. It walks on the supplied node tree and
eventually calls record_plan_function_dependency() for each functions being
found. It should not be invented in the extension again.
Anyway, my reworking on the patch will make clear which static functions
need to be exposed. Please wait for a while.

* Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).

In case of Var-node that references joined-relations, it shall be
replaced to either INNER_VAR or OUTER_VAR according the location of
underlying relations. It eventually makes ExecEvalScalarVar() to
reference either ecxt_innertuple or ecxt_outertuple, however, it is
problematic if we already consolidated tuples come from the both side

into one.

So? If you did that, then you wouldn't have renumbered the Vars as
INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if it
is needed, then there would also be a need for an additional tuple slot
in executor contexts, which you haven't provided.

For example, the enhanced postgres_fdw fetches the result set of
remote join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?

That would be a scan situation, and the vars could reference the scan tuple
slot. Which in fact was the implementation you were using, so how is
CUSTOM_VAR adding anything?

Let me sort out the points.
If custom-node performs as join node with two underlying relations thus
it could retrieve two tuples, here is no matter because INNER/OUTER_VAR
can reference individual tuple slot.
Also, custom-node performs as scan node with one underlying relations,
here is also no matter because all the Var nodes in the target-list
references attributes of a particular relation.
An confusing scenarios is that custom-node performs as scan node as an
alternative of built-in join, thus Var-nodes in the target-list may
reference multiple relations, however, it will have only one tupleslot
as like remote-join in postgres_fdw doing.
In this case, we may be able to use existing INNER/OUTER/INDEX var if
we renumber the varattno and use an appropriate slot, instead of adding
a special varno for this.

I'd like to investigate it little more, but I'm inclined to conclude
CUSTOM_VAR might not be necessary, as you suggested.

* Why is there more than one call site for add_scan_path_hook? I
don't see the need for the calling macro with the randomly
inconsistent name, either.

Where is the best place to do? Even though I cannot imagine the
situation to run sub-query or cte by extensions, its path is
constructed during set_rel_size(), so I had to put the hook for each
set_xxxx_pathlist() functions.

Hm. We could still call the hook in set_rel_pathlist, if we were to get
rid of the individual calls to set_cheapest and do that in one spot at the
bottom of set_rel_pathlist (after the hook call). Calling set_cheapest
in one place seems more consistent anyway.

OK, I'll try to move the set_cheapest() to set_rel_pathlist from the
current positions; being distributed to several functions.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#92

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Kouhei Kaigai (#91)

Re: Custom Scan APIs (Re: Custom Plan node)

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

I expected to include simple function pointers for copying and text-output
as follows:

typedef struct {
Plan plan;
:
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
} Custom;

I was thinking more like

typedef struct CustomPathFuncs {
const char *name; /* used for debugging purposes only */
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
... etc etc etc ...
} CustomPathFuncs;

typedef struct CustomPath {
Path path;
const CustomPathFuncs *funcs;
... maybe a few more fields here, but not too darn many ...
} CustomPath;

and similarly for CustomPlan.

The advantage of this way is it's very cheap for (what I expect will be)
the common case where an extension has a fixed set of support functions
for its custom paths and plans. It just declares a static constant
CustomPathFuncs struct, and puts a pointer to that into its paths.

If an extension really needs to set the support functions on a per-object
basis, it can do this:

typdef struct MyCustomPath {
CustomPath cpath;
CustomPathFuncs funcs;
... more fields ...
} MyCustomPath;

and then initialization of a MyCustomPath would include

mypath->cpath.funcs = &mypath->funcs;
mypath->funcs.node_copy = MyCustomPathCopy;
... etc etc ...

In this case we're arguably wasting one pointer worth of space in the
path, but considering the number of function pointers such a path will be
carrying, I don't think that's much of an objection.

So? If you did that, then you wouldn't have renumbered the Vars as
INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if it
is needed, then there would also be a need for an additional tuple slot
in executor contexts, which you haven't provided.

For example, the enhanced postgres_fdw fetches the result set of
remote join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?

Not sure what we'd do for the general case, but CUSTOM_VAR isn't the
solution. Consider for example a join where both tables supply columns
named "id" --- if you put them both in one tupledesc then there's no
non-kluge way to identify them.

Possibly the route to a solution involves adding another plan-node
callback function that ruleutils.c would use for printing Vars in custom
join nodes. Or maybe we could let the Vars keep their original RTE
numbers, though that would complicate life at execution time.

Anyway, if we're going to punt on add_join_path_hook for the time
being, this problem can probably be left to solve later. It won't
arise for simple table-scan cases, nor for single-input plan nodes
such as sorts.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Tom Lane (#92)

Re: Custom Scan APIs (Re: Custom Plan node)

I was thinking more like

typedef struct CustomPathFuncs {
const char *name; /* used for debugging purposes only */
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
... etc etc etc ...
} CustomPathFuncs;

typedef struct CustomPath {
Path path;
const CustomPathFuncs *funcs;
... maybe a few more fields here, but not too darn many ...
} CustomPath;

and similarly for CustomPlan.

The advantage of this way is it's very cheap for (what I expect will be)
the common case where an extension has a fixed set of support functions
for its custom paths and plans. It just declares a static constant
CustomPathFuncs struct, and puts a pointer to that into its paths.

If an extension really needs to set the support functions on a per-object
basis, it can do this:

typdef struct MyCustomPath {
CustomPath cpath;
CustomPathFuncs funcs;
... more fields ...
} MyCustomPath;

and then initialization of a MyCustomPath would include

mypath->cpath.funcs = &mypath->funcs;
mypath->funcs.node_copy = MyCustomPathCopy;
... etc etc ...

In this case we're arguably wasting one pointer worth of space in the path,
but considering the number of function pointers such a path will be carrying,
I don't think that's much of an objection.

That is exactly same as my expectation, and no objection here.
Thanks for your clarification.

So? If you did that, then you wouldn't have renumbered the Vars as
INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if
it is needed, then there would also be a need for an additional tuple
slot in executor contexts, which you haven't provided.

For example, the enhanced postgres_fdw fetches the result set of
remote join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?
Not sure what we'd do for the general case, but CUSTOM_VAR isn't the solution.
Consider for example a join where both tables supply columns named "id"
--- if you put them both in one tupledesc then there's no non-kluge way
to identify them.
Possibly the route to a solution involves adding another plan-node callback
function that ruleutils.c would use for printing Vars in custom join nodes.
Or maybe we could let the Vars keep their original RTE numbers, though that
would complicate life at execution time.

My preference is earlier one, because complication in execution time may
make performance degradation.
Once two tuples get joined in custom-node, only extension can know which
relation is the origin of a particular attribute in the unified tuple.
So, it seems to me reasonable extension has to provide a hint to resolve
the Var naming.
Probably, another callback that provides a translation table from a Var
node that reference custom-plan but originated from either of subtree.
(It looks like a translated_vars in prepunion.c)

For example, let's assume here is a Var node with INDEX_VAR in the tlist
of custom-plan. It eventually references ecxt_scantuple in the execution
time, and this tuple-slot will keep a joined tuple being originated from
two relations. If its varattno=9 came from the column varno=1/varatno=3,
we like to print its original name. If we can have a translation table
like translated_vars, it allows to translate an attribute number on the
custom-plan into its original ones.
Even it might be abuse of INDEX_VAR, it seems to me a good idea.
Also, I don't like to re-define the meaning of INNER_VAR/OUTER_VAR
because custom-plan may have both of left-/right-subtree, thus it makes
sense to support a case when both of tupleslots are available.

Anyway, if we're going to punt on add_join_path_hook for the time being,
this problem can probably be left to solve later. It won't arise for simple
table-scan cases, nor for single-input plan nodes such as sorts.

Yes, it is a problem if number of input plans is larger then 1.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#94

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Tom Lane (#92)

2 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Hello,

The attached two patches are the revised custom-plan interface
and example usage that implements existing MergeJoin on top of
this interface.

According to the discussion last week, I revised the portion
where custom-node is expected to perform a particular kind of
task, like scanning a relation, by putting polymorphism with
a set of callbacks set by custom-plan provider.
So, the core backend can handle this custom-plan node just
an abstracted plan-node with no anticipation.
Even though the subject of this message says "custom-scan",
I'd like to name the interface "custom-plan" instead, because
it became fully arbitrary of extension whether it scan on
a particular relation.

Definition of CustomXXXX data types were simplified:

typedef struct CustomPath
{
Path path;
const struct CustomPathMethods *methods;
} CustomPath;

typedef struct CustomPlan
{
Plan plan;
const struct CustomPlanMethods *methods;
} CustomPlan;

typedef struct CustomPlanState
{
PlanState ps;
const CustomPlanMethods *methods;
} CustomPlanState;

Each types have a base class and a set of function pointers that
characterize the behavior of this custom-plan node.
In usual use-cases, extension is expected to extend these classes
to keep their private data fields needed to implement its own
functionalities.

Most of the methods are designed to work as a thin layer towards
existing planner / executor functions, so custom-plan provides
has to be responsible to implement its method to communicate with
core backend as built-in ones doing.

Regarding to the topic we discussed last week,

* CUSTOM_VAR has gone.
The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
command output (including column names being referenced) even if
a custom-plan node replaced a join but has no underlying subplans
on left/right subtrees.
A typical situation like this is a remote-join implementation that
I tried to extend postgres_fdw on top of the previous interface.
It retrieves a flat result set of the remote join execution, thus
has no subplan locally. On the other hand, EXPLAIN tries to find
out "actual" Var node from the underlying subplan if a Var node
has special varno (INNER/OUTER/INDEX).
I put a special method to solve the problem. GetSpecialCustomVar
method is called if a certain Var node of custom-plan has a special
varno, then custom-plan provider can inform the core backend
an expression node to be referenced by this Var node.
It allows to solve the column name without recursive walking on
the subtrees, so it enables a custom-plan node that replaces
a part of plan-tree.
This method is optional, so available to adopt existing way if
custom-plan provider does not do anything special.

* Functions to be exposed, from static declaration

Right now, static functions are randomly exposed on demand.
So, we need more investigation which functions are needed, and
which others are not.
According to my trial, the part-2 patch that is MergeJoin on top
of the custom-plan interface, class of functions that recursively
walk on subplan tree have to be exposed. Like, ExplainPreScanNode,
create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan.
In case when custom-plan performs like built-in Append node, it
keeps a list of sub-plans in its private field, so the core backend
cannot know existence of sub-plans, thus its unavailable to make
subplan, unavailable to output EXPLAIN and so on.
It does not make sense to reworking on the extension side again.
Also, createplan.c has many useful functions to construct plan-node,
however, most of them are static because all the built-in plan-node
are constructed by the routines in this file, we didn't need to
expose them to others. I think, functions in createplan.c being
called by create_xxxx_plan() functions to construct plan-node should
be exposed for extension's convenient.

* Definition of add_join_path_hook

I didn't have idea to improve the definition and location of this
hook, so it is still on the tail of the add_paths_to_joinrel().
Its definition was a bit adjusted according to the feedback on
the pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
from the argument list. Indeed, these are specific to the built-in
MergeJoin logic and easy to reproduce.

* Hook location of add_scan_path_hook

I moved the add_scan_path_hook and set_cheapest() into
set_base_rel_pathlists() from various caller locations;
set_xxxx_pathlist() functions typically.
It enabled to consolidate the location to add custom-path for base
relations.

* CustomMergeJoin as a proof-of-concept

The contrib module in the part-2 portion is, a merge-join implementation
on top of custom-plan interface, even though 99% of its implementation is
identical with built-in ones.
Its purpose is to demonstrate a custom join logic can be implemented using
custom-plan interface, even if custom-plan node has underlying sub-plans
unlike previous my examples.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, March 07, 2014 3:09 AM
To: Kaigai Kouhei(海外浩平)
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

I expected to include simple function pointers for copying and
text-output as follows:

typedef struct {
Plan plan;
:
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
} Custom;

I was thinking more like

typedef struct CustomPathFuncs {
const char *name; /* used for debugging purposes only */
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
... etc etc etc ...
} CustomPathFuncs;

typedef struct CustomPath {
Path path;
const CustomPathFuncs *funcs;
... maybe a few more fields here, but not too darn many ...
} CustomPath;

and similarly for CustomPlan.

The advantage of this way is it's very cheap for (what I expect will be)
the common case where an extension has a fixed set of support functions
for its custom paths and plans. It just declares a static constant
CustomPathFuncs struct, and puts a pointer to that into its paths.

If an extension really needs to set the support functions on a per-object
basis, it can do this:

typdef struct MyCustomPath {
CustomPath cpath;
CustomPathFuncs funcs;
... more fields ...
} MyCustomPath;

and then initialization of a MyCustomPath would include

mypath->cpath.funcs = &mypath->funcs;
mypath->funcs.node_copy = MyCustomPathCopy;
... etc etc ...

In this case we're arguably wasting one pointer worth of space in the path,
but considering the number of function pointers such a path will be carrying,
I don't think that's much of an objection.

So? If you did that, then you wouldn't have renumbered the Vars as
INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if
it is needed, then there would also be a need for an additional tuple
slot in executor contexts, which you haven't provided.

For example, the enhanced postgres_fdw fetches the result set of
remote join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?
Not sure what we'd do for the general case, but CUSTOM_VAR isn't the solution.
Consider for example a join where both tables supply columns named "id"
--- if you put them both in one tupledesc then there's no non-kluge way
to identify them.
Possibly the route to a solution involves adding another plan-node callback
function that ruleutils.c would use for printing Vars in custom join nodes.
Or maybe we could let the Vars keep their original RTE numbers, though that
would complicate life at execution time.

Anyway, if we're going to punt on add_join_path_hook for the time being,
this problem can probably be left to solve later. It won't arise for simple
table-scan cases, nor for single-input plan nodes such as sorts.

regards, tom lane

Attachments:

pgsql-v9.4-custom-scan.part-2.v10.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v10.patchDownload

 contrib/custmj/Makefile            |   17 +
 contrib/custmj/createplan.c        |  435 +++++++++
 contrib/custmj/custmj.c            |  691 +++++++++++++++
 contrib/custmj/custmj.h            |  148 ++++
 contrib/custmj/expected/custmj.out |  378 ++++++++
 contrib/custmj/joinpath.c          |  988 +++++++++++++++++++++
 contrib/custmj/nodeMergejoin.c     | 1694 ++++++++++++++++++++++++++++++++++++
 contrib/custmj/setrefs.c           |  326 +++++++
 contrib/custmj/sql/custmj.sql      |   79 ++
 9 files changed, 4756 insertions(+)

diff --git a/contrib/custmj/Makefile b/contrib/custmj/Makefile
new file mode 100644
index 0000000..9b264d4
--- /dev/null
+++ b/contrib/custmj/Makefile
@@ -0,0 +1,17 @@
+# contrib/custmj/Makefile
+
+MODULE_big = custmj
+OBJS = custmj.o joinpath.o createplan.o setrefs.o nodeMergejoin.o
+
+REGRESS = custmj
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/custmj
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/custmj/createplan.c b/contrib/custmj/createplan.c
new file mode 100644
index 0000000..e522d73
--- /dev/null
+++ b/contrib/custmj/createplan.c
@@ -0,0 +1,435 @@
+/*-------------------------------------------------------------------------
+ *
+ * createplan.c
+ *	  Routines to create the desired plan for processing a query.
+ *	  Planning is complete, we just need to convert the selected
+ *	  Path into a Plan.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/plan/createplan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <limits.h>
+#include <math.h>
+
+#include "access/skey.h"
+#include "catalog/pg_class.h"
+#include "foreign/fdwapi.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
+#include "optimizer/var.h"
+#include "parser/parse_clause.h"
+#include "parser/parsetree.h"
+#include "utils/lsyscache.h"
+#include "custmj.h"
+
+static MergeJoin *make_mergejoin(List *tlist,
+			   List *joinclauses, List *otherclauses,
+			   List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   Plan *lefttree, Plan *righttree,
+			   JoinType jointype);
+static Material *make_material(Plan *lefttree);
+
+/*
+ * create_gating_plan
+ *	  Deal with pseudoconstant qual clauses
+ *
+ * If the node's quals list includes any pseudoconstant quals, put them
+ * into a gating Result node atop the already-built plan.  Otherwise,
+ * return the plan as-is.
+ *
+ * Note that we don't change cost or size estimates when doing gating.
+ * The costs of qual eval were already folded into the plan's startup cost.
+ * Leaving the size alone amounts to assuming that the gating qual will
+ * succeed, which is the conservative estimate for planning upper queries.
+ * We certainly don't want to assume the output size is zero (unless the
+ * gating qual is actually constant FALSE, and that case is dealt with in
+ * clausesel.c).  Interpolating between the two cases is silly, because
+ * it doesn't reflect what will really happen at runtime, and besides which
+ * in most cases we have only a very bad idea of the probability of the gating
+ * qual being true.
+ */
+Plan *
+create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
+{
+	List	   *pseudoconstants;
+
+	/* Sort into desirable execution order while still in RestrictInfo form */
+	quals = order_qual_clauses(root, quals);
+
+	/* Pull out any pseudoconstant quals from the RestrictInfo list */
+	pseudoconstants = extract_actual_clauses(quals, true);
+
+	if (!pseudoconstants)
+		return plan;
+
+	return (Plan *) make_result(root,
+								plan->targetlist,
+								(Node *) pseudoconstants,
+								plan);
+}
+
+MergeJoin *
+create_mergejoin_plan(PlannerInfo *root,
+					  CustomMergePath *best_path,
+					  Plan *outer_plan,
+					  Plan *inner_plan)
+{
+	List	   *tlist = build_path_tlist(root, &best_path->cpath.path);
+	List	   *joinclauses;
+	List	   *otherclauses;
+	List	   *mergeclauses;
+	List	   *outerpathkeys;
+	List	   *innerpathkeys;
+	int			nClauses;
+	Oid		   *mergefamilies;
+	Oid		   *mergecollations;
+	int		   *mergestrategies;
+	bool	   *mergenullsfirst;
+	MergeJoin  *join_plan;
+	int			i;
+	ListCell   *lc;
+	ListCell   *lop;
+	ListCell   *lip;
+
+	/* Sort join qual clauses into best execution order */
+	/* NB: do NOT reorder the mergeclauses */
+	joinclauses = order_qual_clauses(root, best_path->joinrestrictinfo);
+
+	/* Get the join qual clauses (in plain expression form) */
+	/* Any pseudoconstant clauses are ignored here */
+	if (IS_OUTER_JOIN(best_path->jointype))
+	{
+		extract_actual_join_clauses(joinclauses,
+									&joinclauses, &otherclauses);
+	}
+	else
+	{
+		/* We can treat all clauses alike for an inner join */
+		joinclauses = extract_actual_clauses(joinclauses, false);
+		otherclauses = NIL;
+	}
+
+	/*
+	 * Remove the mergeclauses from the list of join qual clauses, leaving the
+	 * list of quals that must be checked as qpquals.
+	 */
+	mergeclauses = get_actual_clauses(best_path->path_mergeclauses);
+	joinclauses = list_difference(joinclauses, mergeclauses);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params.  There
+	 * should not be any in the mergeclauses.
+	 */
+	if (best_path->cpath.path.param_info)
+	{
+		joinclauses = (List *)
+			replace_nestloop_params(root, (Node *) joinclauses);
+		otherclauses = (List *)
+			replace_nestloop_params(root, (Node *) otherclauses);
+	}
+
+	/*
+	 * Rearrange mergeclauses, if needed, so that the outer variable is always
+	 * on the left; mark the mergeclause restrictinfos with correct
+	 * outer_is_left status.
+	 */
+	mergeclauses = get_switched_clauses(best_path->path_mergeclauses,
+							 best_path->outerjoinpath->parent->relids);
+
+	/*
+	 * Create explicit sort nodes for the outer and inner paths if necessary.
+	 * Make sure there are no excess columns in the inputs if sorting.
+	 */
+	if (best_path->outersortkeys)
+	{
+		disuse_physical_tlist(root, outer_plan, best_path->outerjoinpath);
+		outer_plan = (Plan *)
+			make_sort_from_pathkeys(root,
+									outer_plan,
+									best_path->outersortkeys,
+									-1.0);
+		outerpathkeys = best_path->outersortkeys;
+	}
+	else
+		outerpathkeys = best_path->outerjoinpath->pathkeys;
+
+	if (best_path->innersortkeys)
+	{
+		disuse_physical_tlist(root, inner_plan, best_path->innerjoinpath);
+		inner_plan = (Plan *)
+			make_sort_from_pathkeys(root,
+									inner_plan,
+									best_path->innersortkeys,
+									-1.0);
+		innerpathkeys = best_path->innersortkeys;
+	}
+	else
+		innerpathkeys = best_path->innerjoinpath->pathkeys;
+
+	/*
+	 * If specified, add a materialize node to shield the inner plan from the
+	 * need to handle mark/restore.
+	 */
+	if (best_path->materialize_inner)
+	{
+		Plan	   *matplan = (Plan *) make_material(inner_plan);
+
+		/*
+		 * We assume the materialize will not spill to disk, and therefore
+		 * charge just cpu_operator_cost per tuple.  (Keep this estimate in
+		 * sync with final_cost_mergejoin.)
+		 */
+		copy_plan_costsize(matplan, inner_plan);
+		matplan->total_cost += cpu_operator_cost * matplan->plan_rows;
+
+		inner_plan = matplan;
+	}
+
+	/*
+	 * Compute the opfamily/collation/strategy/nullsfirst arrays needed by the
+	 * executor.  The information is in the pathkeys for the two inputs, but
+	 * we need to be careful about the possibility of mergeclauses sharing a
+	 * pathkey (compare find_mergeclauses_for_pathkeys()).
+	 */
+	nClauses = list_length(mergeclauses);
+	Assert(nClauses == list_length(best_path->path_mergeclauses));
+	mergefamilies = (Oid *) palloc(nClauses * sizeof(Oid));
+	mergecollations = (Oid *) palloc(nClauses * sizeof(Oid));
+	mergestrategies = (int *) palloc(nClauses * sizeof(int));
+	mergenullsfirst = (bool *) palloc(nClauses * sizeof(bool));
+
+	lop = list_head(outerpathkeys);
+	lip = list_head(innerpathkeys);
+	i = 0;
+	foreach(lc, best_path->path_mergeclauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		EquivalenceClass *oeclass;
+		EquivalenceClass *ieclass;
+		PathKey    *opathkey;
+		PathKey    *ipathkey;
+		EquivalenceClass *opeclass;
+		EquivalenceClass *ipeclass;
+		ListCell   *l2;
+
+		/* fetch outer/inner eclass from mergeclause */
+		Assert(IsA(rinfo, RestrictInfo));
+		if (rinfo->outer_is_left)
+		{
+			oeclass = rinfo->left_ec;
+			ieclass = rinfo->right_ec;
+		}
+		else
+		{
+			oeclass = rinfo->right_ec;
+			ieclass = rinfo->left_ec;
+		}
+		Assert(oeclass != NULL);
+		Assert(ieclass != NULL);
+
+		/*
+		 * For debugging purposes, we check that the eclasses match the paths'
+		 * pathkeys.  In typical cases the merge clauses are one-to-one with
+		 * the pathkeys, but when dealing with partially redundant query
+		 * conditions, we might have clauses that re-reference earlier path
+		 * keys.  The case that we need to reject is where a pathkey is
+		 * entirely skipped over.
+		 *
+		 * lop and lip reference the first as-yet-unused pathkey elements;
+		 * it's okay to match them, or any element before them.  If they're
+		 * NULL then we have found all pathkey elements to be used.
+		 */
+		if (lop)
+		{
+			opathkey = (PathKey *) lfirst(lop);
+			opeclass = opathkey->pk_eclass;
+			if (oeclass == opeclass)
+			{
+				/* fast path for typical case */
+				lop = lnext(lop);
+			}
+			else
+			{
+				/* redundant clauses ... must match something before lop */
+				foreach(l2, outerpathkeys)
+				{
+					if (l2 == lop)
+						break;
+					opathkey = (PathKey *) lfirst(l2);
+					opeclass = opathkey->pk_eclass;
+					if (oeclass == opeclass)
+						break;
+				}
+				if (oeclass != opeclass)
+					elog(ERROR, "outer pathkeys do not match mergeclauses");
+			}
+		}
+		else
+		{
+			/* redundant clauses ... must match some already-used pathkey */
+			opathkey = NULL;
+			opeclass = NULL;
+			foreach(l2, outerpathkeys)
+			{
+				opathkey = (PathKey *) lfirst(l2);
+				opeclass = opathkey->pk_eclass;
+				if (oeclass == opeclass)
+					break;
+			}
+			if (l2 == NULL)
+				elog(ERROR, "outer pathkeys do not match mergeclauses");
+		}
+
+		if (lip)
+		{
+			ipathkey = (PathKey *) lfirst(lip);
+			ipeclass = ipathkey->pk_eclass;
+			if (ieclass == ipeclass)
+			{
+				/* fast path for typical case */
+				lip = lnext(lip);
+			}
+			else
+			{
+				/* redundant clauses ... must match something before lip */
+				foreach(l2, innerpathkeys)
+				{
+					if (l2 == lip)
+						break;
+					ipathkey = (PathKey *) lfirst(l2);
+					ipeclass = ipathkey->pk_eclass;
+					if (ieclass == ipeclass)
+						break;
+				}
+				if (ieclass != ipeclass)
+					elog(ERROR, "inner pathkeys do not match mergeclauses");
+			}
+		}
+		else
+		{
+			/* redundant clauses ... must match some already-used pathkey */
+			ipathkey = NULL;
+			ipeclass = NULL;
+			foreach(l2, innerpathkeys)
+			{
+				ipathkey = (PathKey *) lfirst(l2);
+				ipeclass = ipathkey->pk_eclass;
+				if (ieclass == ipeclass)
+					break;
+			}
+			if (l2 == NULL)
+				elog(ERROR, "inner pathkeys do not match mergeclauses");
+		}
+
+		/* pathkeys should match each other too (more debugging) */
+		if (opathkey->pk_opfamily != ipathkey->pk_opfamily ||
+			opathkey->pk_eclass->ec_collation != ipathkey->pk_eclass->ec_collation ||
+			opathkey->pk_strategy != ipathkey->pk_strategy ||
+			opathkey->pk_nulls_first != ipathkey->pk_nulls_first)
+			elog(ERROR, "left and right pathkeys do not match in mergejoin");
+
+		/* OK, save info for executor */
+		mergefamilies[i] = opathkey->pk_opfamily;
+		mergecollations[i] = opathkey->pk_eclass->ec_collation;
+		mergestrategies[i] = opathkey->pk_strategy;
+		mergenullsfirst[i] = opathkey->pk_nulls_first;
+		i++;
+	}
+
+	/*
+	 * Note: it is not an error if we have additional pathkey elements (i.e.,
+	 * lop or lip isn't NULL here).  The input paths might be better-sorted
+	 * than we need for the current mergejoin.
+	 */
+
+	/*
+	 * Now we can build the mergejoin node.
+	 */
+	join_plan = make_mergejoin(tlist,
+							   joinclauses,
+							   otherclauses,
+							   mergeclauses,
+							   mergefamilies,
+							   mergecollations,
+							   mergestrategies,
+							   mergenullsfirst,
+							   outer_plan,
+							   inner_plan,
+							   best_path->jointype);
+
+	/* Costs of sort and material steps are included in path cost already */
+	copy_path_costsize(&join_plan->join.plan, &best_path->cpath.path);
+
+	return join_plan;
+}
+
+static MergeJoin *
+make_mergejoin(List *tlist,
+			   List *joinclauses,
+			   List *otherclauses,
+			   List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   Plan *lefttree,
+			   Plan *righttree,
+			   JoinType jointype)
+{
+	MergeJoin  *node = makeNode(MergeJoin);
+	Plan	   *plan = &node->join.plan;
+
+	/* cost should be inserted by caller */
+	plan->targetlist = tlist;
+	plan->qual = otherclauses;
+	plan->lefttree = lefttree;
+	plan->righttree = righttree;
+	node->mergeclauses = mergeclauses;
+	node->mergeFamilies = mergefamilies;
+	node->mergeCollations = mergecollations;
+	node->mergeStrategies = mergestrategies;
+	node->mergeNullsFirst = mergenullsfirst;
+	node->join.jointype = jointype;
+	node->join.joinqual = joinclauses;
+
+	return node;
+}
+
+static Material *
+make_material(Plan *lefttree)
+{
+	Material   *node = makeNode(Material);
+	Plan	   *plan = &node->plan;
+
+	/* cost should be inserted by caller */
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	return node;
+}
diff --git a/contrib/custmj/custmj.c b/contrib/custmj/custmj.c
new file mode 100644
index 0000000..ef64857
--- /dev/null
+++ b/contrib/custmj/custmj.c
@@ -0,0 +1,691 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/custmj/custmj.c
+ *
+ * Custom version of MergeJoin - an example implementation of MergeJoin
+ * logic on top of Custom-Plan interface, to demonstrate how to use this
+ * interface for joining relations.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "commands/explain.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/nodeFuncs.h"
+#include "executor/executor.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/subselect.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "custmj.h"
+
+PG_MODULE_MAGIC;
+
+/* declaration of local variables */
+static add_join_path_hook_type	add_join_path_orig = NULL;
+bool		enable_custom_mergejoin;
+
+/* callback table of custom merge join */
+CustomPathMethods			custmj_path_methods;
+CustomPlanMethods			custmj_plan_methods;
+
+/*
+ * custmjAddJoinPath
+ *
+ * A callback function to add custom version of merge-join logic towards
+ * the supplied relations join.
+ */
+static void
+custmjAddJoinPath(PlannerInfo *root,
+				  RelOptInfo *joinrel,
+				  RelOptInfo *outerrel,
+				  RelOptInfo *innerrel,
+				  JoinType jointype,
+				  SpecialJoinInfo *sjinfo,
+				  List *restrictlist,
+				  Relids param_source_rels,
+				  Relids extra_lateral_rels)
+{
+	List	   *mergeclause_list = NIL;
+	bool		mergejoin_allowed = true;
+	SemiAntiJoinFactors semifactors;
+
+	if (add_join_path_orig)
+		(*add_join_path_orig)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  param_source_rels,
+							  extra_lateral_rels);
+	/* nothing to do anymore */
+	if (!enable_custom_mergejoin)
+		return;
+
+	/*
+	 * Find potential mergejoin clauses.
+	 */
+   	mergeclause_list = select_mergejoin_clauses(root,
+												joinrel,
+												outerrel,
+												innerrel,
+												restrictlist,
+												jointype,
+												&mergejoin_allowed);
+	if (!mergejoin_allowed)
+		return;
+
+	/*
+     * If it's SEMI or ANTI join, compute correction factors for cost
+     * estimation.  These will be the same for all paths.
+     */
+    if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+        compute_semi_anti_join_factors(root, outerrel, innerrel,
+                                       jointype, sjinfo, restrictlist,
+                                       &semifactors);
+
+	/*
+	 * 1. Consider mergejoin paths where both relations must be explicitly
+	 * sorted.  Skip this if we can't mergejoin.
+	 */
+	sort_inner_and_outer(root, joinrel, outerrel, innerrel,
+						 restrictlist, mergeclause_list, jointype,
+						 sjinfo,
+						 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 2. Consider paths where the outer relation need not be explicitly
+	 * sorted. This includes both nestloops and mergejoins where the outer
+	 * path is already ordered.  Again, skip this if we can't mergejoin.
+	 * (That's okay because we know that nestloop can't handle right/full
+	 * joins at all, so it wouldn't work in the prohibited cases either.)
+	 */
+	match_unsorted_outer(root, joinrel, outerrel, innerrel,
+						 restrictlist, mergeclause_list, jointype,
+						 sjinfo, &semifactors,
+						 param_source_rels, extra_lateral_rels);
+}
+
+/*
+ * CreateCustomMergeJoinPlan
+ *
+ * A method to populate CustomPlan node according to the supplied
+ * CustomPath node; being choosen by the planner.
+ */
+static CustomPlan *
+CreateCustomMergeJoinPlan(PlannerInfo *root, CustomPath *custom_path)
+{
+	CustomMergePath	   *cmpath = (CustomMergePath *) custom_path;
+	CustomMergeJoin	   *cmjoin;
+	MergeJoin		   *mjplan;
+	Plan			   *outer_plan;
+	Plan			   *inner_plan;
+
+	/* plans the underlying relations */
+	outer_plan = create_plan_recurse(root, cmpath->outerjoinpath);
+	inner_plan = create_plan_recurse(root, cmpath->innerjoinpath);
+
+	mjplan = create_mergejoin_plan(root, cmpath, outer_plan, inner_plan);
+
+	/*
+     * If there are any pseudoconstant clauses attached to this node, insert a
+     * gating Result node that evaluates the pseudoconstants as one-time
+     * quals.
+     */
+    if (root->hasPseudoConstantQuals)
+        mjplan = (MergeJoin *)
+			create_gating_plan(root, &mjplan->join.plan,
+							   cmpath->joinrestrictinfo);
+
+	/* construct a CustomMergeJoin plan */
+	cmjoin = palloc0(sizeof(CustomMergeJoin));
+	cmjoin->cplan.plan = mjplan->join.plan;
+	cmjoin->cplan.plan.type = T_CustomPlan;
+	cmjoin->cplan.methods = &custmj_plan_methods;
+	cmjoin->jointype = mjplan->join.jointype;
+	cmjoin->joinqual = mjplan->join.joinqual;
+	cmjoin->mergeclauses = mjplan->mergeclauses;
+	cmjoin->mergeFamilies = mjplan->mergeFamilies;
+	cmjoin->mergeCollations = mjplan->mergeCollations;
+	cmjoin->mergeStrategies = mjplan->mergeStrategies;
+	cmjoin->mergeNullsFirst = mjplan->mergeNullsFirst;
+	pfree(mjplan);
+
+	return &cmjoin->cplan;
+}
+
+/*
+ * TextOutCustomMergeJoinPath
+ *
+ * A method to support nodeToString for CustomPath node
+ */
+static void
+TextOutCustomMergeJoinPath(StringInfo str, Node *node)
+{
+	CustomMergePath	*cmpath = (CustomMergePath *) node;
+	char			*temp;
+
+	/* common fields should be dumped by the core backend */
+	Assert(cmpath->cpath.methods == &custmj_path_methods);
+	appendStringInfo(str, " :jointype %d", cmpath->jointype);
+	temp = nodeToString(cmpath->outerjoinpath);
+	appendStringInfo(str, " :outerjoinpath %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->innerjoinpath);
+	appendStringInfo(str, " :innerjoinpath %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->joinrestrictinfo);
+	appendStringInfo(str, " :joinrestrictinfo %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->path_mergeclauses);
+	appendStringInfo(str, " :path_mergeclauses %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->outersortkeys);
+	appendStringInfo(str, " :outersortkeys %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->innersortkeys);
+	appendStringInfo(str, " :innersortkeys %s", temp);
+	pfree(temp);
+	appendStringInfo(str, " :materialize_inner %s",
+					 cmpath->materialize_inner ? "true" : "false");
+}
+
+/*
+ * SetCustomMergeJoinRef
+ *
+ * A method to adjust varno/varattno in the expression clauses.
+ */
+static void
+SetCustomMergeJoinRef(PlannerInfo *root,
+					  CustomPlan *custom_plan,
+					  int rtoffset)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *) custom_plan;
+	/* overall logic copied from set_join_references() */
+	Plan			*outer_plan = cmjoin->cplan.plan.lefttree;
+	Plan			*inner_plan = cmjoin->cplan.plan.righttree;
+	indexed_tlist	*outer_itlist;
+	indexed_tlist	*inner_itlist;
+
+	outer_itlist = build_tlist_index(outer_plan->targetlist);
+	inner_itlist = build_tlist_index(inner_plan->targetlist);
+
+	/* All join plans have tlist, qual, and joinqual */
+	cmjoin->cplan.plan.targetlist
+		= fix_join_expr(root,
+						cmjoin->cplan.plan.targetlist,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+	cmjoin->cplan.plan.qual
+		= fix_join_expr(root,
+						cmjoin->cplan.plan.qual,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+	cmjoin->joinqual
+		= fix_join_expr(root,
+						cmjoin->joinqual,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+
+	/* Now do join-type-specific stuff */
+	cmjoin->mergeclauses
+		= fix_join_expr(root,
+						cmjoin->mergeclauses,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+
+	/*
+	 * outer_itlist is saved to test GetSpecialCustomVar method; that
+	 * shows actual Var node referenced by special varno in EXPLAIN
+	 * command.
+	 */
+	cmjoin->outer_itlist = outer_itlist;
+
+	pfree(inner_itlist);
+}
+
+/*
+ * FinalizeCustomMergePlan
+ *
+ * A method to 
+ */
+static void
+FinalizeCustomMergePlan(PlannerInfo *root,
+						CustomPlan *custom_plan,
+						Bitmapset **p_paramids,
+						Bitmapset **p_valid_params,
+						Bitmapset **p_scan_params)
+{
+	CustomMergeJoin	   *cmjoin = (CustomMergeJoin *) custom_plan;
+	Bitmapset  *paramids = *p_paramids;
+
+	paramids = finalize_primnode(root,
+								 (Node *) cmjoin->joinqual,
+								 paramids);
+	paramids = finalize_primnode(root,
+								 (Node *) cmjoin->mergeclauses,
+								 paramids);
+	*p_paramids = paramids;
+}
+
+/*
+ * BeginCustomMergeJoin
+ *
+ * A method to populate CustomPlanState node according to the supplied
+ * CustomPlan node, and initialize this execution node itself.
+ */
+static CustomPlanState *
+BeginCustomMergeJoin(CustomPlan *cplan, EState *estate, int eflags)
+{
+	CustomMergeJoin		   *cmplan = (CustomMergeJoin *) cplan;
+	CustomMergeJoinState   *cmjs = palloc0(sizeof(CustomMergeJoinState));
+	MergeJoinState		   *mjs;
+
+	mjs = _ExecInitMergeJoin(cmplan, estate, eflags);
+	cmjs->cps.ps = mjs->js.ps;
+	cmjs->cps.ps.type = T_CustomPlanState;
+	cmjs->cps.methods = &custmj_plan_methods;
+	cmjs->jointype = mjs->js.jointype;
+	cmjs->joinqual = mjs->js.joinqual;
+	cmjs->mj_NumClauses = mjs->mj_NumClauses;
+	cmjs->mj_Clauses = mjs->mj_Clauses;
+	cmjs->mj_JoinState = mjs->mj_JoinState;
+	cmjs->mj_ExtraMarks = mjs->mj_ExtraMarks;
+	cmjs->mj_ConstFalseJoin = mjs->mj_ConstFalseJoin;
+	cmjs->mj_FillOuter = mjs->mj_FillOuter;
+	cmjs->mj_FillInner = mjs->mj_FillInner;
+	cmjs->mj_MatchedOuter = mjs->mj_MatchedOuter;
+	cmjs->mj_MatchedInner = mjs->mj_MatchedInner;
+	cmjs->mj_OuterTupleSlot = mjs->mj_OuterTupleSlot;
+	cmjs->mj_InnerTupleSlot = mjs->mj_InnerTupleSlot;
+	cmjs->mj_MarkedTupleSlot = mjs->mj_MarkedTupleSlot;
+	cmjs->mj_NullOuterTupleSlot = mjs->mj_NullOuterTupleSlot;
+	cmjs->mj_NullInnerTupleSlot = mjs->mj_NullInnerTupleSlot;
+	cmjs->mj_OuterEContext = mjs->mj_OuterEContext;
+	cmjs->mj_InnerEContext = mjs->mj_InnerEContext;
+	pfree(mjs);
+
+	/*
+	 * MEMO: In case when a custom-plan node replace a join by a scan,
+	 * like a situation to implement remote-join stuff that receives
+	 * a joined relation and scan on it, the extension should adjust
+	 * varno / varattno of Var nodes in the targetlist of PlanState,
+	 * instead of Plan.
+	 * Because the executor evaluates expression nodes in the targetlist
+	 * of PlanState, but EXPLAIN command shows Var names according to
+	 * the targetlist of Plan, it shall not work if you adjusted the
+	 * targetlist to reference the ecxt_scantuple of ExprContext.
+	 */
+
+	return &cmjs->cps;
+}
+
+/*
+ * ExecCustomMergeJoin
+ *
+ * A method to run this execution node
+ */
+static TupleTableSlot *
+ExecCustomMergeJoin(CustomPlanState *node)
+{
+	return _ExecMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * EndCustomMergeJoin
+ *
+ * A method to end this execution node
+ */
+static void
+EndCustomMergeJoin(CustomPlanState *node)
+{
+	_ExecEndMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * ReScanCustomMergeJoin
+ *
+ * A method to rescan this execution node
+ */
+static void
+ReScanCustomMergeJoin(CustomPlanState *node)
+{
+	_ExecReScanMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * ExplainCustomMergeJoinTargetRel
+ *
+ * A method to show target relation in EXPLAIN command.
+ */
+static void
+ExplainCustomMergeJoinTargetRel(CustomPlanState *node,
+								ExplainState *es)
+{
+	CustomMergeJoinState *cmjs = (CustomMergeJoinState *) node;
+	const char *jointype;
+
+	switch (cmjs->jointype)
+	{
+		case JOIN_INNER:
+			jointype = "Inner";
+			break;
+		case JOIN_LEFT:
+			jointype = "Left";
+			break;
+		case JOIN_FULL:
+			jointype = "Full";
+			break;
+		case JOIN_RIGHT:
+			jointype = "Right";
+			break;
+		case JOIN_SEMI:
+			jointype = "Semi";
+			break;
+		case JOIN_ANTI:
+			jointype = "Anti";
+			break;
+		default:
+			jointype = "???";
+			break;
+	}
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		if (cmjs->jointype != JOIN_INNER)
+			appendStringInfo(es->str, " %s Join", jointype);
+		else
+			appendStringInfoString(es->str, " Join");
+	}
+	else
+		ExplainPropertyText("Join Type", jointype, es);
+}
+
+/* a function copied from explain.c */
+static void
+show_upper_qual(List *qual, const char *qlabel,
+				PlanState *planstate, List *ancestors,
+				ExplainState *es)
+{
+	bool	useprefix = (list_length(es->rtable) > 1 || es->verbose);
+	Node   *node;
+	List   *context;
+    char   *exprstr;
+
+	/* No work if empty qual */
+	if (qual == NIL)
+		return;
+
+	/* Convert AND list to explicit AND */
+	node = (Node *) make_ands_explicit(qual);
+
+	/* And show it */
+	context = deparse_context_for_planstate((Node *) planstate,
+                                            ancestors,
+                                            es->rtable,
+                                            es->rtable_names);
+	exprstr = deparse_expression(node, context, useprefix, false);
+
+	ExplainPropertyText(qlabel, exprstr, es);
+}
+
+/* a function copied from explain.c */
+static void
+show_instrumentation_count(const char *qlabel, int which,
+                           PlanState *planstate, ExplainState *es)
+{
+	double		nfiltered;
+	double		nloops;
+
+	if (!es->analyze || !planstate->instrument)
+		return;
+
+	if (which == 2)
+		nfiltered = planstate->instrument->nfiltered2;
+	else
+		nfiltered = planstate->instrument->nfiltered1;
+	nloops = planstate->instrument->nloops;
+
+	/* In text mode, suppress zero counts; they're not interesting enough */
+	if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		if (nloops > 0)
+			ExplainPropertyFloat(qlabel, nfiltered / nloops, 0, es);
+		else
+			ExplainPropertyFloat(qlabel, 0.0, 0, es);
+	}
+}
+
+/*
+ * ExplainCustomMergeJoin
+ *
+ * A method to construct EXPLAIN output.
+ */
+static void
+ExplainCustomMergeJoin(CustomPlanState *node,
+					   List *ancestors,
+					   ExplainState *es)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *)node->ps.plan;
+
+	show_upper_qual(cmjoin->mergeclauses,
+					"Merge Cond", &node->ps, ancestors, es);
+	show_upper_qual(cmjoin->joinqual,
+					"Join Filter", &node->ps, ancestors, es);
+	if (cmjoin->joinqual)
+		show_instrumentation_count("Rows Removed by Join Filter", 1,
+								   &node->ps, es);
+	show_upper_qual(cmjoin->cplan.plan.qual,
+					"Filter", &node->ps, ancestors, es);
+	if (cmjoin->cplan.plan.qual)
+		show_instrumentation_count("Rows Removed by Filter", 2,
+								   &node->ps, es);
+}
+
+/*
+ * GetRelidsCustomMergeJoin
+ *
+ * A method to inform underlying range-table indexes.
+ */
+static Bitmapset *
+GetRelidsCustomMergeJoin(CustomPlanState *node)
+{
+	Bitmapset  *result = NULL;
+
+	if (outerPlanState(&node->ps))
+		ExplainPreScanNode(outerPlanState(&node->ps), &result);
+	if (innerPlanState(&node->ps))
+		ExplainPreScanNode(innerPlanState(&node->ps), &result);
+
+	return result;
+}
+
+/*
+ * GetSpecialCustomMergeVar
+ *
+ * Test handler of GetSpecialCustomVar method.
+ * In case when a custom-plan node replaced a join node but does not have
+ * two underlying sub-plan, like a remote join feature that retrieves one
+ * flat result set, EXPLAIN command cannot resolve name of the columns
+ * being referenced by special varno (INNER_VAR, OUTER_VAR or INDEX_VAR)
+ * because it tries to walk on the underlying sub-plan to be thre.
+ * However, such kind of custom-plan node does not have, because it replaces
+ * a part of plan sub-tree by one custom-plan node. In this case, custom-
+ * plan provider has to return an expression node that is referenced by
+ * the Var node with special varno.
+ */
+static Node *
+GetSpecialCustomMergeVar(CustomPlanState *cpstate, Var *varnode)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *)cpstate->ps.plan;
+	indexed_tlist	*itlist;
+	int		i;
+
+	if (varnode->varno != OUTER_VAR)
+		return NULL;
+
+	itlist = cmjoin->outer_itlist;
+	for (i=0; i < itlist->num_vars; i++)
+	{
+		if (itlist->vars[i].resno == varnode->varattno)
+		{
+			Var	   *newnode = copyObject(varnode);
+
+			newnode->varno = itlist->vars[i].varno;
+			newnode->varattno = itlist->vars[i].varattno;
+
+			elog(DEBUG2, "%s: (OUTER_VAR,%d) is reference to (%d,%d)",
+				 __FUNCTION__,
+				 varnode->varattno, newnode->varno, newnode->varattno);
+
+			return (Node *) newnode;
+		}
+	}
+	elog(ERROR, "outer_itlist has no entry for Var: %s",
+		 nodeToString(varnode));
+	return NULL;
+}
+
+/*
+ * TextOutCustomMergeJoin
+ *		nodeToString() support in CustomMergeJoin
+ */
+static void
+TextOutCustomMergeJoin(StringInfo str, const CustomPlan *node)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *) node;
+	char   *temp;
+	int		i, num;
+
+	/* common fields should be dumped by the core backend */
+	Assert(cmjoin->cplan.methods == &custmj_plan_methods);
+	appendStringInfo(str, " :jointype %d", cmjoin->jointype);
+	temp = nodeToString(cmjoin->joinqual);
+	appendStringInfo(str, " :joinqual %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmjoin->mergeclauses);
+	appendStringInfo(str, " :mergeclauses %s", temp);
+	pfree(temp);
+
+	num = list_length(cmjoin->mergeclauses);
+	appendStringInfoString(str, " :mergeFamilies");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %u", cmjoin->mergeFamilies[i]);
+	appendStringInfoString(str, " :mergeCollations");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %u", cmjoin->mergeCollations[i]);
+	appendStringInfoString(str, " :mergeStrategies");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %d", cmjoin->mergeStrategies[i]);
+	appendStringInfoString(str, " :mergeNullsFirst");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %d", (int) cmjoin->mergeNullsFirst[i]);
+}
+
+/*
+ * CopyCustomMergeJoin
+ *		copyObject() support in CustomMergeJoin
+ */
+static CustomPlan *
+CopyCustomMergeJoin(const CustomPlan *from)
+{
+	const CustomMergeJoin *oldnode = (const CustomMergeJoin *) from;
+	CustomMergeJoin *newnode  = palloc(sizeof(CustomMergeJoin));
+	int		num;
+
+	/* copying the common fields */
+	CopyCustomPlanCommon((const Node *) oldnode, (Node *) newnode);
+
+	newnode->jointype = oldnode->jointype;
+	newnode->joinqual = copyObject(oldnode->joinqual);
+	newnode->mergeclauses = copyObject(oldnode->mergeclauses);
+	num = list_length(oldnode->mergeclauses);
+	newnode->mergeFamilies = palloc(sizeof(Oid) * num);
+	memcpy(newnode->mergeFamilies,
+		   oldnode->mergeFamilies,
+		   sizeof(Oid) * num);
+	newnode->mergeCollations = palloc(sizeof(Oid) * num);
+	memcpy(newnode->mergeCollations,
+		   oldnode->mergeCollations,
+		   sizeof(Oid) * num);
+	newnode->mergeStrategies = palloc(sizeof(int) * num);
+	memcpy(newnode->mergeStrategies,
+		   oldnode->mergeStrategies,
+		   sizeof(int) * num);
+	newnode->mergeNullsFirst = palloc(sizeof(bool) * num);
+	memcpy(newnode->mergeNullsFirst,
+		   oldnode->mergeNullsFirst,
+		   sizeof(bool) * num);
+	num = oldnode->outer_itlist->num_vars;
+	newnode->outer_itlist = palloc(offsetof(indexed_tlist, vars[num]));
+	memcpy(newnode->outer_itlist,
+		   oldnode->outer_itlist,
+		   offsetof(indexed_tlist, vars[num]));
+
+	return &newnode->cplan;
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	/* "custnl.enabled" to control availability of this module */
+	DefineCustomBoolVariable("enable_custom_mergejoin",
+							 "enables the planner's use of custom merge join",
+							 NULL,
+							 &enable_custom_mergejoin,
+							 true,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* methods of CustomMergeJoinPath */
+	memset(&custmj_path_methods, 0, sizeof(CustomPathMethods));
+	custmj_path_methods.CustomName = "CustomMergeJoin";
+	custmj_path_methods.CreateCustomPlan = CreateCustomMergeJoinPlan;
+	custmj_path_methods.TextOutCustomPath = TextOutCustomMergeJoinPath;
+
+	/* methods of CustomMergeJoinPlan */
+	memset(&custmj_plan_methods, 0, sizeof(CustomPlanMethods));
+	custmj_plan_methods.CustomName = "CustomMergeJoin";
+	custmj_plan_methods.SetCustomPlanRef = SetCustomMergeJoinRef;
+	custmj_plan_methods.SupportBackwardScan = NULL;
+	custmj_plan_methods.FinalizeCustomPlan = FinalizeCustomMergePlan;
+	custmj_plan_methods.BeginCustomPlan = BeginCustomMergeJoin;
+	custmj_plan_methods.ExecCustomPlan = ExecCustomMergeJoin;
+	custmj_plan_methods.EndCustomPlan = EndCustomMergeJoin;
+	custmj_plan_methods.ReScanCustomPlan = ReScanCustomMergeJoin;
+	custmj_plan_methods.ExplainCustomPlanTargetRel
+		= ExplainCustomMergeJoinTargetRel;
+	custmj_plan_methods.ExplainCustomPlan = ExplainCustomMergeJoin;
+	custmj_plan_methods.GetRelidsCustomPlan = GetRelidsCustomMergeJoin;
+	custmj_plan_methods.GetSpecialCustomVar = GetSpecialCustomMergeVar;
+	custmj_plan_methods.TextOutCustomPlan = TextOutCustomMergeJoin;
+	custmj_plan_methods.CopyCustomPlan = CopyCustomMergeJoin;
+
+	/* hook registration */
+	add_join_path_orig = add_join_path_hook;
+	add_join_path_hook = custmjAddJoinPath;
+
+	elog(INFO, "MergeJoin logic on top of CustomPlan interface");
+}
diff --git a/contrib/custmj/custmj.h b/contrib/custmj/custmj.h
new file mode 100644
index 0000000..732bbff
--- /dev/null
+++ b/contrib/custmj/custmj.h
@@ -0,0 +1,148 @@
+/*
+ * definitions related to custom version of merge join
+ */
+#ifndef CUSTMJ_H
+#define CUSTMJ_H
+#include "nodes/nodes.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+typedef struct
+{
+	CustomPath	cpath;
+	/* fields come from JoinPath */
+	JoinType    jointype;
+    Path       *outerjoinpath;  /* path for the outer side of the join */
+    Path       *innerjoinpath;  /* path for the inner side of the join */
+    List       *joinrestrictinfo;       /* RestrictInfos to apply to join */
+	/* fields come from MergePath */
+	List       *path_mergeclauses;      /* join clauses to be used for merge */
+	List       *outersortkeys;  /* keys for explicit sort, if any */
+	List       *innersortkeys;  /* keys for explicit sort, if any */
+	bool        materialize_inner;      /* add Materialize to inner? */
+} CustomMergePath;
+
+struct indexed_tlist;
+
+typedef struct
+{
+	CustomPlan	cplan;
+	/* fields come from Join */
+	JoinType	jointype;
+	List	   *joinqual;
+	/* fields come from MergeJoin */
+	List	   *mergeclauses;   /* mergeclauses as expression trees */
+	/* these are arrays, but have the same length as the mergeclauses list: */
+	Oid		   *mergeFamilies;  /* per-clause OIDs of btree opfamilies */
+	Oid		   *mergeCollations;    /* per-clause OIDs of collations */
+	int		   *mergeStrategies;    /* per-clause ordering (ASC or DESC) */
+	bool	   *mergeNullsFirst;    /* per-clause nulls ordering */
+	/* for transvar testing */
+	struct indexed_tlist *outer_itlist;
+} CustomMergeJoin;
+
+typedef struct
+{
+	CustomPlanState	cps;
+	/* fields come from JoinState */
+	JoinType	jointype;
+	List	   *joinqual;		/* JOIN quals (in addition to ps.qual) */
+	/* fields come from MergeJoinState */
+	int			mj_NumClauses;
+	MergeJoinClause mj_Clauses; /* array of length mj_NumClauses */
+	int			mj_JoinState;
+	bool		mj_ExtraMarks;
+	bool		mj_ConstFalseJoin;
+	bool		mj_FillOuter;
+	bool		mj_FillInner;
+	bool		mj_MatchedOuter;
+	bool		mj_MatchedInner;
+	TupleTableSlot *mj_OuterTupleSlot;
+	TupleTableSlot *mj_InnerTupleSlot;
+	TupleTableSlot *mj_MarkedTupleSlot;
+	TupleTableSlot *mj_NullOuterTupleSlot;
+	TupleTableSlot *mj_NullInnerTupleSlot;
+	ExprContext *mj_OuterEContext;
+	ExprContext *mj_InnerEContext;
+} CustomMergeJoinState;
+
+/* custmj.c */
+extern bool						enable_custom_mergejoin;
+extern CustomPathMethods		custmj_path_methods;
+extern CustomPlanMethods		custmj_plan_methods;
+
+extern void	_PG_init(void);
+
+/* joinpath.c */
+extern List *select_mergejoin_clauses(PlannerInfo *root,
+									  RelOptInfo *joinrel,
+									  RelOptInfo *outerrel,
+									  RelOptInfo *innerrel,
+									  List *restrictlist,
+									  JoinType jointype,
+									  bool *mergejoin_allowed);
+
+extern void sort_inner_and_outer(PlannerInfo *root,
+								 RelOptInfo *joinrel,
+								 RelOptInfo *outerrel,
+								 RelOptInfo *innerrel,
+								 List *restrictlist,
+								 List *mergeclause_list,
+								 JoinType jointype,
+								 SpecialJoinInfo *sjinfo,
+								 Relids param_source_rels,
+								 Relids extra_lateral_rels);
+
+extern void match_unsorted_outer(PlannerInfo *root,
+								 RelOptInfo *joinrel,
+								 RelOptInfo *outerrel,
+								 RelOptInfo *innerrel,
+								 List *restrictlist,
+								 List *mergeclause_list,
+								 JoinType jointype,
+								 SpecialJoinInfo *sjinfo,
+								 SemiAntiJoinFactors *semifactors,
+								 Relids param_source_rels,
+								 Relids extra_lateral_rels);
+
+/* createplan.c */
+extern MergeJoin *create_mergejoin_plan(PlannerInfo *root,
+										CustomMergePath *best_path,
+										Plan *outer_plan,
+										Plan *inner_plan);
+extern Plan *create_gating_plan(PlannerInfo *root, Plan *plan, List *quals);
+
+/* setrefs.c */
+typedef struct tlist_vinfo
+{
+	Index		varno;			/* RT index of Var */
+	AttrNumber	varattno;		/* attr number of Var */
+	AttrNumber	resno;			/* TLE position of Var */
+} tlist_vinfo;
+
+typedef struct indexed_tlist
+{
+	List	   *tlist;			/* underlying target list */
+	int			num_vars;		/* number of plain Var tlist entries */
+	bool		has_ph_vars;	/* are there PlaceHolderVar entries? */
+	bool		has_non_vars;	/* are there other entries? */
+	/* array of num_vars entries: */
+	tlist_vinfo vars[1];		/* VARIABLE LENGTH ARRAY */
+} indexed_tlist;				/* VARIABLE LENGTH STRUCT */
+
+extern indexed_tlist *build_tlist_index(List *tlist);
+extern List *fix_join_expr(PlannerInfo *root,
+						   List *clauses,
+						   indexed_tlist *outer_itlist,
+						   indexed_tlist *inner_itlist,
+						   Index acceptable_rel,
+						   int rtoffset);
+/* nodeMergejoin.c */
+extern MergeJoinState *_ExecInitMergeJoin(CustomMergeJoin *node,
+										  EState *estate,
+										  int eflags);
+extern TupleTableSlot *_ExecMergeJoin(CustomMergeJoinState *node);
+extern void _ExecEndMergeJoin(CustomMergeJoinState *node);
+extern void _ExecReScanMergeJoin(CustomMergeJoinState *node);
+
+#endif	/* CUSTMJ_H */
diff --git a/contrib/custmj/expected/custmj.out b/contrib/custmj/expected/custmj.out
new file mode 100644
index 0000000..19ba188
--- /dev/null
+++ b/contrib/custmj/expected/custmj.out
@@ -0,0 +1,378 @@
+-- regression test for custmj extension
+--
+-- initial setup
+--
+CREATE TABLE t1 (a int, b text);
+CREATE TABLE t2 (x int, y text);
+CREATE TABLE t3 (n int primary key, m text);
+CREATE TABLE t4 (s int references t3(n), t text);
+INSERT INTO t1 (SELECT x, md5(x::text) FROM generate_series(  1,600) x);
+INSERT INTO t2 (SELECT x, md5(x::text) FROM generate_series(401,800) x);
+INSERT INTO t3 (SELECT x, md5(x::text) FROM generate_series(  1,800) x);
+INSERT INTO t4 (SELECT x, md5(x::text) FROM generate_series(201,600) x);
+VACUUM ANALYZE t1;
+VACUUM ANALYZE t2;
+VACUUM ANALYZE t3;
+VACUUM ANALYZE t4;
+-- LOAD this extension
+LOAD 'custmj';
+INFO:  MergeJoin logic on top of CustomPlan interface
+--
+-- explain output
+--
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Hash Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Hash Cond: (t1.a = t2.x)
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+   ->  Hash
+         Output: t2.x, t2.y
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Hash Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Hash Cond: (t1.a = t2.x)
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+   ->  Hash
+         Output: t2.x, t2.y
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+            QUERY PLAN             
+-----------------------------------
+ Hash Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Hash Cond: (t3.n = t4.s)
+   ->  Seq Scan on public.t3
+         Output: t3.n, t3.m
+   ->  Hash
+         Output: t4.s, t4.t
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+            QUERY PLAN             
+-----------------------------------
+ Hash Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Hash Cond: (t3.n = t4.s)
+   ->  Seq Scan on public.t3
+         Output: t3.n, t3.m
+   ->  Hash
+         Output: t4.s, t4.t
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(9 rows)
+
+-- force off hash_join
+SET enable_hashjoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Merge Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO bmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Merge Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO bmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO bmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO bmj4 FROM t3 FULL JOIN t4 ON n = s;
+-- force off built-in merge_join
+SET enable_mergejoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO cmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+             QUERY PLAN             
+------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO cmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO cmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO cmj4 FROM t3 FULL JOIN t4 ON n = s;
+-- compare the difference of simple result
+SELECT * FROM bmj1 EXCEPT SELECT * FROM cmj1;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj1 EXCEPT SELECT * FROM bmj1;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj2 EXCEPT SELECT * FROM cmj2;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj2 EXCEPT SELECT * FROM bmj2;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj3 EXCEPT SELECT * FROM cmj3;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj3 EXCEPT SELECT * FROM bmj3;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj4 EXCEPT SELECT * FROM cmj4;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj4 EXCEPT SELECT * FROM bmj4;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+-- a little bit complicated
+EXPLAIN (verbose, costs off)
+  SELECT (a + x + n) % s AS c1, md5(b || y || m || t) AS c2
+  FROM ((t1 join t2 on a = x) join t3 on y = m) join t4 on n = s
+  WHERE b like '%ab%' AND y like '%cd%' AND m like t;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Nested Loop
+   Output: (((t1.a + t2.x) + t3.n) % t4.s), md5((((t1.b || t2.y) || t3.m) || t4.t))
+   Join Filter: (t2.x = t1.a)
+   ->  Nested Loop
+         Output: t2.x, t2.y, t3.n, t3.m, t4.s, t4.t
+         Join Filter: (t3.m = t2.y)
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+               Filter: (t2.y ~~ '%cd%'::text)
+         ->  Materialize
+               Output: t3.n, t3.m, t4.s, t4.t
+               ->  Custom (CustomMergeJoin) Join
+                     Output: t3.n, t3.m, t4.s, t4.t
+                     Merge Cond: (t3.n = t4.s)
+                     Join Filter: (t3.m ~~ t4.t)
+                     ->  Index Scan using t3_pkey on public.t3
+                           Output: t3.n, t3.m
+                     ->  Sort
+                           Output: t4.s, t4.t
+                           Sort Key: t4.s
+                           ->  Seq Scan on public.t4
+                                 Output: t4.s, t4.t
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+         Filter: (t1.b ~~ '%ab%'::text)
+(25 rows)
+
+PREPARE p1(int,int) AS
+SELECT * FROM t1 JOIN t3 ON a = n WHERE n BETWEEN $1 AND $2;
+EXPLAIN (verbose, costs off) EXECUTE p1(100,100);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Nested Loop
+   Output: t1.a, t1.b, t3.n, t3.m
+   Join Filter: (t1.a = t3.n)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+         Index Cond: ((t3.n >= 100) AND (t3.n <= 100))
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+(8 rows)
+
+EXPLAIN (verbose, costs off) EXECUTE p1(100,1000);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t3.n, t3.m
+   Merge Cond: (t3.n = t1.a)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+         Index Cond: ((t3.n >= 100) AND (t3.n <= 1000))
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+(11 rows)
+
+EXPLAIN (verbose, costs off)
+SELECT * FROM t1 JOIN t2 ON a = x WHERE x IN (SELECT n % 100 FROM t3);
+                   QUERY PLAN                   
+------------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t2.x = t1.a)
+   ->  Custom (CustomMergeJoin) Semi Join
+         Output: t2.x, t2.y, t3.n
+         Merge Cond: (t2.x = ((t3.n % 100)))
+         ->  Sort
+               Output: t2.x, t2.y
+               Sort Key: t2.x
+               ->  Seq Scan on public.t2
+                     Output: t2.x, t2.y
+         ->  Sort
+               Output: t3.n, ((t3.n % 100))
+               Sort Key: ((t3.n % 100))
+               ->  Seq Scan on public.t3
+                     Output: t3.n, (t3.n % 100)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+(21 rows)
+
+-- check GetSpecialCustomVar stuff
+SET client_min_messages = debug;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,1) is reference to (1,1)
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,2) is reference to (1,2)
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,1) is reference to (1,1)
+             QUERY PLAN             
+------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
diff --git a/contrib/custmj/joinpath.c b/contrib/custmj/joinpath.c
new file mode 100644
index 0000000..9ef940b
--- /dev/null
+++ b/contrib/custmj/joinpath.c
@@ -0,0 +1,988 @@
+/*-------------------------------------------------------------------------
+ *
+ * joinpath.c
+ *	  Routines to find all possible paths for processing a set of joins
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/joinpath.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "executor/executor.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "custmj.h"
+
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
+
+#define PATH_PARAM_BY_REL(path, rel)  \
+	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+
+/*
+ * try_nestloop_path
+ *	  Consider a nestloop join path; if it appears useful, push it into
+ *	  the joinrel's pathlist via add_path().
+ */
+static void
+try_nestloop_path(PlannerInfo *root,
+				  RelOptInfo *joinrel,
+				  JoinType jointype,
+				  SpecialJoinInfo *sjinfo,
+				  SemiAntiJoinFactors *semifactors,
+				  Relids param_source_rels,
+				  Relids extra_lateral_rels,
+				  Path *outer_path,
+				  Path *inner_path,
+				  List *restrict_clauses,
+				  List *pathkeys)
+{
+	Relids		required_outer;
+	JoinCostWorkspace workspace;
+
+	/*
+	 * Check to see if proposed path is still parameterized, and reject if the
+	 * parameterization wouldn't be sensible.
+	 */
+	required_outer = calc_nestloop_required_outer(outer_path,
+												  inner_path);
+	if (required_outer &&
+		!bms_overlap(required_outer, param_source_rels))
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+		return;
+	}
+
+	/*
+	 * Independently of that, add parameterization needed for any
+	 * PlaceHolderVars that need to be computed at the join.
+	 */
+	required_outer = bms_add_members(required_outer, extra_lateral_rels);
+
+	/*
+	 * Do a precheck to quickly eliminate obviously-inferior paths.  We
+	 * calculate a cheap lower bound on the path's cost and then use
+	 * add_path_precheck() to see if the path is clearly going to be dominated
+	 * by some existing path for the joinrel.  If not, do the full pushup with
+	 * creating a fully valid path structure and submitting it to add_path().
+	 * The latter two steps are expensive enough to make this two-phase
+	 * methodology worthwhile.
+	 */
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_path,
+						  sjinfo, semifactors);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		add_path(joinrel, (Path *)
+				 create_nestloop_path(root,
+									  joinrel,
+									  jointype,
+									  &workspace,
+									  sjinfo,
+									  semifactors,
+									  outer_path,
+									  inner_path,
+									  restrict_clauses,
+									  pathkeys,
+									  required_outer));
+	}
+	else
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+	}
+}
+
+/*
+ * try_mergejoin_path
+ *	  Consider a merge join path; if it appears useful, push it into
+ *	  the joinrel's pathlist via add_path().
+ */
+static void
+try_mergejoin_path(PlannerInfo *root,
+				   RelOptInfo *joinrel,
+				   JoinType jointype,
+				   SpecialJoinInfo *sjinfo,
+				   Relids param_source_rels,
+				   Relids extra_lateral_rels,
+				   Path *outer_path,
+				   Path *inner_path,
+				   List *restrict_clauses,
+				   List *pathkeys,
+				   List *mergeclauses,
+				   List *outersortkeys,
+				   List *innersortkeys)
+{
+	Relids		required_outer;
+	JoinCostWorkspace workspace;
+
+	/*
+	 * Check to see if proposed path is still parameterized, and reject if the
+	 * parameterization wouldn't be sensible.
+	 */
+	required_outer = calc_non_nestloop_required_outer(outer_path,
+													  inner_path);
+	if (required_outer &&
+		!bms_overlap(required_outer, param_source_rels))
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+		return;
+	}
+
+	/*
+	 * Independently of that, add parameterization needed for any
+	 * PlaceHolderVars that need to be computed at the join.
+	 */
+	required_outer = bms_add_members(required_outer, extra_lateral_rels);
+
+	/*
+	 * If the given paths are already well enough ordered, we can skip doing
+	 * an explicit sort.
+	 */
+	if (outersortkeys &&
+		pathkeys_contained_in(outersortkeys, outer_path->pathkeys))
+		outersortkeys = NIL;
+	if (innersortkeys &&
+		pathkeys_contained_in(innersortkeys, inner_path->pathkeys))
+		innersortkeys = NIL;
+
+	/*
+	 * See comments in try_nestloop_path().
+	 */
+	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
+						   outer_path, inner_path,
+						   outersortkeys, innersortkeys,
+						   sjinfo);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/* KG: adjust to create CustomMergePath, instead of MergePath */
+		CustomMergePath	   *cmpath;
+		MergePath		   *mpath
+			= create_mergejoin_path(root,
+									joinrel,
+									jointype,
+									&workspace,
+									sjinfo,
+									outer_path,
+									inner_path,
+									restrict_clauses,
+									pathkeys,
+									required_outer,
+									mergeclauses,
+									outersortkeys,
+									innersortkeys);
+
+		/* adjust cost according to enable_(custom)_mergejoin GUCs */
+		if (!enable_mergejoin && enable_custom_mergejoin)
+		{
+			mpath->jpath.path.startup_cost -= disable_cost;
+			mpath->jpath.path.total_cost -= disable_cost;
+		}
+		else if (enable_mergejoin && !enable_custom_mergejoin)
+		{
+			mpath->jpath.path.startup_cost += disable_cost;
+			mpath->jpath.path.total_cost += disable_cost;
+		}
+
+		/* construct CustomMergePath object */
+		cmpath = palloc0(sizeof(CustomMergePath));
+		cmpath->cpath.path = mpath->jpath.path;
+		cmpath->cpath.path.type = T_CustomPath;
+		cmpath->cpath.path.pathtype = T_CustomPlan;
+		cmpath->cpath.methods = &custmj_path_methods;
+		cmpath->jointype = mpath->jpath.jointype;
+		cmpath->outerjoinpath = mpath->jpath.outerjoinpath;
+		cmpath->innerjoinpath = mpath->jpath.innerjoinpath;
+		cmpath->joinrestrictinfo = mpath->jpath.joinrestrictinfo;
+		cmpath->path_mergeclauses = mpath->path_mergeclauses;
+		cmpath->outersortkeys = mpath->outersortkeys;
+		cmpath->innersortkeys = mpath->innersortkeys;
+		cmpath->materialize_inner = mpath->materialize_inner;
+
+		add_path(joinrel, &cmpath->cpath.path);
+	}
+	else
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+	}
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.	If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static inline bool
+clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
+						RelOptInfo *innerrel)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrel->relids) &&
+		bms_is_subset(rinfo->right_relids, innerrel->relids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrel->relids) &&
+			 bms_is_subset(rinfo->right_relids, outerrel->relids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+/*
+ * sort_inner_and_outer
+ *	  Create mergejoin join paths by explicitly sorting both the outer and
+ *	  inner join relations on each available merge ordering.
+ *
+ * 'joinrel' is the join relation
+ * 'outerrel' is the outer join relation
+ * 'innerrel' is the inner join relation
+ * 'restrictlist' contains all of the RestrictInfo nodes for restriction
+ *		clauses that apply to this join
+ * 'mergeclause_list' is a list of RestrictInfo nodes for available
+ *		mergejoin clauses in this join
+ * 'jointype' is the type of join to do
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'param_source_rels' are OK targets for parameterization of result paths
+ * 'extra_lateral_rels' are additional parameterization for result paths
+ */
+void
+sort_inner_and_outer(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Path	   *outer_path;
+	Path	   *inner_path;
+	List	   *all_pathkeys;
+	ListCell   *l;
+
+	/*
+	 * We only consider the cheapest-total-cost input paths, since we are
+	 * assuming here that a sort is required.  We will consider
+	 * cheapest-startup-cost input paths later, and only if they don't need a
+	 * sort.
+	 *
+	 * This function intentionally does not consider parameterized input
+	 * paths, except when the cheapest-total is parameterized.	If we did so,
+	 * we'd have a combinatorial explosion of mergejoin paths of dubious
+	 * value.  This interacts with decisions elsewhere that also discriminate
+	 * against mergejoins with parameterized inputs; see comments in
+	 * src/backend/optimizer/README.
+	 */
+	outer_path = outerrel->cheapest_total_path;
+	inner_path = innerrel->cheapest_total_path;
+
+	/*
+	 * If either cheapest-total path is parameterized by the other rel, we
+	 * can't use a mergejoin.  (There's no use looking for alternative input
+	 * paths, since these should already be the least-parameterized available
+	 * paths.)
+	 */
+	if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
+		PATH_PARAM_BY_REL(inner_path, outerrel))
+		return;
+
+	/*
+	 * If unique-ification is requested, do it and then handle as a plain
+	 * inner join.
+	 */
+	if (jointype == JOIN_UNIQUE_OUTER)
+	{
+		outer_path = (Path *) create_unique_path(root, outerrel,
+												 outer_path, sjinfo);
+		Assert(outer_path);
+		jointype = JOIN_INNER;
+	}
+	else if (jointype == JOIN_UNIQUE_INNER)
+	{
+		inner_path = (Path *) create_unique_path(root, innerrel,
+												 inner_path, sjinfo);
+		Assert(inner_path);
+		jointype = JOIN_INNER;
+	}
+
+	/*
+	 * Each possible ordering of the available mergejoin clauses will generate
+	 * a differently-sorted result path at essentially the same cost.  We have
+	 * no basis for choosing one over another at this level of joining, but
+	 * some sort orders may be more useful than others for higher-level
+	 * mergejoins, so it's worth considering multiple orderings.
+	 *
+	 * Actually, it's not quite true that every mergeclause ordering will
+	 * generate a different path order, because some of the clauses may be
+	 * partially redundant (refer to the same EquivalenceClasses).	Therefore,
+	 * what we do is convert the mergeclause list to a list of canonical
+	 * pathkeys, and then consider different orderings of the pathkeys.
+	 *
+	 * Generating a path for *every* permutation of the pathkeys doesn't seem
+	 * like a winning strategy; the cost in planning time is too high. For
+	 * now, we generate one path for each pathkey, listing that pathkey first
+	 * and the rest in random order.  This should allow at least a one-clause
+	 * mergejoin without re-sorting against any other possible mergejoin
+	 * partner path.  But if we've not guessed the right ordering of secondary
+	 * keys, we may end up evaluating clauses as qpquals when they could have
+	 * been done as mergeclauses.  (In practice, it's rare that there's more
+	 * than two or three mergeclauses, so expending a huge amount of thought
+	 * on that is probably not worth it.)
+	 *
+	 * The pathkey order returned by select_outer_pathkeys_for_merge() has
+	 * some heuristics behind it (see that function), so be sure to try it
+	 * exactly as-is as well as making variants.
+	 */
+	all_pathkeys = select_outer_pathkeys_for_merge(root,
+												   mergeclause_list,
+												   joinrel);
+
+	foreach(l, all_pathkeys)
+	{
+		List	   *front_pathkey = (List *) lfirst(l);
+		List	   *cur_mergeclauses;
+		List	   *outerkeys;
+		List	   *innerkeys;
+		List	   *merge_pathkeys;
+
+		/* Make a pathkey list with this guy first */
+		if (l != list_head(all_pathkeys))
+			outerkeys = lcons(front_pathkey,
+							  list_delete_ptr(list_copy(all_pathkeys),
+											  front_pathkey));
+		else
+			outerkeys = all_pathkeys;	/* no work at first one... */
+
+		/* Sort the mergeclauses into the corresponding ordering */
+		cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
+														  outerkeys,
+														  true,
+														  mergeclause_list);
+
+		/* Should have used them all... */
+		Assert(list_length(cur_mergeclauses) == list_length(mergeclause_list));
+
+		/* Build sort pathkeys for the inner side */
+		innerkeys = make_inner_pathkeys_for_merge(root,
+												  cur_mergeclauses,
+												  outerkeys);
+
+		/* Build pathkeys representing output sort order */
+		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
+											 outerkeys);
+
+		/*
+		 * And now we can make the path.
+		 *
+		 * Note: it's possible that the cheapest paths will already be sorted
+		 * properly.  try_mergejoin_path will detect that case and suppress an
+		 * explicit sort step, so we needn't do so here.
+		 */
+		try_mergejoin_path(root,
+						   joinrel,
+						   jointype,
+						   sjinfo,
+						   param_source_rels,
+						   extra_lateral_rels,
+						   outer_path,
+						   inner_path,
+						   restrictlist,
+						   merge_pathkeys,
+						   cur_mergeclauses,
+						   outerkeys,
+						   innerkeys);
+	}
+}
+
+/*
+ * match_unsorted_outer
+ *	  Creates possible join paths for processing a single join relation
+ *	  'joinrel' by employing either iterative substitution or
+ *	  mergejoining on each of its possible outer paths (considering
+ *	  only outer paths that are already ordered well enough for merging).
+ *
+ * We always generate a nestloop path for each available outer path.
+ * In fact we may generate as many as five: one on the cheapest-total-cost
+ * inner path, one on the same with materialization, one on the
+ * cheapest-startup-cost inner path (if different), one on the
+ * cheapest-total inner-indexscan path (if any), and one on the
+ * cheapest-startup inner-indexscan path (if different).
+ *
+ * We also consider mergejoins if mergejoin clauses are available.	We have
+ * two ways to generate the inner path for a mergejoin: sort the cheapest
+ * inner path, or use an inner path that is already suitably ordered for the
+ * merge.  If we have several mergeclauses, it could be that there is no inner
+ * path (or only a very expensive one) for the full list of mergeclauses, but
+ * better paths exist if we truncate the mergeclause list (thereby discarding
+ * some sort key requirements).  So, we consider truncations of the
+ * mergeclause list as well as the full list.  (Ideally we'd consider all
+ * subsets of the mergeclause list, but that seems way too expensive.)
+ *
+ * 'joinrel' is the join relation
+ * 'outerrel' is the outer join relation
+ * 'innerrel' is the inner join relation
+ * 'restrictlist' contains all of the RestrictInfo nodes for restriction
+ *		clauses that apply to this join
+ * 'mergeclause_list' is a list of RestrictInfo nodes for available
+ *		mergejoin clauses in this join
+ * 'jointype' is the type of join to do
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'semifactors' contains valid data if jointype is SEMI or ANTI
+ * 'param_source_rels' are OK targets for parameterization of result paths
+ * 'extra_lateral_rels' are additional parameterization for result paths
+ */
+void
+match_unsorted_outer(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	JoinType	save_jointype = jointype;
+	bool		nestjoinOK;
+	bool		useallclauses;
+	Path	   *inner_cheapest_total = innerrel->cheapest_total_path;
+	Path	   *matpath = NULL;
+	ListCell   *lc1;
+
+	/*
+	 * Nestloop only supports inner, left, semi, and anti joins.  Also, if we
+	 * are doing a right or full mergejoin, we must use *all* the mergeclauses
+	 * as join clauses, else we will not have a valid plan.  (Although these
+	 * two flags are currently inverses, keep them separate for clarity and
+	 * possible future changes.)
+	 */
+	switch (jointype)
+	{
+		case JOIN_INNER:
+		case JOIN_LEFT:
+		case JOIN_SEMI:
+		case JOIN_ANTI:
+			nestjoinOK = true;
+			useallclauses = false;
+			break;
+		case JOIN_RIGHT:
+		case JOIN_FULL:
+			nestjoinOK = false;
+			useallclauses = true;
+			break;
+		case JOIN_UNIQUE_OUTER:
+		case JOIN_UNIQUE_INNER:
+			jointype = JOIN_INNER;
+			nestjoinOK = true;
+			useallclauses = false;
+			break;
+		default:
+			elog(ERROR, "unrecognized join type: %d",
+				 (int) jointype);
+			nestjoinOK = false; /* keep compiler quiet */
+			useallclauses = false;
+			break;
+	}
+
+	/*
+	 * If inner_cheapest_total is parameterized by the outer rel, ignore it;
+	 * we will consider it below as a member of cheapest_parameterized_paths,
+	 * but the other possibilities considered in this routine aren't usable.
+	 */
+	if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
+		inner_cheapest_total = NULL;
+
+	/*
+	 * If we need to unique-ify the inner path, we will consider only the
+	 * cheapest-total inner.
+	 */
+	if (save_jointype == JOIN_UNIQUE_INNER)
+	{
+		/* No way to do this with an inner path parameterized by outer rel */
+		if (inner_cheapest_total == NULL)
+			return;
+		inner_cheapest_total = (Path *)
+			create_unique_path(root, innerrel, inner_cheapest_total, sjinfo);
+		Assert(inner_cheapest_total);
+	}
+	else if (nestjoinOK)
+	{
+		/*
+		 * Consider materializing the cheapest inner path, unless
+		 * enable_material is off or the path in question materializes its
+		 * output anyway.
+		 */
+		if (enable_material && inner_cheapest_total != NULL &&
+			!ExecMaterializesOutput(inner_cheapest_total->pathtype))
+			matpath = (Path *)
+				create_material_path(innerrel, inner_cheapest_total);
+	}
+
+	foreach(lc1, outerrel->pathlist)
+	{
+		Path	   *outerpath = (Path *) lfirst(lc1);
+		List	   *merge_pathkeys;
+		List	   *mergeclauses;
+		List	   *innersortkeys;
+		List	   *trialsortkeys;
+		Path	   *cheapest_startup_inner;
+		Path	   *cheapest_total_inner;
+		int			num_sortkeys;
+		int			sortkeycnt;
+
+		/*
+		 * We cannot use an outer path that is parameterized by the inner rel.
+		 */
+		if (PATH_PARAM_BY_REL(outerpath, innerrel))
+			continue;
+
+		/*
+		 * If we need to unique-ify the outer path, it's pointless to consider
+		 * any but the cheapest outer.	(XXX we don't consider parameterized
+		 * outers, nor inners, for unique-ified cases.	Should we?)
+		 */
+		if (save_jointype == JOIN_UNIQUE_OUTER)
+		{
+			if (outerpath != outerrel->cheapest_total_path)
+				continue;
+			outerpath = (Path *) create_unique_path(root, outerrel,
+													outerpath, sjinfo);
+			Assert(outerpath);
+		}
+
+		/*
+		 * The result will have this sort order (even if it is implemented as
+		 * a nestloop, and even if some of the mergeclauses are implemented by
+		 * qpquals rather than as true mergeclauses):
+		 */
+		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
+											 outerpath->pathkeys);
+
+		if (save_jointype == JOIN_UNIQUE_INNER)
+		{
+			/*
+			 * Consider nestloop join, but only with the unique-ified cheapest
+			 * inner path
+			 */
+			try_nestloop_path(root,
+							  joinrel,
+							  jointype,
+							  sjinfo,
+							  semifactors,
+							  param_source_rels,
+							  extra_lateral_rels,
+							  outerpath,
+							  inner_cheapest_total,
+							  restrictlist,
+							  merge_pathkeys);
+		}
+		else if (nestjoinOK)
+		{
+			/*
+			 * Consider nestloop joins using this outer path and various
+			 * available paths for the inner relation.	We consider the
+			 * cheapest-total paths for each available parameterization of the
+			 * inner relation, including the unparameterized case.
+			 */
+			ListCell   *lc2;
+
+			foreach(lc2, innerrel->cheapest_parameterized_paths)
+			{
+				Path	   *innerpath = (Path *) lfirst(lc2);
+
+				try_nestloop_path(root,
+								  joinrel,
+								  jointype,
+								  sjinfo,
+								  semifactors,
+								  param_source_rels,
+								  extra_lateral_rels,
+								  outerpath,
+								  innerpath,
+								  restrictlist,
+								  merge_pathkeys);
+			}
+
+			/* Also consider materialized form of the cheapest inner path */
+			if (matpath != NULL)
+				try_nestloop_path(root,
+								  joinrel,
+								  jointype,
+								  sjinfo,
+								  semifactors,
+								  param_source_rels,
+								  extra_lateral_rels,
+								  outerpath,
+								  matpath,
+								  restrictlist,
+								  merge_pathkeys);
+		}
+
+		/* Can't do anything else if outer path needs to be unique'd */
+		if (save_jointype == JOIN_UNIQUE_OUTER)
+			continue;
+
+		/* Can't do anything else if inner rel is parameterized by outer */
+		if (inner_cheapest_total == NULL)
+			continue;
+
+		/* Look for useful mergeclauses (if any) */
+		mergeclauses = find_mergeclauses_for_pathkeys(root,
+													  outerpath->pathkeys,
+													  true,
+													  mergeclause_list);
+
+		/*
+		 * Done with this outer path if no chance for a mergejoin.
+		 *
+		 * Special corner case: for "x FULL JOIN y ON true", there will be no
+		 * join clauses at all.  Ordinarily we'd generate a clauseless
+		 * nestloop path, but since mergejoin is our only join type that
+		 * supports FULL JOIN without any join clauses, it's necessary to
+		 * generate a clauseless mergejoin path instead.
+		 */
+		if (mergeclauses == NIL)
+		{
+			if (jointype == JOIN_FULL)
+				 /* okay to try for mergejoin */ ;
+			else
+				continue;
+		}
+		if (useallclauses && list_length(mergeclauses) != list_length(mergeclause_list))
+			continue;
+
+		/* Compute the required ordering of the inner path */
+		innersortkeys = make_inner_pathkeys_for_merge(root,
+													  mergeclauses,
+													  outerpath->pathkeys);
+
+		/*
+		 * Generate a mergejoin on the basis of sorting the cheapest inner.
+		 * Since a sort will be needed, only cheapest total cost matters. (But
+		 * try_mergejoin_path will do the right thing if inner_cheapest_total
+		 * is already correctly sorted.)
+		 */
+		try_mergejoin_path(root,
+						   joinrel,
+						   jointype,
+						   sjinfo,
+						   param_source_rels,
+						   extra_lateral_rels,
+						   outerpath,
+						   inner_cheapest_total,
+						   restrictlist,
+						   merge_pathkeys,
+						   mergeclauses,
+						   NIL,
+						   innersortkeys);
+
+		/* Can't do anything else if inner path needs to be unique'd */
+		if (save_jointype == JOIN_UNIQUE_INNER)
+			continue;
+
+		/*
+		 * Look for presorted inner paths that satisfy the innersortkey list
+		 * --- or any truncation thereof, if we are allowed to build a
+		 * mergejoin using a subset of the merge clauses.  Here, we consider
+		 * both cheap startup cost and cheap total cost.
+		 *
+		 * Currently we do not consider parameterized inner paths here. This
+		 * interacts with decisions elsewhere that also discriminate against
+		 * mergejoins with parameterized inputs; see comments in
+		 * src/backend/optimizer/README.
+		 *
+		 * As we shorten the sortkey list, we should consider only paths that
+		 * are strictly cheaper than (in particular, not the same as) any path
+		 * found in an earlier iteration.  Otherwise we'd be intentionally
+		 * using fewer merge keys than a given path allows (treating the rest
+		 * as plain joinquals), which is unlikely to be a good idea.  Also,
+		 * eliminating paths here on the basis of compare_path_costs is a lot
+		 * cheaper than building the mergejoin path only to throw it away.
+		 *
+		 * If inner_cheapest_total is well enough sorted to have not required
+		 * a sort in the path made above, we shouldn't make a duplicate path
+		 * with it, either.  We handle that case with the same logic that
+		 * handles the previous consideration, by initializing the variables
+		 * that track cheapest-so-far properly.  Note that we do NOT reject
+		 * inner_cheapest_total if we find it matches some shorter set of
+		 * pathkeys.  That case corresponds to using fewer mergekeys to avoid
+		 * sorting inner_cheapest_total, whereas we did sort it above, so the
+		 * plans being considered are different.
+		 */
+		if (pathkeys_contained_in(innersortkeys,
+								  inner_cheapest_total->pathkeys))
+		{
+			/* inner_cheapest_total didn't require a sort */
+			cheapest_startup_inner = inner_cheapest_total;
+			cheapest_total_inner = inner_cheapest_total;
+		}
+		else
+		{
+			/* it did require a sort, at least for the full set of keys */
+			cheapest_startup_inner = NULL;
+			cheapest_total_inner = NULL;
+		}
+		num_sortkeys = list_length(innersortkeys);
+		if (num_sortkeys > 1 && !useallclauses)
+			trialsortkeys = list_copy(innersortkeys);	/* need modifiable copy */
+		else
+			trialsortkeys = innersortkeys;		/* won't really truncate */
+
+		for (sortkeycnt = num_sortkeys; sortkeycnt > 0; sortkeycnt--)
+		{
+			Path	   *innerpath;
+			List	   *newclauses = NIL;
+
+			/*
+			 * Look for an inner path ordered well enough for the first
+			 * 'sortkeycnt' innersortkeys.	NB: trialsortkeys list is modified
+			 * destructively, which is why we made a copy...
+			 */
+			trialsortkeys = list_truncate(trialsortkeys, sortkeycnt);
+			innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
+													   trialsortkeys,
+													   NULL,
+													   TOTAL_COST);
+			if (innerpath != NULL &&
+				(cheapest_total_inner == NULL ||
+				 compare_path_costs(innerpath, cheapest_total_inner,
+									TOTAL_COST) < 0))
+			{
+				/* Found a cheap (or even-cheaper) sorted path */
+				/* Select the right mergeclauses, if we didn't already */
+				if (sortkeycnt < num_sortkeys)
+				{
+					newclauses =
+						find_mergeclauses_for_pathkeys(root,
+													   trialsortkeys,
+													   false,
+													   mergeclauses);
+					Assert(newclauses != NIL);
+				}
+				else
+					newclauses = mergeclauses;
+				try_mergejoin_path(root,
+								   joinrel,
+								   jointype,
+								   sjinfo,
+								   param_source_rels,
+								   extra_lateral_rels,
+								   outerpath,
+								   innerpath,
+								   restrictlist,
+								   merge_pathkeys,
+								   newclauses,
+								   NIL,
+								   NIL);
+				cheapest_total_inner = innerpath;
+			}
+			/* Same on the basis of cheapest startup cost ... */
+			innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
+													   trialsortkeys,
+													   NULL,
+													   STARTUP_COST);
+			if (innerpath != NULL &&
+				(cheapest_startup_inner == NULL ||
+				 compare_path_costs(innerpath, cheapest_startup_inner,
+									STARTUP_COST) < 0))
+			{
+				/* Found a cheap (or even-cheaper) sorted path */
+				if (innerpath != cheapest_total_inner)
+				{
+					/*
+					 * Avoid rebuilding clause list if we already made one;
+					 * saves memory in big join trees...
+					 */
+					if (newclauses == NIL)
+					{
+						if (sortkeycnt < num_sortkeys)
+						{
+							newclauses =
+								find_mergeclauses_for_pathkeys(root,
+															   trialsortkeys,
+															   false,
+															   mergeclauses);
+							Assert(newclauses != NIL);
+						}
+						else
+							newclauses = mergeclauses;
+					}
+					try_mergejoin_path(root,
+									   joinrel,
+									   jointype,
+									   sjinfo,
+									   param_source_rels,
+									   extra_lateral_rels,
+									   outerpath,
+									   innerpath,
+									   restrictlist,
+									   merge_pathkeys,
+									   newclauses,
+									   NIL,
+									   NIL);
+				}
+				cheapest_startup_inner = innerpath;
+			}
+
+			/*
+			 * Don't consider truncated sortkeys if we need all clauses.
+			 */
+			if (useallclauses)
+				break;
+		}
+	}
+}
+
+/*
+ * select_mergejoin_clauses
+ *	  Select mergejoin clauses that are usable for a particular join.
+ *	  Returns a list of RestrictInfo nodes for those clauses.
+ *
+ * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
+ * this is a right/full join and there are nonmergejoinable join clauses.
+ * The executor's mergejoin machinery cannot handle such cases, so we have
+ * to avoid generating a mergejoin plan.  (Note that this flag does NOT
+ * consider whether there are actually any mergejoinable clauses.  This is
+ * correct because in some cases we need to build a clauseless mergejoin.
+ * Simply returning NIL is therefore not enough to distinguish safe from
+ * unsafe cases.)
+ *
+ * We also mark each selected RestrictInfo to show which side is currently
+ * being considered as outer.  These are transient markings that are only
+ * good for the duration of the current add_paths_to_joinrel() call!
+ *
+ * We examine each restrictinfo clause known for the join to see
+ * if it is mergejoinable and involves vars from the two sub-relations
+ * currently of interest.
+ */
+List *
+select_mergejoin_clauses(PlannerInfo *root,
+						 RelOptInfo *joinrel,
+						 RelOptInfo *outerrel,
+						 RelOptInfo *innerrel,
+						 List *restrictlist,
+						 JoinType jointype,
+						 bool *mergejoin_allowed)
+{
+	List	   *result_list = NIL;
+	bool		isouterjoin = IS_OUTER_JOIN(jointype);
+	bool		have_nonmergeable_joinclause = false;
+	ListCell   *l;
+
+	foreach(l, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+
+		/*
+		 * If processing an outer join, only use its own join clauses in the
+		 * merge.  For inner joins we can use pushed-down clauses too. (Note:
+		 * we don't set have_nonmergeable_joinclause here because pushed-down
+		 * clauses will become otherquals not joinquals.)
+		 */
+		if (isouterjoin && restrictinfo->is_pushed_down)
+			continue;
+
+		/* Check that clause is a mergeable operator clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+		{
+			/*
+			 * The executor can handle extra joinquals that are constants, but
+			 * not anything else, when doing right/full merge join.  (The
+			 * reason to support constants is so we can do FULL JOIN ON
+			 * FALSE.)
+			 */
+			if (!restrictinfo->clause || !IsA(restrictinfo->clause, Const))
+				have_nonmergeable_joinclause = true;
+			continue;			/* not mergejoinable */
+		}
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer".
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
+		{
+			have_nonmergeable_joinclause = true;
+			continue;			/* no good for these input relations */
+		}
+
+		/*
+		 * Insist that each side have a non-redundant eclass.  This
+		 * restriction is needed because various bits of the planner expect
+		 * that each clause in a merge be associatable with some pathkey in a
+		 * canonical pathkey list, but redundant eclasses can't appear in
+		 * canonical sort orderings.  (XXX it might be worth relaxing this,
+		 * but not enough time to address it for 8.3.)
+		 *
+		 * Note: it would be bad if this condition failed for an otherwise
+		 * mergejoinable FULL JOIN clause, since that would result in
+		 * undesirable planner failure.  I believe that is not possible
+		 * however; a variable involved in a full join could only appear in
+		 * below_outer_join eclasses, which aren't considered redundant.
+		 *
+		 * This case *can* happen for left/right join clauses: the outer-side
+		 * variable could be equated to a constant.  Because we will propagate
+		 * that constant across the join clause, the loss of ability to do a
+		 * mergejoin is not really all that big a deal, and so it's not clear
+		 * that improving this is important.
+		 */
+		update_mergeclause_eclasses(root, restrictinfo);
+
+		if (EC_MUST_BE_REDUNDANT(restrictinfo->left_ec) ||
+			EC_MUST_BE_REDUNDANT(restrictinfo->right_ec))
+		{
+			have_nonmergeable_joinclause = true;
+			continue;			/* can't handle redundant eclasses */
+		}
+
+		result_list = lappend(result_list, restrictinfo);
+	}
+
+	/*
+	 * Report whether mergejoin is allowed (see comment at top of function).
+	 */
+	switch (jointype)
+	{
+		case JOIN_RIGHT:
+		case JOIN_FULL:
+			*mergejoin_allowed = !have_nonmergeable_joinclause;
+			break;
+		default:
+			*mergejoin_allowed = true;
+			break;
+	}
+
+	return result_list;
+}
diff --git a/contrib/custmj/nodeMergejoin.c b/contrib/custmj/nodeMergejoin.c
new file mode 100644
index 0000000..62dd8c0
--- /dev/null
+++ b/contrib/custmj/nodeMergejoin.c
@@ -0,0 +1,1694 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeMergejoin.c
+ *	  routines supporting merge joins
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeMergejoin.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecMergeJoin			mergejoin outer and inner relations.
+ *		ExecInitMergeJoin		creates and initializes run time states
+ *		ExecEndMergeJoin		cleans up the node.
+ *
+ * NOTES
+ *
+ *		Merge-join is done by joining the inner and outer tuples satisfying
+ *		join clauses of the form ((= outerKey innerKey) ...).
+ *		The join clause list is provided by the query planner and may contain
+ *		more than one (= outerKey innerKey) clause (for composite sort key).
+ *
+ *		However, the query executor needs to know whether an outer
+ *		tuple is "greater/smaller" than an inner tuple so that it can
+ *		"synchronize" the two relations. For example, consider the following
+ *		relations:
+ *
+ *				outer: (0 ^1 1 2 5 5 5 6 6 7)	current tuple: 1
+ *				inner: (1 ^3 5 5 5 5 6)			current tuple: 3
+ *
+ *		To continue the merge-join, the executor needs to scan both inner
+ *		and outer relations till the matching tuples 5. It needs to know
+ *		that currently inner tuple 3 is "greater" than outer tuple 1 and
+ *		therefore it should scan the outer relation first to find a
+ *		matching tuple and so on.
+ *
+ *		Therefore, rather than directly executing the merge join clauses,
+ *		we evaluate the left and right key expressions separately and then
+ *		compare the columns one at a time (see MJCompare).	The planner
+ *		passes us enough information about the sort ordering of the inputs
+ *		to allow us to determine how to make the comparison.  We may use the
+ *		appropriate btree comparison function, since Postgres' only notion
+ *		of ordering is specified by btree opfamilies.
+ *
+ *
+ *		Consider the above relations and suppose that the executor has
+ *		just joined the first outer "5" with the last inner "5". The
+ *		next step is of course to join the second outer "5" with all
+ *		the inner "5's". This requires repositioning the inner "cursor"
+ *		to point at the first inner "5". This is done by "marking" the
+ *		first inner 5 so we can restore the "cursor" to it before joining
+ *		with the second outer 5. The access method interface provides
+ *		routines to mark and restore to a tuple.
+ *
+ *
+ *		Essential operation of the merge join algorithm is as follows:
+ *
+ *		Join {
+ *			get initial outer and inner tuples				INITIALIZE
+ *			do forever {
+ *				while (outer != inner) {					SKIP_TEST
+ *					if (outer < inner)
+ *						advance outer						SKIPOUTER_ADVANCE
+ *					else
+ *						advance inner						SKIPINNER_ADVANCE
+ *				}
+ *				mark inner position							SKIP_TEST
+ *				do forever {
+ *					while (outer == inner) {
+ *						join tuples							JOINTUPLES
+ *						advance inner position				NEXTINNER
+ *					}
+ *					advance outer position					NEXTOUTER
+ *					if (outer == mark)						TESTOUTER
+ *						restore inner position to mark		TESTOUTER
+ *					else
+ *						break	// return to top of outer loop
+ *				}
+ *			}
+ *		}
+ *
+ *		The merge join operation is coded in the fashion
+ *		of a state machine.  At each state, we do something and then
+ *		proceed to another state.  This state is stored in the node's
+ *		execution state information and is preserved across calls to
+ *		ExecMergeJoin. -cim 10/31/89
+ */
+#include "postgres.h"
+
+#include "access/nbtree.h"
+#include "executor/execdebug.h"
+/* #include "executor/nodeMergejoin.h" */
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "custmj.h"
+
+/*
+ * States of the ExecMergeJoin state machine
+ */
+#define EXEC_MJ_INITIALIZE_OUTER		1
+#define EXEC_MJ_INITIALIZE_INNER		2
+#define EXEC_MJ_JOINTUPLES				3
+#define EXEC_MJ_NEXTOUTER				4
+#define EXEC_MJ_TESTOUTER				5
+#define EXEC_MJ_NEXTINNER				6
+#define EXEC_MJ_SKIP_TEST				7
+#define EXEC_MJ_SKIPOUTER_ADVANCE		8
+#define EXEC_MJ_SKIPINNER_ADVANCE		9
+#define EXEC_MJ_ENDOUTER				10
+#define EXEC_MJ_ENDINNER				11
+
+/*
+ * Runtime data for each mergejoin clause
+ */
+typedef struct MergeJoinClauseData
+{
+	/* Executable expression trees */
+	ExprState  *lexpr;			/* left-hand (outer) input expression */
+	ExprState  *rexpr;			/* right-hand (inner) input expression */
+
+	/*
+	 * If we have a current left or right input tuple, the values of the
+	 * expressions are loaded into these fields:
+	 */
+	Datum		ldatum;			/* current left-hand value */
+	Datum		rdatum;			/* current right-hand value */
+	bool		lisnull;		/* and their isnull flags */
+	bool		risnull;
+
+	/*
+	 * Everything we need to know to compare the left and right values is
+	 * stored here.
+	 */
+	SortSupportData ssup;
+}	MergeJoinClauseData;
+
+/* Result type for MJEvalOuterValues and MJEvalInnerValues */
+typedef enum
+{
+	MJEVAL_MATCHABLE,			/* normal, potentially matchable tuple */
+	MJEVAL_NONMATCHABLE,		/* tuple cannot join because it has a null */
+	MJEVAL_ENDOFJOIN			/* end of input (physical or effective) */
+} MJEvalResult;
+
+
+#define MarkInnerTuple(innerTupleSlot, mergestate) \
+	ExecCopySlot((mergestate)->mj_MarkedTupleSlot, (innerTupleSlot))
+
+
+/*
+ * MJExamineQuals
+ *
+ * This deconstructs the list of mergejoinable expressions, which is given
+ * to us by the planner in the form of a list of "leftexpr = rightexpr"
+ * expression trees in the order matching the sort columns of the inputs.
+ * We build an array of MergeJoinClause structs containing the information
+ * we will need at runtime.  Each struct essentially tells us how to compare
+ * the two expressions from the original clause.
+ *
+ * In addition to the expressions themselves, the planner passes the btree
+ * opfamily OID, collation OID, btree strategy number (BTLessStrategyNumber or
+ * BTGreaterStrategyNumber), and nulls-first flag that identify the intended
+ * sort ordering for each merge key.  The mergejoinable operator is an
+ * equality operator in the opfamily, and the two inputs are guaranteed to be
+ * ordered in either increasing or decreasing (respectively) order according
+ * to the opfamily and collation, with nulls at the indicated end of the range.
+ * This allows us to obtain the needed comparison function from the opfamily.
+ */
+static MergeJoinClause
+MJExamineQuals(List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   PlanState *parent)
+{
+	MergeJoinClause clauses;
+	int			nClauses = list_length(mergeclauses);
+	int			iClause;
+	ListCell   *cl;
+
+	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+
+	iClause = 0;
+	foreach(cl, mergeclauses)
+	{
+		OpExpr	   *qual = (OpExpr *) lfirst(cl);
+		MergeJoinClause clause = &clauses[iClause];
+		Oid			opfamily = mergefamilies[iClause];
+		Oid			collation = mergecollations[iClause];
+		StrategyNumber opstrategy = mergestrategies[iClause];
+		bool		nulls_first = mergenullsfirst[iClause];
+		int			op_strategy;
+		Oid			op_lefttype;
+		Oid			op_righttype;
+		Oid			sortfunc;
+
+		if (!IsA(qual, OpExpr))
+			elog(ERROR, "mergejoin clause is not an OpExpr");
+
+		/*
+		 * Prepare the input expressions for execution.
+		 */
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+
+		/* Set up sort support data */
+		clause->ssup.ssup_cxt = CurrentMemoryContext;
+		clause->ssup.ssup_collation = collation;
+		if (opstrategy == BTLessStrategyNumber)
+			clause->ssup.ssup_reverse = false;
+		else if (opstrategy == BTGreaterStrategyNumber)
+			clause->ssup.ssup_reverse = true;
+		else	/* planner screwed up */
+			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+		clause->ssup.ssup_nulls_first = nulls_first;
+
+		/* Extract the operator's declared left/right datatypes */
+		get_op_opfamily_properties(qual->opno, opfamily, false,
+								   &op_strategy,
+								   &op_lefttype,
+								   &op_righttype);
+		if (op_strategy != BTEqualStrategyNumber)		/* should not happen */
+			elog(ERROR, "cannot merge using non-equality operator %u",
+				 qual->opno);
+
+		/* And get the matching support or comparison function */
+		sortfunc = get_opfamily_proc(opfamily,
+									 op_lefttype,
+									 op_righttype,
+									 BTSORTSUPPORT_PROC);
+		if (OidIsValid(sortfunc))
+		{
+			/* The sort support function should provide a comparator */
+			OidFunctionCall1(sortfunc, PointerGetDatum(&clause->ssup));
+			Assert(clause->ssup.comparator != NULL);
+		}
+		else
+		{
+			/* opfamily doesn't provide sort support, get comparison func */
+			sortfunc = get_opfamily_proc(opfamily,
+										 op_lefttype,
+										 op_righttype,
+										 BTORDER_PROC);
+			if (!OidIsValid(sortfunc))	/* should not happen */
+				elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+					 BTORDER_PROC, op_lefttype, op_righttype, opfamily);
+			/* We'll use a shim to call the old-style btree comparator */
+			PrepareSortSupportComparisonShim(sortfunc, &clause->ssup);
+		}
+
+		iClause++;
+	}
+
+	return clauses;
+}
+
+/*
+ * MJEvalOuterValues
+ *
+ * Compute the values of the mergejoined expressions for the current
+ * outer tuple.  We also detect whether it's impossible for the current
+ * outer tuple to match anything --- this is true if it yields a NULL
+ * input, since we assume mergejoin operators are strict.  If the NULL
+ * is in the first join column, and that column sorts nulls last, then
+ * we can further conclude that no following tuple can match anything
+ * either, since they must all have nulls in the first column.	However,
+ * that case is only interesting if we're not in FillOuter mode, else
+ * we have to visit all the tuples anyway.
+ *
+ * For the convenience of callers, we also make this routine responsible
+ * for testing for end-of-input (null outer tuple), and returning
+ * MJEVAL_ENDOFJOIN when that's seen.  This allows the same code to be used
+ * for both real end-of-input and the effective end-of-input represented by
+ * a first-column NULL.
+ *
+ * We evaluate the values in OuterEContext, which can be reset each
+ * time we move to a new tuple.
+ */
+static MJEvalResult
+MJEvalOuterValues(CustomMergeJoinState *mergestate)
+{
+	ExprContext *econtext = mergestate->mj_OuterEContext;
+	MJEvalResult result = MJEVAL_MATCHABLE;
+	int			i;
+	MemoryContext oldContext;
+
+	/* Check for end of outer subplan */
+	if (TupIsNull(mergestate->mj_OuterTupleSlot))
+		return MJEVAL_ENDOFJOIN;
+
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	econtext->ecxt_outertuple = mergestate->mj_OuterTupleSlot;
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		clause->ldatum = ExecEvalExpr(clause->lexpr, econtext,
+									  &clause->lisnull, NULL);
+		if (clause->lisnull)
+		{
+			/* match is impossible; can we end the join early? */
+			if (i == 0 && !clause->ssup.ssup_nulls_first &&
+				!mergestate->mj_FillOuter)
+				result = MJEVAL_ENDOFJOIN;
+			else if (result == MJEVAL_MATCHABLE)
+				result = MJEVAL_NONMATCHABLE;
+		}
+	}
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+/*
+ * MJEvalInnerValues
+ *
+ * Same as above, but for the inner tuple.	Here, we have to be prepared
+ * to load data from either the true current inner, or the marked inner,
+ * so caller must tell us which slot to load from.
+ */
+static MJEvalResult
+MJEvalInnerValues(CustomMergeJoinState *mergestate, TupleTableSlot *innerslot)
+{
+	ExprContext *econtext = mergestate->mj_InnerEContext;
+	MJEvalResult result = MJEVAL_MATCHABLE;
+	int			i;
+	MemoryContext oldContext;
+
+	/* Check for end of inner subplan */
+	if (TupIsNull(innerslot))
+		return MJEVAL_ENDOFJOIN;
+
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	econtext->ecxt_innertuple = innerslot;
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		clause->rdatum = ExecEvalExpr(clause->rexpr, econtext,
+									  &clause->risnull, NULL);
+		if (clause->risnull)
+		{
+			/* match is impossible; can we end the join early? */
+			if (i == 0 && !clause->ssup.ssup_nulls_first &&
+				!mergestate->mj_FillInner)
+				result = MJEVAL_ENDOFJOIN;
+			else if (result == MJEVAL_MATCHABLE)
+				result = MJEVAL_NONMATCHABLE;
+		}
+	}
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+/*
+ * MJCompare
+ *
+ * Compare the mergejoinable values of the current two input tuples
+ * and return 0 if they are equal (ie, the mergejoin equalities all
+ * succeed), >0 if outer > inner, <0 if outer < inner.
+ *
+ * MJEvalOuterValues and MJEvalInnerValues must already have been called
+ * for the current outer and inner tuples, respectively.
+ */
+static int
+MJCompare(CustomMergeJoinState *mergestate)
+{
+	int			result = 0;
+	bool		nulleqnull = false;
+	ExprContext *econtext = mergestate->cps.ps.ps_ExprContext;
+	int			i;
+	MemoryContext oldContext;
+
+	/*
+	 * Call the comparison functions in short-lived context, in case they leak
+	 * memory.
+	 */
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		/*
+		 * Special case for NULL-vs-NULL, else use standard comparison.
+		 */
+		if (clause->lisnull && clause->risnull)
+		{
+			nulleqnull = true;	/* NULL "=" NULL */
+			continue;
+		}
+
+		result = ApplySortComparator(clause->ldatum, clause->lisnull,
+									 clause->rdatum, clause->risnull,
+									 &clause->ssup);
+
+		if (result != 0)
+			break;
+	}
+
+	/*
+	 * If we had any NULL-vs-NULL inputs, we do not want to report that the
+	 * tuples are equal.  Instead, if result is still 0, change it to +1. This
+	 * will result in advancing the inner side of the join.
+	 *
+	 * Likewise, if there was a constant-false joinqual, do not report
+	 * equality.  We have to check this as part of the mergequals, else the
+	 * rescan logic will do the wrong thing.
+	 */
+	if (result == 0 &&
+		(nulleqnull || mergestate->mj_ConstFalseJoin))
+		result = 1;
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+
+/*
+ * Generate a fake join tuple with nulls for the inner tuple,
+ * and return it if it passes the non-join quals.
+ */
+static TupleTableSlot *
+MJFillOuter(CustomMergeJoinState *node)
+{
+	ExprContext *econtext = node->cps.ps.ps_ExprContext;
+	List	   *otherqual = node->cps.ps.qual;
+
+	ResetExprContext(econtext);
+
+	econtext->ecxt_outertuple = node->mj_OuterTupleSlot;
+	econtext->ecxt_innertuple = node->mj_NullInnerTupleSlot;
+
+	if (ExecQual(otherqual, econtext, false))
+	{
+		/*
+		 * qualification succeeded.  now form the desired projection tuple and
+		 * return the slot containing it.
+		 */
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		MJ_printf("ExecMergeJoin: returning outer fill tuple\n");
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+
+		if (isDone != ExprEndResult)
+		{
+			node->cps.ps.ps_TupFromTlist =
+				(isDone == ExprMultipleResult);
+			return result;
+		}
+	}
+	else
+		InstrCountFiltered2(node, 1);
+
+	return NULL;
+}
+
+/*
+ * Generate a fake join tuple with nulls for the outer tuple,
+ * and return it if it passes the non-join quals.
+ */
+static TupleTableSlot *
+MJFillInner(CustomMergeJoinState *node)
+{
+	ExprContext *econtext = node->cps.ps.ps_ExprContext;
+	List	   *otherqual = node->cps.ps.qual;
+
+	ResetExprContext(econtext);
+
+	econtext->ecxt_outertuple = node->mj_NullOuterTupleSlot;
+	econtext->ecxt_innertuple = node->mj_InnerTupleSlot;
+
+	if (ExecQual(otherqual, econtext, false))
+	{
+		/*
+		 * qualification succeeded.  now form the desired projection tuple and
+		 * return the slot containing it.
+		 */
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		MJ_printf("ExecMergeJoin: returning inner fill tuple\n");
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+
+		if (isDone != ExprEndResult)
+		{
+			node->cps.ps.ps_TupFromTlist =
+				(isDone == ExprMultipleResult);
+			return result;
+		}
+	}
+	else
+		InstrCountFiltered2(node, 1);
+
+	return NULL;
+}
+
+
+/*
+ * Check that a qual condition is constant true or constant false.
+ * If it is constant false (or null), set *is_const_false to TRUE.
+ *
+ * Constant true would normally be represented by a NIL list, but we allow an
+ * actual bool Const as well.  We do expect that the planner will have thrown
+ * away any non-constant terms that have been ANDed with a constant false.
+ */
+static bool
+check_constant_qual(List *qual, bool *is_const_false)
+{
+	ListCell   *lc;
+
+	foreach(lc, qual)
+	{
+		Const	   *con = (Const *) lfirst(lc);
+
+		if (!con || !IsA(con, Const))
+			return false;
+		if (con->constisnull || !DatumGetBool(con->constvalue))
+			*is_const_false = true;
+	}
+	return true;
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecMergeTupleDump
+ *
+ *		This function is called through the MJ_dump() macro
+ *		when EXEC_MERGEJOINDEBUG is defined
+ * ----------------------------------------------------------------
+ */
+#ifdef EXEC_MERGEJOINDEBUG
+
+static void
+ExecMergeTupleDumpOuter(MergeJoinState *mergestate)
+{
+	TupleTableSlot *outerSlot = mergestate->mj_OuterTupleSlot;
+
+	printf("==== outer tuple ====\n");
+	if (TupIsNull(outerSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(outerSlot);
+}
+
+static void
+ExecMergeTupleDumpInner(MergeJoinState *mergestate)
+{
+	TupleTableSlot *innerSlot = mergestate->mj_InnerTupleSlot;
+
+	printf("==== inner tuple ====\n");
+	if (TupIsNull(innerSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(innerSlot);
+}
+
+static void
+ExecMergeTupleDumpMarked(MergeJoinState *mergestate)
+{
+	TupleTableSlot *markedSlot = mergestate->mj_MarkedTupleSlot;
+
+	printf("==== marked tuple ====\n");
+	if (TupIsNull(markedSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(markedSlot);
+}
+
+static void
+ExecMergeTupleDump(MergeJoinState *mergestate)
+{
+	printf("******** ExecMergeTupleDump ********\n");
+
+	ExecMergeTupleDumpOuter(mergestate);
+	ExecMergeTupleDumpInner(mergestate);
+	ExecMergeTupleDumpMarked(mergestate);
+
+	printf("******** \n");
+}
+#endif
+
+/* ----------------------------------------------------------------
+ *		ExecMergeJoin
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+_ExecMergeJoin(CustomMergeJoinState *node)
+{
+	List	   *joinqual;
+	List	   *otherqual;
+	bool		qualResult;
+	int			compareResult;
+	PlanState  *innerPlan;
+	TupleTableSlot *innerTupleSlot;
+	PlanState  *outerPlan;
+	TupleTableSlot *outerTupleSlot;
+	ExprContext *econtext;
+	bool		doFillOuter;
+	bool		doFillInner;
+
+	/*
+	 * get information from node
+	 */
+	innerPlan = innerPlanState(node);
+	outerPlan = outerPlanState(node);
+	econtext = node->cps.ps.ps_ExprContext;
+	joinqual = node->joinqual;
+	otherqual = node->cps.ps.qual;
+	doFillOuter = node->mj_FillOuter;
+	doFillInner = node->mj_FillInner;
+
+	/*
+	 * Check to see if we're still projecting out tuples from a previous join
+	 * tuple (because there is a function-returning-set in the projection
+	 * expressions).  If so, try to project another one.
+	 */
+	if (node->cps.ps.ps_TupFromTlist)
+	{
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+		if (isDone == ExprMultipleResult)
+			return result;
+		/* Done with that source tuple... */
+		node->cps.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.  Note this can't happen
+	 * until we're done projecting out tuples from a join tuple.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * ok, everything is setup.. let's go to work
+	 */
+	for (;;)
+	{
+		MJ_dump(node);
+
+		/*
+		 * get the current state of the join and do things accordingly.
+		 */
+		switch (node->mj_JoinState)
+		{
+				/*
+				 * EXEC_MJ_INITIALIZE_OUTER means that this is the first time
+				 * ExecMergeJoin() has been called and so we have to fetch the
+				 * first matchable tuple for both outer and inner subplans. We
+				 * do the outer side in INITIALIZE_OUTER state, then advance
+				 * to INITIALIZE_INNER state for the inner subplan.
+				 */
+			case EXEC_MJ_INITIALIZE_OUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_INITIALIZE_OUTER\n");
+
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* OK to go get the first inner tuple */
+						node->mj_JoinState = EXEC_MJ_INITIALIZE_INNER;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Stay in same state to fetch next outer tuple */
+						if (doFillOuter)
+						{
+							/*
+							 * Generate a fake join tuple with nulls for the
+							 * inner tuple, and return it if it passes the
+							 * non-join quals.
+							 */
+							TupleTableSlot *result;
+
+							result = MJFillOuter(node);
+							if (result)
+								return result;
+						}
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: nothing in outer subplan\n");
+						if (doFillInner)
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples. We set MatchedInner = true to
+							 * force the ENDOUTER state to advance inner.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							node->mj_MatchedInner = true;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+			case EXEC_MJ_INITIALIZE_INNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_INITIALIZE_INNER\n");
+
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+
+						/*
+						 * OK, we have the initial tuples.	Begin by skipping
+						 * non-matching tuples.
+						 */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Mark before advancing, if wanted */
+						if (node->mj_ExtraMarks)
+							ExecMarkPos(innerPlan);
+						/* Stay in same state to fetch next inner tuple */
+						if (doFillInner)
+						{
+							/*
+							 * Generate a fake join tuple with nulls for the
+							 * outer tuple, and return it if it passes the
+							 * non-join quals.
+							 */
+							TupleTableSlot *result;
+
+							result = MJFillInner(node);
+							if (result)
+								return result;
+						}
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more inner tuples */
+						MJ_printf("ExecMergeJoin: nothing in inner subplan\n");
+						if (doFillOuter)
+						{
+							/*
+							 * Need to emit left-join tuples for all outer
+							 * tuples, including the one we just fetched.  We
+							 * set MatchedOuter = false to force the ENDINNER
+							 * state to emit first tuple before advancing
+							 * outer.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDINNER;
+							node->mj_MatchedOuter = false;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * EXEC_MJ_JOINTUPLES means we have two tuples which satisfied
+				 * the merge clause so we join them and then proceed to get
+				 * the next inner tuple (EXEC_MJ_NEXTINNER).
+				 */
+			case EXEC_MJ_JOINTUPLES:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_JOINTUPLES\n");
+
+				/*
+				 * Set the next state machine state.  The right things will
+				 * happen whether we return this join tuple or just fall
+				 * through to continue the state machine execution.
+				 */
+				node->mj_JoinState = EXEC_MJ_NEXTINNER;
+
+				/*
+				 * Check the extra qual conditions to see if we actually want
+				 * to return this join tuple.  If not, can proceed with merge.
+				 * We must distinguish the additional joinquals (which must
+				 * pass to consider the tuples "matched" for outer-join logic)
+				 * from the otherquals (which must pass before we actually
+				 * return the tuple).
+				 *
+				 * We don't bother with a ResetExprContext here, on the
+				 * assumption that we just did one while checking the merge
+				 * qual.  One per tuple should be sufficient.  We do have to
+				 * set up the econtext links to the tuples for ExecQual to
+				 * use.
+				 */
+				outerTupleSlot = node->mj_OuterTupleSlot;
+				econtext->ecxt_outertuple = outerTupleSlot;
+				innerTupleSlot = node->mj_InnerTupleSlot;
+				econtext->ecxt_innertuple = innerTupleSlot;
+
+				qualResult = (joinqual == NIL ||
+							  ExecQual(joinqual, econtext, false));
+				MJ_DEBUG_QUAL(joinqual, qualResult);
+
+				if (qualResult)
+				{
+					node->mj_MatchedOuter = true;
+					node->mj_MatchedInner = true;
+
+					/* In an antijoin, we never return a matched tuple */
+					if (node->jointype == JOIN_ANTI)
+					{
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					}
+
+					/*
+					 * In a semijoin, we'll consider returning the first
+					 * match, but after that we're done with this outer tuple.
+					 */
+					if (node->jointype == JOIN_SEMI)
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+
+					qualResult = (otherqual == NIL ||
+								  ExecQual(otherqual, econtext, false));
+					MJ_DEBUG_QUAL(otherqual, qualResult);
+
+					if (qualResult)
+					{
+						/*
+						 * qualification succeeded.  now form the desired
+						 * projection tuple and return the slot containing it.
+						 */
+						TupleTableSlot *result;
+						ExprDoneCond isDone;
+
+						MJ_printf("ExecMergeJoin: returning tuple\n");
+
+						result = ExecProject(node->cps.ps.ps_ProjInfo,
+											 &isDone);
+
+						if (isDone != ExprEndResult)
+						{
+							node->cps.ps.ps_TupFromTlist =
+								(isDone == ExprMultipleResult);
+							return result;
+						}
+					}
+					else
+						InstrCountFiltered2(node, 1);
+				}
+				else
+					InstrCountFiltered1(node, 1);
+				break;
+
+				/*
+				 * EXEC_MJ_NEXTINNER means advance the inner scan to the next
+				 * tuple. If the tuple is not nil, we then proceed to test it
+				 * against the join qualification.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this inner tuple.
+				 */
+			case EXEC_MJ_NEXTINNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_NEXTINNER\n");
+
+				if (doFillInner && !node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next inner tuple, if any.  If there's none,
+				 * advance to next outer tuple (which may be able to join to
+				 * previously marked tuples).
+				 *
+				 * NB: must NOT do "extraMarks" here, since we may need to
+				 * return to previously marked tuples.
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+
+						/*
+						 * Test the new inner tuple to see if it matches
+						 * outer.
+						 *
+						 * If they do match, then we join them and move on to
+						 * the next inner tuple (EXEC_MJ_JOINTUPLES).
+						 *
+						 * If they do not match then advance to next outer
+						 * tuple.
+						 */
+						compareResult = MJCompare(node);
+						MJ_DEBUG_COMPARE(compareResult);
+
+						if (compareResult == 0)
+							node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+						else
+						{
+							Assert(compareResult < 0);
+							node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						}
+						break;
+					case MJEVAL_NONMATCHABLE:
+
+						/*
+						 * It contains a NULL and hence can't match any outer
+						 * tuple, so we can skip the comparison and assume the
+						 * new tuple is greater than current outer.
+						 */
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					case MJEVAL_ENDOFJOIN:
+
+						/*
+						 * No more inner tuples.  However, this might be only
+						 * effective and not physical end of inner plan, so
+						 * force mj_InnerTupleSlot to null to make sure we
+						 * don't fetch more inner tuples.  (We need this hack
+						 * because we are not transiting to a state where the
+						 * inner plan is assumed to be exhausted.)
+						 */
+						node->mj_InnerTupleSlot = NULL;
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+				}
+				break;
+
+				/*-------------------------------------------
+				 * EXEC_MJ_NEXTOUTER means
+				 *
+				 *				outer inner
+				 * outer tuple -  5		5  - marked tuple
+				 *				  5		5
+				 *				  6		6  - inner tuple
+				 *				  7		7
+				 *
+				 * we know we just bumped into the
+				 * first inner tuple > current outer tuple (or possibly
+				 * the end of the inner stream)
+				 * so get a new outer tuple and then
+				 * proceed to test it against the marked tuple
+				 * (EXEC_MJ_TESTOUTER)
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this outer tuple.
+				 *------------------------------------------------
+				 */
+			case EXEC_MJ_NEXTOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_NEXTOUTER\n");
+
+				if (doFillOuter && !node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* Go test the new tuple against the marked tuple */
+						node->mj_JoinState = EXEC_MJ_TESTOUTER;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Can't match, so fetch next outer tuple */
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: end of outer subplan\n");
+						innerTupleSlot = node->mj_InnerTupleSlot;
+						if (doFillInner && !TupIsNull(innerTupleSlot))
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*--------------------------------------------------------
+				 * EXEC_MJ_TESTOUTER If the new outer tuple and the marked
+				 * tuple satisfy the merge clause then we know we have
+				 * duplicates in the outer scan so we have to restore the
+				 * inner scan to the marked tuple and proceed to join the
+				 * new outer tuple with the inner tuples.
+				 *
+				 * This is the case when
+				 *						  outer inner
+				 *							4	  5  - marked tuple
+				 *			 outer tuple -	5	  5
+				 *		 new outer tuple -	5	  5
+				 *							6	  8  - inner tuple
+				 *							7	 12
+				 *
+				 *				new outer tuple == marked tuple
+				 *
+				 * If the outer tuple fails the test, then we are done
+				 * with the marked tuples, and we have to look for a
+				 * match to the current inner tuple.  So we will
+				 * proceed to skip outer tuples until outer >= inner
+				 * (EXEC_MJ_SKIP_TEST).
+				 *
+				 *		This is the case when
+				 *
+				 *						  outer inner
+				 *							5	  5  - marked tuple
+				 *			 outer tuple -	5	  5
+				 *		 new outer tuple -	6	  8  - inner tuple
+				 *							7	 12
+				 *
+				 *				new outer tuple > marked tuple
+				 *
+				 *---------------------------------------------------------
+				 */
+			case EXEC_MJ_TESTOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_TESTOUTER\n");
+
+				/*
+				 * Here we must compare the outer tuple with the marked inner
+				 * tuple.  (We can ignore the result of MJEvalInnerValues,
+				 * since the marked inner tuple is certainly matchable.)
+				 */
+				innerTupleSlot = node->mj_MarkedTupleSlot;
+				(void) MJEvalInnerValues(node, innerTupleSlot);
+
+				compareResult = MJCompare(node);
+				MJ_DEBUG_COMPARE(compareResult);
+
+				if (compareResult == 0)
+				{
+					/*
+					 * the merge clause matched so now we restore the inner
+					 * scan position to the first mark, and go join that tuple
+					 * (and any following ones) to the new outer.
+					 *
+					 * NOTE: we do not need to worry about the MatchedInner
+					 * state for the rescanned inner tuples.  We know all of
+					 * them will match this new outer tuple and therefore
+					 * won't be emitted as fill tuples.  This works *only*
+					 * because we require the extra joinquals to be constant
+					 * when doing a right or full join --- otherwise some of
+					 * the rescanned tuples might fail the extra joinquals.
+					 * This obviously won't happen for a constant-true extra
+					 * joinqual, while the constant-false case is handled by
+					 * forcing the merge clause to never match, so we never
+					 * get here.
+					 */
+					ExecRestrPos(innerPlan);
+
+					/*
+					 * ExecRestrPos probably should give us back a new Slot,
+					 * but since it doesn't, use the marked slot.  (The
+					 * previously returned mj_InnerTupleSlot cannot be assumed
+					 * to hold the required tuple.)
+					 */
+					node->mj_InnerTupleSlot = innerTupleSlot;
+					/* we need not do MJEvalInnerValues again */
+
+					node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+				}
+				else
+				{
+					/* ----------------
+					 *	if the new outer tuple didn't match the marked inner
+					 *	tuple then we have a case like:
+					 *
+					 *			 outer inner
+					 *			   4	 4	- marked tuple
+					 * new outer - 5	 4
+					 *			   6	 5	- inner tuple
+					 *			   7
+					 *
+					 *	which means that all subsequent outer tuples will be
+					 *	larger than our marked inner tuples.  So we need not
+					 *	revisit any of the marked tuples but can proceed to
+					 *	look for a match to the current inner.	If there's
+					 *	no more inners, no more matches are possible.
+					 * ----------------
+					 */
+					Assert(compareResult > 0);
+					innerTupleSlot = node->mj_InnerTupleSlot;
+
+					/* reload comparison data for current inner */
+					switch (MJEvalInnerValues(node, innerTupleSlot))
+					{
+						case MJEVAL_MATCHABLE:
+							/* proceed to compare it to the current outer */
+							node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+							break;
+						case MJEVAL_NONMATCHABLE:
+
+							/*
+							 * current inner can't possibly match any outer;
+							 * better to advance the inner scan than the
+							 * outer.
+							 */
+							node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+							break;
+						case MJEVAL_ENDOFJOIN:
+							/* No more inner tuples */
+							if (doFillOuter)
+							{
+								/*
+								 * Need to emit left-join tuples for remaining
+								 * outer tuples.
+								 */
+								node->mj_JoinState = EXEC_MJ_ENDINNER;
+								break;
+							}
+							/* Otherwise we're done. */
+							return NULL;
+					}
+				}
+				break;
+
+				/*----------------------------------------------------------
+				 * EXEC_MJ_SKIP means compare tuples and if they do not
+				 * match, skip whichever is lesser.
+				 *
+				 * For example:
+				 *
+				 *				outer inner
+				 *				  5		5
+				 *				  5		5
+				 * outer tuple -  6		8  - inner tuple
+				 *				  7    12
+				 *				  8    14
+				 *
+				 * we have to advance the outer scan
+				 * until we find the outer 8.
+				 *
+				 * On the other hand:
+				 *
+				 *				outer inner
+				 *				  5		5
+				 *				  5		5
+				 * outer tuple - 12		8  - inner tuple
+				 *				 14    10
+				 *				 17    12
+				 *
+				 * we have to advance the inner scan
+				 * until we find the inner 12.
+				 *----------------------------------------------------------
+				 */
+			case EXEC_MJ_SKIP_TEST:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIP_TEST\n");
+
+				/*
+				 * before we advance, make sure the current tuples do not
+				 * satisfy the mergeclauses.  If they do, then we update the
+				 * marked tuple position and go join them.
+				 */
+				compareResult = MJCompare(node);
+				MJ_DEBUG_COMPARE(compareResult);
+
+				if (compareResult == 0)
+				{
+					ExecMarkPos(innerPlan);
+
+					MarkInnerTuple(node->mj_InnerTupleSlot, node);
+
+					node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+				}
+				else if (compareResult < 0)
+					node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
+				else
+					/* compareResult > 0 */
+					node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+				break;
+
+				/*
+				 * SKIPOUTER_ADVANCE: advance over an outer tuple that is
+				 * known not to join to any inner tuple.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this outer tuple.
+				 */
+			case EXEC_MJ_SKIPOUTER_ADVANCE:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIPOUTER_ADVANCE\n");
+
+				if (doFillOuter && !node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* Go test the new tuple against the current inner */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Can't match, so fetch next outer tuple */
+						node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: end of outer subplan\n");
+						innerTupleSlot = node->mj_InnerTupleSlot;
+						if (doFillInner && !TupIsNull(innerTupleSlot))
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * SKIPINNER_ADVANCE: advance over an inner tuple that is
+				 * known not to join to any outer tuple.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this inner tuple.
+				 */
+			case EXEC_MJ_SKIPINNER_ADVANCE:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIPINNER_ADVANCE\n");
+
+				if (doFillInner && !node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/* Mark before advancing, if wanted */
+				if (node->mj_ExtraMarks)
+					ExecMarkPos(innerPlan);
+
+				/*
+				 * now we get the next inner tuple, if any
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+						/* proceed to compare it to the current outer */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+
+						/*
+						 * current inner can't possibly match any outer;
+						 * better to advance the inner scan than the outer.
+						 */
+						node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more inner tuples */
+						MJ_printf("ExecMergeJoin: end of inner subplan\n");
+						outerTupleSlot = node->mj_OuterTupleSlot;
+						if (doFillOuter && !TupIsNull(outerTupleSlot))
+						{
+							/*
+							 * Need to emit left-join tuples for remaining
+							 * outer tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDINNER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * EXEC_MJ_ENDOUTER means we have run out of outer tuples, but
+				 * are doing a right/full join and therefore must null-fill
+				 * any remaining unmatched inner tuples.
+				 */
+			case EXEC_MJ_ENDOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_ENDOUTER\n");
+
+				Assert(doFillInner);
+
+				if (!node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/* Mark before advancing, if wanted */
+				if (node->mj_ExtraMarks)
+					ExecMarkPos(innerPlan);
+
+				/*
+				 * now we get the next inner tuple, if any
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				if (TupIsNull(innerTupleSlot))
+				{
+					MJ_printf("ExecMergeJoin: end of inner subplan\n");
+					return NULL;
+				}
+
+				/* Else remain in ENDOUTER state and process next tuple. */
+				break;
+
+				/*
+				 * EXEC_MJ_ENDINNER means we have run out of inner tuples, but
+				 * are doing a left/full join and therefore must null- fill
+				 * any remaining unmatched outer tuples.
+				 */
+			case EXEC_MJ_ENDINNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_ENDINNER\n");
+
+				Assert(doFillOuter);
+
+				if (!node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				if (TupIsNull(outerTupleSlot))
+				{
+					MJ_printf("ExecMergeJoin: end of outer subplan\n");
+					return NULL;
+				}
+
+				/* Else remain in ENDINNER state and process next tuple. */
+				break;
+
+				/*
+				 * broken state value?
+				 */
+			default:
+				elog(ERROR, "unrecognized mergejoin state: %d",
+					 (int) node->mj_JoinState);
+		}
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitMergeJoin
+ * ----------------------------------------------------------------
+ */
+MergeJoinState *
+_ExecInitMergeJoin(CustomMergeJoin *node, EState *estate, int eflags)
+{
+	MergeJoinState *mergestate;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	MJ1_printf("ExecInitMergeJoin: %s\n",
+			   "initializing node");
+
+	/*
+	 * create state structure
+	 */
+	mergestate = makeNode(MergeJoinState);
+	mergestate->js.ps.plan = (Plan *) node;
+	mergestate->js.ps.state = estate;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &mergestate->js.ps);
+
+	/*
+	 * we need two additional econtexts in which we can compute the join
+	 * expressions from the left and right input tuples.  The node's regular
+	 * econtext won't do because it gets reset too often.
+	 */
+	mergestate->mj_OuterEContext = CreateExprContext(estate);
+	mergestate->mj_InnerEContext = CreateExprContext(estate);
+
+	/*
+	 * initialize child expressions
+	 */
+	mergestate->js.ps.targetlist = (List *)
+		ExecInitExpr((Expr *) node->cplan.plan.targetlist,
+					 (PlanState *) mergestate);
+	mergestate->js.ps.qual = (List *)
+		ExecInitExpr((Expr *) node->cplan.plan.qual,
+					 (PlanState *) mergestate);
+	mergestate->js.jointype = node->jointype;
+	mergestate->js.joinqual = (List *)
+		ExecInitExpr((Expr *) node->joinqual,
+					 (PlanState *) mergestate);
+	mergestate->mj_ConstFalseJoin = false;
+	/* mergeclauses are handled below */
+
+	/*
+	 * initialize child nodes
+	 *
+	 * inner child must support MARK/RESTORE.
+	 */
+	outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+	innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
+											  eflags | EXEC_FLAG_MARK);
+
+	/*
+	 * For certain types of inner child nodes, it is advantageous to issue
+	 * MARK every time we advance past an inner tuple we will never return to.
+	 * For other types, MARK on a tuple we cannot return to is a waste of
+	 * cycles.	Detect which case applies and set mj_ExtraMarks if we want to
+	 * issue "unnecessary" MARK calls.
+	 *
+	 * Currently, only Material wants the extra MARKs, and it will be helpful
+	 * only if eflags doesn't specify REWIND.
+	 */
+	if (IsA(innerPlan(node), Material) &&
+		(eflags & EXEC_FLAG_REWIND) == 0)
+		mergestate->mj_ExtraMarks = true;
+	else
+		mergestate->mj_ExtraMarks = false;
+
+	/*
+	 * tuple table initialization
+	 */
+	ExecInitResultTupleSlot(estate, &mergestate->js.ps);
+
+	mergestate->mj_MarkedTupleSlot = ExecInitExtraTupleSlot(estate);
+	ExecSetSlotDescriptor(mergestate->mj_MarkedTupleSlot,
+						  ExecGetResultType(innerPlanState(mergestate)));
+
+	switch (node->jointype)
+	{
+		case JOIN_INNER:
+		case JOIN_SEMI:
+			mergestate->mj_FillOuter = false;
+			mergestate->mj_FillInner = false;
+			break;
+		case JOIN_LEFT:
+		case JOIN_ANTI:
+			mergestate->mj_FillOuter = true;
+			mergestate->mj_FillInner = false;
+			mergestate->mj_NullInnerTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(innerPlanState(mergestate)));
+			break;
+		case JOIN_RIGHT:
+			mergestate->mj_FillOuter = false;
+			mergestate->mj_FillInner = true;
+			mergestate->mj_NullOuterTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(outerPlanState(mergestate)));
+
+			/*
+			 * Can't handle right or full join with non-constant extra
+			 * joinclauses.  This should have been caught by planner.
+			 */
+			if (!check_constant_qual(node->joinqual,
+									 &mergestate->mj_ConstFalseJoin))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("RIGHT JOIN is only supported with merge-joinable join conditions")));
+			break;
+		case JOIN_FULL:
+			mergestate->mj_FillOuter = true;
+			mergestate->mj_FillInner = true;
+			mergestate->mj_NullOuterTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(outerPlanState(mergestate)));
+			mergestate->mj_NullInnerTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(innerPlanState(mergestate)));
+
+			/*
+			 * Can't handle right or full join with non-constant extra
+			 * joinclauses.  This should have been caught by planner.
+			 */
+			if (!check_constant_qual(node->joinqual,
+									 &mergestate->mj_ConstFalseJoin))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("FULL JOIN is only supported with merge-joinable join conditions")));
+			break;
+		default:
+			elog(ERROR, "unrecognized join type: %d",
+				 (int) node->jointype);
+	}
+
+	/*
+	 * initialize tuple type and projection info
+	 */
+	ExecAssignResultTypeFromTL(&mergestate->js.ps);
+	ExecAssignProjectionInfo(&mergestate->js.ps, NULL);
+
+	/*
+	 * preprocess the merge clauses
+	 */
+	mergestate->mj_NumClauses = list_length(node->mergeclauses);
+	mergestate->mj_Clauses = MJExamineQuals(node->mergeclauses,
+											node->mergeFamilies,
+											node->mergeCollations,
+											node->mergeStrategies,
+											node->mergeNullsFirst,
+											(PlanState *) mergestate);
+
+	/*
+	 * initialize join state
+	 */
+	mergestate->mj_JoinState = EXEC_MJ_INITIALIZE_OUTER;
+	mergestate->js.ps.ps_TupFromTlist = false;
+	mergestate->mj_MatchedOuter = false;
+	mergestate->mj_MatchedInner = false;
+	mergestate->mj_OuterTupleSlot = NULL;
+	mergestate->mj_InnerTupleSlot = NULL;
+
+	/*
+	 * initialization successful
+	 */
+	MJ1_printf("ExecInitMergeJoin: %s\n",
+			   "node initialized");
+
+	return mergestate;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndMergeJoin
+ *
+ * old comments
+ *		frees storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+_ExecEndMergeJoin(CustomMergeJoinState *node)
+{
+	MJ1_printf("ExecEndMergeJoin: %s\n",
+			   "ending node processing");
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->cps.ps);
+
+	/*
+	 * clean out the tuple table
+	 */
+	ExecClearTuple(node->cps.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->mj_MarkedTupleSlot);
+
+	/*
+	 * shut down the subplans
+	 */
+	ExecEndNode(innerPlanState(node));
+	ExecEndNode(outerPlanState(node));
+
+	MJ1_printf("ExecEndMergeJoin: %s\n",
+			   "node processing ended");
+}
+
+void
+_ExecReScanMergeJoin(CustomMergeJoinState *node)
+{
+	ExecClearTuple(node->mj_MarkedTupleSlot);
+
+	node->mj_JoinState = EXEC_MJ_INITIALIZE_OUTER;
+	node->cps.ps.ps_TupFromTlist = false;
+	node->mj_MatchedOuter = false;
+	node->mj_MatchedInner = false;
+	node->mj_OuterTupleSlot = NULL;
+	node->mj_InnerTupleSlot = NULL;
+
+	/*
+	 * if chgParam of subnodes is not null then plans will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (node->cps.ps.lefttree->chgParam == NULL)
+		ExecReScan(node->cps.ps.lefttree);
+	if (node->cps.ps.righttree->chgParam == NULL)
+		ExecReScan(node->cps.ps.righttree);
+
+}
diff --git a/contrib/custmj/setrefs.c b/contrib/custmj/setrefs.c
new file mode 100644
index 0000000..9eb0b14
--- /dev/null
+++ b/contrib/custmj/setrefs.c
@@ -0,0 +1,326 @@
+/*-------------------------------------------------------------------------
+ *
+ * setrefs.c
+ *	  Post-processing of a completed plan tree: fix references to subplan
+ *	  vars, compute regproc values for operators, etc
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/plan/setrefs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/transam.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/tlist.h"
+#include "tcop/utility.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "custmj.h"
+
+typedef struct
+{
+	PlannerInfo *root;
+	indexed_tlist *outer_itlist;
+	indexed_tlist *inner_itlist;
+	Index		acceptable_rel;
+	int			rtoffset;
+} fix_join_expr_context;
+
+typedef struct
+{
+	PlannerInfo *root;
+	indexed_tlist *subplan_itlist;
+	Index		newvarno;
+	int			rtoffset;
+} fix_upper_expr_context;
+
+static Var *search_indexed_tlist_for_non_var(Node *node,
+								 indexed_tlist *itlist,
+								 Index newvarno);
+static Node *fix_join_expr_mutator(Node *node,
+					  fix_join_expr_context *context);
+/*
+ * copyVar
+ *		Copy a Var node.
+ *
+ * fix_scan_expr and friends do this enough times that it's worth having
+ * a bespoke routine instead of using the generic copyObject() function.
+ */
+static inline Var *
+copyVar(Var *var)
+{
+	Var		   *newvar = (Var *) palloc(sizeof(Var));
+
+	*newvar = *var;
+	return newvar;
+}
+
+/*
+ * build_tlist_index --- build an index data structure for a child tlist
+ *
+ * In most cases, subplan tlists will be "flat" tlists with only Vars,
+ * so we try to optimize that case by extracting information about Vars
+ * in advance.	Matching a parent tlist to a child is still an O(N^2)
+ * operation, but at least with a much smaller constant factor than plain
+ * tlist_member() searches.
+ *
+ * The result of this function is an indexed_tlist struct to pass to
+ * search_indexed_tlist_for_var() or search_indexed_tlist_for_non_var().
+ * When done, the indexed_tlist may be freed with a single pfree().
+ */
+indexed_tlist *
+build_tlist_index(List *tlist)
+{
+	indexed_tlist *itlist;
+	tlist_vinfo *vinfo;
+	ListCell   *l;
+
+	/* Create data structure with enough slots for all tlist entries */
+	itlist = (indexed_tlist *)
+		palloc(offsetof(indexed_tlist, vars) +
+			   list_length(tlist) * sizeof(tlist_vinfo));
+
+	itlist->tlist = tlist;
+	itlist->has_ph_vars = false;
+	itlist->has_non_vars = false;
+
+	/* Find the Vars and fill in the index array */
+	vinfo = itlist->vars;
+	foreach(l, tlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+		if (tle->expr && IsA(tle->expr, Var))
+		{
+			Var		   *var = (Var *) tle->expr;
+
+			vinfo->varno = var->varno;
+			vinfo->varattno = var->varattno;
+			vinfo->resno = tle->resno;
+			vinfo++;
+		}
+		else if (tle->expr && IsA(tle->expr, PlaceHolderVar))
+			itlist->has_ph_vars = true;
+		else
+			itlist->has_non_vars = true;
+	}
+
+	itlist->num_vars = (vinfo - itlist->vars);
+
+	return itlist;
+}
+
+/*
+ * search_indexed_tlist_for_var --- find a Var in an indexed tlist
+ *
+ * If a match is found, return a copy of the given Var with suitably
+ * modified varno/varattno (to wit, newvarno and the resno of the TLE entry).
+ * Also ensure that varnoold is incremented by rtoffset.
+ * If no match, return NULL.
+ */
+static Var *
+search_indexed_tlist_for_var(Var *var, indexed_tlist *itlist,
+							 Index newvarno, int rtoffset)
+{
+	Index		varno = var->varno;
+	AttrNumber	varattno = var->varattno;
+	tlist_vinfo *vinfo;
+	int			i;
+
+	vinfo = itlist->vars;
+	i = itlist->num_vars;
+	while (i-- > 0)
+	{
+		if (vinfo->varno == varno && vinfo->varattno == varattno)
+		{
+			/* Found a match */
+			Var		   *newvar = copyVar(var);
+
+			newvar->varno = newvarno;
+			newvar->varattno = vinfo->resno;
+			if (newvar->varnoold > 0)
+				newvar->varnoold += rtoffset;
+			return newvar;
+		}
+		vinfo++;
+	}
+	return NULL;				/* no match */
+}
+
+/*
+ * search_indexed_tlist_for_non_var --- find a non-Var in an indexed tlist
+ *
+ * If a match is found, return a Var constructed to reference the tlist item.
+ * If no match, return NULL.
+ *
+ * NOTE: it is a waste of time to call this unless itlist->has_ph_vars or
+ * itlist->has_non_vars
+ */
+static Var *
+search_indexed_tlist_for_non_var(Node *node,
+								 indexed_tlist *itlist, Index newvarno)
+{
+	TargetEntry *tle;
+
+	tle = tlist_member(node, itlist->tlist);
+	if (tle)
+	{
+		/* Found a matching subplan output expression */
+		Var		   *newvar;
+
+		newvar = makeVarFromTargetEntry(newvarno, tle);
+		newvar->varnoold = 0;	/* wasn't ever a plain Var */
+		newvar->varoattno = 0;
+		return newvar;
+	}
+	return NULL;				/* no match */
+}
+
+/*
+ * fix_join_expr
+ *	   Create a new set of targetlist entries or join qual clauses by
+ *	   changing the varno/varattno values of variables in the clauses
+ *	   to reference target list values from the outer and inner join
+ *	   relation target lists.  Also perform opcode lookup and add
+ *	   regclass OIDs to root->glob->relationOids.
+ *
+ * This is used in two different scenarios: a normal join clause, where all
+ * the Vars in the clause *must* be replaced by OUTER_VAR or INNER_VAR
+ * references; and a RETURNING clause, which may contain both Vars of the
+ * target relation and Vars of other relations.  In the latter case we want
+ * to replace the other-relation Vars by OUTER_VAR references, while leaving
+ * target Vars alone.
+ *
+ * For a normal join, acceptable_rel should be zero so that any failure to
+ * match a Var will be reported as an error.  For the RETURNING case, pass
+ * inner_itlist = NULL and acceptable_rel = the ID of the target relation.
+ *
+ * 'clauses' is the targetlist or list of join clauses
+ * 'outer_itlist' is the indexed target list of the outer join relation
+ * 'inner_itlist' is the indexed target list of the inner join relation,
+ *		or NULL
+ * 'acceptable_rel' is either zero or the rangetable index of a relation
+ *		whose Vars may appear in the clause without provoking an error
+ * 'rtoffset': how much to increment varnoold by
+ *
+ * Returns the new expression tree.  The original clause structure is
+ * not modified.
+ */
+List *
+fix_join_expr(PlannerInfo *root,
+			  List *clauses,
+			  indexed_tlist *outer_itlist,
+			  indexed_tlist *inner_itlist,
+			  Index acceptable_rel,
+			  int rtoffset)
+{
+	fix_join_expr_context context;
+
+	context.root = root;
+	context.outer_itlist = outer_itlist;
+	context.inner_itlist = inner_itlist;
+	context.acceptable_rel = acceptable_rel;
+	context.rtoffset = rtoffset;
+	return (List *) fix_join_expr_mutator((Node *) clauses, &context);
+}
+
+static Node *
+fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
+{
+	Var		   *newvar;
+
+	if (node == NULL)
+		return NULL;
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+
+		/* First look for the var in the input tlists */
+		newvar = search_indexed_tlist_for_var(var,
+											  context->outer_itlist,
+											  OUTER_VAR,
+											  context->rtoffset);
+		if (newvar)
+			return (Node *) newvar;
+		if (context->inner_itlist)
+		{
+			newvar = search_indexed_tlist_for_var(var,
+												  context->inner_itlist,
+												  INNER_VAR,
+												  context->rtoffset);
+			if (newvar)
+				return (Node *) newvar;
+		}
+
+		/* If it's for acceptable_rel, adjust and return it */
+		if (var->varno == context->acceptable_rel)
+		{
+			var = copyVar(var);
+			var->varno += context->rtoffset;
+			if (var->varnoold > 0)
+				var->varnoold += context->rtoffset;
+			return (Node *) var;
+		}
+
+		/* No referent found for Var */
+		elog(ERROR, "variable not found in subplan target lists");
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		/* See if the PlaceHolderVar has bubbled up from a lower plan node */
+		if (context->outer_itlist->has_ph_vars)
+		{
+			newvar = search_indexed_tlist_for_non_var((Node *) phv,
+													  context->outer_itlist,
+													  OUTER_VAR);
+			if (newvar)
+				return (Node *) newvar;
+		}
+		if (context->inner_itlist && context->inner_itlist->has_ph_vars)
+		{
+			newvar = search_indexed_tlist_for_non_var((Node *) phv,
+													  context->inner_itlist,
+													  INNER_VAR);
+			if (newvar)
+				return (Node *) newvar;
+		}
+
+		/* If not supplied by input plans, evaluate the contained expr */
+		return fix_join_expr_mutator((Node *) phv->phexpr, context);
+	}
+	/* Try matching more complex expressions too, if tlists have any */
+	if (context->outer_itlist->has_non_vars)
+	{
+		newvar = search_indexed_tlist_for_non_var(node,
+												  context->outer_itlist,
+												  OUTER_VAR);
+		if (newvar)
+			return (Node *) newvar;
+	}
+	if (context->inner_itlist && context->inner_itlist->has_non_vars)
+	{
+		newvar = search_indexed_tlist_for_non_var(node,
+												  context->inner_itlist,
+												  INNER_VAR);
+		if (newvar)
+			return (Node *) newvar;
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node,
+								   fix_join_expr_mutator,
+								   (void *) context);
+}
diff --git a/contrib/custmj/sql/custmj.sql b/contrib/custmj/sql/custmj.sql
new file mode 100644
index 0000000..ffb6d9d
--- /dev/null
+++ b/contrib/custmj/sql/custmj.sql
@@ -0,0 +1,79 @@
+-- regression test for custmj extension
+
+--
+-- initial setup
+--
+CREATE TABLE t1 (a int, b text);
+CREATE TABLE t2 (x int, y text);
+CREATE TABLE t3 (n int primary key, m text);
+CREATE TABLE t4 (s int references t3(n), t text);
+
+INSERT INTO t1 (SELECT x, md5(x::text) FROM generate_series(  1,600) x);
+INSERT INTO t2 (SELECT x, md5(x::text) FROM generate_series(401,800) x);
+INSERT INTO t3 (SELECT x, md5(x::text) FROM generate_series(  1,800) x);
+INSERT INTO t4 (SELECT x, md5(x::text) FROM generate_series(201,600) x);
+
+VACUUM ANALYZE t1;
+VACUUM ANALYZE t2;
+VACUUM ANALYZE t3;
+VACUUM ANALYZE t4;
+-- LOAD this extension
+LOAD 'custmj';
+
+--
+-- explain output
+--
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+
+-- force off hash_join
+SET enable_hashjoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+SELECT * INTO bmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+SELECT * INTO bmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+SELECT * INTO bmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+SELECT * INTO bmj4 FROM t3 FULL JOIN t4 ON n = s;
+
+-- force off built-in merge_join
+SET enable_mergejoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+SELECT * INTO cmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+SELECT * INTO cmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+SELECT * INTO cmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+SELECT * INTO cmj4 FROM t3 FULL JOIN t4 ON n = s;
+
+-- compare the difference of simple result
+SELECT * FROM bmj1 EXCEPT SELECT * FROM cmj1;
+SELECT * FROM cmj1 EXCEPT SELECT * FROM bmj1;
+SELECT * FROM bmj2 EXCEPT SELECT * FROM cmj2;
+SELECT * FROM cmj2 EXCEPT SELECT * FROM bmj2;
+SELECT * FROM bmj3 EXCEPT SELECT * FROM cmj3;
+SELECT * FROM cmj3 EXCEPT SELECT * FROM bmj3;
+SELECT * FROM bmj4 EXCEPT SELECT * FROM cmj4;
+SELECT * FROM cmj4 EXCEPT SELECT * FROM bmj4;
+
+-- a little bit complicated
+EXPLAIN (verbose, costs off)
+  SELECT (a + x + n) % s AS c1, md5(b || y || m || t) AS c2
+  FROM ((t1 join t2 on a = x) join t3 on y = m) join t4 on n = s
+  WHERE b like '%ab%' AND y like '%cd%' AND m like t;
+
+PREPARE p1(int,int) AS
+SELECT * FROM t1 JOIN t3 ON a = n WHERE n BETWEEN $1 AND $2;
+EXPLAIN (verbose, costs off) EXECUTE p1(100,100);
+EXPLAIN (verbose, costs off) EXECUTE p1(100,1000);
+
+EXPLAIN (verbose, costs off)
+SELECT * FROM t1 JOIN t2 ON a = x WHERE x IN (SELECT n % 100 FROM t3);
+
+-- check GetSpecialCustomVar stuff
+SET client_min_messages = debug;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;

pgsql-v9.4-custom-scan.part-1.v10.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v10.patchDownload

 doc/src/sgml/custom-plan.sgml           | 315 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  45 ++++-
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  23 +++
 src/backend/executor/execProcnode.c     |  15 ++
 src/backend/executor/nodeCustom.c       |  73 ++++++++
 src/backend/nodes/copyfuncs.c           |  42 +++++
 src/backend/nodes/outfuncs.c            |  40 ++++
 src/backend/optimizer/path/allpaths.c   |  34 ++--
 src/backend/optimizer/path/joinpath.c   |  16 ++
 src/backend/optimizer/plan/createplan.c |  55 ++++--
 src/backend/optimizer/plan/setrefs.c    |  22 ++-
 src/backend/optimizer/plan/subselect.c  | 128 +++++++------
 src/backend/utils/adt/ruleutils.c       |  56 ++++++
 src/include/commands/explain.h          |   1 +
 src/include/executor/nodeCustom.h       |  30 +++
 src/include/nodes/execnodes.h           |  12 ++
 src/include/nodes/nodes.h               |   6 +
 src/include/nodes/plannodes.h           |  77 ++++++++
 src/include/nodes/relation.h            |  29 +++
 src/include/optimizer/paths.h           |  17 ++
 src/include/optimizer/planmain.h        |  11 ++
 src/include/optimizer/subselect.h       |   7 +
 25 files changed, 956 insertions(+), 102 deletions(-)

diff --git a/doc/src/sgml/custom-plan.sgml b/doc/src/sgml/custom-plan.sgml
new file mode 100644
index 0000000..8d456f9
--- /dev/null
+++ b/doc/src/sgml/custom-plan.sgml
@@ -0,0 +1,315 @@
+<!-- doc/src/sgml/custom-plan.sgml -->
+
+<chapter id="custom-plan">
+ <title>Writing A Custom Plan Provider</title>
+
+ <indexterm zone="custom-plan">
+  <primary>custom plan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-plan interface enables extensions to implement its custom
+  behavior, instead of built-in plan node, according to the cost based
+  optimizer manner.
+  Its key component is <literal>CustomPlan</> node that has usual
+  <literal>Plan</> field and a table of function-pointers; that performs
+  like methods of base class in object oriented programming language,
+  thus <literal>CustomPlan</> node works as a polymorphism plan / execution
+  node.
+  The core backend does not assume anything about behavior of this node
+  type, thus, note that it is responsibility of the custom-plan provider
+  to work its custom node as if the built-in plan / execution node being
+  replaced.
+ </para>
+ <para>
+  Overall steps to use this custom-plan interface is below.
+ </para>
+ <para>
+  Custom-plan provider can add <literal>CustomPath</> on a particular
+  relation scan using <literal>add_scan_path_hook</> or a particular
+  relations join using <literal>add_join_path_hook</>.
+  Then, the planner chooses the cheapest path towards a particular
+  scan or join in the built-in and custom paths.
+  So, <literal>CustomPath</> node has to have proper cost estimation
+  for right plan selection, no need to say.
+ </para>
+ <para>
+  Usually, custom-plan provider extends <literal>CustomPath</> type
+  to have its private fields, like:
+<programlisting>
+typedef struct {
+    CustomPath    cpath;
+        :
+    List         *somethin_private;
+        :
+} YourOwnCustomPath;
+</programlisting>
+  You can also extend <literal>CustomPlan</> and <literal>CustomPlanState</>
+  type with similar manner.
+ </para>
+ <para>
+  <literal>CustomPathMethods</> is table of function-pointers
+  for <literal>CustomPath</>, and <literal>CustomPlanMethods</> is
+  table of function-pointers for <literal>CustomPlan</> and
+  <literal>CustomPlanState</>.
+  Extension has to implement the functions according to the specification
+  in the next section.
+ </para>
+
+ <sect1 id="custom-plan-spec">
+  <title>Specification of Custom Plan Interface</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom-plan path</title>
+   <para>
+    The first task of custom-plan provide is to add <literal>CustomPath</>
+    towards a particular relation scan or relations join.
+    Right now, only scan and join are supported by planner thus cost-based
+    optimization shall be applied, however, other kind of nodes (like sort,
+    aggregate and so on...) are not supported.
+   </para>
+   <para>
+<programlisting>
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+                                        RelOptInfo *baserel,
+                                        RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+</programlisting>
+    Custom-plan provider can add its custom-path using
+    <literal>add_scan_path_hook</> to provide alternative way to scan
+    the relation being specified.
+   </para>
+   <para>
+<programlisting>
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                        RelOptInfo *joinrel,
+                                        RelOptInfo *outerrel,
+                                        RelOptInfo *innerrel,
+                                        JoinType jointype,
+                                        SpecialJoinInfo *sjinfo,
+                                        List *restrictlist,
+                                        Relids param_source_rels,
+                                        Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
+</programlisting>
+    Also, custom-plan provider can add its custom-path using
+    <literal>add_join_path_hook</> to provide alternative way to join
+    two relations (note that both or either of relations are also joined
+    relations, not only base relations) being specified.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-path-methods">
+   <title>Methods of CustomPath</title>
+   <para>
+    This section introduces the method functions of <literal>CustomPath</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlan *
+CreateCustomPlan(PlannerInfo *root,
+                 CustomPath *custom_path);
+</programlisting>
+    This method pupolates a node object that (at least) extends
+    <literal>CustomPlan</> data type, according to the supplied
+    <literal>CustomPath</>.
+    If this custom-plan support mark-and-restore position, its
+    node tag should be <literal>CustomPlanMarkPos</>, instead of
+    <literal>CustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+TextOutCustomPath(StringInfo str, Node *node);
+</programlisting>    
+    This method is needed to support <literal>nodeToString</> for your
+    custom path type to dump its private fields also.
+    The message format has to follow the manner in <filename>outfuncs.c</>.
+   </para>
+  </sect2>
+  <sect2 id="custom-plan-methods">
+   <title>Methods of CustomPlan</title>
+   <para>
+    This section introduces the method functions of <literal>CustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+SetCustomPlanRef(PlannerInfo *root,
+                 CustomPlan *custom_plan,
+                 int rtoffset);
+</programlisting>
+    This method requires custom-plan provides to adjust <literal>Var</> node
+    references in the supplied <literal>CustomPlan</> node.
+    Usually, it shall be shifted by <literal>rtoffset</>, or replaced by
+    <literal>INNER_VAR</> or <literal>OUTER_VAR</> if it references either
+    left or right subplan.
+   </para>
+   <para>
+<programlisting>
+bool
+SupportBackwardScan(CustomPlan *custom_plan);
+</programlisting>
+    This optional method informs the core backend whether this custom-plan
+    supports backward scan capability, or not.
+    If this method is implemented and returns <literal>true</>, it means
+    this custom-plan node supports backward scan. Elsewhere, it is not
+    available.
+   </para>
+   <para>
+<programlisting>
+void
+FinalizeCustomPlan(PlannerInfo *root,
+                   CustomPlan *custom_plan,
+                   Bitmapset **paramids,
+                   Bitmapset **valid_params,
+                   Bitmapset **scan_params);
+</programlisting>
+    This optional method informs the core backend which query parameters
+    are referenced in this custom-plan node, in addition to the ones
+    considered in the <literal>targetlist</> and <literal>qual</> fields
+    of the base <literal>Plan</> node.
+    If parameters are found in the private data field managed by custom-
+    plan provider, it needs to update the supplied bitmapset as expected
+    in the <literal>finalize_plan()</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlanState *
+BeginCustomPlan(CustomPlan *custom_plan,
+                EState *estate,
+                int eflags);
+</programlisting>
+    This method populates a <literal>CustomPlanState</> object according to
+    the supplied <literal>CustomPlan</>, and initializes execution of this
+    custom-plan node, first of all.
+   </para>
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It fetches one tuple from this custom-plan node. This custom-plan node
+    has to set a valid tuple on the <literal>ps_ResultTupleSlot</> and
+    return if any, or returns <literal>NULL</> to inform the upper node
+    it already reached end of the scan.
+   </para>
+   <para>
+<programlisting>
+Node *
+MultiExecCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    Unlike <literal>ExecCustomPlan</>, it allows upper node to fetch
+    multiple tuples, however, you need to pay attention the data format
+    and the way to return it because it fully depends on the type of
+    upper node.
+   </para>
+   <para>
+<programlisting>
+void
+EndCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It ends the execution of this custom-plan node, and releases the
+    resources being allocated. Usually, it is not important to release
+    memory in the per execution memory context, so custom-plan provider
+    should be responsible to its own resources regardless of the framework.
+   </para>
+   <para>
+<programlisting>
+void
+ReScanCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It is optional, but should be implemented if <literal>CustomPlanMarkPos</>
+    was applied, instead of <literal>CustomPlan</>.
+    It saves the current position of the custom-plan on somewhere private
+    state, to restore the position later.    
+   </para>
+   <para>
+<programlisting>
+void
+RestrPosCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It is optional, but should be implemented if <literal>CustomPlanMarkPos</>
+    was applied, instead of <literal>CustomPlan</>.
+    It restores the current position of the custom-plan from the private
+    information being saved somewhere at <literal>MarkPosCustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomPlanTargetRel(CustomPlanState *cpstate,
+                           ExplainState *es);
+</programlisting>
+    It shows the target relation, if this custom-plan node replaced
+    a particular relation scan. Because of implementation reason, this
+    method is separated from the <literal>ExplainCustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomPlan(CustomPlanState *cpstate,
+                  List *ancestors,
+                  ExplainState *es);
+</programlisting>
+    It put properties of this custom-plan node into the supplied
+    <literal>ExplainState</> according to the usual <command>EXPLAIN</>
+    manner.
+   </para>
+   <para>
+<programlisting>
+Bitmapset *
+GetRelidsCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It returns a set of range-table indexes being scanned by this custom-
+    plan node. In case of multiple relations are underlying, it is not
+    always singleton bitmap.
+   </para>
+   <para>
+<programlisting>
+Node *
+GetSpecialCustomVar(CustomPlanState *cpstate,
+                    Var *varnode);
+</programlisting>
+    This optional method returns an expression node to be referenced by
+    the supplied <literal>varnode</> that has special <literal>varno</>
+    (<literal>INNER_VAR</>, <literal>OUTER_VAR</> or <literal>INDEX_VAR</>).
+    <command>EXPLAIN</> command shows column name being referenced in the
+    targetlist or qualifiers of plan nodes. If a var node has special
+    <literal>varno</>, it recursively walks down the underlying subplan to
+    ensure the actual expression referenced by this special varno.
+    In case when a custom-plan node replaced a join node but does not have
+    underlying sub-plan on the left- and right-tree, it is unavailable to
+    use a usual logic, so custom-plan provider has to implement this method
+    to inform the core backend the expression node being referenced by
+    the supplied <literal>varnode</> that has special <literal>varno</>.
+    If this method is not implemented or returns <literal>NULL</>,
+    the core backend solves the special varnode reference as usual.
+   </para>
+   <para>
+<programlisting>
+void
+TextOutCustomPlan(StringInfo str, const CustomPlan *node);
+</programlisting>
+    This method is needed to support <literal>nodeToString</> for your
+    custom plan type to dump its private fields also.
+    The message format has to follow the manner in <filename>outfuncs.c</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlan *
+CopyCustomPlan(const CustomPlan *from);
+</programlisting>
+    This methos is needed to support <literal>copyObject</> for your
+    custom plan type to copy its private fields also.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 0e863ee..33f964e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-plan  SYSTEM "custom-plan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index b47bf52..45e9d32 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-plan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 08f3167..ff9fc7b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -51,7 +52,6 @@ static void ExplainOneQuery(Query *query, IntoClause *into, ExplainState *es,
 static void report_triggers(ResultRelInfo *rInfo, bool show_relname,
 				ExplainState *es);
 static double elapsed_time(instr_time *starttime);
-static void ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used);
 static void ExplainPreScanMemberNodes(List *plans, PlanState **planstates,
 						  Bitmapset **rels_used);
 static void ExplainPreScanSubPlans(List *plans, Bitmapset **rels_used);
@@ -700,7 +700,7 @@ elapsed_time(instr_time *starttime)
  * This ensures that we don't confusingly assign un-suffixed aliases to RTEs
  * that never appear in the EXPLAIN output (such as inheritance parents).
  */
-static void
+void
 ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 {
 	Plan	   *plan = planstate->plan;
@@ -721,6 +721,16 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			{
+				CustomPlanState	   *cpstate = (CustomPlanState *)planstate;
+				Bitmapset		   *temp
+					= cpstate->methods->GetRelidsCustomPlan(cpstate);
+
+				*rels_used = bms_union(*rels_used, temp);
+			}
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -847,6 +857,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -935,6 +946,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomPlan:
+			sname = "Custom";
+			custom_name = ((CustomPlan *) plan)->methods->CustomName;
+			if (custom_name != NULL)
+				pname = psprintf("Custom (%s)", custom_name);
+			else
+				pname = sname;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1036,6 +1055,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1051,6 +1072,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomPlan:
+			{
+				CustomPlanState	*cps = (CustomPlanState *)planstate;
+
+				if (cps->methods->ExplainCustomPlanTargetRel)
+					cps->methods->ExplainCustomPlanTargetRel(cps, es);
+			}
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1347,6 +1376,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomPlan:
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			if (((CustomPlanState *) planstate)->methods->ExplainCustomPlan)
+			{
+				CustomPlanState *cpstate = (CustomPlanState *) planstate;
+
+				cpstate->methods->ExplainCustomPlan(cpstate, ancestors, es);
+			}
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 8c01a63..47e7a3c 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecReScanCustomPlan((CustomPlanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecCustomMarkPos((CustomPlanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecCustomRestrPos((CustomPlanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -390,6 +403,7 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_ValuesScan:
 		case T_Material:
 		case T_Sort:
+		case T_CustomPlanMarkPos:
 			return true;
 
 		case T_Result:
@@ -465,6 +479,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomPlan:
+			{
+				CustomPlan *cplan = (CustomPlan *) node;
+
+				if (cplan->methods->SupportBackwardScan)
+					return cplan->methods->SupportBackwardScan(cplan);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c5ecd18..5aa117b 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			result = (PlanState *) ExecInitCustomPlan((CustomPlan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +449,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			result = ExecCustomPlan((CustomPlanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +689,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecEndCustomPlan((CustomPlanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..e3c8f58
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,73 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/executor.h"
+#include "executor/nodeCustom.h"
+#include "nodes/execnodes.h"
+#include "nodes/plannodes.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+CustomPlanState *
+ExecInitCustomPlan(CustomPlan *custom_plan, EState *estate, int eflags)
+{
+	CustomPlanState	   *cpstate
+		= custom_plan->methods->BeginCustomPlan(custom_plan, estate, eflags);
+
+	Assert(IsA(cpstate, CustomPlanState));
+
+	return cpstate;
+}
+
+TupleTableSlot *
+ExecCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->ExecCustomPlan != NULL);
+	return cpstate->methods->ExecCustomPlan(cpstate);
+}
+
+Node *
+MultiExecCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->MultiExecCustomPlan != NULL);
+	return cpstate->methods->MultiExecCustomPlan(cpstate);
+}
+
+void
+ExecEndCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->EndCustomPlan != NULL);
+	cpstate->methods->EndCustomPlan(cpstate);
+}
+
+void
+ExecReScanCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->ReScanCustomPlan != NULL);
+	cpstate->methods->ReScanCustomPlan(cpstate);
+}
+
+void
+ExecCustomMarkPos(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->MarkPosCustomPlan != NULL);
+	cpstate->methods->MarkPosCustomPlan(cpstate);
+}
+
+void
+ExecCustomRestrPos(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->RestrPosCustomPlan != NULL);
+	cpstate->methods->RestrPosCustomPlan(cpstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c89d808..18505cd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,42 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomPlan
+ */
+static CustomPlan *
+_copyCustomPlan(const CustomPlan *from)
+{
+	CustomPlan *newnode = from->methods->CopyCustomPlan(from);
+
+	Assert(IsA(newnode, CustomPlan));
+	return newnode;
+}
+
+/*
+ * _copyCustomPlanMarkPos
+ */
+static CustomPlanMarkPos *
+_copyCustomPlanMarkPos(const CustomPlanMarkPos *from)
+{
+	CustomPlanMarkPos *newnode = from->methods->CopyCustomPlan(from);
+
+	Assert(IsA(newnode, CustomPlanMarkPos));
+	return newnode;
+}
+
+/* copy common part of CustomPlan */
+void
+CopyCustomPlanCommon(const Node *__from, Node *__newnode)
+{
+	CustomPlan *from = (CustomPlan *) __from;
+	CustomPlan *newnode = (CustomPlan *) __newnode;
+
+	((Node *) newnode)->type = nodeTag(from);
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+	COPY_SCALAR_FIELD(methods);
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3983,6 +4019,12 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomPlan:
+			retval = _copyCustomPlan(from);
+			break;
+		case T_CustomPlanMarkPos:
+			retval = _copyCustomPlanMarkPos(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bfb4b9f..8a93bc5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -563,6 +563,27 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
 
+/* dump common part of CustomPlan structure */
+static void
+_outCustomPlan(StringInfo str, const CustomPlan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLAN");
+	_outPlanInfo(str, (const Plan *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPlan(str, node);
+}
+
+static void
+_outCustomPlanMarkPos(StringInfo str, const CustomPlanMarkPos *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLANMARKPOS");
+	_outPlanInfo(str, (const Plan *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPlan(str, node);
+}
+
 static void
 _outJoin(StringInfo str, const Join *node)
 {
@@ -1581,6 +1602,16 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 }
 
 static void
+_outCustomPath(StringInfo str, const CustomPath *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPATH");
+	_outPathInfo(str, (const Path *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPath(str, (Node *)node);
+}
+
+static void
 _outAppendPath(StringInfo str, const AppendPath *node)
 {
 	WRITE_NODE_TYPE("APPENDPATH");
@@ -2828,6 +2859,12 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomPlan:
+				_outCustomPlan(str, obj);
+				break;
+			case T_CustomPlanMarkPos:
+				_outCustomPlanMarkPos(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
@@ -3036,6 +3073,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignPath:
 				_outForeignPath(str, obj);
 				break;
+			case T_CustomPath:
+				_outCustomPath(str, obj);
+				break;
 			case T_AppendPath:
 				_outAppendPath(str, obj);
 				break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..6c1ea7e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -323,7 +325,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				}
 				break;
 			case RTE_SUBQUERY:
-				/* Subquery --- fully handled during set_rel_size */
+				/* Subquery --- path was added during set_rel_size */
 				break;
 			case RTE_FUNCTION:
 				/* RangeFunction */
@@ -334,12 +336,19 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				set_values_pathlist(root, rel, rte);
 				break;
 			case RTE_CTE:
-				/* CTE reference --- fully handled during set_rel_size */
+				/* CTE reference --- path was added during set_rel_size */
 				break;
 			default:
 				elog(ERROR, "unexpected rtekind: %d", (int) rel->rtekind);
 				break;
 		}
+
+		/* Also, consider custom plans */
+		if (add_scan_path_hook)
+			(*add_scan_path_hook)(root, rel, rte);
+
+		/* Select cheapest path */
+		set_cheapest(rel);
 	}
 
 #ifdef OPTIMIZER_DEBUG
@@ -388,9 +397,6 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
-
-	/* Now find the cheapest of the paths for this rel */
-	set_cheapest(rel);
 }
 
 /*
@@ -416,9 +422,6 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 {
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
-
-	/* Select cheapest path */
-	set_cheapest(rel);
 }
 
 /*
@@ -1235,9 +1238,6 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1306,9 +1306,6 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1329,9 +1326,6 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1398,9 +1392,6 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1451,9 +1442,6 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index a996116..2fb6678 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,20 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 184d37a..055a818 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -42,11 +42,7 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
-static List *build_path_tlist(PlannerInfo *root, Path *path);
-static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
-static void disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path);
 static Plan *create_gating_plan(PlannerInfo *root, Plan *plan, List *quals);
 static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path);
 static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path);
@@ -77,23 +73,20 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomPlan *create_custom_plan(PlannerInfo *root,
+									  CustomPath *best_path);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
 					  Plan *outer_plan, Plan *inner_plan);
 static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
-static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
 static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
 static void process_subquery_nestloop_params(PlannerInfo *root,
 								 List *subplan_params);
 static List *fix_indexqual_references(PlannerInfo *root, IndexPath *index_path);
 static List *fix_indexorderby_references(PlannerInfo *root, IndexPath *index_path);
 static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol);
-static List *get_switched_clauses(List *clauses, Relids outerrelids);
-static List *order_qual_clauses(PlannerInfo *root, List *clauses);
-static void copy_path_costsize(Plan *dest, Path *src);
-static void copy_plan_costsize(Plan *dest, Plan *src);
 static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
 static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 			   Oid indexid, List *indexqual, List *indexqualorig,
@@ -215,7 +208,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -261,6 +254,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 			plan = create_unique_plan(root,
 									  (UniquePath *) best_path);
 			break;
+		case T_CustomPlan:
+			plan = (Plan *) create_custom_plan(root, (CustomPath *) best_path);
+			break;
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -430,7 +426,7 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 /*
  * Build a target list (ie, a list of TargetEntry) for the Path's output.
  */
-static List *
+List *
 build_path_tlist(PlannerInfo *root, Path *path)
 {
 	RelOptInfo *rel = path->parent;
@@ -466,7 +462,7 @@ build_path_tlist(PlannerInfo *root, Path *path)
  *		Decide whether to use a tlist matching relation structure,
  *		rather than only those Vars actually referenced.
  */
-static bool
+bool
 use_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
 {
 	int			i;
@@ -526,7 +522,7 @@ use_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
  * undo the decision made by use_physical_tlist().	Currently, Hash, Sort,
  * and Material nodes want this, so they don't have to store useless columns.
  */
-static void
+void
 disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
 {
 	/* Only need to undo it for path types handled by create_scan_plan() */
@@ -569,7 +565,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
  * in most cases we have only a very bad idea of the probability of the gating
  * qual being true.
  */
-static Plan *
+Plan *
 create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
 {
 	List	   *pseudoconstants;
@@ -1072,6 +1068,26 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
 	return plan;
 }
 
+/*
+ * create_custom_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomPlan *
+create_custom_plan(PlannerInfo *root, CustomPath *best_path)
+{
+	CustomPlan	   *cplan;
+
+	/* Populate CustomPlan according to the CustomPath */
+	Assert(best_path->methods->CreateCustomPlan != NULL);
+	cplan = best_path->methods->CreateCustomPlan(root, best_path);
+	Assert(IsA(cplan, CustomPlan) || IsA(cplan, CustomPlanMarkPos));
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&cplan->plan, &best_path->path);
+
+	return cplan;
+}
 
 /*****************************************************************************
  *
@@ -2006,7 +2022,6 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
-
 /*****************************************************************************
  *
  *	JOIN METHODS
@@ -2540,7 +2555,7 @@ create_hashjoin_plan(PlannerInfo *root,
  * root->curOuterRels are replaced by Params, and entries are added to
  * root->curOuterParams if not already present.
  */
-static Node *
+Node *
 replace_nestloop_params(PlannerInfo *root, Node *expr)
 {
 	/* No setup needed for tree walk, so away we go */
@@ -3023,7 +3038,7 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol)
  *	  touched; a modified list is returned.  We do, however, set the transient
  *	  outer_is_left field in each RestrictInfo to show which side was which.
  */
-static List *
+List *
 get_switched_clauses(List *clauses, Relids outerrelids)
 {
 	List	   *t_list = NIL;
@@ -3089,7 +3104,7 @@ get_switched_clauses(List *clauses, Relids outerrelids)
  * instead of bare clauses.  It's OK because we only sort by cost, but
  * a cost/selectivity combination would likely do the wrong thing.
  */
-static List *
+List *
 order_qual_clauses(PlannerInfo *root, List *clauses)
 {
 	typedef struct
@@ -3156,7 +3171,7 @@ order_qual_clauses(PlannerInfo *root, List *clauses)
  * Copy cost and size info from a Path node to the Plan node created from it.
  * The executor usually won't use this info, but it's needed by EXPLAIN.
  */
-static void
+void
 copy_path_costsize(Plan *dest, Path *src)
 {
 	if (src)
@@ -3179,7 +3194,7 @@ copy_path_costsize(Plan *dest, Path *src)
  * Copy cost and size info from a lower plan node to an inserted node.
  * (Most callers alter the info after copying it.)
  */
-static void
+void
 copy_plan_costsize(Plan *dest, Plan *src)
 {
 	if (src)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 46affe7..738d47c 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -86,7 +87,6 @@ static void add_rtes_to_flat_rtable(PlannerInfo *root, bool recursing);
 static void flatten_unplanned_rtes(PlannerGlobal *glob, RangeTblEntry *rte);
 static bool flatten_rtes_walker(Node *node, PlannerGlobal *glob);
 static void add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte);
-static Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
 static Plan *set_indexonlyscan_references(PlannerInfo *root,
 							 IndexOnlyScan *plan,
 							 int rtoffset);
@@ -419,7 +419,7 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
 /*
  * set_plan_refs: recurse through the Plan nodes of a single subquery level
  */
-static Plan *
+Plan *
 set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 {
 	ListCell   *l;
@@ -576,6 +576,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			{
+				CustomPlan	   *cplan = (CustomPlan *) plan;
+
+				/*
+				 * Extension is responsible to handle set-reference
+				 * correctly.
+				 */
+				Assert(cplan->methods->SetCustomPlanRef != NULL);
+				cplan->methods->SetCustomPlanRef(root,
+												 cplan,
+												 rtoffset);
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
@@ -1057,7 +1073,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a3f3583..6b0c762 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -75,12 +75,8 @@ static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 static Node *replace_correlation_vars_mutator(Node *node, PlannerInfo *root);
 static Node *process_sublinks_mutator(Node *node,
 						 process_sublinks_context *context);
-static Bitmapset *finalize_plan(PlannerInfo *root,
-			  Plan *plan,
-			  Bitmapset *valid_params,
-			  Bitmapset *scan_params);
-static bool finalize_primnode(Node *node, finalize_primnode_context *context);
-
+static bool finalize_primnode_walker(Node *node,
+									 finalize_primnode_context *context);
 
 /*
  * Select a PARAM_EXEC number to identify the given Var as a parameter for
@@ -2045,7 +2041,7 @@ SS_finalize_plan(PlannerInfo *root, Plan *plan, bool attach_initplans)
  * The return value is the computed allParam set for the given Plan node.
  * This is just an internal notational convenience.
  */
-static Bitmapset *
+Bitmapset *
 finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			  Bitmapset *scan_params)
 {
@@ -2070,15 +2066,15 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 	 */
 
 	/* Find params in targetlist and qual */
-	finalize_primnode((Node *) plan->targetlist, &context);
-	finalize_primnode((Node *) plan->qual, &context);
+	finalize_primnode_walker((Node *) plan->targetlist, &context);
+	finalize_primnode_walker((Node *) plan->qual, &context);
 
 	/* Check additional node-type-specific fields */
 	switch (nodeTag(plan))
 	{
 		case T_Result:
-			finalize_primnode(((Result *) plan)->resconstantqual,
-							  &context);
+			finalize_primnode_walker(((Result *) plan)->resconstantqual,
+									 &context);
 			break;
 
 		case T_SeqScan:
@@ -2086,10 +2082,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_IndexScan:
-			finalize_primnode((Node *) ((IndexScan *) plan)->indexqual,
-							  &context);
-			finalize_primnode((Node *) ((IndexScan *) plan)->indexorderby,
-							  &context);
+			finalize_primnode_walker((Node *)((IndexScan *)plan)->indexqual,
+									 &context);
+			finalize_primnode_walker((Node *)((IndexScan *)plan)->indexorderby,
+									 &context);
 
 			/*
 			 * we need not look at indexqualorig, since it will have the same
@@ -2100,10 +2096,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_IndexOnlyScan:
-			finalize_primnode((Node *) ((IndexOnlyScan *) plan)->indexqual,
-							  &context);
-			finalize_primnode((Node *) ((IndexOnlyScan *) plan)->indexorderby,
-							  &context);
+			finalize_primnode_walker((Node *)((IndexOnlyScan *) plan)->indexqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((IndexOnlyScan *) plan)->indexorderby,
+									 &context);
 
 			/*
 			 * we need not look at indextlist, since it cannot contain Params.
@@ -2112,8 +2108,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_BitmapIndexScan:
-			finalize_primnode((Node *) ((BitmapIndexScan *) plan)->indexqual,
-							  &context);
+			finalize_primnode_walker((Node *) ((BitmapIndexScan *) plan)->indexqual,
+									&context);
 
 			/*
 			 * we need not look at indexqualorig, since it will have the same
@@ -2122,14 +2118,14 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_BitmapHeapScan:
-			finalize_primnode((Node *) ((BitmapHeapScan *) plan)->bitmapqualorig,
-							  &context);
+			finalize_primnode_walker((Node *) ((BitmapHeapScan *) plan)->bitmapqualorig,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
 		case T_TidScan:
-			finalize_primnode((Node *) ((TidScan *) plan)->tidquals,
-							  &context);
+			finalize_primnode_walker((Node *) ((TidScan *) plan)->tidquals,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
@@ -2167,7 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 					funccontext = context;
 					funccontext.paramids = NULL;
 
-					finalize_primnode(rtfunc->funcexpr, &funccontext);
+					finalize_primnode_walker(rtfunc->funcexpr, &funccontext);
 
 					/* remember results for execution */
 					rtfunc->funcparams = funccontext.paramids;
@@ -2183,8 +2179,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_ValuesScan:
-			finalize_primnode((Node *) ((ValuesScan *) plan)->values_lists,
-							  &context);
+			finalize_primnode_walker((Node *) ((ValuesScan *) plan)->values_lists,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
@@ -2231,11 +2227,24 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_ForeignScan:
-			finalize_primnode((Node *) ((ForeignScan *) plan)->fdw_exprs,
-							  &context);
+			finalize_primnode_walker((Node *)((ForeignScan *) plan)->fdw_exprs,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomPlan:
+			{
+				CustomPlan *cplan = (CustomPlan *) plan;
+
+				if (cplan->methods->FinalizeCustomPlan)
+					cplan->methods->FinalizeCustomPlan(root,
+													   cplan,
+													   &context.paramids,
+													   &valid_params,
+													   &scan_params);
+			}
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
@@ -2247,8 +2256,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 											  locally_added_param);
 				scan_params = bms_add_member(bms_copy(scan_params),
 											 locally_added_param);
-				finalize_primnode((Node *) mtplan->returningLists,
-								  &context);
+				finalize_primnode_walker((Node *) mtplan->returningLists,
+										 &context);
 				foreach(l, mtplan->plans)
 				{
 					context.paramids =
@@ -2329,8 +2338,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			{
 				ListCell   *l;
 
-				finalize_primnode((Node *) ((Join *) plan)->joinqual,
-								  &context);
+				finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+										 &context);
 				/* collect set of params that will be passed to right child */
 				foreach(l, ((NestLoop *) plan)->nestParams)
 				{
@@ -2343,24 +2352,24 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_MergeJoin:
-			finalize_primnode((Node *) ((Join *) plan)->joinqual,
-							  &context);
-			finalize_primnode((Node *) ((MergeJoin *) plan)->mergeclauses,
-							  &context);
+			finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((MergeJoin *) plan)->mergeclauses,
+									 &context);
 			break;
 
 		case T_HashJoin:
-			finalize_primnode((Node *) ((Join *) plan)->joinqual,
-							  &context);
-			finalize_primnode((Node *) ((HashJoin *) plan)->hashclauses,
+			finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((HashJoin *) plan)->hashclauses,
 							  &context);
 			break;
 
 		case T_Limit:
-			finalize_primnode(((Limit *) plan)->limitOffset,
-							  &context);
-			finalize_primnode(((Limit *) plan)->limitCount,
-							  &context);
+			finalize_primnode_walker(((Limit *) plan)->limitOffset,
+									 &context);
+			finalize_primnode_walker(((Limit *) plan)->limitCount,
+									 &context);
 			break;
 
 		case T_RecursiveUnion:
@@ -2381,10 +2390,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_WindowAgg:
-			finalize_primnode(((WindowAgg *) plan)->startOffset,
-							  &context);
-			finalize_primnode(((WindowAgg *) plan)->endOffset,
-							  &context);
+			finalize_primnode_walker(((WindowAgg *) plan)->startOffset,
+									 &context);
+			finalize_primnode_walker(((WindowAgg *) plan)->endOffset,
+									 &context);
 			break;
 
 		case T_Hash:
@@ -2473,8 +2482,21 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
  * finalize_primnode: add IDs of all PARAM_EXEC params appearing in the given
  * expression tree to the result set.
  */
+Bitmapset *
+finalize_primnode(PlannerInfo *root, Node *node, Bitmapset *paramids)
+{
+	finalize_primnode_context	context;
+
+	context.root = root;
+	context.paramids = paramids;
+
+	finalize_primnode_walker(node, &context);
+
+	return context.paramids;
+}
+
 static bool
-finalize_primnode(Node *node, finalize_primnode_context *context)
+finalize_primnode_walker(Node *node, finalize_primnode_context *context)
 {
 	if (node == NULL)
 		return false;
@@ -2496,7 +2518,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 		Bitmapset  *subparamids;
 
 		/* Recurse into the testexpr, but not into the Plan */
-		finalize_primnode(subplan->testexpr, context);
+		finalize_primnode_walker(subplan->testexpr, context);
 
 		/*
 		 * Remove any param IDs of output parameters of the subplan that were
@@ -2513,7 +2535,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 		}
 
 		/* Also examine args list */
-		finalize_primnode((Node *) subplan->args, context);
+		finalize_primnode_walker((Node *) subplan->args, context);
 
 		/*
 		 * Add params needed by the subplan to paramids, but excluding those
@@ -2528,7 +2550,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 
 		return false;			/* no more to do here */
 	}
-	return expression_tree_walker(node, finalize_primnode,
+	return expression_tree_walker(node, finalize_primnode_walker,
 								  (void *) context);
 }
 
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 566b4c9..1c57352 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -5292,6 +5292,25 @@ get_utility_query_def(Query *query, deparse_context *context)
 	}
 }
 
+/*
+ * GetSpecialCustomVar
+ *
+ * Utility routine to call optional GetSpecialCustomVar method of
+ * CustomPlanState
+ */
+static Node *
+GetSpecialCustomVar(PlanState *ps, Var *varnode)
+{
+	CustomPlanState *cps = (CustomPlanState *) ps;
+
+	Assert(IsA(ps, CustomPlanState));
+	Assert(IS_SPECIAL_VARNO(varnode->varno));
+
+	if (cps->methods->GetSpecialCustomVar)
+		return (Node *)cps->methods->GetSpecialCustomVar(cps, varnode);
+
+	return NULL;
+}
 
 /*
  * Display a Var appropriately.
@@ -5323,6 +5342,7 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 	deparse_columns *colinfo;
 	char	   *refname;
 	char	   *attname;
+	Node	   *expr;
 
 	/* Find appropriate nesting depth */
 	netlevelsup = var->varlevelsup + levelsup;
@@ -5345,6 +5365,22 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 		colinfo = deparse_columns_fetch(var->varno, dpns);
 		attnum = var->varattno;
 	}
+	else if (IS_SPECIAL_VARNO(var->varno) &&
+			 IsA(dpns->planstate, CustomPlanState) &&
+			 (expr = GetSpecialCustomVar(dpns->planstate, var)) != NULL)
+	{
+		/*
+		 * Force parentheses because our caller probably assumed a Var is a
+		 * simple expression.
+		 */
+		if (!IsA(expr, Var))
+			appendStringInfoChar(buf, '(');
+		get_rule_expr((Node *) expr, context, true);
+		if (!IsA(expr, Var))
+			appendStringInfoChar(buf, ')');
+
+		return NULL;
+	}
 	else if (var->varno == OUTER_VAR && dpns->outer_tlist)
 	{
 		TargetEntry *tle;
@@ -5633,6 +5669,26 @@ get_name_for_var_field(Var *var, int fieldno,
 		rte = rt_fetch(var->varno, dpns->rtable);
 		attnum = var->varattno;
 	}
+	else if (IS_SPECIAL_VARNO(var->varno) &&
+			 IsA(dpns->planstate, CustomPlanState) &&
+			 (expr = GetSpecialCustomVar(dpns->planstate, var)) != NULL)
+	{
+		StringInfo		saved = context->buf;
+		StringInfoData	temp;
+		
+		initStringInfo(&temp);
+		context->buf = &temp;
+
+		if (!IsA(expr, Var))
+			appendStringInfoChar(context->buf, '(');
+		get_rule_expr((Node *) expr, context, true);
+		if (!IsA(expr, Var))
+			appendStringInfoChar(context->buf, ')');
+
+		context->buf = saved;
+
+		return temp.data;
+	}
 	else if (var->varno == OUTER_VAR && dpns->outer_tlist)
 	{
 		TargetEntry *tle;
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3488be3..f914696 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -54,6 +54,7 @@ extern PGDLLIMPORT ExplainOneQuery_hook_type ExplainOneQuery_hook;
 typedef const char *(*explain_get_index_name_hook_type) (Oid indexId);
 extern PGDLLIMPORT explain_get_index_name_hook_type explain_get_index_name_hook;
 
+extern void ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used);
 
 extern void ExplainQuery(ExplainStmt *stmt, const char *queryString,
 			 ParamListInfo params, DestReceiver *dest);
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..e6e049e
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,30 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "nodes/plannodes.h"
+#include "nodes/execnodes.h"
+
+/*
+ * General executor code
+ */
+extern CustomPlanState *ExecInitCustomPlan(CustomPlan *custom_plan,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomPlan(CustomPlanState *cpstate);
+extern Node *MultiExecCustomPlan(CustomPlanState *cpstate);
+extern void ExecEndCustomPlan(CustomPlanState *cpstate);
+
+extern void ExecReScanCustomPlan(CustomPlanState *cpstate);
+extern void ExecCustomMarkPos(CustomPlanState *cpstate);
+extern void ExecCustomRestrPos(CustomPlanState *cpstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a301a08..8af5bf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1501,6 +1501,18 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomPlanState information
+ *
+ *		CustomPlan nodes are used to execute custom code within executor.
+ * ----------------
+ */
+typedef struct CustomPlanState
+{
+	PlanState	ps;
+	const CustomPlanMethods	   *methods;
+} CustomPlanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5b8df59..f4a1246 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,8 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomPlan,
+	T_CustomPlanMarkPos,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +109,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomPlanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +227,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
@@ -513,6 +517,8 @@ extern void *stringToNode(char *str);
  */
 extern void *copyObject(const void *obj);
 
+extern void CopyCustomPlanCommon(const Node *from, Node *newnode);
+
 /*
  * nodes/equalfuncs.c
  */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 38c039c..7468d4c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -15,8 +15,10 @@
 #define PLANNODES_H
 
 #include "access/sdir.h"
+#include "lib/stringinfo.h"
 #include "nodes/bitmapset.h"
 #include "nodes/primnodes.h"
+#include "nodes/relation.h"
 
 
 /* ----------------------------------------------------------------
@@ -479,6 +481,81 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomPlan node
+ * ----------------
+ */
+struct CustomPlanMethods;
+
+typedef struct CustomPlan
+{
+	Plan		plan;
+	const struct CustomPlanMethods *methods;
+} CustomPlan;
+
+/* almost same to CustomPlan, but support MarkPos/RestorePos */
+typedef CustomPlan CustomPlanMarkPos;
+
+/* not to include execnodes.h here */
+typedef struct CustomPlanState CustomPlanState;
+typedef struct EState EState;
+typedef struct ExplainState	ExplainState;
+typedef struct TupleTableSlot TupleTableSlot;
+
+typedef void (*SetCustomPlanRef_function)(PlannerInfo *root,
+										  CustomPlan *custom_plan,
+										  int rtoffset);
+typedef bool (*SupportCustomBackwardScan_function)(CustomPlan *custom_plan);
+typedef void (*FinalizeCustomPlan_function)(PlannerInfo *root,
+											CustomPlan *custom_plan,
+											Bitmapset **paramids,
+											Bitmapset **valid_params,
+											Bitmapset **scan_params);
+typedef CustomPlanState *(*BeginCustomPlan_function)(CustomPlan *custom_plan,
+													 EState *estate,
+													 int eflags);
+typedef TupleTableSlot *(*ExecCustomPlan_function)(CustomPlanState *cpstate);
+typedef Node *(*MultiExecCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*EndCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*ReScanCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*MarkPosCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*RestrPosCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*ExplainCustomPlanTargetRel_function)(CustomPlanState *cpstate,
+													ExplainState *es);
+typedef void (*ExplainCustomPlan_function)(CustomPlanState *cpstate,
+										   List *ancestors,
+										   ExplainState *es);
+typedef Bitmapset *(*GetRelidsCustomPlan_function)(CustomPlanState *cpstate);
+typedef Node *(*GetSpecialCustomVar_function)(CustomPlanState *cpstate,
+											  Var *varnode);
+typedef void (*TextOutCustomPlan_function)(StringInfo str,
+										   const CustomPlan *node);
+typedef CustomPlan *(*CopyCustomPlan_function)(const CustomPlan *from);
+
+typedef struct CustomPlanMethods
+{
+	const char						   *CustomName;
+	/* callbacks for the planner stage */
+	SetCustomPlanRef_function			SetCustomPlanRef;
+	SupportCustomBackwardScan_function	SupportBackwardScan;
+	FinalizeCustomPlan_function			FinalizeCustomPlan;
+	/* callbacks for the executor stage */
+	BeginCustomPlan_function			BeginCustomPlan;
+	ExecCustomPlan_function				ExecCustomPlan;
+	MultiExecCustomPlan_function		MultiExecCustomPlan;
+	EndCustomPlan_function				EndCustomPlan;
+	ReScanCustomPlan_function			ReScanCustomPlan;
+	MarkPosCustomPlan_function			MarkPosCustomPlan;
+	RestrPosCustomPlan_function			RestrPosCustomPlan;
+	/* callbacks for EXPLAIN */
+	ExplainCustomPlanTargetRel_function	ExplainCustomPlanTargetRel;
+	ExplainCustomPlan_function			ExplainCustomPlan;
+	GetRelidsCustomPlan_function		GetRelidsCustomPlan;
+	GetSpecialCustomVar_function		GetSpecialCustomVar;
+	/* callbacks for general node management */
+	TextOutCustomPlan_function			TextOutCustomPlan;
+	CopyCustomPlan_function				CopyCustomPlan;
+} CustomPlanMethods;
 
 /*
  * ==========
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c607b36..cbbf1e0 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
 #define RELATION_H
 
 #include "access/sdir.h"
+#include "lib/stringinfo.h"
 #include "nodes/params.h"
 #include "nodes/parsenodes.h"
 #include "storage/block.h"
@@ -878,6 +879,34 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_flags is a set of CUSTOM_* bits to control its behavior.
+ * custom_methods is a set of function pointers that are declared in
+ * CustomPathMethods structure; extension has to set up correctly.
+ */
+struct CustomPathMethods;
+
+typedef struct CustomPath
+{
+	Path		path;
+	const struct CustomPathMethods   *methods;
+} CustomPath;
+
+typedef struct CustomPlan CustomPlan;
+
+typedef CustomPlan *(*CreateCustomPlan_function)(PlannerInfo *root,
+												 CustomPath *custom_path);
+typedef void (*TextOutCustomPath_function)(StringInfo str, Node *node);
+
+typedef struct CustomPathMethods
+{
+	const char				   *CustomName;
+	CreateCustomPlan_function	CreateCustomPlan;
+	TextOutCustomPath_function	TextOutCustomPath;
+} CustomPathMethods;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9b22fda..3047d3d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,23 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 8bdb7db..bc3ca63 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,10 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
+extern List *build_path_tlist(PlannerInfo *root, Path *path);
+extern bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
+extern void disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
@@ -86,6 +90,11 @@ extern ModifyTable *make_modifytable(PlannerInfo *root,
 				 List *withCheckOptionLists, List *returningLists,
 				 List *rowMarks, int epqParam);
 extern bool is_projection_capable_plan(Plan *plan);
+extern List *order_qual_clauses(PlannerInfo *root, List *clauses);
+extern List *get_switched_clauses(List *clauses, Relids outerrelids);
+extern void copy_path_costsize(Plan *dest, Path *src);
+extern void copy_plan_costsize(Plan *dest, Plan *src);
+extern Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
 
 /*
  * prototypes for plan/initsplan.c
@@ -127,6 +136,8 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h
index 5607e98..138b60b 100644
--- a/src/include/optimizer/subselect.h
+++ b/src/include/optimizer/subselect.h
@@ -29,6 +29,13 @@ extern void SS_finalize_plan(PlannerInfo *root, Plan *plan,
 				 bool attach_initplans);
 extern Param *SS_make_initplan_from_plan(PlannerInfo *root, Plan *plan,
 					Oid resulttype, int32 resulttypmod, Oid resultcollation);
+extern Bitmapset *finalize_plan(PlannerInfo *root,
+								Plan *plan,
+								Bitmapset *valid_params,
+								Bitmapset *scan_params);
+extern Bitmapset *finalize_primnode(PlannerInfo *root,
+									Node *node,
+									Bitmapset *paramids);
 extern Param *assign_nestloop_param_var(PlannerInfo *root, Var *var);
 extern Param *assign_nestloop_param_placeholdervar(PlannerInfo *root,
 									 PlaceHolderVar *phv);

#95

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Kouhei Kaigai (#94)

2 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Hello,

I adjusted the custom-plan interface patch little bit for the cache-only
scan patch; that is a demonstration module for vacuum-page hook on top of
the custom-plan interface.

fix_scan_expr() looks to me useful for custom-plan providers that want to
implement its own relation scan logic, even though they can implement it
using fix_expr_common() being already exposed.

Also, I removed the hardcoded portion from the nodeCustom.c although, it
may make sense to provide a few template functions to be called by custom-
plan providers, that performs usual common jobs like construction of expr-
context, assignment of result-slot, open relations, and so on.
I though the idea during implementation of BeginCustomPlan handler.
(These template functions are not in the attached patch yet.)
How about your opinion?

The major portion of this patch is not changed from v10.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Wednesday, March 12, 2014 1:55 PM
To: Tom Lane
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hello,

The attached two patches are the revised custom-plan interface and example
usage that implements existing MergeJoin on top of this interface.

According to the discussion last week, I revised the portion where
custom-node is expected to perform a particular kind of task, like scanning
a relation, by putting polymorphism with a set of callbacks set by
custom-plan provider.
So, the core backend can handle this custom-plan node just an abstracted
plan-node with no anticipation.
Even though the subject of this message says "custom-scan", I'd like to
name the interface "custom-plan" instead, because it became fully arbitrary
of extension whether it scan on a particular relation.

Definition of CustomXXXX data types were simplified:

typedef struct CustomPath
{
Path path;
const struct CustomPathMethods *methods;
} CustomPath;

typedef struct CustomPlan
{
Plan plan;
const struct CustomPlanMethods *methods;
} CustomPlan;

typedef struct CustomPlanState
{
PlanState ps;
const CustomPlanMethods *methods;
} CustomPlanState;

Each types have a base class and a set of function pointers that characterize
the behavior of this custom-plan node.
In usual use-cases, extension is expected to extend these classes to keep
their private data fields needed to implement its own functionalities.

Most of the methods are designed to work as a thin layer towards existing
planner / executor functions, so custom-plan provides has to be responsible
to implement its method to communicate with core backend as built-in ones
doing.

Regarding to the topic we discussed last week,

* CUSTOM_VAR has gone.
The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN command
output (including column names being referenced) even if a custom-plan node
replaced a join but has no underlying subplans on left/right subtrees.
A typical situation like this is a remote-join implementation that I tried
to extend postgres_fdw on top of the previous interface.
It retrieves a flat result set of the remote join execution, thus has no
subplan locally. On the other hand, EXPLAIN tries to find out "actual" Var
node from the underlying subplan if a Var node has special varno
(INNER/OUTER/INDEX).
I put a special method to solve the problem. GetSpecialCustomVar method
is called if a certain Var node of custom-plan has a special varno, then
custom-plan provider can inform the core backend an expression node to be
referenced by this Var node.
It allows to solve the column name without recursive walking on the subtrees,
so it enables a custom-plan node that replaces a part of plan-tree.
This method is optional, so available to adopt existing way if custom-plan
provider does not do anything special.

* Functions to be exposed, from static declaration

Right now, static functions are randomly exposed on demand.
So, we need more investigation which functions are needed, and which others
are not.
According to my trial, the part-2 patch that is MergeJoin on top of the
custom-plan interface, class of functions that recursively walk on subplan
tree have to be exposed. Like, ExplainPreScanNode, create_plan_recurse,
set_plan_refs, fix_expr_common or finalize_plan.
In case when custom-plan performs like built-in Append node, it keeps a
list of sub-plans in its private field, so the core backend cannot know
existence of sub-plans, thus its unavailable to make subplan, unavailable
to output EXPLAIN and so on.
It does not make sense to reworking on the extension side again.
Also, createplan.c has many useful functions to construct plan-node,
however, most of them are static because all the built-in plan-node are
constructed by the routines in this file, we didn't need to expose them
to others. I think, functions in createplan.c being called by
create_xxxx_plan() functions to construct plan-node should be exposed for
extension's convenient.

* Definition of add_join_path_hook

I didn't have idea to improve the definition and location of this hook,
so it is still on the tail of the add_paths_to_joinrel().
Its definition was a bit adjusted according to the feedback on the
pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
from the argument list. Indeed, these are specific to the built-in MergeJoin
logic and easy to reproduce.

* Hook location of add_scan_path_hook

I moved the add_scan_path_hook and set_cheapest() into
set_base_rel_pathlists() from various caller locations;
set_xxxx_pathlist() functions typically.
It enabled to consolidate the location to add custom-path for base
relations.

* CustomMergeJoin as a proof-of-concept

The contrib module in the part-2 portion is, a merge-join implementation
on top of custom-plan interface, even though 99% of its implementation is
identical with built-in ones.
Its purpose is to demonstrate a custom join logic can be implemented using
custom-plan interface, even if custom-plan node has underlying sub-plans
unlike previous my examples.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, March 07, 2014 3:09 AM
To: Kaigai Kouhei(海外浩平)
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
Robert Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

I expected to include simple function pointers for copying and
text-output as follows:

typedef struct {
Plan plan;
:
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
} Custom;

I was thinking more like

typedef struct CustomPathFuncs {
const char *name; /* used for debugging purposes only */
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
... etc etc etc ...
} CustomPathFuncs;

typedef struct CustomPath {
Path path;
const CustomPathFuncs *funcs;
... maybe a few more fields here, but not too darn many ...
} CustomPath;

and similarly for CustomPlan.

The advantage of this way is it's very cheap for (what I expect will
be) the common case where an extension has a fixed set of support
functions for its custom paths and plans. It just declares a static
constant CustomPathFuncs struct, and puts a pointer to that into its paths.

If an extension really needs to set the support functions on a
per-object basis, it can do this:

typdef struct MyCustomPath {
CustomPath cpath;
CustomPathFuncs funcs;
... more fields ...
} MyCustomPath;

and then initialization of a MyCustomPath would include

mypath->cpath.funcs = &mypath->funcs;
mypath->funcs.node_copy = MyCustomPathCopy;
... etc etc ...

In this case we're arguably wasting one pointer worth of space in the
path, but considering the number of function pointers such a path will
be carrying, I don't think that's much of an objection.

So? If you did that, then you wouldn't have renumbered the Vars as
INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all;
if it is needed, then there would also be a need for an additional
tuple slot in executor contexts, which you haven't provided.

For example, the enhanced postgres_fdw fetches the result set of
remote join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?

Not sure what we'd do for the general case, but CUSTOM_VAR isn't the

solution.
Consider for example a join where both tables supply columns named "id"
--- if you put them both in one tupledesc then there's no non-kluge
way to identify them.
Possibly the route to a solution involves adding another plan-node
callback function that ruleutils.c would use for printing Vars in custom
join nodes.

Or maybe we could let the Vars keep their original RTE numbers, though
that would complicate life at execution time.

Anyway, if we're going to punt on add_join_path_hook for the time
being, this problem can probably be left to solve later. It won't
arise for simple table-scan cases, nor for single-input plan nodes such

as sorts.

regards, tom lane

Attachments:

pgsql-v9.4-custom-scan.part-2.v11.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v11.patchDownload

 contrib/custmj/Makefile            |   17 +
 contrib/custmj/createplan.c        |  435 +++++++++
 contrib/custmj/custmj.c            |  691 +++++++++++++++
 contrib/custmj/custmj.h            |  148 ++++
 contrib/custmj/expected/custmj.out |  378 ++++++++
 contrib/custmj/joinpath.c          |  988 +++++++++++++++++++++
 contrib/custmj/nodeMergejoin.c     | 1694 ++++++++++++++++++++++++++++++++++++
 contrib/custmj/setrefs.c           |  326 +++++++
 contrib/custmj/sql/custmj.sql      |   79 ++
 9 files changed, 4756 insertions(+)

diff --git a/contrib/custmj/Makefile b/contrib/custmj/Makefile
new file mode 100644
index 0000000..9b264d4
--- /dev/null
+++ b/contrib/custmj/Makefile
@@ -0,0 +1,17 @@
+# contrib/custmj/Makefile
+
+MODULE_big = custmj
+OBJS = custmj.o joinpath.o createplan.o setrefs.o nodeMergejoin.o
+
+REGRESS = custmj
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/custmj
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/custmj/createplan.c b/contrib/custmj/createplan.c
new file mode 100644
index 0000000..e522d73
--- /dev/null
+++ b/contrib/custmj/createplan.c
@@ -0,0 +1,435 @@
+/*-------------------------------------------------------------------------
+ *
+ * createplan.c
+ *	  Routines to create the desired plan for processing a query.
+ *	  Planning is complete, we just need to convert the selected
+ *	  Path into a Plan.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/plan/createplan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <limits.h>
+#include <math.h>
+
+#include "access/skey.h"
+#include "catalog/pg_class.h"
+#include "foreign/fdwapi.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
+#include "optimizer/var.h"
+#include "parser/parse_clause.h"
+#include "parser/parsetree.h"
+#include "utils/lsyscache.h"
+#include "custmj.h"
+
+static MergeJoin *make_mergejoin(List *tlist,
+			   List *joinclauses, List *otherclauses,
+			   List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   Plan *lefttree, Plan *righttree,
+			   JoinType jointype);
+static Material *make_material(Plan *lefttree);
+
+/*
+ * create_gating_plan
+ *	  Deal with pseudoconstant qual clauses
+ *
+ * If the node's quals list includes any pseudoconstant quals, put them
+ * into a gating Result node atop the already-built plan.  Otherwise,
+ * return the plan as-is.
+ *
+ * Note that we don't change cost or size estimates when doing gating.
+ * The costs of qual eval were already folded into the plan's startup cost.
+ * Leaving the size alone amounts to assuming that the gating qual will
+ * succeed, which is the conservative estimate for planning upper queries.
+ * We certainly don't want to assume the output size is zero (unless the
+ * gating qual is actually constant FALSE, and that case is dealt with in
+ * clausesel.c).  Interpolating between the two cases is silly, because
+ * it doesn't reflect what will really happen at runtime, and besides which
+ * in most cases we have only a very bad idea of the probability of the gating
+ * qual being true.
+ */
+Plan *
+create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
+{
+	List	   *pseudoconstants;
+
+	/* Sort into desirable execution order while still in RestrictInfo form */
+	quals = order_qual_clauses(root, quals);
+
+	/* Pull out any pseudoconstant quals from the RestrictInfo list */
+	pseudoconstants = extract_actual_clauses(quals, true);
+
+	if (!pseudoconstants)
+		return plan;
+
+	return (Plan *) make_result(root,
+								plan->targetlist,
+								(Node *) pseudoconstants,
+								plan);
+}
+
+MergeJoin *
+create_mergejoin_plan(PlannerInfo *root,
+					  CustomMergePath *best_path,
+					  Plan *outer_plan,
+					  Plan *inner_plan)
+{
+	List	   *tlist = build_path_tlist(root, &best_path->cpath.path);
+	List	   *joinclauses;
+	List	   *otherclauses;
+	List	   *mergeclauses;
+	List	   *outerpathkeys;
+	List	   *innerpathkeys;
+	int			nClauses;
+	Oid		   *mergefamilies;
+	Oid		   *mergecollations;
+	int		   *mergestrategies;
+	bool	   *mergenullsfirst;
+	MergeJoin  *join_plan;
+	int			i;
+	ListCell   *lc;
+	ListCell   *lop;
+	ListCell   *lip;
+
+	/* Sort join qual clauses into best execution order */
+	/* NB: do NOT reorder the mergeclauses */
+	joinclauses = order_qual_clauses(root, best_path->joinrestrictinfo);
+
+	/* Get the join qual clauses (in plain expression form) */
+	/* Any pseudoconstant clauses are ignored here */
+	if (IS_OUTER_JOIN(best_path->jointype))
+	{
+		extract_actual_join_clauses(joinclauses,
+									&joinclauses, &otherclauses);
+	}
+	else
+	{
+		/* We can treat all clauses alike for an inner join */
+		joinclauses = extract_actual_clauses(joinclauses, false);
+		otherclauses = NIL;
+	}
+
+	/*
+	 * Remove the mergeclauses from the list of join qual clauses, leaving the
+	 * list of quals that must be checked as qpquals.
+	 */
+	mergeclauses = get_actual_clauses(best_path->path_mergeclauses);
+	joinclauses = list_difference(joinclauses, mergeclauses);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params.  There
+	 * should not be any in the mergeclauses.
+	 */
+	if (best_path->cpath.path.param_info)
+	{
+		joinclauses = (List *)
+			replace_nestloop_params(root, (Node *) joinclauses);
+		otherclauses = (List *)
+			replace_nestloop_params(root, (Node *) otherclauses);
+	}
+
+	/*
+	 * Rearrange mergeclauses, if needed, so that the outer variable is always
+	 * on the left; mark the mergeclause restrictinfos with correct
+	 * outer_is_left status.
+	 */
+	mergeclauses = get_switched_clauses(best_path->path_mergeclauses,
+							 best_path->outerjoinpath->parent->relids);
+
+	/*
+	 * Create explicit sort nodes for the outer and inner paths if necessary.
+	 * Make sure there are no excess columns in the inputs if sorting.
+	 */
+	if (best_path->outersortkeys)
+	{
+		disuse_physical_tlist(root, outer_plan, best_path->outerjoinpath);
+		outer_plan = (Plan *)
+			make_sort_from_pathkeys(root,
+									outer_plan,
+									best_path->outersortkeys,
+									-1.0);
+		outerpathkeys = best_path->outersortkeys;
+	}
+	else
+		outerpathkeys = best_path->outerjoinpath->pathkeys;
+
+	if (best_path->innersortkeys)
+	{
+		disuse_physical_tlist(root, inner_plan, best_path->innerjoinpath);
+		inner_plan = (Plan *)
+			make_sort_from_pathkeys(root,
+									inner_plan,
+									best_path->innersortkeys,
+									-1.0);
+		innerpathkeys = best_path->innersortkeys;
+	}
+	else
+		innerpathkeys = best_path->innerjoinpath->pathkeys;
+
+	/*
+	 * If specified, add a materialize node to shield the inner plan from the
+	 * need to handle mark/restore.
+	 */
+	if (best_path->materialize_inner)
+	{
+		Plan	   *matplan = (Plan *) make_material(inner_plan);
+
+		/*
+		 * We assume the materialize will not spill to disk, and therefore
+		 * charge just cpu_operator_cost per tuple.  (Keep this estimate in
+		 * sync with final_cost_mergejoin.)
+		 */
+		copy_plan_costsize(matplan, inner_plan);
+		matplan->total_cost += cpu_operator_cost * matplan->plan_rows;
+
+		inner_plan = matplan;
+	}
+
+	/*
+	 * Compute the opfamily/collation/strategy/nullsfirst arrays needed by the
+	 * executor.  The information is in the pathkeys for the two inputs, but
+	 * we need to be careful about the possibility of mergeclauses sharing a
+	 * pathkey (compare find_mergeclauses_for_pathkeys()).
+	 */
+	nClauses = list_length(mergeclauses);
+	Assert(nClauses == list_length(best_path->path_mergeclauses));
+	mergefamilies = (Oid *) palloc(nClauses * sizeof(Oid));
+	mergecollations = (Oid *) palloc(nClauses * sizeof(Oid));
+	mergestrategies = (int *) palloc(nClauses * sizeof(int));
+	mergenullsfirst = (bool *) palloc(nClauses * sizeof(bool));
+
+	lop = list_head(outerpathkeys);
+	lip = list_head(innerpathkeys);
+	i = 0;
+	foreach(lc, best_path->path_mergeclauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		EquivalenceClass *oeclass;
+		EquivalenceClass *ieclass;
+		PathKey    *opathkey;
+		PathKey    *ipathkey;
+		EquivalenceClass *opeclass;
+		EquivalenceClass *ipeclass;
+		ListCell   *l2;
+
+		/* fetch outer/inner eclass from mergeclause */
+		Assert(IsA(rinfo, RestrictInfo));
+		if (rinfo->outer_is_left)
+		{
+			oeclass = rinfo->left_ec;
+			ieclass = rinfo->right_ec;
+		}
+		else
+		{
+			oeclass = rinfo->right_ec;
+			ieclass = rinfo->left_ec;
+		}
+		Assert(oeclass != NULL);
+		Assert(ieclass != NULL);
+
+		/*
+		 * For debugging purposes, we check that the eclasses match the paths'
+		 * pathkeys.  In typical cases the merge clauses are one-to-one with
+		 * the pathkeys, but when dealing with partially redundant query
+		 * conditions, we might have clauses that re-reference earlier path
+		 * keys.  The case that we need to reject is where a pathkey is
+		 * entirely skipped over.
+		 *
+		 * lop and lip reference the first as-yet-unused pathkey elements;
+		 * it's okay to match them, or any element before them.  If they're
+		 * NULL then we have found all pathkey elements to be used.
+		 */
+		if (lop)
+		{
+			opathkey = (PathKey *) lfirst(lop);
+			opeclass = opathkey->pk_eclass;
+			if (oeclass == opeclass)
+			{
+				/* fast path for typical case */
+				lop = lnext(lop);
+			}
+			else
+			{
+				/* redundant clauses ... must match something before lop */
+				foreach(l2, outerpathkeys)
+				{
+					if (l2 == lop)
+						break;
+					opathkey = (PathKey *) lfirst(l2);
+					opeclass = opathkey->pk_eclass;
+					if (oeclass == opeclass)
+						break;
+				}
+				if (oeclass != opeclass)
+					elog(ERROR, "outer pathkeys do not match mergeclauses");
+			}
+		}
+		else
+		{
+			/* redundant clauses ... must match some already-used pathkey */
+			opathkey = NULL;
+			opeclass = NULL;
+			foreach(l2, outerpathkeys)
+			{
+				opathkey = (PathKey *) lfirst(l2);
+				opeclass = opathkey->pk_eclass;
+				if (oeclass == opeclass)
+					break;
+			}
+			if (l2 == NULL)
+				elog(ERROR, "outer pathkeys do not match mergeclauses");
+		}
+
+		if (lip)
+		{
+			ipathkey = (PathKey *) lfirst(lip);
+			ipeclass = ipathkey->pk_eclass;
+			if (ieclass == ipeclass)
+			{
+				/* fast path for typical case */
+				lip = lnext(lip);
+			}
+			else
+			{
+				/* redundant clauses ... must match something before lip */
+				foreach(l2, innerpathkeys)
+				{
+					if (l2 == lip)
+						break;
+					ipathkey = (PathKey *) lfirst(l2);
+					ipeclass = ipathkey->pk_eclass;
+					if (ieclass == ipeclass)
+						break;
+				}
+				if (ieclass != ipeclass)
+					elog(ERROR, "inner pathkeys do not match mergeclauses");
+			}
+		}
+		else
+		{
+			/* redundant clauses ... must match some already-used pathkey */
+			ipathkey = NULL;
+			ipeclass = NULL;
+			foreach(l2, innerpathkeys)
+			{
+				ipathkey = (PathKey *) lfirst(l2);
+				ipeclass = ipathkey->pk_eclass;
+				if (ieclass == ipeclass)
+					break;
+			}
+			if (l2 == NULL)
+				elog(ERROR, "inner pathkeys do not match mergeclauses");
+		}
+
+		/* pathkeys should match each other too (more debugging) */
+		if (opathkey->pk_opfamily != ipathkey->pk_opfamily ||
+			opathkey->pk_eclass->ec_collation != ipathkey->pk_eclass->ec_collation ||
+			opathkey->pk_strategy != ipathkey->pk_strategy ||
+			opathkey->pk_nulls_first != ipathkey->pk_nulls_first)
+			elog(ERROR, "left and right pathkeys do not match in mergejoin");
+
+		/* OK, save info for executor */
+		mergefamilies[i] = opathkey->pk_opfamily;
+		mergecollations[i] = opathkey->pk_eclass->ec_collation;
+		mergestrategies[i] = opathkey->pk_strategy;
+		mergenullsfirst[i] = opathkey->pk_nulls_first;
+		i++;
+	}
+
+	/*
+	 * Note: it is not an error if we have additional pathkey elements (i.e.,
+	 * lop or lip isn't NULL here).  The input paths might be better-sorted
+	 * than we need for the current mergejoin.
+	 */
+
+	/*
+	 * Now we can build the mergejoin node.
+	 */
+	join_plan = make_mergejoin(tlist,
+							   joinclauses,
+							   otherclauses,
+							   mergeclauses,
+							   mergefamilies,
+							   mergecollations,
+							   mergestrategies,
+							   mergenullsfirst,
+							   outer_plan,
+							   inner_plan,
+							   best_path->jointype);
+
+	/* Costs of sort and material steps are included in path cost already */
+	copy_path_costsize(&join_plan->join.plan, &best_path->cpath.path);
+
+	return join_plan;
+}
+
+static MergeJoin *
+make_mergejoin(List *tlist,
+			   List *joinclauses,
+			   List *otherclauses,
+			   List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   Plan *lefttree,
+			   Plan *righttree,
+			   JoinType jointype)
+{
+	MergeJoin  *node = makeNode(MergeJoin);
+	Plan	   *plan = &node->join.plan;
+
+	/* cost should be inserted by caller */
+	plan->targetlist = tlist;
+	plan->qual = otherclauses;
+	plan->lefttree = lefttree;
+	plan->righttree = righttree;
+	node->mergeclauses = mergeclauses;
+	node->mergeFamilies = mergefamilies;
+	node->mergeCollations = mergecollations;
+	node->mergeStrategies = mergestrategies;
+	node->mergeNullsFirst = mergenullsfirst;
+	node->join.jointype = jointype;
+	node->join.joinqual = joinclauses;
+
+	return node;
+}
+
+static Material *
+make_material(Plan *lefttree)
+{
+	Material   *node = makeNode(Material);
+	Plan	   *plan = &node->plan;
+
+	/* cost should be inserted by caller */
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	return node;
+}
diff --git a/contrib/custmj/custmj.c b/contrib/custmj/custmj.c
new file mode 100644
index 0000000..ef64857
--- /dev/null
+++ b/contrib/custmj/custmj.c
@@ -0,0 +1,691 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/custmj/custmj.c
+ *
+ * Custom version of MergeJoin - an example implementation of MergeJoin
+ * logic on top of Custom-Plan interface, to demonstrate how to use this
+ * interface for joining relations.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "commands/explain.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/nodeFuncs.h"
+#include "executor/executor.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/subselect.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "custmj.h"
+
+PG_MODULE_MAGIC;
+
+/* declaration of local variables */
+static add_join_path_hook_type	add_join_path_orig = NULL;
+bool		enable_custom_mergejoin;
+
+/* callback table of custom merge join */
+CustomPathMethods			custmj_path_methods;
+CustomPlanMethods			custmj_plan_methods;
+
+/*
+ * custmjAddJoinPath
+ *
+ * A callback function to add custom version of merge-join logic towards
+ * the supplied relations join.
+ */
+static void
+custmjAddJoinPath(PlannerInfo *root,
+				  RelOptInfo *joinrel,
+				  RelOptInfo *outerrel,
+				  RelOptInfo *innerrel,
+				  JoinType jointype,
+				  SpecialJoinInfo *sjinfo,
+				  List *restrictlist,
+				  Relids param_source_rels,
+				  Relids extra_lateral_rels)
+{
+	List	   *mergeclause_list = NIL;
+	bool		mergejoin_allowed = true;
+	SemiAntiJoinFactors semifactors;
+
+	if (add_join_path_orig)
+		(*add_join_path_orig)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  param_source_rels,
+							  extra_lateral_rels);
+	/* nothing to do anymore */
+	if (!enable_custom_mergejoin)
+		return;
+
+	/*
+	 * Find potential mergejoin clauses.
+	 */
+   	mergeclause_list = select_mergejoin_clauses(root,
+												joinrel,
+												outerrel,
+												innerrel,
+												restrictlist,
+												jointype,
+												&mergejoin_allowed);
+	if (!mergejoin_allowed)
+		return;
+
+	/*
+     * If it's SEMI or ANTI join, compute correction factors for cost
+     * estimation.  These will be the same for all paths.
+     */
+    if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+        compute_semi_anti_join_factors(root, outerrel, innerrel,
+                                       jointype, sjinfo, restrictlist,
+                                       &semifactors);
+
+	/*
+	 * 1. Consider mergejoin paths where both relations must be explicitly
+	 * sorted.  Skip this if we can't mergejoin.
+	 */
+	sort_inner_and_outer(root, joinrel, outerrel, innerrel,
+						 restrictlist, mergeclause_list, jointype,
+						 sjinfo,
+						 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 2. Consider paths where the outer relation need not be explicitly
+	 * sorted. This includes both nestloops and mergejoins where the outer
+	 * path is already ordered.  Again, skip this if we can't mergejoin.
+	 * (That's okay because we know that nestloop can't handle right/full
+	 * joins at all, so it wouldn't work in the prohibited cases either.)
+	 */
+	match_unsorted_outer(root, joinrel, outerrel, innerrel,
+						 restrictlist, mergeclause_list, jointype,
+						 sjinfo, &semifactors,
+						 param_source_rels, extra_lateral_rels);
+}
+
+/*
+ * CreateCustomMergeJoinPlan
+ *
+ * A method to populate CustomPlan node according to the supplied
+ * CustomPath node; being choosen by the planner.
+ */
+static CustomPlan *
+CreateCustomMergeJoinPlan(PlannerInfo *root, CustomPath *custom_path)
+{
+	CustomMergePath	   *cmpath = (CustomMergePath *) custom_path;
+	CustomMergeJoin	   *cmjoin;
+	MergeJoin		   *mjplan;
+	Plan			   *outer_plan;
+	Plan			   *inner_plan;
+
+	/* plans the underlying relations */
+	outer_plan = create_plan_recurse(root, cmpath->outerjoinpath);
+	inner_plan = create_plan_recurse(root, cmpath->innerjoinpath);
+
+	mjplan = create_mergejoin_plan(root, cmpath, outer_plan, inner_plan);
+
+	/*
+     * If there are any pseudoconstant clauses attached to this node, insert a
+     * gating Result node that evaluates the pseudoconstants as one-time
+     * quals.
+     */
+    if (root->hasPseudoConstantQuals)
+        mjplan = (MergeJoin *)
+			create_gating_plan(root, &mjplan->join.plan,
+							   cmpath->joinrestrictinfo);
+
+	/* construct a CustomMergeJoin plan */
+	cmjoin = palloc0(sizeof(CustomMergeJoin));
+	cmjoin->cplan.plan = mjplan->join.plan;
+	cmjoin->cplan.plan.type = T_CustomPlan;
+	cmjoin->cplan.methods = &custmj_plan_methods;
+	cmjoin->jointype = mjplan->join.jointype;
+	cmjoin->joinqual = mjplan->join.joinqual;
+	cmjoin->mergeclauses = mjplan->mergeclauses;
+	cmjoin->mergeFamilies = mjplan->mergeFamilies;
+	cmjoin->mergeCollations = mjplan->mergeCollations;
+	cmjoin->mergeStrategies = mjplan->mergeStrategies;
+	cmjoin->mergeNullsFirst = mjplan->mergeNullsFirst;
+	pfree(mjplan);
+
+	return &cmjoin->cplan;
+}
+
+/*
+ * TextOutCustomMergeJoinPath
+ *
+ * A method to support nodeToString for CustomPath node
+ */
+static void
+TextOutCustomMergeJoinPath(StringInfo str, Node *node)
+{
+	CustomMergePath	*cmpath = (CustomMergePath *) node;
+	char			*temp;
+
+	/* common fields should be dumped by the core backend */
+	Assert(cmpath->cpath.methods == &custmj_path_methods);
+	appendStringInfo(str, " :jointype %d", cmpath->jointype);
+	temp = nodeToString(cmpath->outerjoinpath);
+	appendStringInfo(str, " :outerjoinpath %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->innerjoinpath);
+	appendStringInfo(str, " :innerjoinpath %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->joinrestrictinfo);
+	appendStringInfo(str, " :joinrestrictinfo %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->path_mergeclauses);
+	appendStringInfo(str, " :path_mergeclauses %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->outersortkeys);
+	appendStringInfo(str, " :outersortkeys %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->innersortkeys);
+	appendStringInfo(str, " :innersortkeys %s", temp);
+	pfree(temp);
+	appendStringInfo(str, " :materialize_inner %s",
+					 cmpath->materialize_inner ? "true" : "false");
+}
+
+/*
+ * SetCustomMergeJoinRef
+ *
+ * A method to adjust varno/varattno in the expression clauses.
+ */
+static void
+SetCustomMergeJoinRef(PlannerInfo *root,
+					  CustomPlan *custom_plan,
+					  int rtoffset)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *) custom_plan;
+	/* overall logic copied from set_join_references() */
+	Plan			*outer_plan = cmjoin->cplan.plan.lefttree;
+	Plan			*inner_plan = cmjoin->cplan.plan.righttree;
+	indexed_tlist	*outer_itlist;
+	indexed_tlist	*inner_itlist;
+
+	outer_itlist = build_tlist_index(outer_plan->targetlist);
+	inner_itlist = build_tlist_index(inner_plan->targetlist);
+
+	/* All join plans have tlist, qual, and joinqual */
+	cmjoin->cplan.plan.targetlist
+		= fix_join_expr(root,
+						cmjoin->cplan.plan.targetlist,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+	cmjoin->cplan.plan.qual
+		= fix_join_expr(root,
+						cmjoin->cplan.plan.qual,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+	cmjoin->joinqual
+		= fix_join_expr(root,
+						cmjoin->joinqual,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+
+	/* Now do join-type-specific stuff */
+	cmjoin->mergeclauses
+		= fix_join_expr(root,
+						cmjoin->mergeclauses,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+
+	/*
+	 * outer_itlist is saved to test GetSpecialCustomVar method; that
+	 * shows actual Var node referenced by special varno in EXPLAIN
+	 * command.
+	 */
+	cmjoin->outer_itlist = outer_itlist;
+
+	pfree(inner_itlist);
+}
+
+/*
+ * FinalizeCustomMergePlan
+ *
+ * A method to 
+ */
+static void
+FinalizeCustomMergePlan(PlannerInfo *root,
+						CustomPlan *custom_plan,
+						Bitmapset **p_paramids,
+						Bitmapset **p_valid_params,
+						Bitmapset **p_scan_params)
+{
+	CustomMergeJoin	   *cmjoin = (CustomMergeJoin *) custom_plan;
+	Bitmapset  *paramids = *p_paramids;
+
+	paramids = finalize_primnode(root,
+								 (Node *) cmjoin->joinqual,
+								 paramids);
+	paramids = finalize_primnode(root,
+								 (Node *) cmjoin->mergeclauses,
+								 paramids);
+	*p_paramids = paramids;
+}
+
+/*
+ * BeginCustomMergeJoin
+ *
+ * A method to populate CustomPlanState node according to the supplied
+ * CustomPlan node, and initialize this execution node itself.
+ */
+static CustomPlanState *
+BeginCustomMergeJoin(CustomPlan *cplan, EState *estate, int eflags)
+{
+	CustomMergeJoin		   *cmplan = (CustomMergeJoin *) cplan;
+	CustomMergeJoinState   *cmjs = palloc0(sizeof(CustomMergeJoinState));
+	MergeJoinState		   *mjs;
+
+	mjs = _ExecInitMergeJoin(cmplan, estate, eflags);
+	cmjs->cps.ps = mjs->js.ps;
+	cmjs->cps.ps.type = T_CustomPlanState;
+	cmjs->cps.methods = &custmj_plan_methods;
+	cmjs->jointype = mjs->js.jointype;
+	cmjs->joinqual = mjs->js.joinqual;
+	cmjs->mj_NumClauses = mjs->mj_NumClauses;
+	cmjs->mj_Clauses = mjs->mj_Clauses;
+	cmjs->mj_JoinState = mjs->mj_JoinState;
+	cmjs->mj_ExtraMarks = mjs->mj_ExtraMarks;
+	cmjs->mj_ConstFalseJoin = mjs->mj_ConstFalseJoin;
+	cmjs->mj_FillOuter = mjs->mj_FillOuter;
+	cmjs->mj_FillInner = mjs->mj_FillInner;
+	cmjs->mj_MatchedOuter = mjs->mj_MatchedOuter;
+	cmjs->mj_MatchedInner = mjs->mj_MatchedInner;
+	cmjs->mj_OuterTupleSlot = mjs->mj_OuterTupleSlot;
+	cmjs->mj_InnerTupleSlot = mjs->mj_InnerTupleSlot;
+	cmjs->mj_MarkedTupleSlot = mjs->mj_MarkedTupleSlot;
+	cmjs->mj_NullOuterTupleSlot = mjs->mj_NullOuterTupleSlot;
+	cmjs->mj_NullInnerTupleSlot = mjs->mj_NullInnerTupleSlot;
+	cmjs->mj_OuterEContext = mjs->mj_OuterEContext;
+	cmjs->mj_InnerEContext = mjs->mj_InnerEContext;
+	pfree(mjs);
+
+	/*
+	 * MEMO: In case when a custom-plan node replace a join by a scan,
+	 * like a situation to implement remote-join stuff that receives
+	 * a joined relation and scan on it, the extension should adjust
+	 * varno / varattno of Var nodes in the targetlist of PlanState,
+	 * instead of Plan.
+	 * Because the executor evaluates expression nodes in the targetlist
+	 * of PlanState, but EXPLAIN command shows Var names according to
+	 * the targetlist of Plan, it shall not work if you adjusted the
+	 * targetlist to reference the ecxt_scantuple of ExprContext.
+	 */
+
+	return &cmjs->cps;
+}
+
+/*
+ * ExecCustomMergeJoin
+ *
+ * A method to run this execution node
+ */
+static TupleTableSlot *
+ExecCustomMergeJoin(CustomPlanState *node)
+{
+	return _ExecMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * EndCustomMergeJoin
+ *
+ * A method to end this execution node
+ */
+static void
+EndCustomMergeJoin(CustomPlanState *node)
+{
+	_ExecEndMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * ReScanCustomMergeJoin
+ *
+ * A method to rescan this execution node
+ */
+static void
+ReScanCustomMergeJoin(CustomPlanState *node)
+{
+	_ExecReScanMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * ExplainCustomMergeJoinTargetRel
+ *
+ * A method to show target relation in EXPLAIN command.
+ */
+static void
+ExplainCustomMergeJoinTargetRel(CustomPlanState *node,
+								ExplainState *es)
+{
+	CustomMergeJoinState *cmjs = (CustomMergeJoinState *) node;
+	const char *jointype;
+
+	switch (cmjs->jointype)
+	{
+		case JOIN_INNER:
+			jointype = "Inner";
+			break;
+		case JOIN_LEFT:
+			jointype = "Left";
+			break;
+		case JOIN_FULL:
+			jointype = "Full";
+			break;
+		case JOIN_RIGHT:
+			jointype = "Right";
+			break;
+		case JOIN_SEMI:
+			jointype = "Semi";
+			break;
+		case JOIN_ANTI:
+			jointype = "Anti";
+			break;
+		default:
+			jointype = "???";
+			break;
+	}
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		if (cmjs->jointype != JOIN_INNER)
+			appendStringInfo(es->str, " %s Join", jointype);
+		else
+			appendStringInfoString(es->str, " Join");
+	}
+	else
+		ExplainPropertyText("Join Type", jointype, es);
+}
+
+/* a function copied from explain.c */
+static void
+show_upper_qual(List *qual, const char *qlabel,
+				PlanState *planstate, List *ancestors,
+				ExplainState *es)
+{
+	bool	useprefix = (list_length(es->rtable) > 1 || es->verbose);
+	Node   *node;
+	List   *context;
+    char   *exprstr;
+
+	/* No work if empty qual */
+	if (qual == NIL)
+		return;
+
+	/* Convert AND list to explicit AND */
+	node = (Node *) make_ands_explicit(qual);
+
+	/* And show it */
+	context = deparse_context_for_planstate((Node *) planstate,
+                                            ancestors,
+                                            es->rtable,
+                                            es->rtable_names);
+	exprstr = deparse_expression(node, context, useprefix, false);
+
+	ExplainPropertyText(qlabel, exprstr, es);
+}
+
+/* a function copied from explain.c */
+static void
+show_instrumentation_count(const char *qlabel, int which,
+                           PlanState *planstate, ExplainState *es)
+{
+	double		nfiltered;
+	double		nloops;
+
+	if (!es->analyze || !planstate->instrument)
+		return;
+
+	if (which == 2)
+		nfiltered = planstate->instrument->nfiltered2;
+	else
+		nfiltered = planstate->instrument->nfiltered1;
+	nloops = planstate->instrument->nloops;
+
+	/* In text mode, suppress zero counts; they're not interesting enough */
+	if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		if (nloops > 0)
+			ExplainPropertyFloat(qlabel, nfiltered / nloops, 0, es);
+		else
+			ExplainPropertyFloat(qlabel, 0.0, 0, es);
+	}
+}
+
+/*
+ * ExplainCustomMergeJoin
+ *
+ * A method to construct EXPLAIN output.
+ */
+static void
+ExplainCustomMergeJoin(CustomPlanState *node,
+					   List *ancestors,
+					   ExplainState *es)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *)node->ps.plan;
+
+	show_upper_qual(cmjoin->mergeclauses,
+					"Merge Cond", &node->ps, ancestors, es);
+	show_upper_qual(cmjoin->joinqual,
+					"Join Filter", &node->ps, ancestors, es);
+	if (cmjoin->joinqual)
+		show_instrumentation_count("Rows Removed by Join Filter", 1,
+								   &node->ps, es);
+	show_upper_qual(cmjoin->cplan.plan.qual,
+					"Filter", &node->ps, ancestors, es);
+	if (cmjoin->cplan.plan.qual)
+		show_instrumentation_count("Rows Removed by Filter", 2,
+								   &node->ps, es);
+}
+
+/*
+ * GetRelidsCustomMergeJoin
+ *
+ * A method to inform underlying range-table indexes.
+ */
+static Bitmapset *
+GetRelidsCustomMergeJoin(CustomPlanState *node)
+{
+	Bitmapset  *result = NULL;
+
+	if (outerPlanState(&node->ps))
+		ExplainPreScanNode(outerPlanState(&node->ps), &result);
+	if (innerPlanState(&node->ps))
+		ExplainPreScanNode(innerPlanState(&node->ps), &result);
+
+	return result;
+}
+
+/*
+ * GetSpecialCustomMergeVar
+ *
+ * Test handler of GetSpecialCustomVar method.
+ * In case when a custom-plan node replaced a join node but does not have
+ * two underlying sub-plan, like a remote join feature that retrieves one
+ * flat result set, EXPLAIN command cannot resolve name of the columns
+ * being referenced by special varno (INNER_VAR, OUTER_VAR or INDEX_VAR)
+ * because it tries to walk on the underlying sub-plan to be thre.
+ * However, such kind of custom-plan node does not have, because it replaces
+ * a part of plan sub-tree by one custom-plan node. In this case, custom-
+ * plan provider has to return an expression node that is referenced by
+ * the Var node with special varno.
+ */
+static Node *
+GetSpecialCustomMergeVar(CustomPlanState *cpstate, Var *varnode)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *)cpstate->ps.plan;
+	indexed_tlist	*itlist;
+	int		i;
+
+	if (varnode->varno != OUTER_VAR)
+		return NULL;
+
+	itlist = cmjoin->outer_itlist;
+	for (i=0; i < itlist->num_vars; i++)
+	{
+		if (itlist->vars[i].resno == varnode->varattno)
+		{
+			Var	   *newnode = copyObject(varnode);
+
+			newnode->varno = itlist->vars[i].varno;
+			newnode->varattno = itlist->vars[i].varattno;
+
+			elog(DEBUG2, "%s: (OUTER_VAR,%d) is reference to (%d,%d)",
+				 __FUNCTION__,
+				 varnode->varattno, newnode->varno, newnode->varattno);
+
+			return (Node *) newnode;
+		}
+	}
+	elog(ERROR, "outer_itlist has no entry for Var: %s",
+		 nodeToString(varnode));
+	return NULL;
+}
+
+/*
+ * TextOutCustomMergeJoin
+ *		nodeToString() support in CustomMergeJoin
+ */
+static void
+TextOutCustomMergeJoin(StringInfo str, const CustomPlan *node)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *) node;
+	char   *temp;
+	int		i, num;
+
+	/* common fields should be dumped by the core backend */
+	Assert(cmjoin->cplan.methods == &custmj_plan_methods);
+	appendStringInfo(str, " :jointype %d", cmjoin->jointype);
+	temp = nodeToString(cmjoin->joinqual);
+	appendStringInfo(str, " :joinqual %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmjoin->mergeclauses);
+	appendStringInfo(str, " :mergeclauses %s", temp);
+	pfree(temp);
+
+	num = list_length(cmjoin->mergeclauses);
+	appendStringInfoString(str, " :mergeFamilies");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %u", cmjoin->mergeFamilies[i]);
+	appendStringInfoString(str, " :mergeCollations");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %u", cmjoin->mergeCollations[i]);
+	appendStringInfoString(str, " :mergeStrategies");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %d", cmjoin->mergeStrategies[i]);
+	appendStringInfoString(str, " :mergeNullsFirst");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %d", (int) cmjoin->mergeNullsFirst[i]);
+}
+
+/*
+ * CopyCustomMergeJoin
+ *		copyObject() support in CustomMergeJoin
+ */
+static CustomPlan *
+CopyCustomMergeJoin(const CustomPlan *from)
+{
+	const CustomMergeJoin *oldnode = (const CustomMergeJoin *) from;
+	CustomMergeJoin *newnode  = palloc(sizeof(CustomMergeJoin));
+	int		num;
+
+	/* copying the common fields */
+	CopyCustomPlanCommon((const Node *) oldnode, (Node *) newnode);
+
+	newnode->jointype = oldnode->jointype;
+	newnode->joinqual = copyObject(oldnode->joinqual);
+	newnode->mergeclauses = copyObject(oldnode->mergeclauses);
+	num = list_length(oldnode->mergeclauses);
+	newnode->mergeFamilies = palloc(sizeof(Oid) * num);
+	memcpy(newnode->mergeFamilies,
+		   oldnode->mergeFamilies,
+		   sizeof(Oid) * num);
+	newnode->mergeCollations = palloc(sizeof(Oid) * num);
+	memcpy(newnode->mergeCollations,
+		   oldnode->mergeCollations,
+		   sizeof(Oid) * num);
+	newnode->mergeStrategies = palloc(sizeof(int) * num);
+	memcpy(newnode->mergeStrategies,
+		   oldnode->mergeStrategies,
+		   sizeof(int) * num);
+	newnode->mergeNullsFirst = palloc(sizeof(bool) * num);
+	memcpy(newnode->mergeNullsFirst,
+		   oldnode->mergeNullsFirst,
+		   sizeof(bool) * num);
+	num = oldnode->outer_itlist->num_vars;
+	newnode->outer_itlist = palloc(offsetof(indexed_tlist, vars[num]));
+	memcpy(newnode->outer_itlist,
+		   oldnode->outer_itlist,
+		   offsetof(indexed_tlist, vars[num]));
+
+	return &newnode->cplan;
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	/* "custnl.enabled" to control availability of this module */
+	DefineCustomBoolVariable("enable_custom_mergejoin",
+							 "enables the planner's use of custom merge join",
+							 NULL,
+							 &enable_custom_mergejoin,
+							 true,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* methods of CustomMergeJoinPath */
+	memset(&custmj_path_methods, 0, sizeof(CustomPathMethods));
+	custmj_path_methods.CustomName = "CustomMergeJoin";
+	custmj_path_methods.CreateCustomPlan = CreateCustomMergeJoinPlan;
+	custmj_path_methods.TextOutCustomPath = TextOutCustomMergeJoinPath;
+
+	/* methods of CustomMergeJoinPlan */
+	memset(&custmj_plan_methods, 0, sizeof(CustomPlanMethods));
+	custmj_plan_methods.CustomName = "CustomMergeJoin";
+	custmj_plan_methods.SetCustomPlanRef = SetCustomMergeJoinRef;
+	custmj_plan_methods.SupportBackwardScan = NULL;
+	custmj_plan_methods.FinalizeCustomPlan = FinalizeCustomMergePlan;
+	custmj_plan_methods.BeginCustomPlan = BeginCustomMergeJoin;
+	custmj_plan_methods.ExecCustomPlan = ExecCustomMergeJoin;
+	custmj_plan_methods.EndCustomPlan = EndCustomMergeJoin;
+	custmj_plan_methods.ReScanCustomPlan = ReScanCustomMergeJoin;
+	custmj_plan_methods.ExplainCustomPlanTargetRel
+		= ExplainCustomMergeJoinTargetRel;
+	custmj_plan_methods.ExplainCustomPlan = ExplainCustomMergeJoin;
+	custmj_plan_methods.GetRelidsCustomPlan = GetRelidsCustomMergeJoin;
+	custmj_plan_methods.GetSpecialCustomVar = GetSpecialCustomMergeVar;
+	custmj_plan_methods.TextOutCustomPlan = TextOutCustomMergeJoin;
+	custmj_plan_methods.CopyCustomPlan = CopyCustomMergeJoin;
+
+	/* hook registration */
+	add_join_path_orig = add_join_path_hook;
+	add_join_path_hook = custmjAddJoinPath;
+
+	elog(INFO, "MergeJoin logic on top of CustomPlan interface");
+}
diff --git a/contrib/custmj/custmj.h b/contrib/custmj/custmj.h
new file mode 100644
index 0000000..732bbff
--- /dev/null
+++ b/contrib/custmj/custmj.h
@@ -0,0 +1,148 @@
+/*
+ * definitions related to custom version of merge join
+ */
+#ifndef CUSTMJ_H
+#define CUSTMJ_H
+#include "nodes/nodes.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+typedef struct
+{
+	CustomPath	cpath;
+	/* fields come from JoinPath */
+	JoinType    jointype;
+    Path       *outerjoinpath;  /* path for the outer side of the join */
+    Path       *innerjoinpath;  /* path for the inner side of the join */
+    List       *joinrestrictinfo;       /* RestrictInfos to apply to join */
+	/* fields come from MergePath */
+	List       *path_mergeclauses;      /* join clauses to be used for merge */
+	List       *outersortkeys;  /* keys for explicit sort, if any */
+	List       *innersortkeys;  /* keys for explicit sort, if any */
+	bool        materialize_inner;      /* add Materialize to inner? */
+} CustomMergePath;
+
+struct indexed_tlist;
+
+typedef struct
+{
+	CustomPlan	cplan;
+	/* fields come from Join */
+	JoinType	jointype;
+	List	   *joinqual;
+	/* fields come from MergeJoin */
+	List	   *mergeclauses;   /* mergeclauses as expression trees */
+	/* these are arrays, but have the same length as the mergeclauses list: */
+	Oid		   *mergeFamilies;  /* per-clause OIDs of btree opfamilies */
+	Oid		   *mergeCollations;    /* per-clause OIDs of collations */
+	int		   *mergeStrategies;    /* per-clause ordering (ASC or DESC) */
+	bool	   *mergeNullsFirst;    /* per-clause nulls ordering */
+	/* for transvar testing */
+	struct indexed_tlist *outer_itlist;
+} CustomMergeJoin;
+
+typedef struct
+{
+	CustomPlanState	cps;
+	/* fields come from JoinState */
+	JoinType	jointype;
+	List	   *joinqual;		/* JOIN quals (in addition to ps.qual) */
+	/* fields come from MergeJoinState */
+	int			mj_NumClauses;
+	MergeJoinClause mj_Clauses; /* array of length mj_NumClauses */
+	int			mj_JoinState;
+	bool		mj_ExtraMarks;
+	bool		mj_ConstFalseJoin;
+	bool		mj_FillOuter;
+	bool		mj_FillInner;
+	bool		mj_MatchedOuter;
+	bool		mj_MatchedInner;
+	TupleTableSlot *mj_OuterTupleSlot;
+	TupleTableSlot *mj_InnerTupleSlot;
+	TupleTableSlot *mj_MarkedTupleSlot;
+	TupleTableSlot *mj_NullOuterTupleSlot;
+	TupleTableSlot *mj_NullInnerTupleSlot;
+	ExprContext *mj_OuterEContext;
+	ExprContext *mj_InnerEContext;
+} CustomMergeJoinState;
+
+/* custmj.c */
+extern bool						enable_custom_mergejoin;
+extern CustomPathMethods		custmj_path_methods;
+extern CustomPlanMethods		custmj_plan_methods;
+
+extern void	_PG_init(void);
+
+/* joinpath.c */
+extern List *select_mergejoin_clauses(PlannerInfo *root,
+									  RelOptInfo *joinrel,
+									  RelOptInfo *outerrel,
+									  RelOptInfo *innerrel,
+									  List *restrictlist,
+									  JoinType jointype,
+									  bool *mergejoin_allowed);
+
+extern void sort_inner_and_outer(PlannerInfo *root,
+								 RelOptInfo *joinrel,
+								 RelOptInfo *outerrel,
+								 RelOptInfo *innerrel,
+								 List *restrictlist,
+								 List *mergeclause_list,
+								 JoinType jointype,
+								 SpecialJoinInfo *sjinfo,
+								 Relids param_source_rels,
+								 Relids extra_lateral_rels);
+
+extern void match_unsorted_outer(PlannerInfo *root,
+								 RelOptInfo *joinrel,
+								 RelOptInfo *outerrel,
+								 RelOptInfo *innerrel,
+								 List *restrictlist,
+								 List *mergeclause_list,
+								 JoinType jointype,
+								 SpecialJoinInfo *sjinfo,
+								 SemiAntiJoinFactors *semifactors,
+								 Relids param_source_rels,
+								 Relids extra_lateral_rels);
+
+/* createplan.c */
+extern MergeJoin *create_mergejoin_plan(PlannerInfo *root,
+										CustomMergePath *best_path,
+										Plan *outer_plan,
+										Plan *inner_plan);
+extern Plan *create_gating_plan(PlannerInfo *root, Plan *plan, List *quals);
+
+/* setrefs.c */
+typedef struct tlist_vinfo
+{
+	Index		varno;			/* RT index of Var */
+	AttrNumber	varattno;		/* attr number of Var */
+	AttrNumber	resno;			/* TLE position of Var */
+} tlist_vinfo;
+
+typedef struct indexed_tlist
+{
+	List	   *tlist;			/* underlying target list */
+	int			num_vars;		/* number of plain Var tlist entries */
+	bool		has_ph_vars;	/* are there PlaceHolderVar entries? */
+	bool		has_non_vars;	/* are there other entries? */
+	/* array of num_vars entries: */
+	tlist_vinfo vars[1];		/* VARIABLE LENGTH ARRAY */
+} indexed_tlist;				/* VARIABLE LENGTH STRUCT */
+
+extern indexed_tlist *build_tlist_index(List *tlist);
+extern List *fix_join_expr(PlannerInfo *root,
+						   List *clauses,
+						   indexed_tlist *outer_itlist,
+						   indexed_tlist *inner_itlist,
+						   Index acceptable_rel,
+						   int rtoffset);
+/* nodeMergejoin.c */
+extern MergeJoinState *_ExecInitMergeJoin(CustomMergeJoin *node,
+										  EState *estate,
+										  int eflags);
+extern TupleTableSlot *_ExecMergeJoin(CustomMergeJoinState *node);
+extern void _ExecEndMergeJoin(CustomMergeJoinState *node);
+extern void _ExecReScanMergeJoin(CustomMergeJoinState *node);
+
+#endif	/* CUSTMJ_H */
diff --git a/contrib/custmj/expected/custmj.out b/contrib/custmj/expected/custmj.out
new file mode 100644
index 0000000..19ba188
--- /dev/null
+++ b/contrib/custmj/expected/custmj.out
@@ -0,0 +1,378 @@
+-- regression test for custmj extension
+--
+-- initial setup
+--
+CREATE TABLE t1 (a int, b text);
+CREATE TABLE t2 (x int, y text);
+CREATE TABLE t3 (n int primary key, m text);
+CREATE TABLE t4 (s int references t3(n), t text);
+INSERT INTO t1 (SELECT x, md5(x::text) FROM generate_series(  1,600) x);
+INSERT INTO t2 (SELECT x, md5(x::text) FROM generate_series(401,800) x);
+INSERT INTO t3 (SELECT x, md5(x::text) FROM generate_series(  1,800) x);
+INSERT INTO t4 (SELECT x, md5(x::text) FROM generate_series(201,600) x);
+VACUUM ANALYZE t1;
+VACUUM ANALYZE t2;
+VACUUM ANALYZE t3;
+VACUUM ANALYZE t4;
+-- LOAD this extension
+LOAD 'custmj';
+INFO:  MergeJoin logic on top of CustomPlan interface
+--
+-- explain output
+--
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Hash Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Hash Cond: (t1.a = t2.x)
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+   ->  Hash
+         Output: t2.x, t2.y
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Hash Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Hash Cond: (t1.a = t2.x)
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+   ->  Hash
+         Output: t2.x, t2.y
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+            QUERY PLAN             
+-----------------------------------
+ Hash Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Hash Cond: (t3.n = t4.s)
+   ->  Seq Scan on public.t3
+         Output: t3.n, t3.m
+   ->  Hash
+         Output: t4.s, t4.t
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+            QUERY PLAN             
+-----------------------------------
+ Hash Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Hash Cond: (t3.n = t4.s)
+   ->  Seq Scan on public.t3
+         Output: t3.n, t3.m
+   ->  Hash
+         Output: t4.s, t4.t
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(9 rows)
+
+-- force off hash_join
+SET enable_hashjoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Merge Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO bmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Merge Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO bmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO bmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO bmj4 FROM t3 FULL JOIN t4 ON n = s;
+-- force off built-in merge_join
+SET enable_mergejoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO cmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+             QUERY PLAN             
+------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO cmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO cmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO cmj4 FROM t3 FULL JOIN t4 ON n = s;
+-- compare the difference of simple result
+SELECT * FROM bmj1 EXCEPT SELECT * FROM cmj1;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj1 EXCEPT SELECT * FROM bmj1;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj2 EXCEPT SELECT * FROM cmj2;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj2 EXCEPT SELECT * FROM bmj2;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj3 EXCEPT SELECT * FROM cmj3;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj3 EXCEPT SELECT * FROM bmj3;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj4 EXCEPT SELECT * FROM cmj4;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj4 EXCEPT SELECT * FROM bmj4;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+-- a little bit complicated
+EXPLAIN (verbose, costs off)
+  SELECT (a + x + n) % s AS c1, md5(b || y || m || t) AS c2
+  FROM ((t1 join t2 on a = x) join t3 on y = m) join t4 on n = s
+  WHERE b like '%ab%' AND y like '%cd%' AND m like t;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Nested Loop
+   Output: (((t1.a + t2.x) + t3.n) % t4.s), md5((((t1.b || t2.y) || t3.m) || t4.t))
+   Join Filter: (t2.x = t1.a)
+   ->  Nested Loop
+         Output: t2.x, t2.y, t3.n, t3.m, t4.s, t4.t
+         Join Filter: (t3.m = t2.y)
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+               Filter: (t2.y ~~ '%cd%'::text)
+         ->  Materialize
+               Output: t3.n, t3.m, t4.s, t4.t
+               ->  Custom (CustomMergeJoin) Join
+                     Output: t3.n, t3.m, t4.s, t4.t
+                     Merge Cond: (t3.n = t4.s)
+                     Join Filter: (t3.m ~~ t4.t)
+                     ->  Index Scan using t3_pkey on public.t3
+                           Output: t3.n, t3.m
+                     ->  Sort
+                           Output: t4.s, t4.t
+                           Sort Key: t4.s
+                           ->  Seq Scan on public.t4
+                                 Output: t4.s, t4.t
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+         Filter: (t1.b ~~ '%ab%'::text)
+(25 rows)
+
+PREPARE p1(int,int) AS
+SELECT * FROM t1 JOIN t3 ON a = n WHERE n BETWEEN $1 AND $2;
+EXPLAIN (verbose, costs off) EXECUTE p1(100,100);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Nested Loop
+   Output: t1.a, t1.b, t3.n, t3.m
+   Join Filter: (t1.a = t3.n)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+         Index Cond: ((t3.n >= 100) AND (t3.n <= 100))
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+(8 rows)
+
+EXPLAIN (verbose, costs off) EXECUTE p1(100,1000);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t3.n, t3.m
+   Merge Cond: (t3.n = t1.a)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+         Index Cond: ((t3.n >= 100) AND (t3.n <= 1000))
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+(11 rows)
+
+EXPLAIN (verbose, costs off)
+SELECT * FROM t1 JOIN t2 ON a = x WHERE x IN (SELECT n % 100 FROM t3);
+                   QUERY PLAN                   
+------------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t2.x = t1.a)
+   ->  Custom (CustomMergeJoin) Semi Join
+         Output: t2.x, t2.y, t3.n
+         Merge Cond: (t2.x = ((t3.n % 100)))
+         ->  Sort
+               Output: t2.x, t2.y
+               Sort Key: t2.x
+               ->  Seq Scan on public.t2
+                     Output: t2.x, t2.y
+         ->  Sort
+               Output: t3.n, ((t3.n % 100))
+               Sort Key: ((t3.n % 100))
+               ->  Seq Scan on public.t3
+                     Output: t3.n, (t3.n % 100)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+(21 rows)
+
+-- check GetSpecialCustomVar stuff
+SET client_min_messages = debug;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,1) is reference to (1,1)
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,2) is reference to (1,2)
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,1) is reference to (1,1)
+             QUERY PLAN             
+------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
diff --git a/contrib/custmj/joinpath.c b/contrib/custmj/joinpath.c
new file mode 100644
index 0000000..9ef940b
--- /dev/null
+++ b/contrib/custmj/joinpath.c
@@ -0,0 +1,988 @@
+/*-------------------------------------------------------------------------
+ *
+ * joinpath.c
+ *	  Routines to find all possible paths for processing a set of joins
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/joinpath.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "executor/executor.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "custmj.h"
+
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
+
+#define PATH_PARAM_BY_REL(path, rel)  \
+	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+
+/*
+ * try_nestloop_path
+ *	  Consider a nestloop join path; if it appears useful, push it into
+ *	  the joinrel's pathlist via add_path().
+ */
+static void
+try_nestloop_path(PlannerInfo *root,
+				  RelOptInfo *joinrel,
+				  JoinType jointype,
+				  SpecialJoinInfo *sjinfo,
+				  SemiAntiJoinFactors *semifactors,
+				  Relids param_source_rels,
+				  Relids extra_lateral_rels,
+				  Path *outer_path,
+				  Path *inner_path,
+				  List *restrict_clauses,
+				  List *pathkeys)
+{
+	Relids		required_outer;
+	JoinCostWorkspace workspace;
+
+	/*
+	 * Check to see if proposed path is still parameterized, and reject if the
+	 * parameterization wouldn't be sensible.
+	 */
+	required_outer = calc_nestloop_required_outer(outer_path,
+												  inner_path);
+	if (required_outer &&
+		!bms_overlap(required_outer, param_source_rels))
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+		return;
+	}
+
+	/*
+	 * Independently of that, add parameterization needed for any
+	 * PlaceHolderVars that need to be computed at the join.
+	 */
+	required_outer = bms_add_members(required_outer, extra_lateral_rels);
+
+	/*
+	 * Do a precheck to quickly eliminate obviously-inferior paths.  We
+	 * calculate a cheap lower bound on the path's cost and then use
+	 * add_path_precheck() to see if the path is clearly going to be dominated
+	 * by some existing path for the joinrel.  If not, do the full pushup with
+	 * creating a fully valid path structure and submitting it to add_path().
+	 * The latter two steps are expensive enough to make this two-phase
+	 * methodology worthwhile.
+	 */
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_path,
+						  sjinfo, semifactors);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		add_path(joinrel, (Path *)
+				 create_nestloop_path(root,
+									  joinrel,
+									  jointype,
+									  &workspace,
+									  sjinfo,
+									  semifactors,
+									  outer_path,
+									  inner_path,
+									  restrict_clauses,
+									  pathkeys,
+									  required_outer));
+	}
+	else
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+	}
+}
+
+/*
+ * try_mergejoin_path
+ *	  Consider a merge join path; if it appears useful, push it into
+ *	  the joinrel's pathlist via add_path().
+ */
+static void
+try_mergejoin_path(PlannerInfo *root,
+				   RelOptInfo *joinrel,
+				   JoinType jointype,
+				   SpecialJoinInfo *sjinfo,
+				   Relids param_source_rels,
+				   Relids extra_lateral_rels,
+				   Path *outer_path,
+				   Path *inner_path,
+				   List *restrict_clauses,
+				   List *pathkeys,
+				   List *mergeclauses,
+				   List *outersortkeys,
+				   List *innersortkeys)
+{
+	Relids		required_outer;
+	JoinCostWorkspace workspace;
+
+	/*
+	 * Check to see if proposed path is still parameterized, and reject if the
+	 * parameterization wouldn't be sensible.
+	 */
+	required_outer = calc_non_nestloop_required_outer(outer_path,
+													  inner_path);
+	if (required_outer &&
+		!bms_overlap(required_outer, param_source_rels))
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+		return;
+	}
+
+	/*
+	 * Independently of that, add parameterization needed for any
+	 * PlaceHolderVars that need to be computed at the join.
+	 */
+	required_outer = bms_add_members(required_outer, extra_lateral_rels);
+
+	/*
+	 * If the given paths are already well enough ordered, we can skip doing
+	 * an explicit sort.
+	 */
+	if (outersortkeys &&
+		pathkeys_contained_in(outersortkeys, outer_path->pathkeys))
+		outersortkeys = NIL;
+	if (innersortkeys &&
+		pathkeys_contained_in(innersortkeys, inner_path->pathkeys))
+		innersortkeys = NIL;
+
+	/*
+	 * See comments in try_nestloop_path().
+	 */
+	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
+						   outer_path, inner_path,
+						   outersortkeys, innersortkeys,
+						   sjinfo);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/* KG: adjust to create CustomMergePath, instead of MergePath */
+		CustomMergePath	   *cmpath;
+		MergePath		   *mpath
+			= create_mergejoin_path(root,
+									joinrel,
+									jointype,
+									&workspace,
+									sjinfo,
+									outer_path,
+									inner_path,
+									restrict_clauses,
+									pathkeys,
+									required_outer,
+									mergeclauses,
+									outersortkeys,
+									innersortkeys);
+
+		/* adjust cost according to enable_(custom)_mergejoin GUCs */
+		if (!enable_mergejoin && enable_custom_mergejoin)
+		{
+			mpath->jpath.path.startup_cost -= disable_cost;
+			mpath->jpath.path.total_cost -= disable_cost;
+		}
+		else if (enable_mergejoin && !enable_custom_mergejoin)
+		{
+			mpath->jpath.path.startup_cost += disable_cost;
+			mpath->jpath.path.total_cost += disable_cost;
+		}
+
+		/* construct CustomMergePath object */
+		cmpath = palloc0(sizeof(CustomMergePath));
+		cmpath->cpath.path = mpath->jpath.path;
+		cmpath->cpath.path.type = T_CustomPath;
+		cmpath->cpath.path.pathtype = T_CustomPlan;
+		cmpath->cpath.methods = &custmj_path_methods;
+		cmpath->jointype = mpath->jpath.jointype;
+		cmpath->outerjoinpath = mpath->jpath.outerjoinpath;
+		cmpath->innerjoinpath = mpath->jpath.innerjoinpath;
+		cmpath->joinrestrictinfo = mpath->jpath.joinrestrictinfo;
+		cmpath->path_mergeclauses = mpath->path_mergeclauses;
+		cmpath->outersortkeys = mpath->outersortkeys;
+		cmpath->innersortkeys = mpath->innersortkeys;
+		cmpath->materialize_inner = mpath->materialize_inner;
+
+		add_path(joinrel, &cmpath->cpath.path);
+	}
+	else
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+	}
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.	If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static inline bool
+clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
+						RelOptInfo *innerrel)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrel->relids) &&
+		bms_is_subset(rinfo->right_relids, innerrel->relids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrel->relids) &&
+			 bms_is_subset(rinfo->right_relids, outerrel->relids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+/*
+ * sort_inner_and_outer
+ *	  Create mergejoin join paths by explicitly sorting both the outer and
+ *	  inner join relations on each available merge ordering.
+ *
+ * 'joinrel' is the join relation
+ * 'outerrel' is the outer join relation
+ * 'innerrel' is the inner join relation
+ * 'restrictlist' contains all of the RestrictInfo nodes for restriction
+ *		clauses that apply to this join
+ * 'mergeclause_list' is a list of RestrictInfo nodes for available
+ *		mergejoin clauses in this join
+ * 'jointype' is the type of join to do
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'param_source_rels' are OK targets for parameterization of result paths
+ * 'extra_lateral_rels' are additional parameterization for result paths
+ */
+void
+sort_inner_and_outer(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Path	   *outer_path;
+	Path	   *inner_path;
+	List	   *all_pathkeys;
+	ListCell   *l;
+
+	/*
+	 * We only consider the cheapest-total-cost input paths, since we are
+	 * assuming here that a sort is required.  We will consider
+	 * cheapest-startup-cost input paths later, and only if they don't need a
+	 * sort.
+	 *
+	 * This function intentionally does not consider parameterized input
+	 * paths, except when the cheapest-total is parameterized.	If we did so,
+	 * we'd have a combinatorial explosion of mergejoin paths of dubious
+	 * value.  This interacts with decisions elsewhere that also discriminate
+	 * against mergejoins with parameterized inputs; see comments in
+	 * src/backend/optimizer/README.
+	 */
+	outer_path = outerrel->cheapest_total_path;
+	inner_path = innerrel->cheapest_total_path;
+
+	/*
+	 * If either cheapest-total path is parameterized by the other rel, we
+	 * can't use a mergejoin.  (There's no use looking for alternative input
+	 * paths, since these should already be the least-parameterized available
+	 * paths.)
+	 */
+	if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
+		PATH_PARAM_BY_REL(inner_path, outerrel))
+		return;
+
+	/*
+	 * If unique-ification is requested, do it and then handle as a plain
+	 * inner join.
+	 */
+	if (jointype == JOIN_UNIQUE_OUTER)
+	{
+		outer_path = (Path *) create_unique_path(root, outerrel,
+												 outer_path, sjinfo);
+		Assert(outer_path);
+		jointype = JOIN_INNER;
+	}
+	else if (jointype == JOIN_UNIQUE_INNER)
+	{
+		inner_path = (Path *) create_unique_path(root, innerrel,
+												 inner_path, sjinfo);
+		Assert(inner_path);
+		jointype = JOIN_INNER;
+	}
+
+	/*
+	 * Each possible ordering of the available mergejoin clauses will generate
+	 * a differently-sorted result path at essentially the same cost.  We have
+	 * no basis for choosing one over another at this level of joining, but
+	 * some sort orders may be more useful than others for higher-level
+	 * mergejoins, so it's worth considering multiple orderings.
+	 *
+	 * Actually, it's not quite true that every mergeclause ordering will
+	 * generate a different path order, because some of the clauses may be
+	 * partially redundant (refer to the same EquivalenceClasses).	Therefore,
+	 * what we do is convert the mergeclause list to a list of canonical
+	 * pathkeys, and then consider different orderings of the pathkeys.
+	 *
+	 * Generating a path for *every* permutation of the pathkeys doesn't seem
+	 * like a winning strategy; the cost in planning time is too high. For
+	 * now, we generate one path for each pathkey, listing that pathkey first
+	 * and the rest in random order.  This should allow at least a one-clause
+	 * mergejoin without re-sorting against any other possible mergejoin
+	 * partner path.  But if we've not guessed the right ordering of secondary
+	 * keys, we may end up evaluating clauses as qpquals when they could have
+	 * been done as mergeclauses.  (In practice, it's rare that there's more
+	 * than two or three mergeclauses, so expending a huge amount of thought
+	 * on that is probably not worth it.)
+	 *
+	 * The pathkey order returned by select_outer_pathkeys_for_merge() has
+	 * some heuristics behind it (see that function), so be sure to try it
+	 * exactly as-is as well as making variants.
+	 */
+	all_pathkeys = select_outer_pathkeys_for_merge(root,
+												   mergeclause_list,
+												   joinrel);
+
+	foreach(l, all_pathkeys)
+	{
+		List	   *front_pathkey = (List *) lfirst(l);
+		List	   *cur_mergeclauses;
+		List	   *outerkeys;
+		List	   *innerkeys;
+		List	   *merge_pathkeys;
+
+		/* Make a pathkey list with this guy first */
+		if (l != list_head(all_pathkeys))
+			outerkeys = lcons(front_pathkey,
+							  list_delete_ptr(list_copy(all_pathkeys),
+											  front_pathkey));
+		else
+			outerkeys = all_pathkeys;	/* no work at first one... */
+
+		/* Sort the mergeclauses into the corresponding ordering */
+		cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
+														  outerkeys,
+														  true,
+														  mergeclause_list);
+
+		/* Should have used them all... */
+		Assert(list_length(cur_mergeclauses) == list_length(mergeclause_list));
+
+		/* Build sort pathkeys for the inner side */
+		innerkeys = make_inner_pathkeys_for_merge(root,
+												  cur_mergeclauses,
+												  outerkeys);
+
+		/* Build pathkeys representing output sort order */
+		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
+											 outerkeys);
+
+		/*
+		 * And now we can make the path.
+		 *
+		 * Note: it's possible that the cheapest paths will already be sorted
+		 * properly.  try_mergejoin_path will detect that case and suppress an
+		 * explicit sort step, so we needn't do so here.
+		 */
+		try_mergejoin_path(root,
+						   joinrel,
+						   jointype,
+						   sjinfo,
+						   param_source_rels,
+						   extra_lateral_rels,
+						   outer_path,
+						   inner_path,
+						   restrictlist,
+						   merge_pathkeys,
+						   cur_mergeclauses,
+						   outerkeys,
+						   innerkeys);
+	}
+}
+
+/*
+ * match_unsorted_outer
+ *	  Creates possible join paths for processing a single join relation
+ *	  'joinrel' by employing either iterative substitution or
+ *	  mergejoining on each of its possible outer paths (considering
+ *	  only outer paths that are already ordered well enough for merging).
+ *
+ * We always generate a nestloop path for each available outer path.
+ * In fact we may generate as many as five: one on the cheapest-total-cost
+ * inner path, one on the same with materialization, one on the
+ * cheapest-startup-cost inner path (if different), one on the
+ * cheapest-total inner-indexscan path (if any), and one on the
+ * cheapest-startup inner-indexscan path (if different).
+ *
+ * We also consider mergejoins if mergejoin clauses are available.	We have
+ * two ways to generate the inner path for a mergejoin: sort the cheapest
+ * inner path, or use an inner path that is already suitably ordered for the
+ * merge.  If we have several mergeclauses, it could be that there is no inner
+ * path (or only a very expensive one) for the full list of mergeclauses, but
+ * better paths exist if we truncate the mergeclause list (thereby discarding
+ * some sort key requirements).  So, we consider truncations of the
+ * mergeclause list as well as the full list.  (Ideally we'd consider all
+ * subsets of the mergeclause list, but that seems way too expensive.)
+ *
+ * 'joinrel' is the join relation
+ * 'outerrel' is the outer join relation
+ * 'innerrel' is the inner join relation
+ * 'restrictlist' contains all of the RestrictInfo nodes for restriction
+ *		clauses that apply to this join
+ * 'mergeclause_list' is a list of RestrictInfo nodes for available
+ *		mergejoin clauses in this join
+ * 'jointype' is the type of join to do
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'semifactors' contains valid data if jointype is SEMI or ANTI
+ * 'param_source_rels' are OK targets for parameterization of result paths
+ * 'extra_lateral_rels' are additional parameterization for result paths
+ */
+void
+match_unsorted_outer(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	JoinType	save_jointype = jointype;
+	bool		nestjoinOK;
+	bool		useallclauses;
+	Path	   *inner_cheapest_total = innerrel->cheapest_total_path;
+	Path	   *matpath = NULL;
+	ListCell   *lc1;
+
+	/*
+	 * Nestloop only supports inner, left, semi, and anti joins.  Also, if we
+	 * are doing a right or full mergejoin, we must use *all* the mergeclauses
+	 * as join clauses, else we will not have a valid plan.  (Although these
+	 * two flags are currently inverses, keep them separate for clarity and
+	 * possible future changes.)
+	 */
+	switch (jointype)
+	{
+		case JOIN_INNER:
+		case JOIN_LEFT:
+		case JOIN_SEMI:
+		case JOIN_ANTI:
+			nestjoinOK = true;
+			useallclauses = false;
+			break;
+		case JOIN_RIGHT:
+		case JOIN_FULL:
+			nestjoinOK = false;
+			useallclauses = true;
+			break;
+		case JOIN_UNIQUE_OUTER:
+		case JOIN_UNIQUE_INNER:
+			jointype = JOIN_INNER;
+			nestjoinOK = true;
+			useallclauses = false;
+			break;
+		default:
+			elog(ERROR, "unrecognized join type: %d",
+				 (int) jointype);
+			nestjoinOK = false; /* keep compiler quiet */
+			useallclauses = false;
+			break;
+	}
+
+	/*
+	 * If inner_cheapest_total is parameterized by the outer rel, ignore it;
+	 * we will consider it below as a member of cheapest_parameterized_paths,
+	 * but the other possibilities considered in this routine aren't usable.
+	 */
+	if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
+		inner_cheapest_total = NULL;
+
+	/*
+	 * If we need to unique-ify the inner path, we will consider only the
+	 * cheapest-total inner.
+	 */
+	if (save_jointype == JOIN_UNIQUE_INNER)
+	{
+		/* No way to do this with an inner path parameterized by outer rel */
+		if (inner_cheapest_total == NULL)
+			return;
+		inner_cheapest_total = (Path *)
+			create_unique_path(root, innerrel, inner_cheapest_total, sjinfo);
+		Assert(inner_cheapest_total);
+	}
+	else if (nestjoinOK)
+	{
+		/*
+		 * Consider materializing the cheapest inner path, unless
+		 * enable_material is off or the path in question materializes its
+		 * output anyway.
+		 */
+		if (enable_material && inner_cheapest_total != NULL &&
+			!ExecMaterializesOutput(inner_cheapest_total->pathtype))
+			matpath = (Path *)
+				create_material_path(innerrel, inner_cheapest_total);
+	}
+
+	foreach(lc1, outerrel->pathlist)
+	{
+		Path	   *outerpath = (Path *) lfirst(lc1);
+		List	   *merge_pathkeys;
+		List	   *mergeclauses;
+		List	   *innersortkeys;
+		List	   *trialsortkeys;
+		Path	   *cheapest_startup_inner;
+		Path	   *cheapest_total_inner;
+		int			num_sortkeys;
+		int			sortkeycnt;
+
+		/*
+		 * We cannot use an outer path that is parameterized by the inner rel.
+		 */
+		if (PATH_PARAM_BY_REL(outerpath, innerrel))
+			continue;
+
+		/*
+		 * If we need to unique-ify the outer path, it's pointless to consider
+		 * any but the cheapest outer.	(XXX we don't consider parameterized
+		 * outers, nor inners, for unique-ified cases.	Should we?)
+		 */
+		if (save_jointype == JOIN_UNIQUE_OUTER)
+		{
+			if (outerpath != outerrel->cheapest_total_path)
+				continue;
+			outerpath = (Path *) create_unique_path(root, outerrel,
+													outerpath, sjinfo);
+			Assert(outerpath);
+		}
+
+		/*
+		 * The result will have this sort order (even if it is implemented as
+		 * a nestloop, and even if some of the mergeclauses are implemented by
+		 * qpquals rather than as true mergeclauses):
+		 */
+		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
+											 outerpath->pathkeys);
+
+		if (save_jointype == JOIN_UNIQUE_INNER)
+		{
+			/*
+			 * Consider nestloop join, but only with the unique-ified cheapest
+			 * inner path
+			 */
+			try_nestloop_path(root,
+							  joinrel,
+							  jointype,
+							  sjinfo,
+							  semifactors,
+							  param_source_rels,
+							  extra_lateral_rels,
+							  outerpath,
+							  inner_cheapest_total,
+							  restrictlist,
+							  merge_pathkeys);
+		}
+		else if (nestjoinOK)
+		{
+			/*
+			 * Consider nestloop joins using this outer path and various
+			 * available paths for the inner relation.	We consider the
+			 * cheapest-total paths for each available parameterization of the
+			 * inner relation, including the unparameterized case.
+			 */
+			ListCell   *lc2;
+
+			foreach(lc2, innerrel->cheapest_parameterized_paths)
+			{
+				Path	   *innerpath = (Path *) lfirst(lc2);
+
+				try_nestloop_path(root,
+								  joinrel,
+								  jointype,
+								  sjinfo,
+								  semifactors,
+								  param_source_rels,
+								  extra_lateral_rels,
+								  outerpath,
+								  innerpath,
+								  restrictlist,
+								  merge_pathkeys);
+			}
+
+			/* Also consider materialized form of the cheapest inner path */
+			if (matpath != NULL)
+				try_nestloop_path(root,
+								  joinrel,
+								  jointype,
+								  sjinfo,
+								  semifactors,
+								  param_source_rels,
+								  extra_lateral_rels,
+								  outerpath,
+								  matpath,
+								  restrictlist,
+								  merge_pathkeys);
+		}
+
+		/* Can't do anything else if outer path needs to be unique'd */
+		if (save_jointype == JOIN_UNIQUE_OUTER)
+			continue;
+
+		/* Can't do anything else if inner rel is parameterized by outer */
+		if (inner_cheapest_total == NULL)
+			continue;
+
+		/* Look for useful mergeclauses (if any) */
+		mergeclauses = find_mergeclauses_for_pathkeys(root,
+													  outerpath->pathkeys,
+													  true,
+													  mergeclause_list);
+
+		/*
+		 * Done with this outer path if no chance for a mergejoin.
+		 *
+		 * Special corner case: for "x FULL JOIN y ON true", there will be no
+		 * join clauses at all.  Ordinarily we'd generate a clauseless
+		 * nestloop path, but since mergejoin is our only join type that
+		 * supports FULL JOIN without any join clauses, it's necessary to
+		 * generate a clauseless mergejoin path instead.
+		 */
+		if (mergeclauses == NIL)
+		{
+			if (jointype == JOIN_FULL)
+				 /* okay to try for mergejoin */ ;
+			else
+				continue;
+		}
+		if (useallclauses && list_length(mergeclauses) != list_length(mergeclause_list))
+			continue;
+
+		/* Compute the required ordering of the inner path */
+		innersortkeys = make_inner_pathkeys_for_merge(root,
+													  mergeclauses,
+													  outerpath->pathkeys);
+
+		/*
+		 * Generate a mergejoin on the basis of sorting the cheapest inner.
+		 * Since a sort will be needed, only cheapest total cost matters. (But
+		 * try_mergejoin_path will do the right thing if inner_cheapest_total
+		 * is already correctly sorted.)
+		 */
+		try_mergejoin_path(root,
+						   joinrel,
+						   jointype,
+						   sjinfo,
+						   param_source_rels,
+						   extra_lateral_rels,
+						   outerpath,
+						   inner_cheapest_total,
+						   restrictlist,
+						   merge_pathkeys,
+						   mergeclauses,
+						   NIL,
+						   innersortkeys);
+
+		/* Can't do anything else if inner path needs to be unique'd */
+		if (save_jointype == JOIN_UNIQUE_INNER)
+			continue;
+
+		/*
+		 * Look for presorted inner paths that satisfy the innersortkey list
+		 * --- or any truncation thereof, if we are allowed to build a
+		 * mergejoin using a subset of the merge clauses.  Here, we consider
+		 * both cheap startup cost and cheap total cost.
+		 *
+		 * Currently we do not consider parameterized inner paths here. This
+		 * interacts with decisions elsewhere that also discriminate against
+		 * mergejoins with parameterized inputs; see comments in
+		 * src/backend/optimizer/README.
+		 *
+		 * As we shorten the sortkey list, we should consider only paths that
+		 * are strictly cheaper than (in particular, not the same as) any path
+		 * found in an earlier iteration.  Otherwise we'd be intentionally
+		 * using fewer merge keys than a given path allows (treating the rest
+		 * as plain joinquals), which is unlikely to be a good idea.  Also,
+		 * eliminating paths here on the basis of compare_path_costs is a lot
+		 * cheaper than building the mergejoin path only to throw it away.
+		 *
+		 * If inner_cheapest_total is well enough sorted to have not required
+		 * a sort in the path made above, we shouldn't make a duplicate path
+		 * with it, either.  We handle that case with the same logic that
+		 * handles the previous consideration, by initializing the variables
+		 * that track cheapest-so-far properly.  Note that we do NOT reject
+		 * inner_cheapest_total if we find it matches some shorter set of
+		 * pathkeys.  That case corresponds to using fewer mergekeys to avoid
+		 * sorting inner_cheapest_total, whereas we did sort it above, so the
+		 * plans being considered are different.
+		 */
+		if (pathkeys_contained_in(innersortkeys,
+								  inner_cheapest_total->pathkeys))
+		{
+			/* inner_cheapest_total didn't require a sort */
+			cheapest_startup_inner = inner_cheapest_total;
+			cheapest_total_inner = inner_cheapest_total;
+		}
+		else
+		{
+			/* it did require a sort, at least for the full set of keys */
+			cheapest_startup_inner = NULL;
+			cheapest_total_inner = NULL;
+		}
+		num_sortkeys = list_length(innersortkeys);
+		if (num_sortkeys > 1 && !useallclauses)
+			trialsortkeys = list_copy(innersortkeys);	/* need modifiable copy */
+		else
+			trialsortkeys = innersortkeys;		/* won't really truncate */
+
+		for (sortkeycnt = num_sortkeys; sortkeycnt > 0; sortkeycnt--)
+		{
+			Path	   *innerpath;
+			List	   *newclauses = NIL;
+
+			/*
+			 * Look for an inner path ordered well enough for the first
+			 * 'sortkeycnt' innersortkeys.	NB: trialsortkeys list is modified
+			 * destructively, which is why we made a copy...
+			 */
+			trialsortkeys = list_truncate(trialsortkeys, sortkeycnt);
+			innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
+													   trialsortkeys,
+													   NULL,
+													   TOTAL_COST);
+			if (innerpath != NULL &&
+				(cheapest_total_inner == NULL ||
+				 compare_path_costs(innerpath, cheapest_total_inner,
+									TOTAL_COST) < 0))
+			{
+				/* Found a cheap (or even-cheaper) sorted path */
+				/* Select the right mergeclauses, if we didn't already */
+				if (sortkeycnt < num_sortkeys)
+				{
+					newclauses =
+						find_mergeclauses_for_pathkeys(root,
+													   trialsortkeys,
+													   false,
+													   mergeclauses);
+					Assert(newclauses != NIL);
+				}
+				else
+					newclauses = mergeclauses;
+				try_mergejoin_path(root,
+								   joinrel,
+								   jointype,
+								   sjinfo,
+								   param_source_rels,
+								   extra_lateral_rels,
+								   outerpath,
+								   innerpath,
+								   restrictlist,
+								   merge_pathkeys,
+								   newclauses,
+								   NIL,
+								   NIL);
+				cheapest_total_inner = innerpath;
+			}
+			/* Same on the basis of cheapest startup cost ... */
+			innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
+													   trialsortkeys,
+													   NULL,
+													   STARTUP_COST);
+			if (innerpath != NULL &&
+				(cheapest_startup_inner == NULL ||
+				 compare_path_costs(innerpath, cheapest_startup_inner,
+									STARTUP_COST) < 0))
+			{
+				/* Found a cheap (or even-cheaper) sorted path */
+				if (innerpath != cheapest_total_inner)
+				{
+					/*
+					 * Avoid rebuilding clause list if we already made one;
+					 * saves memory in big join trees...
+					 */
+					if (newclauses == NIL)
+					{
+						if (sortkeycnt < num_sortkeys)
+						{
+							newclauses =
+								find_mergeclauses_for_pathkeys(root,
+															   trialsortkeys,
+															   false,
+															   mergeclauses);
+							Assert(newclauses != NIL);
+						}
+						else
+							newclauses = mergeclauses;
+					}
+					try_mergejoin_path(root,
+									   joinrel,
+									   jointype,
+									   sjinfo,
+									   param_source_rels,
+									   extra_lateral_rels,
+									   outerpath,
+									   innerpath,
+									   restrictlist,
+									   merge_pathkeys,
+									   newclauses,
+									   NIL,
+									   NIL);
+				}
+				cheapest_startup_inner = innerpath;
+			}
+
+			/*
+			 * Don't consider truncated sortkeys if we need all clauses.
+			 */
+			if (useallclauses)
+				break;
+		}
+	}
+}
+
+/*
+ * select_mergejoin_clauses
+ *	  Select mergejoin clauses that are usable for a particular join.
+ *	  Returns a list of RestrictInfo nodes for those clauses.
+ *
+ * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
+ * this is a right/full join and there are nonmergejoinable join clauses.
+ * The executor's mergejoin machinery cannot handle such cases, so we have
+ * to avoid generating a mergejoin plan.  (Note that this flag does NOT
+ * consider whether there are actually any mergejoinable clauses.  This is
+ * correct because in some cases we need to build a clauseless mergejoin.
+ * Simply returning NIL is therefore not enough to distinguish safe from
+ * unsafe cases.)
+ *
+ * We also mark each selected RestrictInfo to show which side is currently
+ * being considered as outer.  These are transient markings that are only
+ * good for the duration of the current add_paths_to_joinrel() call!
+ *
+ * We examine each restrictinfo clause known for the join to see
+ * if it is mergejoinable and involves vars from the two sub-relations
+ * currently of interest.
+ */
+List *
+select_mergejoin_clauses(PlannerInfo *root,
+						 RelOptInfo *joinrel,
+						 RelOptInfo *outerrel,
+						 RelOptInfo *innerrel,
+						 List *restrictlist,
+						 JoinType jointype,
+						 bool *mergejoin_allowed)
+{
+	List	   *result_list = NIL;
+	bool		isouterjoin = IS_OUTER_JOIN(jointype);
+	bool		have_nonmergeable_joinclause = false;
+	ListCell   *l;
+
+	foreach(l, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+
+		/*
+		 * If processing an outer join, only use its own join clauses in the
+		 * merge.  For inner joins we can use pushed-down clauses too. (Note:
+		 * we don't set have_nonmergeable_joinclause here because pushed-down
+		 * clauses will become otherquals not joinquals.)
+		 */
+		if (isouterjoin && restrictinfo->is_pushed_down)
+			continue;
+
+		/* Check that clause is a mergeable operator clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+		{
+			/*
+			 * The executor can handle extra joinquals that are constants, but
+			 * not anything else, when doing right/full merge join.  (The
+			 * reason to support constants is so we can do FULL JOIN ON
+			 * FALSE.)
+			 */
+			if (!restrictinfo->clause || !IsA(restrictinfo->clause, Const))
+				have_nonmergeable_joinclause = true;
+			continue;			/* not mergejoinable */
+		}
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer".
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
+		{
+			have_nonmergeable_joinclause = true;
+			continue;			/* no good for these input relations */
+		}
+
+		/*
+		 * Insist that each side have a non-redundant eclass.  This
+		 * restriction is needed because various bits of the planner expect
+		 * that each clause in a merge be associatable with some pathkey in a
+		 * canonical pathkey list, but redundant eclasses can't appear in
+		 * canonical sort orderings.  (XXX it might be worth relaxing this,
+		 * but not enough time to address it for 8.3.)
+		 *
+		 * Note: it would be bad if this condition failed for an otherwise
+		 * mergejoinable FULL JOIN clause, since that would result in
+		 * undesirable planner failure.  I believe that is not possible
+		 * however; a variable involved in a full join could only appear in
+		 * below_outer_join eclasses, which aren't considered redundant.
+		 *
+		 * This case *can* happen for left/right join clauses: the outer-side
+		 * variable could be equated to a constant.  Because we will propagate
+		 * that constant across the join clause, the loss of ability to do a
+		 * mergejoin is not really all that big a deal, and so it's not clear
+		 * that improving this is important.
+		 */
+		update_mergeclause_eclasses(root, restrictinfo);
+
+		if (EC_MUST_BE_REDUNDANT(restrictinfo->left_ec) ||
+			EC_MUST_BE_REDUNDANT(restrictinfo->right_ec))
+		{
+			have_nonmergeable_joinclause = true;
+			continue;			/* can't handle redundant eclasses */
+		}
+
+		result_list = lappend(result_list, restrictinfo);
+	}
+
+	/*
+	 * Report whether mergejoin is allowed (see comment at top of function).
+	 */
+	switch (jointype)
+	{
+		case JOIN_RIGHT:
+		case JOIN_FULL:
+			*mergejoin_allowed = !have_nonmergeable_joinclause;
+			break;
+		default:
+			*mergejoin_allowed = true;
+			break;
+	}
+
+	return result_list;
+}
diff --git a/contrib/custmj/nodeMergejoin.c b/contrib/custmj/nodeMergejoin.c
new file mode 100644
index 0000000..62dd8c0
--- /dev/null
+++ b/contrib/custmj/nodeMergejoin.c
@@ -0,0 +1,1694 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeMergejoin.c
+ *	  routines supporting merge joins
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeMergejoin.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecMergeJoin			mergejoin outer and inner relations.
+ *		ExecInitMergeJoin		creates and initializes run time states
+ *		ExecEndMergeJoin		cleans up the node.
+ *
+ * NOTES
+ *
+ *		Merge-join is done by joining the inner and outer tuples satisfying
+ *		join clauses of the form ((= outerKey innerKey) ...).
+ *		The join clause list is provided by the query planner and may contain
+ *		more than one (= outerKey innerKey) clause (for composite sort key).
+ *
+ *		However, the query executor needs to know whether an outer
+ *		tuple is "greater/smaller" than an inner tuple so that it can
+ *		"synchronize" the two relations. For example, consider the following
+ *		relations:
+ *
+ *				outer: (0 ^1 1 2 5 5 5 6 6 7)	current tuple: 1
+ *				inner: (1 ^3 5 5 5 5 6)			current tuple: 3
+ *
+ *		To continue the merge-join, the executor needs to scan both inner
+ *		and outer relations till the matching tuples 5. It needs to know
+ *		that currently inner tuple 3 is "greater" than outer tuple 1 and
+ *		therefore it should scan the outer relation first to find a
+ *		matching tuple and so on.
+ *
+ *		Therefore, rather than directly executing the merge join clauses,
+ *		we evaluate the left and right key expressions separately and then
+ *		compare the columns one at a time (see MJCompare).	The planner
+ *		passes us enough information about the sort ordering of the inputs
+ *		to allow us to determine how to make the comparison.  We may use the
+ *		appropriate btree comparison function, since Postgres' only notion
+ *		of ordering is specified by btree opfamilies.
+ *
+ *
+ *		Consider the above relations and suppose that the executor has
+ *		just joined the first outer "5" with the last inner "5". The
+ *		next step is of course to join the second outer "5" with all
+ *		the inner "5's". This requires repositioning the inner "cursor"
+ *		to point at the first inner "5". This is done by "marking" the
+ *		first inner 5 so we can restore the "cursor" to it before joining
+ *		with the second outer 5. The access method interface provides
+ *		routines to mark and restore to a tuple.
+ *
+ *
+ *		Essential operation of the merge join algorithm is as follows:
+ *
+ *		Join {
+ *			get initial outer and inner tuples				INITIALIZE
+ *			do forever {
+ *				while (outer != inner) {					SKIP_TEST
+ *					if (outer < inner)
+ *						advance outer						SKIPOUTER_ADVANCE
+ *					else
+ *						advance inner						SKIPINNER_ADVANCE
+ *				}
+ *				mark inner position							SKIP_TEST
+ *				do forever {
+ *					while (outer == inner) {
+ *						join tuples							JOINTUPLES
+ *						advance inner position				NEXTINNER
+ *					}
+ *					advance outer position					NEXTOUTER
+ *					if (outer == mark)						TESTOUTER
+ *						restore inner position to mark		TESTOUTER
+ *					else
+ *						break	// return to top of outer loop
+ *				}
+ *			}
+ *		}
+ *
+ *		The merge join operation is coded in the fashion
+ *		of a state machine.  At each state, we do something and then
+ *		proceed to another state.  This state is stored in the node's
+ *		execution state information and is preserved across calls to
+ *		ExecMergeJoin. -cim 10/31/89
+ */
+#include "postgres.h"
+
+#include "access/nbtree.h"
+#include "executor/execdebug.h"
+/* #include "executor/nodeMergejoin.h" */
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "custmj.h"
+
+/*
+ * States of the ExecMergeJoin state machine
+ */
+#define EXEC_MJ_INITIALIZE_OUTER		1
+#define EXEC_MJ_INITIALIZE_INNER		2
+#define EXEC_MJ_JOINTUPLES				3
+#define EXEC_MJ_NEXTOUTER				4
+#define EXEC_MJ_TESTOUTER				5
+#define EXEC_MJ_NEXTINNER				6
+#define EXEC_MJ_SKIP_TEST				7
+#define EXEC_MJ_SKIPOUTER_ADVANCE		8
+#define EXEC_MJ_SKIPINNER_ADVANCE		9
+#define EXEC_MJ_ENDOUTER				10
+#define EXEC_MJ_ENDINNER				11
+
+/*
+ * Runtime data for each mergejoin clause
+ */
+typedef struct MergeJoinClauseData
+{
+	/* Executable expression trees */
+	ExprState  *lexpr;			/* left-hand (outer) input expression */
+	ExprState  *rexpr;			/* right-hand (inner) input expression */
+
+	/*
+	 * If we have a current left or right input tuple, the values of the
+	 * expressions are loaded into these fields:
+	 */
+	Datum		ldatum;			/* current left-hand value */
+	Datum		rdatum;			/* current right-hand value */
+	bool		lisnull;		/* and their isnull flags */
+	bool		risnull;
+
+	/*
+	 * Everything we need to know to compare the left and right values is
+	 * stored here.
+	 */
+	SortSupportData ssup;
+}	MergeJoinClauseData;
+
+/* Result type for MJEvalOuterValues and MJEvalInnerValues */
+typedef enum
+{
+	MJEVAL_MATCHABLE,			/* normal, potentially matchable tuple */
+	MJEVAL_NONMATCHABLE,		/* tuple cannot join because it has a null */
+	MJEVAL_ENDOFJOIN			/* end of input (physical or effective) */
+} MJEvalResult;
+
+
+#define MarkInnerTuple(innerTupleSlot, mergestate) \
+	ExecCopySlot((mergestate)->mj_MarkedTupleSlot, (innerTupleSlot))
+
+
+/*
+ * MJExamineQuals
+ *
+ * This deconstructs the list of mergejoinable expressions, which is given
+ * to us by the planner in the form of a list of "leftexpr = rightexpr"
+ * expression trees in the order matching the sort columns of the inputs.
+ * We build an array of MergeJoinClause structs containing the information
+ * we will need at runtime.  Each struct essentially tells us how to compare
+ * the two expressions from the original clause.
+ *
+ * In addition to the expressions themselves, the planner passes the btree
+ * opfamily OID, collation OID, btree strategy number (BTLessStrategyNumber or
+ * BTGreaterStrategyNumber), and nulls-first flag that identify the intended
+ * sort ordering for each merge key.  The mergejoinable operator is an
+ * equality operator in the opfamily, and the two inputs are guaranteed to be
+ * ordered in either increasing or decreasing (respectively) order according
+ * to the opfamily and collation, with nulls at the indicated end of the range.
+ * This allows us to obtain the needed comparison function from the opfamily.
+ */
+static MergeJoinClause
+MJExamineQuals(List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   PlanState *parent)
+{
+	MergeJoinClause clauses;
+	int			nClauses = list_length(mergeclauses);
+	int			iClause;
+	ListCell   *cl;
+
+	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+
+	iClause = 0;
+	foreach(cl, mergeclauses)
+	{
+		OpExpr	   *qual = (OpExpr *) lfirst(cl);
+		MergeJoinClause clause = &clauses[iClause];
+		Oid			opfamily = mergefamilies[iClause];
+		Oid			collation = mergecollations[iClause];
+		StrategyNumber opstrategy = mergestrategies[iClause];
+		bool		nulls_first = mergenullsfirst[iClause];
+		int			op_strategy;
+		Oid			op_lefttype;
+		Oid			op_righttype;
+		Oid			sortfunc;
+
+		if (!IsA(qual, OpExpr))
+			elog(ERROR, "mergejoin clause is not an OpExpr");
+
+		/*
+		 * Prepare the input expressions for execution.
+		 */
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+
+		/* Set up sort support data */
+		clause->ssup.ssup_cxt = CurrentMemoryContext;
+		clause->ssup.ssup_collation = collation;
+		if (opstrategy == BTLessStrategyNumber)
+			clause->ssup.ssup_reverse = false;
+		else if (opstrategy == BTGreaterStrategyNumber)
+			clause->ssup.ssup_reverse = true;
+		else	/* planner screwed up */
+			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+		clause->ssup.ssup_nulls_first = nulls_first;
+
+		/* Extract the operator's declared left/right datatypes */
+		get_op_opfamily_properties(qual->opno, opfamily, false,
+								   &op_strategy,
+								   &op_lefttype,
+								   &op_righttype);
+		if (op_strategy != BTEqualStrategyNumber)		/* should not happen */
+			elog(ERROR, "cannot merge using non-equality operator %u",
+				 qual->opno);
+
+		/* And get the matching support or comparison function */
+		sortfunc = get_opfamily_proc(opfamily,
+									 op_lefttype,
+									 op_righttype,
+									 BTSORTSUPPORT_PROC);
+		if (OidIsValid(sortfunc))
+		{
+			/* The sort support function should provide a comparator */
+			OidFunctionCall1(sortfunc, PointerGetDatum(&clause->ssup));
+			Assert(clause->ssup.comparator != NULL);
+		}
+		else
+		{
+			/* opfamily doesn't provide sort support, get comparison func */
+			sortfunc = get_opfamily_proc(opfamily,
+										 op_lefttype,
+										 op_righttype,
+										 BTORDER_PROC);
+			if (!OidIsValid(sortfunc))	/* should not happen */
+				elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+					 BTORDER_PROC, op_lefttype, op_righttype, opfamily);
+			/* We'll use a shim to call the old-style btree comparator */
+			PrepareSortSupportComparisonShim(sortfunc, &clause->ssup);
+		}
+
+		iClause++;
+	}
+
+	return clauses;
+}
+
+/*
+ * MJEvalOuterValues
+ *
+ * Compute the values of the mergejoined expressions for the current
+ * outer tuple.  We also detect whether it's impossible for the current
+ * outer tuple to match anything --- this is true if it yields a NULL
+ * input, since we assume mergejoin operators are strict.  If the NULL
+ * is in the first join column, and that column sorts nulls last, then
+ * we can further conclude that no following tuple can match anything
+ * either, since they must all have nulls in the first column.	However,
+ * that case is only interesting if we're not in FillOuter mode, else
+ * we have to visit all the tuples anyway.
+ *
+ * For the convenience of callers, we also make this routine responsible
+ * for testing for end-of-input (null outer tuple), and returning
+ * MJEVAL_ENDOFJOIN when that's seen.  This allows the same code to be used
+ * for both real end-of-input and the effective end-of-input represented by
+ * a first-column NULL.
+ *
+ * We evaluate the values in OuterEContext, which can be reset each
+ * time we move to a new tuple.
+ */
+static MJEvalResult
+MJEvalOuterValues(CustomMergeJoinState *mergestate)
+{
+	ExprContext *econtext = mergestate->mj_OuterEContext;
+	MJEvalResult result = MJEVAL_MATCHABLE;
+	int			i;
+	MemoryContext oldContext;
+
+	/* Check for end of outer subplan */
+	if (TupIsNull(mergestate->mj_OuterTupleSlot))
+		return MJEVAL_ENDOFJOIN;
+
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	econtext->ecxt_outertuple = mergestate->mj_OuterTupleSlot;
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		clause->ldatum = ExecEvalExpr(clause->lexpr, econtext,
+									  &clause->lisnull, NULL);
+		if (clause->lisnull)
+		{
+			/* match is impossible; can we end the join early? */
+			if (i == 0 && !clause->ssup.ssup_nulls_first &&
+				!mergestate->mj_FillOuter)
+				result = MJEVAL_ENDOFJOIN;
+			else if (result == MJEVAL_MATCHABLE)
+				result = MJEVAL_NONMATCHABLE;
+		}
+	}
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+/*
+ * MJEvalInnerValues
+ *
+ * Same as above, but for the inner tuple.	Here, we have to be prepared
+ * to load data from either the true current inner, or the marked inner,
+ * so caller must tell us which slot to load from.
+ */
+static MJEvalResult
+MJEvalInnerValues(CustomMergeJoinState *mergestate, TupleTableSlot *innerslot)
+{
+	ExprContext *econtext = mergestate->mj_InnerEContext;
+	MJEvalResult result = MJEVAL_MATCHABLE;
+	int			i;
+	MemoryContext oldContext;
+
+	/* Check for end of inner subplan */
+	if (TupIsNull(innerslot))
+		return MJEVAL_ENDOFJOIN;
+
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	econtext->ecxt_innertuple = innerslot;
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		clause->rdatum = ExecEvalExpr(clause->rexpr, econtext,
+									  &clause->risnull, NULL);
+		if (clause->risnull)
+		{
+			/* match is impossible; can we end the join early? */
+			if (i == 0 && !clause->ssup.ssup_nulls_first &&
+				!mergestate->mj_FillInner)
+				result = MJEVAL_ENDOFJOIN;
+			else if (result == MJEVAL_MATCHABLE)
+				result = MJEVAL_NONMATCHABLE;
+		}
+	}
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+/*
+ * MJCompare
+ *
+ * Compare the mergejoinable values of the current two input tuples
+ * and return 0 if they are equal (ie, the mergejoin equalities all
+ * succeed), >0 if outer > inner, <0 if outer < inner.
+ *
+ * MJEvalOuterValues and MJEvalInnerValues must already have been called
+ * for the current outer and inner tuples, respectively.
+ */
+static int
+MJCompare(CustomMergeJoinState *mergestate)
+{
+	int			result = 0;
+	bool		nulleqnull = false;
+	ExprContext *econtext = mergestate->cps.ps.ps_ExprContext;
+	int			i;
+	MemoryContext oldContext;
+
+	/*
+	 * Call the comparison functions in short-lived context, in case they leak
+	 * memory.
+	 */
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		/*
+		 * Special case for NULL-vs-NULL, else use standard comparison.
+		 */
+		if (clause->lisnull && clause->risnull)
+		{
+			nulleqnull = true;	/* NULL "=" NULL */
+			continue;
+		}
+
+		result = ApplySortComparator(clause->ldatum, clause->lisnull,
+									 clause->rdatum, clause->risnull,
+									 &clause->ssup);
+
+		if (result != 0)
+			break;
+	}
+
+	/*
+	 * If we had any NULL-vs-NULL inputs, we do not want to report that the
+	 * tuples are equal.  Instead, if result is still 0, change it to +1. This
+	 * will result in advancing the inner side of the join.
+	 *
+	 * Likewise, if there was a constant-false joinqual, do not report
+	 * equality.  We have to check this as part of the mergequals, else the
+	 * rescan logic will do the wrong thing.
+	 */
+	if (result == 0 &&
+		(nulleqnull || mergestate->mj_ConstFalseJoin))
+		result = 1;
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+
+/*
+ * Generate a fake join tuple with nulls for the inner tuple,
+ * and return it if it passes the non-join quals.
+ */
+static TupleTableSlot *
+MJFillOuter(CustomMergeJoinState *node)
+{
+	ExprContext *econtext = node->cps.ps.ps_ExprContext;
+	List	   *otherqual = node->cps.ps.qual;
+
+	ResetExprContext(econtext);
+
+	econtext->ecxt_outertuple = node->mj_OuterTupleSlot;
+	econtext->ecxt_innertuple = node->mj_NullInnerTupleSlot;
+
+	if (ExecQual(otherqual, econtext, false))
+	{
+		/*
+		 * qualification succeeded.  now form the desired projection tuple and
+		 * return the slot containing it.
+		 */
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		MJ_printf("ExecMergeJoin: returning outer fill tuple\n");
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+
+		if (isDone != ExprEndResult)
+		{
+			node->cps.ps.ps_TupFromTlist =
+				(isDone == ExprMultipleResult);
+			return result;
+		}
+	}
+	else
+		InstrCountFiltered2(node, 1);
+
+	return NULL;
+}
+
+/*
+ * Generate a fake join tuple with nulls for the outer tuple,
+ * and return it if it passes the non-join quals.
+ */
+static TupleTableSlot *
+MJFillInner(CustomMergeJoinState *node)
+{
+	ExprContext *econtext = node->cps.ps.ps_ExprContext;
+	List	   *otherqual = node->cps.ps.qual;
+
+	ResetExprContext(econtext);
+
+	econtext->ecxt_outertuple = node->mj_NullOuterTupleSlot;
+	econtext->ecxt_innertuple = node->mj_InnerTupleSlot;
+
+	if (ExecQual(otherqual, econtext, false))
+	{
+		/*
+		 * qualification succeeded.  now form the desired projection tuple and
+		 * return the slot containing it.
+		 */
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		MJ_printf("ExecMergeJoin: returning inner fill tuple\n");
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+
+		if (isDone != ExprEndResult)
+		{
+			node->cps.ps.ps_TupFromTlist =
+				(isDone == ExprMultipleResult);
+			return result;
+		}
+	}
+	else
+		InstrCountFiltered2(node, 1);
+
+	return NULL;
+}
+
+
+/*
+ * Check that a qual condition is constant true or constant false.
+ * If it is constant false (or null), set *is_const_false to TRUE.
+ *
+ * Constant true would normally be represented by a NIL list, but we allow an
+ * actual bool Const as well.  We do expect that the planner will have thrown
+ * away any non-constant terms that have been ANDed with a constant false.
+ */
+static bool
+check_constant_qual(List *qual, bool *is_const_false)
+{
+	ListCell   *lc;
+
+	foreach(lc, qual)
+	{
+		Const	   *con = (Const *) lfirst(lc);
+
+		if (!con || !IsA(con, Const))
+			return false;
+		if (con->constisnull || !DatumGetBool(con->constvalue))
+			*is_const_false = true;
+	}
+	return true;
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecMergeTupleDump
+ *
+ *		This function is called through the MJ_dump() macro
+ *		when EXEC_MERGEJOINDEBUG is defined
+ * ----------------------------------------------------------------
+ */
+#ifdef EXEC_MERGEJOINDEBUG
+
+static void
+ExecMergeTupleDumpOuter(MergeJoinState *mergestate)
+{
+	TupleTableSlot *outerSlot = mergestate->mj_OuterTupleSlot;
+
+	printf("==== outer tuple ====\n");
+	if (TupIsNull(outerSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(outerSlot);
+}
+
+static void
+ExecMergeTupleDumpInner(MergeJoinState *mergestate)
+{
+	TupleTableSlot *innerSlot = mergestate->mj_InnerTupleSlot;
+
+	printf("==== inner tuple ====\n");
+	if (TupIsNull(innerSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(innerSlot);
+}
+
+static void
+ExecMergeTupleDumpMarked(MergeJoinState *mergestate)
+{
+	TupleTableSlot *markedSlot = mergestate->mj_MarkedTupleSlot;
+
+	printf("==== marked tuple ====\n");
+	if (TupIsNull(markedSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(markedSlot);
+}
+
+static void
+ExecMergeTupleDump(MergeJoinState *mergestate)
+{
+	printf("******** ExecMergeTupleDump ********\n");
+
+	ExecMergeTupleDumpOuter(mergestate);
+	ExecMergeTupleDumpInner(mergestate);
+	ExecMergeTupleDumpMarked(mergestate);
+
+	printf("******** \n");
+}
+#endif
+
+/* ----------------------------------------------------------------
+ *		ExecMergeJoin
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+_ExecMergeJoin(CustomMergeJoinState *node)
+{
+	List	   *joinqual;
+	List	   *otherqual;
+	bool		qualResult;
+	int			compareResult;
+	PlanState  *innerPlan;
+	TupleTableSlot *innerTupleSlot;
+	PlanState  *outerPlan;
+	TupleTableSlot *outerTupleSlot;
+	ExprContext *econtext;
+	bool		doFillOuter;
+	bool		doFillInner;
+
+	/*
+	 * get information from node
+	 */
+	innerPlan = innerPlanState(node);
+	outerPlan = outerPlanState(node);
+	econtext = node->cps.ps.ps_ExprContext;
+	joinqual = node->joinqual;
+	otherqual = node->cps.ps.qual;
+	doFillOuter = node->mj_FillOuter;
+	doFillInner = node->mj_FillInner;
+
+	/*
+	 * Check to see if we're still projecting out tuples from a previous join
+	 * tuple (because there is a function-returning-set in the projection
+	 * expressions).  If so, try to project another one.
+	 */
+	if (node->cps.ps.ps_TupFromTlist)
+	{
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+		if (isDone == ExprMultipleResult)
+			return result;
+		/* Done with that source tuple... */
+		node->cps.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.  Note this can't happen
+	 * until we're done projecting out tuples from a join tuple.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * ok, everything is setup.. let's go to work
+	 */
+	for (;;)
+	{
+		MJ_dump(node);
+
+		/*
+		 * get the current state of the join and do things accordingly.
+		 */
+		switch (node->mj_JoinState)
+		{
+				/*
+				 * EXEC_MJ_INITIALIZE_OUTER means that this is the first time
+				 * ExecMergeJoin() has been called and so we have to fetch the
+				 * first matchable tuple for both outer and inner subplans. We
+				 * do the outer side in INITIALIZE_OUTER state, then advance
+				 * to INITIALIZE_INNER state for the inner subplan.
+				 */
+			case EXEC_MJ_INITIALIZE_OUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_INITIALIZE_OUTER\n");
+
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* OK to go get the first inner tuple */
+						node->mj_JoinState = EXEC_MJ_INITIALIZE_INNER;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Stay in same state to fetch next outer tuple */
+						if (doFillOuter)
+						{
+							/*
+							 * Generate a fake join tuple with nulls for the
+							 * inner tuple, and return it if it passes the
+							 * non-join quals.
+							 */
+							TupleTableSlot *result;
+
+							result = MJFillOuter(node);
+							if (result)
+								return result;
+						}
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: nothing in outer subplan\n");
+						if (doFillInner)
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples. We set MatchedInner = true to
+							 * force the ENDOUTER state to advance inner.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							node->mj_MatchedInner = true;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+			case EXEC_MJ_INITIALIZE_INNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_INITIALIZE_INNER\n");
+
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+
+						/*
+						 * OK, we have the initial tuples.	Begin by skipping
+						 * non-matching tuples.
+						 */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Mark before advancing, if wanted */
+						if (node->mj_ExtraMarks)
+							ExecMarkPos(innerPlan);
+						/* Stay in same state to fetch next inner tuple */
+						if (doFillInner)
+						{
+							/*
+							 * Generate a fake join tuple with nulls for the
+							 * outer tuple, and return it if it passes the
+							 * non-join quals.
+							 */
+							TupleTableSlot *result;
+
+							result = MJFillInner(node);
+							if (result)
+								return result;
+						}
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more inner tuples */
+						MJ_printf("ExecMergeJoin: nothing in inner subplan\n");
+						if (doFillOuter)
+						{
+							/*
+							 * Need to emit left-join tuples for all outer
+							 * tuples, including the one we just fetched.  We
+							 * set MatchedOuter = false to force the ENDINNER
+							 * state to emit first tuple before advancing
+							 * outer.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDINNER;
+							node->mj_MatchedOuter = false;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * EXEC_MJ_JOINTUPLES means we have two tuples which satisfied
+				 * the merge clause so we join them and then proceed to get
+				 * the next inner tuple (EXEC_MJ_NEXTINNER).
+				 */
+			case EXEC_MJ_JOINTUPLES:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_JOINTUPLES\n");
+
+				/*
+				 * Set the next state machine state.  The right things will
+				 * happen whether we return this join tuple or just fall
+				 * through to continue the state machine execution.
+				 */
+				node->mj_JoinState = EXEC_MJ_NEXTINNER;
+
+				/*
+				 * Check the extra qual conditions to see if we actually want
+				 * to return this join tuple.  If not, can proceed with merge.
+				 * We must distinguish the additional joinquals (which must
+				 * pass to consider the tuples "matched" for outer-join logic)
+				 * from the otherquals (which must pass before we actually
+				 * return the tuple).
+				 *
+				 * We don't bother with a ResetExprContext here, on the
+				 * assumption that we just did one while checking the merge
+				 * qual.  One per tuple should be sufficient.  We do have to
+				 * set up the econtext links to the tuples for ExecQual to
+				 * use.
+				 */
+				outerTupleSlot = node->mj_OuterTupleSlot;
+				econtext->ecxt_outertuple = outerTupleSlot;
+				innerTupleSlot = node->mj_InnerTupleSlot;
+				econtext->ecxt_innertuple = innerTupleSlot;
+
+				qualResult = (joinqual == NIL ||
+							  ExecQual(joinqual, econtext, false));
+				MJ_DEBUG_QUAL(joinqual, qualResult);
+
+				if (qualResult)
+				{
+					node->mj_MatchedOuter = true;
+					node->mj_MatchedInner = true;
+
+					/* In an antijoin, we never return a matched tuple */
+					if (node->jointype == JOIN_ANTI)
+					{
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					}
+
+					/*
+					 * In a semijoin, we'll consider returning the first
+					 * match, but after that we're done with this outer tuple.
+					 */
+					if (node->jointype == JOIN_SEMI)
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+
+					qualResult = (otherqual == NIL ||
+								  ExecQual(otherqual, econtext, false));
+					MJ_DEBUG_QUAL(otherqual, qualResult);
+
+					if (qualResult)
+					{
+						/*
+						 * qualification succeeded.  now form the desired
+						 * projection tuple and return the slot containing it.
+						 */
+						TupleTableSlot *result;
+						ExprDoneCond isDone;
+
+						MJ_printf("ExecMergeJoin: returning tuple\n");
+
+						result = ExecProject(node->cps.ps.ps_ProjInfo,
+											 &isDone);
+
+						if (isDone != ExprEndResult)
+						{
+							node->cps.ps.ps_TupFromTlist =
+								(isDone == ExprMultipleResult);
+							return result;
+						}
+					}
+					else
+						InstrCountFiltered2(node, 1);
+				}
+				else
+					InstrCountFiltered1(node, 1);
+				break;
+
+				/*
+				 * EXEC_MJ_NEXTINNER means advance the inner scan to the next
+				 * tuple. If the tuple is not nil, we then proceed to test it
+				 * against the join qualification.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this inner tuple.
+				 */
+			case EXEC_MJ_NEXTINNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_NEXTINNER\n");
+
+				if (doFillInner && !node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next inner tuple, if any.  If there's none,
+				 * advance to next outer tuple (which may be able to join to
+				 * previously marked tuples).
+				 *
+				 * NB: must NOT do "extraMarks" here, since we may need to
+				 * return to previously marked tuples.
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+
+						/*
+						 * Test the new inner tuple to see if it matches
+						 * outer.
+						 *
+						 * If they do match, then we join them and move on to
+						 * the next inner tuple (EXEC_MJ_JOINTUPLES).
+						 *
+						 * If they do not match then advance to next outer
+						 * tuple.
+						 */
+						compareResult = MJCompare(node);
+						MJ_DEBUG_COMPARE(compareResult);
+
+						if (compareResult == 0)
+							node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+						else
+						{
+							Assert(compareResult < 0);
+							node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						}
+						break;
+					case MJEVAL_NONMATCHABLE:
+
+						/*
+						 * It contains a NULL and hence can't match any outer
+						 * tuple, so we can skip the comparison and assume the
+						 * new tuple is greater than current outer.
+						 */
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					case MJEVAL_ENDOFJOIN:
+
+						/*
+						 * No more inner tuples.  However, this might be only
+						 * effective and not physical end of inner plan, so
+						 * force mj_InnerTupleSlot to null to make sure we
+						 * don't fetch more inner tuples.  (We need this hack
+						 * because we are not transiting to a state where the
+						 * inner plan is assumed to be exhausted.)
+						 */
+						node->mj_InnerTupleSlot = NULL;
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+				}
+				break;
+
+				/*-------------------------------------------
+				 * EXEC_MJ_NEXTOUTER means
+				 *
+				 *				outer inner
+				 * outer tuple -  5		5  - marked tuple
+				 *				  5		5
+				 *				  6		6  - inner tuple
+				 *				  7		7
+				 *
+				 * we know we just bumped into the
+				 * first inner tuple > current outer tuple (or possibly
+				 * the end of the inner stream)
+				 * so get a new outer tuple and then
+				 * proceed to test it against the marked tuple
+				 * (EXEC_MJ_TESTOUTER)
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this outer tuple.
+				 *------------------------------------------------
+				 */
+			case EXEC_MJ_NEXTOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_NEXTOUTER\n");
+
+				if (doFillOuter && !node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* Go test the new tuple against the marked tuple */
+						node->mj_JoinState = EXEC_MJ_TESTOUTER;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Can't match, so fetch next outer tuple */
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: end of outer subplan\n");
+						innerTupleSlot = node->mj_InnerTupleSlot;
+						if (doFillInner && !TupIsNull(innerTupleSlot))
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*--------------------------------------------------------
+				 * EXEC_MJ_TESTOUTER If the new outer tuple and the marked
+				 * tuple satisfy the merge clause then we know we have
+				 * duplicates in the outer scan so we have to restore the
+				 * inner scan to the marked tuple and proceed to join the
+				 * new outer tuple with the inner tuples.
+				 *
+				 * This is the case when
+				 *						  outer inner
+				 *							4	  5  - marked tuple
+				 *			 outer tuple -	5	  5
+				 *		 new outer tuple -	5	  5
+				 *							6	  8  - inner tuple
+				 *							7	 12
+				 *
+				 *				new outer tuple == marked tuple
+				 *
+				 * If the outer tuple fails the test, then we are done
+				 * with the marked tuples, and we have to look for a
+				 * match to the current inner tuple.  So we will
+				 * proceed to skip outer tuples until outer >= inner
+				 * (EXEC_MJ_SKIP_TEST).
+				 *
+				 *		This is the case when
+				 *
+				 *						  outer inner
+				 *							5	  5  - marked tuple
+				 *			 outer tuple -	5	  5
+				 *		 new outer tuple -	6	  8  - inner tuple
+				 *							7	 12
+				 *
+				 *				new outer tuple > marked tuple
+				 *
+				 *---------------------------------------------------------
+				 */
+			case EXEC_MJ_TESTOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_TESTOUTER\n");
+
+				/*
+				 * Here we must compare the outer tuple with the marked inner
+				 * tuple.  (We can ignore the result of MJEvalInnerValues,
+				 * since the marked inner tuple is certainly matchable.)
+				 */
+				innerTupleSlot = node->mj_MarkedTupleSlot;
+				(void) MJEvalInnerValues(node, innerTupleSlot);
+
+				compareResult = MJCompare(node);
+				MJ_DEBUG_COMPARE(compareResult);
+
+				if (compareResult == 0)
+				{
+					/*
+					 * the merge clause matched so now we restore the inner
+					 * scan position to the first mark, and go join that tuple
+					 * (and any following ones) to the new outer.
+					 *
+					 * NOTE: we do not need to worry about the MatchedInner
+					 * state for the rescanned inner tuples.  We know all of
+					 * them will match this new outer tuple and therefore
+					 * won't be emitted as fill tuples.  This works *only*
+					 * because we require the extra joinquals to be constant
+					 * when doing a right or full join --- otherwise some of
+					 * the rescanned tuples might fail the extra joinquals.
+					 * This obviously won't happen for a constant-true extra
+					 * joinqual, while the constant-false case is handled by
+					 * forcing the merge clause to never match, so we never
+					 * get here.
+					 */
+					ExecRestrPos(innerPlan);
+
+					/*
+					 * ExecRestrPos probably should give us back a new Slot,
+					 * but since it doesn't, use the marked slot.  (The
+					 * previously returned mj_InnerTupleSlot cannot be assumed
+					 * to hold the required tuple.)
+					 */
+					node->mj_InnerTupleSlot = innerTupleSlot;
+					/* we need not do MJEvalInnerValues again */
+
+					node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+				}
+				else
+				{
+					/* ----------------
+					 *	if the new outer tuple didn't match the marked inner
+					 *	tuple then we have a case like:
+					 *
+					 *			 outer inner
+					 *			   4	 4	- marked tuple
+					 * new outer - 5	 4
+					 *			   6	 5	- inner tuple
+					 *			   7
+					 *
+					 *	which means that all subsequent outer tuples will be
+					 *	larger than our marked inner tuples.  So we need not
+					 *	revisit any of the marked tuples but can proceed to
+					 *	look for a match to the current inner.	If there's
+					 *	no more inners, no more matches are possible.
+					 * ----------------
+					 */
+					Assert(compareResult > 0);
+					innerTupleSlot = node->mj_InnerTupleSlot;
+
+					/* reload comparison data for current inner */
+					switch (MJEvalInnerValues(node, innerTupleSlot))
+					{
+						case MJEVAL_MATCHABLE:
+							/* proceed to compare it to the current outer */
+							node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+							break;
+						case MJEVAL_NONMATCHABLE:
+
+							/*
+							 * current inner can't possibly match any outer;
+							 * better to advance the inner scan than the
+							 * outer.
+							 */
+							node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+							break;
+						case MJEVAL_ENDOFJOIN:
+							/* No more inner tuples */
+							if (doFillOuter)
+							{
+								/*
+								 * Need to emit left-join tuples for remaining
+								 * outer tuples.
+								 */
+								node->mj_JoinState = EXEC_MJ_ENDINNER;
+								break;
+							}
+							/* Otherwise we're done. */
+							return NULL;
+					}
+				}
+				break;
+
+				/*----------------------------------------------------------
+				 * EXEC_MJ_SKIP means compare tuples and if they do not
+				 * match, skip whichever is lesser.
+				 *
+				 * For example:
+				 *
+				 *				outer inner
+				 *				  5		5
+				 *				  5		5
+				 * outer tuple -  6		8  - inner tuple
+				 *				  7    12
+				 *				  8    14
+				 *
+				 * we have to advance the outer scan
+				 * until we find the outer 8.
+				 *
+				 * On the other hand:
+				 *
+				 *				outer inner
+				 *				  5		5
+				 *				  5		5
+				 * outer tuple - 12		8  - inner tuple
+				 *				 14    10
+				 *				 17    12
+				 *
+				 * we have to advance the inner scan
+				 * until we find the inner 12.
+				 *----------------------------------------------------------
+				 */
+			case EXEC_MJ_SKIP_TEST:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIP_TEST\n");
+
+				/*
+				 * before we advance, make sure the current tuples do not
+				 * satisfy the mergeclauses.  If they do, then we update the
+				 * marked tuple position and go join them.
+				 */
+				compareResult = MJCompare(node);
+				MJ_DEBUG_COMPARE(compareResult);
+
+				if (compareResult == 0)
+				{
+					ExecMarkPos(innerPlan);
+
+					MarkInnerTuple(node->mj_InnerTupleSlot, node);
+
+					node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+				}
+				else if (compareResult < 0)
+					node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
+				else
+					/* compareResult > 0 */
+					node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+				break;
+
+				/*
+				 * SKIPOUTER_ADVANCE: advance over an outer tuple that is
+				 * known not to join to any inner tuple.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this outer tuple.
+				 */
+			case EXEC_MJ_SKIPOUTER_ADVANCE:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIPOUTER_ADVANCE\n");
+
+				if (doFillOuter && !node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* Go test the new tuple against the current inner */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Can't match, so fetch next outer tuple */
+						node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: end of outer subplan\n");
+						innerTupleSlot = node->mj_InnerTupleSlot;
+						if (doFillInner && !TupIsNull(innerTupleSlot))
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * SKIPINNER_ADVANCE: advance over an inner tuple that is
+				 * known not to join to any outer tuple.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this inner tuple.
+				 */
+			case EXEC_MJ_SKIPINNER_ADVANCE:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIPINNER_ADVANCE\n");
+
+				if (doFillInner && !node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/* Mark before advancing, if wanted */
+				if (node->mj_ExtraMarks)
+					ExecMarkPos(innerPlan);
+
+				/*
+				 * now we get the next inner tuple, if any
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+						/* proceed to compare it to the current outer */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+
+						/*
+						 * current inner can't possibly match any outer;
+						 * better to advance the inner scan than the outer.
+						 */
+						node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more inner tuples */
+						MJ_printf("ExecMergeJoin: end of inner subplan\n");
+						outerTupleSlot = node->mj_OuterTupleSlot;
+						if (doFillOuter && !TupIsNull(outerTupleSlot))
+						{
+							/*
+							 * Need to emit left-join tuples for remaining
+							 * outer tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDINNER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * EXEC_MJ_ENDOUTER means we have run out of outer tuples, but
+				 * are doing a right/full join and therefore must null-fill
+				 * any remaining unmatched inner tuples.
+				 */
+			case EXEC_MJ_ENDOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_ENDOUTER\n");
+
+				Assert(doFillInner);
+
+				if (!node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/* Mark before advancing, if wanted */
+				if (node->mj_ExtraMarks)
+					ExecMarkPos(innerPlan);
+
+				/*
+				 * now we get the next inner tuple, if any
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				if (TupIsNull(innerTupleSlot))
+				{
+					MJ_printf("ExecMergeJoin: end of inner subplan\n");
+					return NULL;
+				}
+
+				/* Else remain in ENDOUTER state and process next tuple. */
+				break;
+
+				/*
+				 * EXEC_MJ_ENDINNER means we have run out of inner tuples, but
+				 * are doing a left/full join and therefore must null- fill
+				 * any remaining unmatched outer tuples.
+				 */
+			case EXEC_MJ_ENDINNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_ENDINNER\n");
+
+				Assert(doFillOuter);
+
+				if (!node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				if (TupIsNull(outerTupleSlot))
+				{
+					MJ_printf("ExecMergeJoin: end of outer subplan\n");
+					return NULL;
+				}
+
+				/* Else remain in ENDINNER state and process next tuple. */
+				break;
+
+				/*
+				 * broken state value?
+				 */
+			default:
+				elog(ERROR, "unrecognized mergejoin state: %d",
+					 (int) node->mj_JoinState);
+		}
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitMergeJoin
+ * ----------------------------------------------------------------
+ */
+MergeJoinState *
+_ExecInitMergeJoin(CustomMergeJoin *node, EState *estate, int eflags)
+{
+	MergeJoinState *mergestate;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	MJ1_printf("ExecInitMergeJoin: %s\n",
+			   "initializing node");
+
+	/*
+	 * create state structure
+	 */
+	mergestate = makeNode(MergeJoinState);
+	mergestate->js.ps.plan = (Plan *) node;
+	mergestate->js.ps.state = estate;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &mergestate->js.ps);
+
+	/*
+	 * we need two additional econtexts in which we can compute the join
+	 * expressions from the left and right input tuples.  The node's regular
+	 * econtext won't do because it gets reset too often.
+	 */
+	mergestate->mj_OuterEContext = CreateExprContext(estate);
+	mergestate->mj_InnerEContext = CreateExprContext(estate);
+
+	/*
+	 * initialize child expressions
+	 */
+	mergestate->js.ps.targetlist = (List *)
+		ExecInitExpr((Expr *) node->cplan.plan.targetlist,
+					 (PlanState *) mergestate);
+	mergestate->js.ps.qual = (List *)
+		ExecInitExpr((Expr *) node->cplan.plan.qual,
+					 (PlanState *) mergestate);
+	mergestate->js.jointype = node->jointype;
+	mergestate->js.joinqual = (List *)
+		ExecInitExpr((Expr *) node->joinqual,
+					 (PlanState *) mergestate);
+	mergestate->mj_ConstFalseJoin = false;
+	/* mergeclauses are handled below */
+
+	/*
+	 * initialize child nodes
+	 *
+	 * inner child must support MARK/RESTORE.
+	 */
+	outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+	innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
+											  eflags | EXEC_FLAG_MARK);
+
+	/*
+	 * For certain types of inner child nodes, it is advantageous to issue
+	 * MARK every time we advance past an inner tuple we will never return to.
+	 * For other types, MARK on a tuple we cannot return to is a waste of
+	 * cycles.	Detect which case applies and set mj_ExtraMarks if we want to
+	 * issue "unnecessary" MARK calls.
+	 *
+	 * Currently, only Material wants the extra MARKs, and it will be helpful
+	 * only if eflags doesn't specify REWIND.
+	 */
+	if (IsA(innerPlan(node), Material) &&
+		(eflags & EXEC_FLAG_REWIND) == 0)
+		mergestate->mj_ExtraMarks = true;
+	else
+		mergestate->mj_ExtraMarks = false;
+
+	/*
+	 * tuple table initialization
+	 */
+	ExecInitResultTupleSlot(estate, &mergestate->js.ps);
+
+	mergestate->mj_MarkedTupleSlot = ExecInitExtraTupleSlot(estate);
+	ExecSetSlotDescriptor(mergestate->mj_MarkedTupleSlot,
+						  ExecGetResultType(innerPlanState(mergestate)));
+
+	switch (node->jointype)
+	{
+		case JOIN_INNER:
+		case JOIN_SEMI:
+			mergestate->mj_FillOuter = false;
+			mergestate->mj_FillInner = false;
+			break;
+		case JOIN_LEFT:
+		case JOIN_ANTI:
+			mergestate->mj_FillOuter = true;
+			mergestate->mj_FillInner = false;
+			mergestate->mj_NullInnerTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(innerPlanState(mergestate)));
+			break;
+		case JOIN_RIGHT:
+			mergestate->mj_FillOuter = false;
+			mergestate->mj_FillInner = true;
+			mergestate->mj_NullOuterTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(outerPlanState(mergestate)));
+
+			/*
+			 * Can't handle right or full join with non-constant extra
+			 * joinclauses.  This should have been caught by planner.
+			 */
+			if (!check_constant_qual(node->joinqual,
+									 &mergestate->mj_ConstFalseJoin))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("RIGHT JOIN is only supported with merge-joinable join conditions")));
+			break;
+		case JOIN_FULL:
+			mergestate->mj_FillOuter = true;
+			mergestate->mj_FillInner = true;
+			mergestate->mj_NullOuterTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(outerPlanState(mergestate)));
+			mergestate->mj_NullInnerTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(innerPlanState(mergestate)));
+
+			/*
+			 * Can't handle right or full join with non-constant extra
+			 * joinclauses.  This should have been caught by planner.
+			 */
+			if (!check_constant_qual(node->joinqual,
+									 &mergestate->mj_ConstFalseJoin))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("FULL JOIN is only supported with merge-joinable join conditions")));
+			break;
+		default:
+			elog(ERROR, "unrecognized join type: %d",
+				 (int) node->jointype);
+	}
+
+	/*
+	 * initialize tuple type and projection info
+	 */
+	ExecAssignResultTypeFromTL(&mergestate->js.ps);
+	ExecAssignProjectionInfo(&mergestate->js.ps, NULL);
+
+	/*
+	 * preprocess the merge clauses
+	 */
+	mergestate->mj_NumClauses = list_length(node->mergeclauses);
+	mergestate->mj_Clauses = MJExamineQuals(node->mergeclauses,
+											node->mergeFamilies,
+											node->mergeCollations,
+											node->mergeStrategies,
+											node->mergeNullsFirst,
+											(PlanState *) mergestate);
+
+	/*
+	 * initialize join state
+	 */
+	mergestate->mj_JoinState = EXEC_MJ_INITIALIZE_OUTER;
+	mergestate->js.ps.ps_TupFromTlist = false;
+	mergestate->mj_MatchedOuter = false;
+	mergestate->mj_MatchedInner = false;
+	mergestate->mj_OuterTupleSlot = NULL;
+	mergestate->mj_InnerTupleSlot = NULL;
+
+	/*
+	 * initialization successful
+	 */
+	MJ1_printf("ExecInitMergeJoin: %s\n",
+			   "node initialized");
+
+	return mergestate;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndMergeJoin
+ *
+ * old comments
+ *		frees storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+_ExecEndMergeJoin(CustomMergeJoinState *node)
+{
+	MJ1_printf("ExecEndMergeJoin: %s\n",
+			   "ending node processing");
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->cps.ps);
+
+	/*
+	 * clean out the tuple table
+	 */
+	ExecClearTuple(node->cps.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->mj_MarkedTupleSlot);
+
+	/*
+	 * shut down the subplans
+	 */
+	ExecEndNode(innerPlanState(node));
+	ExecEndNode(outerPlanState(node));
+
+	MJ1_printf("ExecEndMergeJoin: %s\n",
+			   "node processing ended");
+}
+
+void
+_ExecReScanMergeJoin(CustomMergeJoinState *node)
+{
+	ExecClearTuple(node->mj_MarkedTupleSlot);
+
+	node->mj_JoinState = EXEC_MJ_INITIALIZE_OUTER;
+	node->cps.ps.ps_TupFromTlist = false;
+	node->mj_MatchedOuter = false;
+	node->mj_MatchedInner = false;
+	node->mj_OuterTupleSlot = NULL;
+	node->mj_InnerTupleSlot = NULL;
+
+	/*
+	 * if chgParam of subnodes is not null then plans will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (node->cps.ps.lefttree->chgParam == NULL)
+		ExecReScan(node->cps.ps.lefttree);
+	if (node->cps.ps.righttree->chgParam == NULL)
+		ExecReScan(node->cps.ps.righttree);
+
+}
diff --git a/contrib/custmj/setrefs.c b/contrib/custmj/setrefs.c
new file mode 100644
index 0000000..9eb0b14
--- /dev/null
+++ b/contrib/custmj/setrefs.c
@@ -0,0 +1,326 @@
+/*-------------------------------------------------------------------------
+ *
+ * setrefs.c
+ *	  Post-processing of a completed plan tree: fix references to subplan
+ *	  vars, compute regproc values for operators, etc
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/plan/setrefs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/transam.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/tlist.h"
+#include "tcop/utility.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "custmj.h"
+
+typedef struct
+{
+	PlannerInfo *root;
+	indexed_tlist *outer_itlist;
+	indexed_tlist *inner_itlist;
+	Index		acceptable_rel;
+	int			rtoffset;
+} fix_join_expr_context;
+
+typedef struct
+{
+	PlannerInfo *root;
+	indexed_tlist *subplan_itlist;
+	Index		newvarno;
+	int			rtoffset;
+} fix_upper_expr_context;
+
+static Var *search_indexed_tlist_for_non_var(Node *node,
+								 indexed_tlist *itlist,
+								 Index newvarno);
+static Node *fix_join_expr_mutator(Node *node,
+					  fix_join_expr_context *context);
+/*
+ * copyVar
+ *		Copy a Var node.
+ *
+ * fix_scan_expr and friends do this enough times that it's worth having
+ * a bespoke routine instead of using the generic copyObject() function.
+ */
+static inline Var *
+copyVar(Var *var)
+{
+	Var		   *newvar = (Var *) palloc(sizeof(Var));
+
+	*newvar = *var;
+	return newvar;
+}
+
+/*
+ * build_tlist_index --- build an index data structure for a child tlist
+ *
+ * In most cases, subplan tlists will be "flat" tlists with only Vars,
+ * so we try to optimize that case by extracting information about Vars
+ * in advance.	Matching a parent tlist to a child is still an O(N^2)
+ * operation, but at least with a much smaller constant factor than plain
+ * tlist_member() searches.
+ *
+ * The result of this function is an indexed_tlist struct to pass to
+ * search_indexed_tlist_for_var() or search_indexed_tlist_for_non_var().
+ * When done, the indexed_tlist may be freed with a single pfree().
+ */
+indexed_tlist *
+build_tlist_index(List *tlist)
+{
+	indexed_tlist *itlist;
+	tlist_vinfo *vinfo;
+	ListCell   *l;
+
+	/* Create data structure with enough slots for all tlist entries */
+	itlist = (indexed_tlist *)
+		palloc(offsetof(indexed_tlist, vars) +
+			   list_length(tlist) * sizeof(tlist_vinfo));
+
+	itlist->tlist = tlist;
+	itlist->has_ph_vars = false;
+	itlist->has_non_vars = false;
+
+	/* Find the Vars and fill in the index array */
+	vinfo = itlist->vars;
+	foreach(l, tlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+		if (tle->expr && IsA(tle->expr, Var))
+		{
+			Var		   *var = (Var *) tle->expr;
+
+			vinfo->varno = var->varno;
+			vinfo->varattno = var->varattno;
+			vinfo->resno = tle->resno;
+			vinfo++;
+		}
+		else if (tle->expr && IsA(tle->expr, PlaceHolderVar))
+			itlist->has_ph_vars = true;
+		else
+			itlist->has_non_vars = true;
+	}
+
+	itlist->num_vars = (vinfo - itlist->vars);
+
+	return itlist;
+}
+
+/*
+ * search_indexed_tlist_for_var --- find a Var in an indexed tlist
+ *
+ * If a match is found, return a copy of the given Var with suitably
+ * modified varno/varattno (to wit, newvarno and the resno of the TLE entry).
+ * Also ensure that varnoold is incremented by rtoffset.
+ * If no match, return NULL.
+ */
+static Var *
+search_indexed_tlist_for_var(Var *var, indexed_tlist *itlist,
+							 Index newvarno, int rtoffset)
+{
+	Index		varno = var->varno;
+	AttrNumber	varattno = var->varattno;
+	tlist_vinfo *vinfo;
+	int			i;
+
+	vinfo = itlist->vars;
+	i = itlist->num_vars;
+	while (i-- > 0)
+	{
+		if (vinfo->varno == varno && vinfo->varattno == varattno)
+		{
+			/* Found a match */
+			Var		   *newvar = copyVar(var);
+
+			newvar->varno = newvarno;
+			newvar->varattno = vinfo->resno;
+			if (newvar->varnoold > 0)
+				newvar->varnoold += rtoffset;
+			return newvar;
+		}
+		vinfo++;
+	}
+	return NULL;				/* no match */
+}
+
+/*
+ * search_indexed_tlist_for_non_var --- find a non-Var in an indexed tlist
+ *
+ * If a match is found, return a Var constructed to reference the tlist item.
+ * If no match, return NULL.
+ *
+ * NOTE: it is a waste of time to call this unless itlist->has_ph_vars or
+ * itlist->has_non_vars
+ */
+static Var *
+search_indexed_tlist_for_non_var(Node *node,
+								 indexed_tlist *itlist, Index newvarno)
+{
+	TargetEntry *tle;
+
+	tle = tlist_member(node, itlist->tlist);
+	if (tle)
+	{
+		/* Found a matching subplan output expression */
+		Var		   *newvar;
+
+		newvar = makeVarFromTargetEntry(newvarno, tle);
+		newvar->varnoold = 0;	/* wasn't ever a plain Var */
+		newvar->varoattno = 0;
+		return newvar;
+	}
+	return NULL;				/* no match */
+}
+
+/*
+ * fix_join_expr
+ *	   Create a new set of targetlist entries or join qual clauses by
+ *	   changing the varno/varattno values of variables in the clauses
+ *	   to reference target list values from the outer and inner join
+ *	   relation target lists.  Also perform opcode lookup and add
+ *	   regclass OIDs to root->glob->relationOids.
+ *
+ * This is used in two different scenarios: a normal join clause, where all
+ * the Vars in the clause *must* be replaced by OUTER_VAR or INNER_VAR
+ * references; and a RETURNING clause, which may contain both Vars of the
+ * target relation and Vars of other relations.  In the latter case we want
+ * to replace the other-relation Vars by OUTER_VAR references, while leaving
+ * target Vars alone.
+ *
+ * For a normal join, acceptable_rel should be zero so that any failure to
+ * match a Var will be reported as an error.  For the RETURNING case, pass
+ * inner_itlist = NULL and acceptable_rel = the ID of the target relation.
+ *
+ * 'clauses' is the targetlist or list of join clauses
+ * 'outer_itlist' is the indexed target list of the outer join relation
+ * 'inner_itlist' is the indexed target list of the inner join relation,
+ *		or NULL
+ * 'acceptable_rel' is either zero or the rangetable index of a relation
+ *		whose Vars may appear in the clause without provoking an error
+ * 'rtoffset': how much to increment varnoold by
+ *
+ * Returns the new expression tree.  The original clause structure is
+ * not modified.
+ */
+List *
+fix_join_expr(PlannerInfo *root,
+			  List *clauses,
+			  indexed_tlist *outer_itlist,
+			  indexed_tlist *inner_itlist,
+			  Index acceptable_rel,
+			  int rtoffset)
+{
+	fix_join_expr_context context;
+
+	context.root = root;
+	context.outer_itlist = outer_itlist;
+	context.inner_itlist = inner_itlist;
+	context.acceptable_rel = acceptable_rel;
+	context.rtoffset = rtoffset;
+	return (List *) fix_join_expr_mutator((Node *) clauses, &context);
+}
+
+static Node *
+fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
+{
+	Var		   *newvar;
+
+	if (node == NULL)
+		return NULL;
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+
+		/* First look for the var in the input tlists */
+		newvar = search_indexed_tlist_for_var(var,
+											  context->outer_itlist,
+											  OUTER_VAR,
+											  context->rtoffset);
+		if (newvar)
+			return (Node *) newvar;
+		if (context->inner_itlist)
+		{
+			newvar = search_indexed_tlist_for_var(var,
+												  context->inner_itlist,
+												  INNER_VAR,
+												  context->rtoffset);
+			if (newvar)
+				return (Node *) newvar;
+		}
+
+		/* If it's for acceptable_rel, adjust and return it */
+		if (var->varno == context->acceptable_rel)
+		{
+			var = copyVar(var);
+			var->varno += context->rtoffset;
+			if (var->varnoold > 0)
+				var->varnoold += context->rtoffset;
+			return (Node *) var;
+		}
+
+		/* No referent found for Var */
+		elog(ERROR, "variable not found in subplan target lists");
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		/* See if the PlaceHolderVar has bubbled up from a lower plan node */
+		if (context->outer_itlist->has_ph_vars)
+		{
+			newvar = search_indexed_tlist_for_non_var((Node *) phv,
+													  context->outer_itlist,
+													  OUTER_VAR);
+			if (newvar)
+				return (Node *) newvar;
+		}
+		if (context->inner_itlist && context->inner_itlist->has_ph_vars)
+		{
+			newvar = search_indexed_tlist_for_non_var((Node *) phv,
+													  context->inner_itlist,
+													  INNER_VAR);
+			if (newvar)
+				return (Node *) newvar;
+		}
+
+		/* If not supplied by input plans, evaluate the contained expr */
+		return fix_join_expr_mutator((Node *) phv->phexpr, context);
+	}
+	/* Try matching more complex expressions too, if tlists have any */
+	if (context->outer_itlist->has_non_vars)
+	{
+		newvar = search_indexed_tlist_for_non_var(node,
+												  context->outer_itlist,
+												  OUTER_VAR);
+		if (newvar)
+			return (Node *) newvar;
+	}
+	if (context->inner_itlist && context->inner_itlist->has_non_vars)
+	{
+		newvar = search_indexed_tlist_for_non_var(node,
+												  context->inner_itlist,
+												  INNER_VAR);
+		if (newvar)
+			return (Node *) newvar;
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node,
+								   fix_join_expr_mutator,
+								   (void *) context);
+}
diff --git a/contrib/custmj/sql/custmj.sql b/contrib/custmj/sql/custmj.sql
new file mode 100644
index 0000000..ffb6d9d
--- /dev/null
+++ b/contrib/custmj/sql/custmj.sql
@@ -0,0 +1,79 @@
+-- regression test for custmj extension
+
+--
+-- initial setup
+--
+CREATE TABLE t1 (a int, b text);
+CREATE TABLE t2 (x int, y text);
+CREATE TABLE t3 (n int primary key, m text);
+CREATE TABLE t4 (s int references t3(n), t text);
+
+INSERT INTO t1 (SELECT x, md5(x::text) FROM generate_series(  1,600) x);
+INSERT INTO t2 (SELECT x, md5(x::text) FROM generate_series(401,800) x);
+INSERT INTO t3 (SELECT x, md5(x::text) FROM generate_series(  1,800) x);
+INSERT INTO t4 (SELECT x, md5(x::text) FROM generate_series(201,600) x);
+
+VACUUM ANALYZE t1;
+VACUUM ANALYZE t2;
+VACUUM ANALYZE t3;
+VACUUM ANALYZE t4;
+-- LOAD this extension
+LOAD 'custmj';
+
+--
+-- explain output
+--
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+
+-- force off hash_join
+SET enable_hashjoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+SELECT * INTO bmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+SELECT * INTO bmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+SELECT * INTO bmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+SELECT * INTO bmj4 FROM t3 FULL JOIN t4 ON n = s;
+
+-- force off built-in merge_join
+SET enable_mergejoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+SELECT * INTO cmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+SELECT * INTO cmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+SELECT * INTO cmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+SELECT * INTO cmj4 FROM t3 FULL JOIN t4 ON n = s;
+
+-- compare the difference of simple result
+SELECT * FROM bmj1 EXCEPT SELECT * FROM cmj1;
+SELECT * FROM cmj1 EXCEPT SELECT * FROM bmj1;
+SELECT * FROM bmj2 EXCEPT SELECT * FROM cmj2;
+SELECT * FROM cmj2 EXCEPT SELECT * FROM bmj2;
+SELECT * FROM bmj3 EXCEPT SELECT * FROM cmj3;
+SELECT * FROM cmj3 EXCEPT SELECT * FROM bmj3;
+SELECT * FROM bmj4 EXCEPT SELECT * FROM cmj4;
+SELECT * FROM cmj4 EXCEPT SELECT * FROM bmj4;
+
+-- a little bit complicated
+EXPLAIN (verbose, costs off)
+  SELECT (a + x + n) % s AS c1, md5(b || y || m || t) AS c2
+  FROM ((t1 join t2 on a = x) join t3 on y = m) join t4 on n = s
+  WHERE b like '%ab%' AND y like '%cd%' AND m like t;
+
+PREPARE p1(int,int) AS
+SELECT * FROM t1 JOIN t3 ON a = n WHERE n BETWEEN $1 AND $2;
+EXPLAIN (verbose, costs off) EXECUTE p1(100,100);
+EXPLAIN (verbose, costs off) EXECUTE p1(100,1000);
+
+EXPLAIN (verbose, costs off)
+SELECT * FROM t1 JOIN t2 ON a = x WHERE x IN (SELECT n % 100 FROM t3);
+
+-- check GetSpecialCustomVar stuff
+SET client_min_messages = debug;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;

pgsql-v9.4-custom-scan.part-1.v11.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v11.patchDownload

 doc/src/sgml/custom-plan.sgml           | 315 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  45 ++++-
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  23 +++
 src/backend/executor/execProcnode.c     |  15 ++
 src/backend/executor/nodeCustom.c       |  73 ++++++++
 src/backend/nodes/copyfuncs.c           |  42 +++++
 src/backend/nodes/outfuncs.c            |  40 ++++
 src/backend/optimizer/path/allpaths.c   |  34 ++--
 src/backend/optimizer/path/joinpath.c   |  16 ++
 src/backend/optimizer/plan/createplan.c |  55 ++++--
 src/backend/optimizer/plan/setrefs.c    |  25 ++-
 src/backend/optimizer/plan/subselect.c  | 128 +++++++------
 src/backend/utils/adt/ruleutils.c       |  56 ++++++
 src/include/commands/explain.h          |   1 +
 src/include/executor/nodeCustom.h       |  30 +++
 src/include/nodes/execnodes.h           |  12 ++
 src/include/nodes/nodes.h               |   6 +
 src/include/nodes/plannodes.h           |  77 ++++++++
 src/include/nodes/relation.h            |  29 +++
 src/include/optimizer/paths.h           |  17 ++
 src/include/optimizer/planmain.h        |  12 ++
 src/include/optimizer/subselect.h       |   7 +
 25 files changed, 958 insertions(+), 104 deletions(-)

diff --git a/doc/src/sgml/custom-plan.sgml b/doc/src/sgml/custom-plan.sgml
new file mode 100644
index 0000000..8d456f9
--- /dev/null
+++ b/doc/src/sgml/custom-plan.sgml
@@ -0,0 +1,315 @@
+<!-- doc/src/sgml/custom-plan.sgml -->
+
+<chapter id="custom-plan">
+ <title>Writing A Custom Plan Provider</title>
+
+ <indexterm zone="custom-plan">
+  <primary>custom plan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-plan interface enables extensions to implement its custom
+  behavior, instead of built-in plan node, according to the cost based
+  optimizer manner.
+  Its key component is <literal>CustomPlan</> node that has usual
+  <literal>Plan</> field and a table of function-pointers; that performs
+  like methods of base class in object oriented programming language,
+  thus <literal>CustomPlan</> node works as a polymorphism plan / execution
+  node.
+  The core backend does not assume anything about behavior of this node
+  type, thus, note that it is responsibility of the custom-plan provider
+  to work its custom node as if the built-in plan / execution node being
+  replaced.
+ </para>
+ <para>
+  Overall steps to use this custom-plan interface is below.
+ </para>
+ <para>
+  Custom-plan provider can add <literal>CustomPath</> on a particular
+  relation scan using <literal>add_scan_path_hook</> or a particular
+  relations join using <literal>add_join_path_hook</>.
+  Then, the planner chooses the cheapest path towards a particular
+  scan or join in the built-in and custom paths.
+  So, <literal>CustomPath</> node has to have proper cost estimation
+  for right plan selection, no need to say.
+ </para>
+ <para>
+  Usually, custom-plan provider extends <literal>CustomPath</> type
+  to have its private fields, like:
+<programlisting>
+typedef struct {
+    CustomPath    cpath;
+        :
+    List         *somethin_private;
+        :
+} YourOwnCustomPath;
+</programlisting>
+  You can also extend <literal>CustomPlan</> and <literal>CustomPlanState</>
+  type with similar manner.
+ </para>
+ <para>
+  <literal>CustomPathMethods</> is table of function-pointers
+  for <literal>CustomPath</>, and <literal>CustomPlanMethods</> is
+  table of function-pointers for <literal>CustomPlan</> and
+  <literal>CustomPlanState</>.
+  Extension has to implement the functions according to the specification
+  in the next section.
+ </para>
+
+ <sect1 id="custom-plan-spec">
+  <title>Specification of Custom Plan Interface</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom-plan path</title>
+   <para>
+    The first task of custom-plan provide is to add <literal>CustomPath</>
+    towards a particular relation scan or relations join.
+    Right now, only scan and join are supported by planner thus cost-based
+    optimization shall be applied, however, other kind of nodes (like sort,
+    aggregate and so on...) are not supported.
+   </para>
+   <para>
+<programlisting>
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+                                        RelOptInfo *baserel,
+                                        RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+</programlisting>
+    Custom-plan provider can add its custom-path using
+    <literal>add_scan_path_hook</> to provide alternative way to scan
+    the relation being specified.
+   </para>
+   <para>
+<programlisting>
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                        RelOptInfo *joinrel,
+                                        RelOptInfo *outerrel,
+                                        RelOptInfo *innerrel,
+                                        JoinType jointype,
+                                        SpecialJoinInfo *sjinfo,
+                                        List *restrictlist,
+                                        Relids param_source_rels,
+                                        Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
+</programlisting>
+    Also, custom-plan provider can add its custom-path using
+    <literal>add_join_path_hook</> to provide alternative way to join
+    two relations (note that both or either of relations are also joined
+    relations, not only base relations) being specified.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-path-methods">
+   <title>Methods of CustomPath</title>
+   <para>
+    This section introduces the method functions of <literal>CustomPath</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlan *
+CreateCustomPlan(PlannerInfo *root,
+                 CustomPath *custom_path);
+</programlisting>
+    This method pupolates a node object that (at least) extends
+    <literal>CustomPlan</> data type, according to the supplied
+    <literal>CustomPath</>.
+    If this custom-plan support mark-and-restore position, its
+    node tag should be <literal>CustomPlanMarkPos</>, instead of
+    <literal>CustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+TextOutCustomPath(StringInfo str, Node *node);
+</programlisting>    
+    This method is needed to support <literal>nodeToString</> for your
+    custom path type to dump its private fields also.
+    The message format has to follow the manner in <filename>outfuncs.c</>.
+   </para>
+  </sect2>
+  <sect2 id="custom-plan-methods">
+   <title>Methods of CustomPlan</title>
+   <para>
+    This section introduces the method functions of <literal>CustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+SetCustomPlanRef(PlannerInfo *root,
+                 CustomPlan *custom_plan,
+                 int rtoffset);
+</programlisting>
+    This method requires custom-plan provides to adjust <literal>Var</> node
+    references in the supplied <literal>CustomPlan</> node.
+    Usually, it shall be shifted by <literal>rtoffset</>, or replaced by
+    <literal>INNER_VAR</> or <literal>OUTER_VAR</> if it references either
+    left or right subplan.
+   </para>
+   <para>
+<programlisting>
+bool
+SupportBackwardScan(CustomPlan *custom_plan);
+</programlisting>
+    This optional method informs the core backend whether this custom-plan
+    supports backward scan capability, or not.
+    If this method is implemented and returns <literal>true</>, it means
+    this custom-plan node supports backward scan. Elsewhere, it is not
+    available.
+   </para>
+   <para>
+<programlisting>
+void
+FinalizeCustomPlan(PlannerInfo *root,
+                   CustomPlan *custom_plan,
+                   Bitmapset **paramids,
+                   Bitmapset **valid_params,
+                   Bitmapset **scan_params);
+</programlisting>
+    This optional method informs the core backend which query parameters
+    are referenced in this custom-plan node, in addition to the ones
+    considered in the <literal>targetlist</> and <literal>qual</> fields
+    of the base <literal>Plan</> node.
+    If parameters are found in the private data field managed by custom-
+    plan provider, it needs to update the supplied bitmapset as expected
+    in the <literal>finalize_plan()</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlanState *
+BeginCustomPlan(CustomPlan *custom_plan,
+                EState *estate,
+                int eflags);
+</programlisting>
+    This method populates a <literal>CustomPlanState</> object according to
+    the supplied <literal>CustomPlan</>, and initializes execution of this
+    custom-plan node, first of all.
+   </para>
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It fetches one tuple from this custom-plan node. This custom-plan node
+    has to set a valid tuple on the <literal>ps_ResultTupleSlot</> and
+    return if any, or returns <literal>NULL</> to inform the upper node
+    it already reached end of the scan.
+   </para>
+   <para>
+<programlisting>
+Node *
+MultiExecCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    Unlike <literal>ExecCustomPlan</>, it allows upper node to fetch
+    multiple tuples, however, you need to pay attention the data format
+    and the way to return it because it fully depends on the type of
+    upper node.
+   </para>
+   <para>
+<programlisting>
+void
+EndCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It ends the execution of this custom-plan node, and releases the
+    resources being allocated. Usually, it is not important to release
+    memory in the per execution memory context, so custom-plan provider
+    should be responsible to its own resources regardless of the framework.
+   </para>
+   <para>
+<programlisting>
+void
+ReScanCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It is optional, but should be implemented if <literal>CustomPlanMarkPos</>
+    was applied, instead of <literal>CustomPlan</>.
+    It saves the current position of the custom-plan on somewhere private
+    state, to restore the position later.    
+   </para>
+   <para>
+<programlisting>
+void
+RestrPosCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It is optional, but should be implemented if <literal>CustomPlanMarkPos</>
+    was applied, instead of <literal>CustomPlan</>.
+    It restores the current position of the custom-plan from the private
+    information being saved somewhere at <literal>MarkPosCustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomPlanTargetRel(CustomPlanState *cpstate,
+                           ExplainState *es);
+</programlisting>
+    It shows the target relation, if this custom-plan node replaced
+    a particular relation scan. Because of implementation reason, this
+    method is separated from the <literal>ExplainCustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomPlan(CustomPlanState *cpstate,
+                  List *ancestors,
+                  ExplainState *es);
+</programlisting>
+    It put properties of this custom-plan node into the supplied
+    <literal>ExplainState</> according to the usual <command>EXPLAIN</>
+    manner.
+   </para>
+   <para>
+<programlisting>
+Bitmapset *
+GetRelidsCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It returns a set of range-table indexes being scanned by this custom-
+    plan node. In case of multiple relations are underlying, it is not
+    always singleton bitmap.
+   </para>
+   <para>
+<programlisting>
+Node *
+GetSpecialCustomVar(CustomPlanState *cpstate,
+                    Var *varnode);
+</programlisting>
+    This optional method returns an expression node to be referenced by
+    the supplied <literal>varnode</> that has special <literal>varno</>
+    (<literal>INNER_VAR</>, <literal>OUTER_VAR</> or <literal>INDEX_VAR</>).
+    <command>EXPLAIN</> command shows column name being referenced in the
+    targetlist or qualifiers of plan nodes. If a var node has special
+    <literal>varno</>, it recursively walks down the underlying subplan to
+    ensure the actual expression referenced by this special varno.
+    In case when a custom-plan node replaced a join node but does not have
+    underlying sub-plan on the left- and right-tree, it is unavailable to
+    use a usual logic, so custom-plan provider has to implement this method
+    to inform the core backend the expression node being referenced by
+    the supplied <literal>varnode</> that has special <literal>varno</>.
+    If this method is not implemented or returns <literal>NULL</>,
+    the core backend solves the special varnode reference as usual.
+   </para>
+   <para>
+<programlisting>
+void
+TextOutCustomPlan(StringInfo str, const CustomPlan *node);
+</programlisting>
+    This method is needed to support <literal>nodeToString</> for your
+    custom plan type to dump its private fields also.
+    The message format has to follow the manner in <filename>outfuncs.c</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlan *
+CopyCustomPlan(const CustomPlan *from);
+</programlisting>
+    This methos is needed to support <literal>copyObject</> for your
+    custom plan type to copy its private fields also.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 0e863ee..33f964e 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -91,6 +91,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-plan  SYSTEM "custom-plan.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
 <!ENTITY storage    SYSTEM "storage.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index b47bf52..45e9d32 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -241,6 +241,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-plan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 08f3167..ff9fc7b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -51,7 +52,6 @@ static void ExplainOneQuery(Query *query, IntoClause *into, ExplainState *es,
 static void report_triggers(ResultRelInfo *rInfo, bool show_relname,
 				ExplainState *es);
 static double elapsed_time(instr_time *starttime);
-static void ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used);
 static void ExplainPreScanMemberNodes(List *plans, PlanState **planstates,
 						  Bitmapset **rels_used);
 static void ExplainPreScanSubPlans(List *plans, Bitmapset **rels_used);
@@ -700,7 +700,7 @@ elapsed_time(instr_time *starttime)
  * This ensures that we don't confusingly assign un-suffixed aliases to RTEs
  * that never appear in the EXPLAIN output (such as inheritance parents).
  */
-static void
+void
 ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 {
 	Plan	   *plan = planstate->plan;
@@ -721,6 +721,16 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			{
+				CustomPlanState	   *cpstate = (CustomPlanState *)planstate;
+				Bitmapset		   *temp
+					= cpstate->methods->GetRelidsCustomPlan(cpstate);
+
+				*rels_used = bms_union(*rels_used, temp);
+			}
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -847,6 +857,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -935,6 +946,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomPlan:
+			sname = "Custom";
+			custom_name = ((CustomPlan *) plan)->methods->CustomName;
+			if (custom_name != NULL)
+				pname = psprintf("Custom (%s)", custom_name);
+			else
+				pname = sname;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1036,6 +1055,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1051,6 +1072,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomPlan:
+			{
+				CustomPlanState	*cps = (CustomPlanState *)planstate;
+
+				if (cps->methods->ExplainCustomPlanTargetRel)
+					cps->methods->ExplainCustomPlanTargetRel(cps, es);
+			}
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1347,6 +1376,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomPlan:
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			if (((CustomPlanState *) planstate)->methods->ExplainCustomPlan)
+			{
+				CustomPlanState *cpstate = (CustomPlanState *) planstate;
+
+				cpstate->methods->ExplainCustomPlan(cpstate, ancestors, es);
+			}
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 8c01a63..47e7a3c 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecReScanCustomPlan((CustomPlanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecCustomMarkPos((CustomPlanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecCustomRestrPos((CustomPlanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -390,6 +403,7 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_ValuesScan:
 		case T_Material:
 		case T_Sort:
+		case T_CustomPlanMarkPos:
 			return true;
 
 		case T_Result:
@@ -465,6 +479,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomPlan:
+			{
+				CustomPlan *cplan = (CustomPlan *) node;
+
+				if (cplan->methods->SupportBackwardScan)
+					return cplan->methods->SupportBackwardScan(cplan);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c5ecd18..5aa117b 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			result = (PlanState *) ExecInitCustomPlan((CustomPlan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +449,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			result = ExecCustomPlan((CustomPlanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +689,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecEndCustomPlan((CustomPlanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..e3c8f58
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,73 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/executor.h"
+#include "executor/nodeCustom.h"
+#include "nodes/execnodes.h"
+#include "nodes/plannodes.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+CustomPlanState *
+ExecInitCustomPlan(CustomPlan *custom_plan, EState *estate, int eflags)
+{
+	CustomPlanState	   *cpstate
+		= custom_plan->methods->BeginCustomPlan(custom_plan, estate, eflags);
+
+	Assert(IsA(cpstate, CustomPlanState));
+
+	return cpstate;
+}
+
+TupleTableSlot *
+ExecCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->ExecCustomPlan != NULL);
+	return cpstate->methods->ExecCustomPlan(cpstate);
+}
+
+Node *
+MultiExecCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->MultiExecCustomPlan != NULL);
+	return cpstate->methods->MultiExecCustomPlan(cpstate);
+}
+
+void
+ExecEndCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->EndCustomPlan != NULL);
+	cpstate->methods->EndCustomPlan(cpstate);
+}
+
+void
+ExecReScanCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->ReScanCustomPlan != NULL);
+	cpstate->methods->ReScanCustomPlan(cpstate);
+}
+
+void
+ExecCustomMarkPos(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->MarkPosCustomPlan != NULL);
+	cpstate->methods->MarkPosCustomPlan(cpstate);
+}
+
+void
+ExecCustomRestrPos(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->RestrPosCustomPlan != NULL);
+	cpstate->methods->RestrPosCustomPlan(cpstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c89d808..18505cd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,42 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomPlan
+ */
+static CustomPlan *
+_copyCustomPlan(const CustomPlan *from)
+{
+	CustomPlan *newnode = from->methods->CopyCustomPlan(from);
+
+	Assert(IsA(newnode, CustomPlan));
+	return newnode;
+}
+
+/*
+ * _copyCustomPlanMarkPos
+ */
+static CustomPlanMarkPos *
+_copyCustomPlanMarkPos(const CustomPlanMarkPos *from)
+{
+	CustomPlanMarkPos *newnode = from->methods->CopyCustomPlan(from);
+
+	Assert(IsA(newnode, CustomPlanMarkPos));
+	return newnode;
+}
+
+/* copy common part of CustomPlan */
+void
+CopyCustomPlanCommon(const Node *__from, Node *__newnode)
+{
+	CustomPlan *from = (CustomPlan *) __from;
+	CustomPlan *newnode = (CustomPlan *) __newnode;
+
+	((Node *) newnode)->type = nodeTag(from);
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+	COPY_SCALAR_FIELD(methods);
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3983,6 +4019,12 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomPlan:
+			retval = _copyCustomPlan(from);
+			break;
+		case T_CustomPlanMarkPos:
+			retval = _copyCustomPlanMarkPos(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bfb4b9f..8a93bc5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -563,6 +563,27 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
 
+/* dump common part of CustomPlan structure */
+static void
+_outCustomPlan(StringInfo str, const CustomPlan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLAN");
+	_outPlanInfo(str, (const Plan *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPlan(str, node);
+}
+
+static void
+_outCustomPlanMarkPos(StringInfo str, const CustomPlanMarkPos *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLANMARKPOS");
+	_outPlanInfo(str, (const Plan *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPlan(str, node);
+}
+
 static void
 _outJoin(StringInfo str, const Join *node)
 {
@@ -1581,6 +1602,16 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 }
 
 static void
+_outCustomPath(StringInfo str, const CustomPath *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPATH");
+	_outPathInfo(str, (const Path *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPath(str, (Node *)node);
+}
+
+static void
 _outAppendPath(StringInfo str, const AppendPath *node)
 {
 	WRITE_NODE_TYPE("APPENDPATH");
@@ -2828,6 +2859,12 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomPlan:
+				_outCustomPlan(str, obj);
+				break;
+			case T_CustomPlanMarkPos:
+				_outCustomPlanMarkPos(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
@@ -3036,6 +3073,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignPath:
 				_outForeignPath(str, obj);
 				break;
+			case T_CustomPath:
+				_outCustomPath(str, obj);
+				break;
 			case T_AppendPath:
 				_outAppendPath(str, obj);
 				break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..6c1ea7e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -323,7 +325,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				}
 				break;
 			case RTE_SUBQUERY:
-				/* Subquery --- fully handled during set_rel_size */
+				/* Subquery --- path was added during set_rel_size */
 				break;
 			case RTE_FUNCTION:
 				/* RangeFunction */
@@ -334,12 +336,19 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				set_values_pathlist(root, rel, rte);
 				break;
 			case RTE_CTE:
-				/* CTE reference --- fully handled during set_rel_size */
+				/* CTE reference --- path was added during set_rel_size */
 				break;
 			default:
 				elog(ERROR, "unexpected rtekind: %d", (int) rel->rtekind);
 				break;
 		}
+
+		/* Also, consider custom plans */
+		if (add_scan_path_hook)
+			(*add_scan_path_hook)(root, rel, rte);
+
+		/* Select cheapest path */
+		set_cheapest(rel);
 	}
 
 #ifdef OPTIMIZER_DEBUG
@@ -388,9 +397,6 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
-
-	/* Now find the cheapest of the paths for this rel */
-	set_cheapest(rel);
 }
 
 /*
@@ -416,9 +422,6 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 {
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
-
-	/* Select cheapest path */
-	set_cheapest(rel);
 }
 
 /*
@@ -1235,9 +1238,6 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1306,9 +1306,6 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1329,9 +1326,6 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1398,9 +1392,6 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1451,9 +1442,6 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index a996116..2fb6678 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,20 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 184d37a..055a818 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -42,11 +42,7 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
-static List *build_path_tlist(PlannerInfo *root, Path *path);
-static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
-static void disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path);
 static Plan *create_gating_plan(PlannerInfo *root, Plan *plan, List *quals);
 static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path);
 static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path);
@@ -77,23 +73,20 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomPlan *create_custom_plan(PlannerInfo *root,
+									  CustomPath *best_path);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
 					  Plan *outer_plan, Plan *inner_plan);
 static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
-static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
 static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
 static void process_subquery_nestloop_params(PlannerInfo *root,
 								 List *subplan_params);
 static List *fix_indexqual_references(PlannerInfo *root, IndexPath *index_path);
 static List *fix_indexorderby_references(PlannerInfo *root, IndexPath *index_path);
 static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol);
-static List *get_switched_clauses(List *clauses, Relids outerrelids);
-static List *order_qual_clauses(PlannerInfo *root, List *clauses);
-static void copy_path_costsize(Plan *dest, Path *src);
-static void copy_plan_costsize(Plan *dest, Plan *src);
 static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
 static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 			   Oid indexid, List *indexqual, List *indexqualorig,
@@ -215,7 +208,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -261,6 +254,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 			plan = create_unique_plan(root,
 									  (UniquePath *) best_path);
 			break;
+		case T_CustomPlan:
+			plan = (Plan *) create_custom_plan(root, (CustomPath *) best_path);
+			break;
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -430,7 +426,7 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 /*
  * Build a target list (ie, a list of TargetEntry) for the Path's output.
  */
-static List *
+List *
 build_path_tlist(PlannerInfo *root, Path *path)
 {
 	RelOptInfo *rel = path->parent;
@@ -466,7 +462,7 @@ build_path_tlist(PlannerInfo *root, Path *path)
  *		Decide whether to use a tlist matching relation structure,
  *		rather than only those Vars actually referenced.
  */
-static bool
+bool
 use_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
 {
 	int			i;
@@ -526,7 +522,7 @@ use_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
  * undo the decision made by use_physical_tlist().	Currently, Hash, Sort,
  * and Material nodes want this, so they don't have to store useless columns.
  */
-static void
+void
 disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
 {
 	/* Only need to undo it for path types handled by create_scan_plan() */
@@ -569,7 +565,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
  * in most cases we have only a very bad idea of the probability of the gating
  * qual being true.
  */
-static Plan *
+Plan *
 create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
 {
 	List	   *pseudoconstants;
@@ -1072,6 +1068,26 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
 	return plan;
 }
 
+/*
+ * create_custom_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomPlan *
+create_custom_plan(PlannerInfo *root, CustomPath *best_path)
+{
+	CustomPlan	   *cplan;
+
+	/* Populate CustomPlan according to the CustomPath */
+	Assert(best_path->methods->CreateCustomPlan != NULL);
+	cplan = best_path->methods->CreateCustomPlan(root, best_path);
+	Assert(IsA(cplan, CustomPlan) || IsA(cplan, CustomPlanMarkPos));
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&cplan->plan, &best_path->path);
+
+	return cplan;
+}
 
 /*****************************************************************************
  *
@@ -2006,7 +2022,6 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
-
 /*****************************************************************************
  *
  *	JOIN METHODS
@@ -2540,7 +2555,7 @@ create_hashjoin_plan(PlannerInfo *root,
  * root->curOuterRels are replaced by Params, and entries are added to
  * root->curOuterParams if not already present.
  */
-static Node *
+Node *
 replace_nestloop_params(PlannerInfo *root, Node *expr)
 {
 	/* No setup needed for tree walk, so away we go */
@@ -3023,7 +3038,7 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol)
  *	  touched; a modified list is returned.  We do, however, set the transient
  *	  outer_is_left field in each RestrictInfo to show which side was which.
  */
-static List *
+List *
 get_switched_clauses(List *clauses, Relids outerrelids)
 {
 	List	   *t_list = NIL;
@@ -3089,7 +3104,7 @@ get_switched_clauses(List *clauses, Relids outerrelids)
  * instead of bare clauses.  It's OK because we only sort by cost, but
  * a cost/selectivity combination would likely do the wrong thing.
  */
-static List *
+List *
 order_qual_clauses(PlannerInfo *root, List *clauses)
 {
 	typedef struct
@@ -3156,7 +3171,7 @@ order_qual_clauses(PlannerInfo *root, List *clauses)
  * Copy cost and size info from a Path node to the Plan node created from it.
  * The executor usually won't use this info, but it's needed by EXPLAIN.
  */
-static void
+void
 copy_path_costsize(Plan *dest, Path *src)
 {
 	if (src)
@@ -3179,7 +3194,7 @@ copy_path_costsize(Plan *dest, Path *src)
  * Copy cost and size info from a lower plan node to an inserted node.
  * (Most callers alter the info after copying it.)
  */
-static void
+void
 copy_plan_costsize(Plan *dest, Plan *src)
 {
 	if (src)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 46affe7..e0fd9a2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -86,7 +87,6 @@ static void add_rtes_to_flat_rtable(PlannerInfo *root, bool recursing);
 static void flatten_unplanned_rtes(PlannerGlobal *glob, RangeTblEntry *rte);
 static bool flatten_rtes_walker(Node *node, PlannerGlobal *glob);
 static void add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte);
-static Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
 static Plan *set_indexonlyscan_references(PlannerInfo *root,
 							 IndexOnlyScan *plan,
 							 int rtoffset);
@@ -94,7 +94,6 @@ static Plan *set_subqueryscan_references(PlannerInfo *root,
 							SubqueryScan *plan,
 							int rtoffset);
 static bool trivial_subqueryscan(SubqueryScan *plan);
-static Node *fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset);
 static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
 static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
 static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
@@ -419,7 +418,7 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
 /*
  * set_plan_refs: recurse through the Plan nodes of a single subquery level
  */
-static Plan *
+Plan *
 set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 {
 	ListCell   *l;
@@ -576,6 +575,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			{
+				CustomPlan	   *cplan = (CustomPlan *) plan;
+
+				/*
+				 * Extension is responsible to handle set-reference
+				 * correctly.
+				 */
+				Assert(cplan->methods->SetCustomPlanRef != NULL);
+				cplan->methods->SetCustomPlanRef(root,
+												 cplan,
+												 rtoffset);
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
@@ -1057,7 +1072,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
@@ -1126,7 +1141,7 @@ fix_expr_common(PlannerInfo *root, Node *node)
  * looking up operator opcode info for OpExpr and related nodes,
  * and adding OIDs from regclass Const nodes into root->glob->relationOids.
  */
-static Node *
+Node *
 fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset)
 {
 	fix_scan_expr_context context;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a3f3583..6b0c762 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -75,12 +75,8 @@ static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 static Node *replace_correlation_vars_mutator(Node *node, PlannerInfo *root);
 static Node *process_sublinks_mutator(Node *node,
 						 process_sublinks_context *context);
-static Bitmapset *finalize_plan(PlannerInfo *root,
-			  Plan *plan,
-			  Bitmapset *valid_params,
-			  Bitmapset *scan_params);
-static bool finalize_primnode(Node *node, finalize_primnode_context *context);
-
+static bool finalize_primnode_walker(Node *node,
+									 finalize_primnode_context *context);
 
 /*
  * Select a PARAM_EXEC number to identify the given Var as a parameter for
@@ -2045,7 +2041,7 @@ SS_finalize_plan(PlannerInfo *root, Plan *plan, bool attach_initplans)
  * The return value is the computed allParam set for the given Plan node.
  * This is just an internal notational convenience.
  */
-static Bitmapset *
+Bitmapset *
 finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			  Bitmapset *scan_params)
 {
@@ -2070,15 +2066,15 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 	 */
 
 	/* Find params in targetlist and qual */
-	finalize_primnode((Node *) plan->targetlist, &context);
-	finalize_primnode((Node *) plan->qual, &context);
+	finalize_primnode_walker((Node *) plan->targetlist, &context);
+	finalize_primnode_walker((Node *) plan->qual, &context);
 
 	/* Check additional node-type-specific fields */
 	switch (nodeTag(plan))
 	{
 		case T_Result:
-			finalize_primnode(((Result *) plan)->resconstantqual,
-							  &context);
+			finalize_primnode_walker(((Result *) plan)->resconstantqual,
+									 &context);
 			break;
 
 		case T_SeqScan:
@@ -2086,10 +2082,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_IndexScan:
-			finalize_primnode((Node *) ((IndexScan *) plan)->indexqual,
-							  &context);
-			finalize_primnode((Node *) ((IndexScan *) plan)->indexorderby,
-							  &context);
+			finalize_primnode_walker((Node *)((IndexScan *)plan)->indexqual,
+									 &context);
+			finalize_primnode_walker((Node *)((IndexScan *)plan)->indexorderby,
+									 &context);
 
 			/*
 			 * we need not look at indexqualorig, since it will have the same
@@ -2100,10 +2096,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_IndexOnlyScan:
-			finalize_primnode((Node *) ((IndexOnlyScan *) plan)->indexqual,
-							  &context);
-			finalize_primnode((Node *) ((IndexOnlyScan *) plan)->indexorderby,
-							  &context);
+			finalize_primnode_walker((Node *)((IndexOnlyScan *) plan)->indexqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((IndexOnlyScan *) plan)->indexorderby,
+									 &context);
 
 			/*
 			 * we need not look at indextlist, since it cannot contain Params.
@@ -2112,8 +2108,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_BitmapIndexScan:
-			finalize_primnode((Node *) ((BitmapIndexScan *) plan)->indexqual,
-							  &context);
+			finalize_primnode_walker((Node *) ((BitmapIndexScan *) plan)->indexqual,
+									&context);
 
 			/*
 			 * we need not look at indexqualorig, since it will have the same
@@ -2122,14 +2118,14 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_BitmapHeapScan:
-			finalize_primnode((Node *) ((BitmapHeapScan *) plan)->bitmapqualorig,
-							  &context);
+			finalize_primnode_walker((Node *) ((BitmapHeapScan *) plan)->bitmapqualorig,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
 		case T_TidScan:
-			finalize_primnode((Node *) ((TidScan *) plan)->tidquals,
-							  &context);
+			finalize_primnode_walker((Node *) ((TidScan *) plan)->tidquals,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
@@ -2167,7 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 					funccontext = context;
 					funccontext.paramids = NULL;
 
-					finalize_primnode(rtfunc->funcexpr, &funccontext);
+					finalize_primnode_walker(rtfunc->funcexpr, &funccontext);
 
 					/* remember results for execution */
 					rtfunc->funcparams = funccontext.paramids;
@@ -2183,8 +2179,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_ValuesScan:
-			finalize_primnode((Node *) ((ValuesScan *) plan)->values_lists,
-							  &context);
+			finalize_primnode_walker((Node *) ((ValuesScan *) plan)->values_lists,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
@@ -2231,11 +2227,24 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_ForeignScan:
-			finalize_primnode((Node *) ((ForeignScan *) plan)->fdw_exprs,
-							  &context);
+			finalize_primnode_walker((Node *)((ForeignScan *) plan)->fdw_exprs,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomPlan:
+			{
+				CustomPlan *cplan = (CustomPlan *) plan;
+
+				if (cplan->methods->FinalizeCustomPlan)
+					cplan->methods->FinalizeCustomPlan(root,
+													   cplan,
+													   &context.paramids,
+													   &valid_params,
+													   &scan_params);
+			}
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
@@ -2247,8 +2256,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 											  locally_added_param);
 				scan_params = bms_add_member(bms_copy(scan_params),
 											 locally_added_param);
-				finalize_primnode((Node *) mtplan->returningLists,
-								  &context);
+				finalize_primnode_walker((Node *) mtplan->returningLists,
+										 &context);
 				foreach(l, mtplan->plans)
 				{
 					context.paramids =
@@ -2329,8 +2338,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			{
 				ListCell   *l;
 
-				finalize_primnode((Node *) ((Join *) plan)->joinqual,
-								  &context);
+				finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+										 &context);
 				/* collect set of params that will be passed to right child */
 				foreach(l, ((NestLoop *) plan)->nestParams)
 				{
@@ -2343,24 +2352,24 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_MergeJoin:
-			finalize_primnode((Node *) ((Join *) plan)->joinqual,
-							  &context);
-			finalize_primnode((Node *) ((MergeJoin *) plan)->mergeclauses,
-							  &context);
+			finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((MergeJoin *) plan)->mergeclauses,
+									 &context);
 			break;
 
 		case T_HashJoin:
-			finalize_primnode((Node *) ((Join *) plan)->joinqual,
-							  &context);
-			finalize_primnode((Node *) ((HashJoin *) plan)->hashclauses,
+			finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((HashJoin *) plan)->hashclauses,
 							  &context);
 			break;
 
 		case T_Limit:
-			finalize_primnode(((Limit *) plan)->limitOffset,
-							  &context);
-			finalize_primnode(((Limit *) plan)->limitCount,
-							  &context);
+			finalize_primnode_walker(((Limit *) plan)->limitOffset,
+									 &context);
+			finalize_primnode_walker(((Limit *) plan)->limitCount,
+									 &context);
 			break;
 
 		case T_RecursiveUnion:
@@ -2381,10 +2390,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_WindowAgg:
-			finalize_primnode(((WindowAgg *) plan)->startOffset,
-							  &context);
-			finalize_primnode(((WindowAgg *) plan)->endOffset,
-							  &context);
+			finalize_primnode_walker(((WindowAgg *) plan)->startOffset,
+									 &context);
+			finalize_primnode_walker(((WindowAgg *) plan)->endOffset,
+									 &context);
 			break;
 
 		case T_Hash:
@@ -2473,8 +2482,21 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
  * finalize_primnode: add IDs of all PARAM_EXEC params appearing in the given
  * expression tree to the result set.
  */
+Bitmapset *
+finalize_primnode(PlannerInfo *root, Node *node, Bitmapset *paramids)
+{
+	finalize_primnode_context	context;
+
+	context.root = root;
+	context.paramids = paramids;
+
+	finalize_primnode_walker(node, &context);
+
+	return context.paramids;
+}
+
 static bool
-finalize_primnode(Node *node, finalize_primnode_context *context)
+finalize_primnode_walker(Node *node, finalize_primnode_context *context)
 {
 	if (node == NULL)
 		return false;
@@ -2496,7 +2518,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 		Bitmapset  *subparamids;
 
 		/* Recurse into the testexpr, but not into the Plan */
-		finalize_primnode(subplan->testexpr, context);
+		finalize_primnode_walker(subplan->testexpr, context);
 
 		/*
 		 * Remove any param IDs of output parameters of the subplan that were
@@ -2513,7 +2535,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 		}
 
 		/* Also examine args list */
-		finalize_primnode((Node *) subplan->args, context);
+		finalize_primnode_walker((Node *) subplan->args, context);
 
 		/*
 		 * Add params needed by the subplan to paramids, but excluding those
@@ -2528,7 +2550,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 
 		return false;			/* no more to do here */
 	}
-	return expression_tree_walker(node, finalize_primnode,
+	return expression_tree_walker(node, finalize_primnode_walker,
 								  (void *) context);
 }
 
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 566b4c9..934d796 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -5292,6 +5292,25 @@ get_utility_query_def(Query *query, deparse_context *context)
 	}
 }
 
+/*
+ * GetSpecialCustomVar
+ *
+ * Utility routine to call optional GetSpecialCustomVar method of
+ * CustomPlanState
+ */
+static Node *
+GetSpecialCustomVar(PlanState *ps, Var *varnode)
+{
+	CustomPlanState *cps = (CustomPlanState *) ps;
+
+	Assert(IsA(ps, CustomPlanState));
+	Assert(IS_SPECIAL_VARNO(varnode->varno));
+
+	if (cps->methods->GetSpecialCustomVar)
+		return (Node *)cps->methods->GetSpecialCustomVar(cps, varnode);
+
+	return NULL;
+}
 
 /*
  * Display a Var appropriately.
@@ -5323,6 +5342,7 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 	deparse_columns *colinfo;
 	char	   *refname;
 	char	   *attname;
+	Node	   *expr;
 
 	/* Find appropriate nesting depth */
 	netlevelsup = var->varlevelsup + levelsup;
@@ -5345,6 +5365,22 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 		colinfo = deparse_columns_fetch(var->varno, dpns);
 		attnum = var->varattno;
 	}
+	else if (IS_SPECIAL_VARNO(var->varno) &&
+			 IsA(dpns->planstate, CustomPlanState) &&
+			 (expr = GetSpecialCustomVar(dpns->planstate, var)) != NULL)
+	{
+		/*
+		 * Force parentheses because our caller probably assumed a Var is a
+		 * simple expression.
+		 */
+		if (!IsA(expr, Var))
+			appendStringInfoChar(buf, '(');
+		get_rule_expr((Node *) expr, context, true);
+		if (!IsA(expr, Var))
+			appendStringInfoChar(buf, ')');
+
+		return NULL;
+	}
 	else if (var->varno == OUTER_VAR && dpns->outer_tlist)
 	{
 		TargetEntry *tle;
@@ -5633,6 +5669,26 @@ get_name_for_var_field(Var *var, int fieldno,
 		rte = rt_fetch(var->varno, dpns->rtable);
 		attnum = var->varattno;
 	}
+	else if (IS_SPECIAL_VARNO(var->varno) &&
+			 IsA(dpns->planstate, CustomPlanState) &&
+			 (expr = GetSpecialCustomVar(dpns->planstate, var)) != NULL)
+	{
+		StringInfo		saved = context->buf;
+		StringInfoData	temp;
+
+		initStringInfo(&temp);
+		context->buf = &temp;
+
+		if (!IsA(expr, Var))
+			appendStringInfoChar(context->buf, '(');
+		get_rule_expr((Node *) expr, context, true);
+		if (!IsA(expr, Var))
+			appendStringInfoChar(context->buf, ')');
+
+		context->buf = saved;
+
+		return temp.data;
+	}
 	else if (var->varno == OUTER_VAR && dpns->outer_tlist)
 	{
 		TargetEntry *tle;
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3488be3..f914696 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -54,6 +54,7 @@ extern PGDLLIMPORT ExplainOneQuery_hook_type ExplainOneQuery_hook;
 typedef const char *(*explain_get_index_name_hook_type) (Oid indexId);
 extern PGDLLIMPORT explain_get_index_name_hook_type explain_get_index_name_hook;
 
+extern void ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used);
 
 extern void ExplainQuery(ExplainStmt *stmt, const char *queryString,
 			 ParamListInfo params, DestReceiver *dest);
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..e6e049e
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,30 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "nodes/plannodes.h"
+#include "nodes/execnodes.h"
+
+/*
+ * General executor code
+ */
+extern CustomPlanState *ExecInitCustomPlan(CustomPlan *custom_plan,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomPlan(CustomPlanState *cpstate);
+extern Node *MultiExecCustomPlan(CustomPlanState *cpstate);
+extern void ExecEndCustomPlan(CustomPlanState *cpstate);
+
+extern void ExecReScanCustomPlan(CustomPlanState *cpstate);
+extern void ExecCustomMarkPos(CustomPlanState *cpstate);
+extern void ExecCustomRestrPos(CustomPlanState *cpstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a301a08..8af5bf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1501,6 +1501,18 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomPlanState information
+ *
+ *		CustomPlan nodes are used to execute custom code within executor.
+ * ----------------
+ */
+typedef struct CustomPlanState
+{
+	PlanState	ps;
+	const CustomPlanMethods	   *methods;
+} CustomPlanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5b8df59..f4a1246 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,8 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomPlan,
+	T_CustomPlanMarkPos,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +109,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomPlanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +227,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
@@ -513,6 +517,8 @@ extern void *stringToNode(char *str);
  */
 extern void *copyObject(const void *obj);
 
+extern void CopyCustomPlanCommon(const Node *from, Node *newnode);
+
 /*
  * nodes/equalfuncs.c
  */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 38c039c..7468d4c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -15,8 +15,10 @@
 #define PLANNODES_H
 
 #include "access/sdir.h"
+#include "lib/stringinfo.h"
 #include "nodes/bitmapset.h"
 #include "nodes/primnodes.h"
+#include "nodes/relation.h"
 
 
 /* ----------------------------------------------------------------
@@ -479,6 +481,81 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomPlan node
+ * ----------------
+ */
+struct CustomPlanMethods;
+
+typedef struct CustomPlan
+{
+	Plan		plan;
+	const struct CustomPlanMethods *methods;
+} CustomPlan;
+
+/* almost same to CustomPlan, but support MarkPos/RestorePos */
+typedef CustomPlan CustomPlanMarkPos;
+
+/* not to include execnodes.h here */
+typedef struct CustomPlanState CustomPlanState;
+typedef struct EState EState;
+typedef struct ExplainState	ExplainState;
+typedef struct TupleTableSlot TupleTableSlot;
+
+typedef void (*SetCustomPlanRef_function)(PlannerInfo *root,
+										  CustomPlan *custom_plan,
+										  int rtoffset);
+typedef bool (*SupportCustomBackwardScan_function)(CustomPlan *custom_plan);
+typedef void (*FinalizeCustomPlan_function)(PlannerInfo *root,
+											CustomPlan *custom_plan,
+											Bitmapset **paramids,
+											Bitmapset **valid_params,
+											Bitmapset **scan_params);
+typedef CustomPlanState *(*BeginCustomPlan_function)(CustomPlan *custom_plan,
+													 EState *estate,
+													 int eflags);
+typedef TupleTableSlot *(*ExecCustomPlan_function)(CustomPlanState *cpstate);
+typedef Node *(*MultiExecCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*EndCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*ReScanCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*MarkPosCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*RestrPosCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*ExplainCustomPlanTargetRel_function)(CustomPlanState *cpstate,
+													ExplainState *es);
+typedef void (*ExplainCustomPlan_function)(CustomPlanState *cpstate,
+										   List *ancestors,
+										   ExplainState *es);
+typedef Bitmapset *(*GetRelidsCustomPlan_function)(CustomPlanState *cpstate);
+typedef Node *(*GetSpecialCustomVar_function)(CustomPlanState *cpstate,
+											  Var *varnode);
+typedef void (*TextOutCustomPlan_function)(StringInfo str,
+										   const CustomPlan *node);
+typedef CustomPlan *(*CopyCustomPlan_function)(const CustomPlan *from);
+
+typedef struct CustomPlanMethods
+{
+	const char						   *CustomName;
+	/* callbacks for the planner stage */
+	SetCustomPlanRef_function			SetCustomPlanRef;
+	SupportCustomBackwardScan_function	SupportBackwardScan;
+	FinalizeCustomPlan_function			FinalizeCustomPlan;
+	/* callbacks for the executor stage */
+	BeginCustomPlan_function			BeginCustomPlan;
+	ExecCustomPlan_function				ExecCustomPlan;
+	MultiExecCustomPlan_function		MultiExecCustomPlan;
+	EndCustomPlan_function				EndCustomPlan;
+	ReScanCustomPlan_function			ReScanCustomPlan;
+	MarkPosCustomPlan_function			MarkPosCustomPlan;
+	RestrPosCustomPlan_function			RestrPosCustomPlan;
+	/* callbacks for EXPLAIN */
+	ExplainCustomPlanTargetRel_function	ExplainCustomPlanTargetRel;
+	ExplainCustomPlan_function			ExplainCustomPlan;
+	GetRelidsCustomPlan_function		GetRelidsCustomPlan;
+	GetSpecialCustomVar_function		GetSpecialCustomVar;
+	/* callbacks for general node management */
+	TextOutCustomPlan_function			TextOutCustomPlan;
+	CopyCustomPlan_function				CopyCustomPlan;
+} CustomPlanMethods;
 
 /*
  * ==========
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c607b36..cbbf1e0 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
 #define RELATION_H
 
 #include "access/sdir.h"
+#include "lib/stringinfo.h"
 #include "nodes/params.h"
 #include "nodes/parsenodes.h"
 #include "storage/block.h"
@@ -878,6 +879,34 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_flags is a set of CUSTOM_* bits to control its behavior.
+ * custom_methods is a set of function pointers that are declared in
+ * CustomPathMethods structure; extension has to set up correctly.
+ */
+struct CustomPathMethods;
+
+typedef struct CustomPath
+{
+	Path		path;
+	const struct CustomPathMethods   *methods;
+} CustomPath;
+
+typedef struct CustomPlan CustomPlan;
+
+typedef CustomPlan *(*CreateCustomPlan_function)(PlannerInfo *root,
+												 CustomPath *custom_path);
+typedef void (*TextOutCustomPath_function)(StringInfo str, Node *node);
+
+typedef struct CustomPathMethods
+{
+	const char				   *CustomName;
+	CreateCustomPlan_function	CreateCustomPlan;
+	TextOutCustomPath_function	TextOutCustomPath;
+} CustomPathMethods;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9b22fda..3047d3d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,23 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 8bdb7db..28b89d9 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,10 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
+extern List *build_path_tlist(PlannerInfo *root, Path *path);
+extern bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
+extern void disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
@@ -86,6 +90,11 @@ extern ModifyTable *make_modifytable(PlannerInfo *root,
 				 List *withCheckOptionLists, List *returningLists,
 				 List *rowMarks, int epqParam);
 extern bool is_projection_capable_plan(Plan *plan);
+extern List *order_qual_clauses(PlannerInfo *root, List *clauses);
+extern List *get_switched_clauses(List *clauses, Relids outerrelids);
+extern void copy_path_costsize(Plan *dest, Path *src);
+extern void copy_plan_costsize(Plan *dest, Plan *src);
+extern Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
 
 /*
  * prototypes for plan/initsplan.c
@@ -127,6 +136,9 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
+extern Node *fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h
index 5607e98..138b60b 100644
--- a/src/include/optimizer/subselect.h
+++ b/src/include/optimizer/subselect.h
@@ -29,6 +29,13 @@ extern void SS_finalize_plan(PlannerInfo *root, Plan *plan,
 				 bool attach_initplans);
 extern Param *SS_make_initplan_from_plan(PlannerInfo *root, Plan *plan,
 					Oid resulttype, int32 resulttypmod, Oid resultcollation);
+extern Bitmapset *finalize_plan(PlannerInfo *root,
+								Plan *plan,
+								Bitmapset *valid_params,
+								Bitmapset *scan_params);
+extern Bitmapset *finalize_primnode(PlannerInfo *root,
+									Node *node,
+									Bitmapset *paramids);
 extern Param *assign_nestloop_param_var(PlannerInfo *root, Var *var);
 extern Param *assign_nestloop_param_placeholdervar(PlannerInfo *root,
 									 PlaceHolderVar *phv);

#96

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Kouhei Kaigai (#95)

Re: Custom Scan APIs (Re: Custom Plan node)

Hello,

* Definition of add_join_path_hook

I didn't have idea to improve the definition and location of this
hook, so it is still on the tail of the add_paths_to_joinrel().
Its definition was a bit adjusted according to the feedback on the
pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
from the argument list. Indeed, these are specific to the built-in
MergeJoin logic and easy to reproduce.

After the submission, I'm still investigating better way to put a hook
to add custom join paths.

Regarding to the comment from Tom:
| * The API for add_join_path_hook seems overcomplex, as well as too full
| of implementation details that should remain private to joinpath.c.
| I particularly object to passing the mergeclause lists, which seem unlikely
| to be of interest for non-mergejoin plan types anyway.
| More generally, it seems likely that this hook is at the wrong level of
| abstraction; it forces the hook function to concern itself with a lot of
| stuff about join legality and parameterization (which I note your patch3
| code fails to do; but that's not an optional thing).
|
The earlier half was already done. My trouble is the later portion.

The overall jobs of add_join_path_hook are below:
1. construction of parameterized path information; being saved at
param_source_rel and extra_lateral_rels.
2. Try to add mergejoin paths with underlying Sort node
3. Try to add mergejoin/nestloop paths without underlying Sort node
4. Try to add hashjoin paths

It seems to me the check for join legality and parameterization are
built within individual routines for each join algorithm.
(what does the "join legality check" actually mean?)

Probably, it makes sense to provide a common utility function to be
called back from the extension if we can find out a common part for
all the join logics, however, I don't have clear idea to cut off the
common portion. What's jobs can be done independent from the join
algorithm??

I'd like to see ideas around this issue. Of course, I also think it
is still an option to handle it by extension on the initial version.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Monday, March 17, 2014 9:29 AM
To: Kaigai Kouhei(海外浩平); Tom Lane
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hello,

I adjusted the custom-plan interface patch little bit for the cache-only
scan patch; that is a demonstration module for vacuum-page hook on top of
the custom-plan interface.

fix_scan_expr() looks to me useful for custom-plan providers that want to
implement its own relation scan logic, even though they can implement it
using fix_expr_common() being already exposed.

Also, I removed the hardcoded portion from the nodeCustom.c although, it
may make sense to provide a few template functions to be called by custom-
plan providers, that performs usual common jobs like construction of expr-
context, assignment of result-slot, open relations, and so on.
I though the idea during implementation of BeginCustomPlan handler.
(These template functions are not in the attached patch yet.) How about
your opinion?

The major portion of this patch is not changed from v10.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Wednesday, March 12, 2014 1:55 PM
To: Tom Lane
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
Robert Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hello,

The attached two patches are the revised custom-plan interface and
example usage that implements existing MergeJoin on top of this interface.

According to the discussion last week, I revised the portion where
custom-node is expected to perform a particular kind of task, like
scanning a relation, by putting polymorphism with a set of callbacks
set by custom-plan provider.
So, the core backend can handle this custom-plan node just an
abstracted plan-node with no anticipation.
Even though the subject of this message says "custom-scan", I'd like
to name the interface "custom-plan" instead, because it became fully
arbitrary of extension whether it scan on a particular relation.

Definition of CustomXXXX data types were simplified:

typedef struct CustomPath
{
Path path;
const struct CustomPathMethods *methods;
} CustomPath;

typedef struct CustomPlan
{
Plan plan;
const struct CustomPlanMethods *methods;
} CustomPlan;

typedef struct CustomPlanState
{
PlanState ps;
const CustomPlanMethods *methods;
} CustomPlanState;

Each types have a base class and a set of function pointers that
characterize the behavior of this custom-plan node.
In usual use-cases, extension is expected to extend these classes to
keep their private data fields needed to implement its own

functionalities.

Most of the methods are designed to work as a thin layer towards
existing planner / executor functions, so custom-plan provides has to
be responsible to implement its method to communicate with core
backend as built-in ones doing.

Regarding to the topic we discussed last week,

* CUSTOM_VAR has gone.
The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
command output (including column names being referenced) even if a
custom-plan node replaced a join but has no underlying subplans on

left/right subtrees.

A typical situation like this is a remote-join implementation that I
tried to extend postgres_fdw on top of the previous interface.
It retrieves a flat result set of the remote join execution, thus has
no subplan locally. On the other hand, EXPLAIN tries to find out
"actual" Var node from the underlying subplan if a Var node has
special varno (INNER/OUTER/INDEX).
I put a special method to solve the problem. GetSpecialCustomVar
method is called if a certain Var node of custom-plan has a special
varno, then custom-plan provider can inform the core backend an
expression node to be referenced by this Var node.
It allows to solve the column name without recursive walking on the
subtrees, so it enables a custom-plan node that replaces a part of

plan-tree.

This method is optional, so available to adopt existing way if
custom-plan provider does not do anything special.

* Functions to be exposed, from static declaration

Right now, static functions are randomly exposed on demand.
So, we need more investigation which functions are needed, and which
others are not.
According to my trial, the part-2 patch that is MergeJoin on top of
the custom-plan interface, class of functions that recursively walk on
subplan tree have to be exposed. Like, ExplainPreScanNode,
create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan.
In case when custom-plan performs like built-in Append node, it keeps
a list of sub-plans in its private field, so the core backend cannot
know existence of sub-plans, thus its unavailable to make subplan,
unavailable to output EXPLAIN and so on.
It does not make sense to reworking on the extension side again.
Also, createplan.c has many useful functions to construct plan-node,
however, most of them are static because all the built-in plan-node
are constructed by the routines in this file, we didn't need to expose
them to others. I think, functions in createplan.c being called by
create_xxxx_plan() functions to construct plan-node should be exposed
for extension's convenient.

* Definition of add_join_path_hook

I didn't have idea to improve the definition and location of this
hook, so it is still on the tail of the add_paths_to_joinrel().
Its definition was a bit adjusted according to the feedback on the
pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
from the argument list. Indeed, these are specific to the built-in
MergeJoin logic and easy to reproduce.

* Hook location of add_scan_path_hook

I moved the add_scan_path_hook and set_cheapest() into
set_base_rel_pathlists() from various caller locations;
set_xxxx_pathlist() functions typically.
It enabled to consolidate the location to add custom-path for base
relations.

* CustomMergeJoin as a proof-of-concept

The contrib module in the part-2 portion is, a merge-join
implementation on top of custom-plan interface, even though 99% of its
implementation is identical with built-in ones.
Its purpose is to demonstrate a custom join logic can be implemented
using custom-plan interface, even if custom-plan node has underlying
sub-plans unlike previous my examples.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, March 07, 2014 3:09 AM
To: Kaigai Kouhei(海外浩平)
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
Robert Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

I expected to include simple function pointers for copying and
text-output as follows:

typedef struct {
Plan plan;
:
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
} Custom;

I was thinking more like

typedef struct CustomPathFuncs {
const char *name; /* used for debugging purposes only */
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
... etc etc etc ...
} CustomPathFuncs;

typedef struct CustomPath {
Path path;
const CustomPathFuncs *funcs;
... maybe a few more fields here, but not too darn many ...
} CustomPath;

and similarly for CustomPlan.

The advantage of this way is it's very cheap for (what I expect will
be) the common case where an extension has a fixed set of support
functions for its custom paths and plans. It just declares a static
constant CustomPathFuncs struct, and puts a pointer to that into its

paths.

If an extension really needs to set the support functions on a
per-object basis, it can do this:

typdef struct MyCustomPath {
CustomPath cpath;
CustomPathFuncs funcs;
... more fields ...
} MyCustomPath;

and then initialization of a MyCustomPath would include

mypath->cpath.funcs = &mypath->funcs;
mypath->funcs.node_copy = MyCustomPathCopy;
... etc etc ...

In this case we're arguably wasting one pointer worth of space in
the path, but considering the number of function pointers such a
path will be carrying, I don't think that's much of an objection.

So? If you did that, then you wouldn't have renumbered the Vars
as INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at
all; if it is needed, then there would also be a need for an
additional tuple slot in executor contexts, which you haven't

provided.

For example, the enhanced postgres_fdw fetches the result set of
remote join query, thus a tuple contains the fields come from both

side.
In this case, what varno shall be suitable to put?

Not sure what we'd do for the general case, but CUSTOM_VAR isn't the

solution.
Consider for example a join where both tables supply columns named "id"
--- if you put them both in one tupledesc then there's no non-kluge
way to identify them.
Possibly the route to a solution involves adding another plan-node
callback function that ruleutils.c would use for printing Vars in
custom
join nodes.

Or maybe we could let the Vars keep their original RTE numbers,
though that would complicate life at execution time.

Anyway, if we're going to punt on add_join_path_hook for the time
being, this problem can probably be left to solve later. It won't
arise for simple table-scan cases, nor for single-input plan nodes
such

as sorts.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#97

Kouhei Kaigai

kaigai@ak.jp.nec.com

almost 12 years ago

In reply to: Kouhei Kaigai (#45)

2 attachment(s)

Re: Custom Scan APIs (Re: Custom Plan node)

Hello,

Because of patch conflict towards the latest master branch, I rebased
the custom-plan interface patches; no functional difference from the v11.

Brief summary of the current approach that has been revised from my
original submission through the discussion on pgsql-hackers:

The plannode was renamed to CustomPlan, instead of CustomScan, because
it dropped all the hardcoded portion that assumes the custom-node shall
perform as alternative scan or join method, because it prevents this
custom-node to perform as other stuff; like sort or append potentially.
According to the suggestion by Tom, I put a structure that contains
several function pointers on the new CustomPlan node, and extension will
allocate an object that extends CustomPlan node.
It looks like polymorphism in object oriented programming language.
The core backend knows abstracted set of methods defined in the
tables of function pointers, and extension can implement its own logic
on the callback, using private state on the extended object.

Some issues are still under discussion:
* Design of add_join_path_hook
Tom suggested that core backend can support to check join legality and
parameterization stuff, however, it looks to me existing code handles
join legality checks within the function of individual join logic.
So, I'm not certain whether we can have a common legality check for
all the (potential) alternative join implementation.
The part-2 is a demonstration module that implemented existing merge-
join logic, but on top of this interface.

* Functions to be exposed for extensions
Some utility functions useful to implement plan node are declared as
static functions, because most of the stuff are implemented within
createplan.c, setrefs.c and so on, thus these were not needed to
expose other stuff. However, extension will become a caller for these
functions due to custom-plan interface, even though they are implemented
out of the core.
In my investigation, class of functions that recursively walk on subplan
tree have to be exposed, like ExplainPreScanNode, create_plan_recurse,
set_plan_refs, fix_expr_common or finalize_plan.
Do we have other criteria to determine what function shall be exposed?

Any help and feedback are welcome.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Kaigai Kouhei(海外浩平)
Sent: Thursday, March 20, 2014 10:46 AM
To: Kaigai Kouhei(海外浩平); Tom Lane
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
Haas; PgHacker; Peter Eisentraut
Subject: RE: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hello,

* Definition of add_join_path_hook

I didn't have idea to improve the definition and location of this
hook, so it is still on the tail of the add_paths_to_joinrel().
Its definition was a bit adjusted according to the feedback on the
pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
from the argument list. Indeed, these are specific to the built-in
MergeJoin logic and easy to reproduce.

After the submission, I'm still investigating better way to put a hook to
add custom join paths.

Regarding to the comment from Tom:
| * The API for add_join_path_hook seems overcomplex, as well as too
| full of implementation details that should remain private to joinpath.c.
| I particularly object to passing the mergeclause lists, which seem
| unlikely to be of interest for non-mergejoin plan types anyway.
| More generally, it seems likely that this hook is at the wrong level
| of abstraction; it forces the hook function to concern itself with a
| lot of stuff about join legality and parameterization (which I note
| your patch3 code fails to do; but that's not an optional thing).
|
The earlier half was already done. My trouble is the later portion.

The overall jobs of add_join_path_hook are below:
1. construction of parameterized path information; being saved at
param_source_rel and extra_lateral_rels.
2. Try to add mergejoin paths with underlying Sort node 3. Try to add
mergejoin/nestloop paths without underlying Sort node 4. Try to add hashjoin
paths

It seems to me the check for join legality and parameterization are built
within individual routines for each join algorithm.
(what does the "join legality check" actually mean?)

Probably, it makes sense to provide a common utility function to be called
back from the extension if we can find out a common part for all the join
logics, however, I don't have clear idea to cut off the common portion.
What's jobs can be done independent from the join algorithm??

I'd like to see ideas around this issue. Of course, I also think it is still
an option to handle it by extension on the initial version.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Monday, March 17, 2014 9:29 AM
To: Kaigai Kouhei(海外浩平); Tom Lane
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
Robert Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hello,

I adjusted the custom-plan interface patch little bit for the
cache-only scan patch; that is a demonstration module for vacuum-page
hook on top of the custom-plan interface.

fix_scan_expr() looks to me useful for custom-plan providers that want
to implement its own relation scan logic, even though they can
implement it using fix_expr_common() being already exposed.

Also, I removed the hardcoded portion from the nodeCustom.c although,
it may make sense to provide a few template functions to be called by
custom- plan providers, that performs usual common jobs like
construction of expr- context, assignment of result-slot, open relations,

and so on.

I though the idea during implementation of BeginCustomPlan handler.
(These template functions are not in the attached patch yet.) How
about your opinion?

The major portion of this patch is not changed from v10.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei
Kaigai
Sent: Wednesday, March 12, 2014 1:55 PM
To: Tom Lane
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
Robert Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hello,

The attached two patches are the revised custom-plan interface and
example usage that implements existing MergeJoin on top of this

interface.

According to the discussion last week, I revised the portion where
custom-node is expected to perform a particular kind of task, like
scanning a relation, by putting polymorphism with a set of callbacks
set by custom-plan provider.
So, the core backend can handle this custom-plan node just an
abstracted plan-node with no anticipation.
Even though the subject of this message says "custom-scan", I'd like
to name the interface "custom-plan" instead, because it became fully
arbitrary of extension whether it scan on a particular relation.

Definition of CustomXXXX data types were simplified:

typedef struct CustomPath
{
Path path;
const struct CustomPathMethods *methods;
} CustomPath;

typedef struct CustomPlan
{
Plan plan;
const struct CustomPlanMethods *methods;
} CustomPlan;

typedef struct CustomPlanState
{
PlanState ps;
const CustomPlanMethods *methods;
} CustomPlanState;

Each types have a base class and a set of function pointers that
characterize the behavior of this custom-plan node.
In usual use-cases, extension is expected to extend these classes to
keep their private data fields needed to implement its own

functionalities.

Most of the methods are designed to work as a thin layer towards
existing planner / executor functions, so custom-plan provides has
to be responsible to implement its method to communicate with core
backend as built-in ones doing.

Regarding to the topic we discussed last week,

* CUSTOM_VAR has gone.
The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
command output (including column names being referenced) even if a
custom-plan node replaced a join but has no underlying subplans on

left/right subtrees.

A typical situation like this is a remote-join implementation that I
tried to extend postgres_fdw on top of the previous interface.
It retrieves a flat result set of the remote join execution, thus
has no subplan locally. On the other hand, EXPLAIN tries to find out
"actual" Var node from the underlying subplan if a Var node has
special varno (INNER/OUTER/INDEX).
I put a special method to solve the problem. GetSpecialCustomVar
method is called if a certain Var node of custom-plan has a special
varno, then custom-plan provider can inform the core backend an
expression node to be referenced by this Var node.
It allows to solve the column name without recursive walking on the
subtrees, so it enables a custom-plan node that replaces a part of

plan-tree.

This method is optional, so available to adopt existing way if
custom-plan provider does not do anything special.

* Functions to be exposed, from static declaration

Right now, static functions are randomly exposed on demand.
So, we need more investigation which functions are needed, and which
others are not.
According to my trial, the part-2 patch that is MergeJoin on top of
the custom-plan interface, class of functions that recursively walk
on subplan tree have to be exposed. Like, ExplainPreScanNode,
create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan.
In case when custom-plan performs like built-in Append node, it
keeps a list of sub-plans in its private field, so the core backend
cannot know existence of sub-plans, thus its unavailable to make
subplan, unavailable to output EXPLAIN and so on.
It does not make sense to reworking on the extension side again.
Also, createplan.c has many useful functions to construct plan-node,
however, most of them are static because all the built-in plan-node
are constructed by the routines in this file, we didn't need to
expose them to others. I think, functions in createplan.c being
called by
create_xxxx_plan() functions to construct plan-node should be
exposed for extension's convenient.

* Definition of add_join_path_hook

I didn't have idea to improve the definition and location of this
hook, so it is still on the tail of the add_paths_to_joinrel().
Its definition was a bit adjusted according to the feedback on the
pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
from the argument list. Indeed, these are specific to the built-in
MergeJoin logic and easy to reproduce.

* Hook location of add_scan_path_hook

I moved the add_scan_path_hook and set_cheapest() into
set_base_rel_pathlists() from various caller locations;
set_xxxx_pathlist() functions typically.
It enabled to consolidate the location to add custom-path for base
relations.

* CustomMergeJoin as a proof-of-concept

The contrib module in the part-2 portion is, a merge-join
implementation on top of custom-plan interface, even though 99% of
its implementation is identical with built-in ones.
Its purpose is to demonstrate a custom join logic can be implemented
using custom-plan interface, even if custom-plan node has underlying
sub-plans unlike previous my examples.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, March 07, 2014 3:09 AM
To: Kaigai Kouhei(海外浩平)
Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
Robert Haas; PgHacker; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:

I expected to include simple function pointers for copying and
text-output as follows:

typedef struct {
Plan plan;
:
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
} Custom;

I was thinking more like

typedef struct CustomPathFuncs {
const char *name; /* used for debugging purposes only */
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
... etc etc etc ...
} CustomPathFuncs;

typedef struct CustomPath {
Path path;
const CustomPathFuncs *funcs;
... maybe a few more fields here, but not too darn many ...
} CustomPath;

and similarly for CustomPlan.

The advantage of this way is it's very cheap for (what I expect
will
be) the common case where an extension has a fixed set of support
functions for its custom paths and plans. It just declares a
static constant CustomPathFuncs struct, and puts a pointer to that
into its

paths.

If an extension really needs to set the support functions on a
per-object basis, it can do this:

typdef struct MyCustomPath {
CustomPath cpath;
CustomPathFuncs funcs;
... more fields ...
} MyCustomPath;

and then initialization of a MyCustomPath would include

mypath->cpath.funcs = &mypath->funcs;
mypath->funcs.node_copy = MyCustomPathCopy;
... etc etc ...

In this case we're arguably wasting one pointer worth of space in
the path, but considering the number of function pointers such a
path will be carrying, I don't think that's much of an objection.

So? If you did that, then you wouldn't have renumbered the
Vars as INNER/OUTER. I don't believe that CUSTOM_VAR is
necessary at all; if it is needed, then there would also be a
need for an additional tuple slot in executor contexts, which
you haven't

provided.

For example, the enhanced postgres_fdw fetches the result set of
remote join query, thus a tuple contains the fields come from
both

side.

In this case, what varno shall be suitable to put?

Not sure what we'd do for the general case, but CUSTOM_VAR isn't
the

solution.

Consider for example a join where both tables supply columns named

"id"
--- if you put them both in one tupledesc then there's no
non-kluge way to identify them.
Possibly the route to a solution involves adding another plan-node
callback function that ruleutils.c would use for printing Vars in
custom
join nodes.

Or maybe we could let the Vars keep their original RTE numbers,
though that would complicate life at execution time.

Anyway, if we're going to punt on add_join_path_hook for the time
being, this problem can probably be left to solve later. It won't
arise for simple table-scan cases, nor for single-input plan nodes
such

as sorts.

regards, tom lane

Attachments:

pgsql-v9.4-custom-scan.part-2.v12.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v12.patchDownload

 contrib/custmj/Makefile            |   17 +
 contrib/custmj/createplan.c        |  435 +++++++++
 contrib/custmj/custmj.c            |  691 +++++++++++++++
 contrib/custmj/custmj.h            |  148 ++++
 contrib/custmj/expected/custmj.out |  378 ++++++++
 contrib/custmj/joinpath.c          |  988 +++++++++++++++++++++
 contrib/custmj/nodeMergejoin.c     | 1694 ++++++++++++++++++++++++++++++++++++
 contrib/custmj/setrefs.c           |  326 +++++++
 contrib/custmj/sql/custmj.sql      |   79 ++
 9 files changed, 4756 insertions(+)

diff --git a/contrib/custmj/Makefile b/contrib/custmj/Makefile
new file mode 100644
index 0000000..9b264d4
--- /dev/null
+++ b/contrib/custmj/Makefile
@@ -0,0 +1,17 @@
+# contrib/custmj/Makefile
+
+MODULE_big = custmj
+OBJS = custmj.o joinpath.o createplan.o setrefs.o nodeMergejoin.o
+
+REGRESS = custmj
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/custmj
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/custmj/createplan.c b/contrib/custmj/createplan.c
new file mode 100644
index 0000000..e522d73
--- /dev/null
+++ b/contrib/custmj/createplan.c
@@ -0,0 +1,435 @@
+/*-------------------------------------------------------------------------
+ *
+ * createplan.c
+ *	  Routines to create the desired plan for processing a query.
+ *	  Planning is complete, we just need to convert the selected
+ *	  Path into a Plan.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/plan/createplan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <limits.h>
+#include <math.h>
+
+#include "access/skey.h"
+#include "catalog/pg_class.h"
+#include "foreign/fdwapi.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/subselect.h"
+#include "optimizer/tlist.h"
+#include "optimizer/var.h"
+#include "parser/parse_clause.h"
+#include "parser/parsetree.h"
+#include "utils/lsyscache.h"
+#include "custmj.h"
+
+static MergeJoin *make_mergejoin(List *tlist,
+			   List *joinclauses, List *otherclauses,
+			   List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   Plan *lefttree, Plan *righttree,
+			   JoinType jointype);
+static Material *make_material(Plan *lefttree);
+
+/*
+ * create_gating_plan
+ *	  Deal with pseudoconstant qual clauses
+ *
+ * If the node's quals list includes any pseudoconstant quals, put them
+ * into a gating Result node atop the already-built plan.  Otherwise,
+ * return the plan as-is.
+ *
+ * Note that we don't change cost or size estimates when doing gating.
+ * The costs of qual eval were already folded into the plan's startup cost.
+ * Leaving the size alone amounts to assuming that the gating qual will
+ * succeed, which is the conservative estimate for planning upper queries.
+ * We certainly don't want to assume the output size is zero (unless the
+ * gating qual is actually constant FALSE, and that case is dealt with in
+ * clausesel.c).  Interpolating between the two cases is silly, because
+ * it doesn't reflect what will really happen at runtime, and besides which
+ * in most cases we have only a very bad idea of the probability of the gating
+ * qual being true.
+ */
+Plan *
+create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
+{
+	List	   *pseudoconstants;
+
+	/* Sort into desirable execution order while still in RestrictInfo form */
+	quals = order_qual_clauses(root, quals);
+
+	/* Pull out any pseudoconstant quals from the RestrictInfo list */
+	pseudoconstants = extract_actual_clauses(quals, true);
+
+	if (!pseudoconstants)
+		return plan;
+
+	return (Plan *) make_result(root,
+								plan->targetlist,
+								(Node *) pseudoconstants,
+								plan);
+}
+
+MergeJoin *
+create_mergejoin_plan(PlannerInfo *root,
+					  CustomMergePath *best_path,
+					  Plan *outer_plan,
+					  Plan *inner_plan)
+{
+	List	   *tlist = build_path_tlist(root, &best_path->cpath.path);
+	List	   *joinclauses;
+	List	   *otherclauses;
+	List	   *mergeclauses;
+	List	   *outerpathkeys;
+	List	   *innerpathkeys;
+	int			nClauses;
+	Oid		   *mergefamilies;
+	Oid		   *mergecollations;
+	int		   *mergestrategies;
+	bool	   *mergenullsfirst;
+	MergeJoin  *join_plan;
+	int			i;
+	ListCell   *lc;
+	ListCell   *lop;
+	ListCell   *lip;
+
+	/* Sort join qual clauses into best execution order */
+	/* NB: do NOT reorder the mergeclauses */
+	joinclauses = order_qual_clauses(root, best_path->joinrestrictinfo);
+
+	/* Get the join qual clauses (in plain expression form) */
+	/* Any pseudoconstant clauses are ignored here */
+	if (IS_OUTER_JOIN(best_path->jointype))
+	{
+		extract_actual_join_clauses(joinclauses,
+									&joinclauses, &otherclauses);
+	}
+	else
+	{
+		/* We can treat all clauses alike for an inner join */
+		joinclauses = extract_actual_clauses(joinclauses, false);
+		otherclauses = NIL;
+	}
+
+	/*
+	 * Remove the mergeclauses from the list of join qual clauses, leaving the
+	 * list of quals that must be checked as qpquals.
+	 */
+	mergeclauses = get_actual_clauses(best_path->path_mergeclauses);
+	joinclauses = list_difference(joinclauses, mergeclauses);
+
+	/*
+	 * Replace any outer-relation variables with nestloop params.  There
+	 * should not be any in the mergeclauses.
+	 */
+	if (best_path->cpath.path.param_info)
+	{
+		joinclauses = (List *)
+			replace_nestloop_params(root, (Node *) joinclauses);
+		otherclauses = (List *)
+			replace_nestloop_params(root, (Node *) otherclauses);
+	}
+
+	/*
+	 * Rearrange mergeclauses, if needed, so that the outer variable is always
+	 * on the left; mark the mergeclause restrictinfos with correct
+	 * outer_is_left status.
+	 */
+	mergeclauses = get_switched_clauses(best_path->path_mergeclauses,
+							 best_path->outerjoinpath->parent->relids);
+
+	/*
+	 * Create explicit sort nodes for the outer and inner paths if necessary.
+	 * Make sure there are no excess columns in the inputs if sorting.
+	 */
+	if (best_path->outersortkeys)
+	{
+		disuse_physical_tlist(root, outer_plan, best_path->outerjoinpath);
+		outer_plan = (Plan *)
+			make_sort_from_pathkeys(root,
+									outer_plan,
+									best_path->outersortkeys,
+									-1.0);
+		outerpathkeys = best_path->outersortkeys;
+	}
+	else
+		outerpathkeys = best_path->outerjoinpath->pathkeys;
+
+	if (best_path->innersortkeys)
+	{
+		disuse_physical_tlist(root, inner_plan, best_path->innerjoinpath);
+		inner_plan = (Plan *)
+			make_sort_from_pathkeys(root,
+									inner_plan,
+									best_path->innersortkeys,
+									-1.0);
+		innerpathkeys = best_path->innersortkeys;
+	}
+	else
+		innerpathkeys = best_path->innerjoinpath->pathkeys;
+
+	/*
+	 * If specified, add a materialize node to shield the inner plan from the
+	 * need to handle mark/restore.
+	 */
+	if (best_path->materialize_inner)
+	{
+		Plan	   *matplan = (Plan *) make_material(inner_plan);
+
+		/*
+		 * We assume the materialize will not spill to disk, and therefore
+		 * charge just cpu_operator_cost per tuple.  (Keep this estimate in
+		 * sync with final_cost_mergejoin.)
+		 */
+		copy_plan_costsize(matplan, inner_plan);
+		matplan->total_cost += cpu_operator_cost * matplan->plan_rows;
+
+		inner_plan = matplan;
+	}
+
+	/*
+	 * Compute the opfamily/collation/strategy/nullsfirst arrays needed by the
+	 * executor.  The information is in the pathkeys for the two inputs, but
+	 * we need to be careful about the possibility of mergeclauses sharing a
+	 * pathkey (compare find_mergeclauses_for_pathkeys()).
+	 */
+	nClauses = list_length(mergeclauses);
+	Assert(nClauses == list_length(best_path->path_mergeclauses));
+	mergefamilies = (Oid *) palloc(nClauses * sizeof(Oid));
+	mergecollations = (Oid *) palloc(nClauses * sizeof(Oid));
+	mergestrategies = (int *) palloc(nClauses * sizeof(int));
+	mergenullsfirst = (bool *) palloc(nClauses * sizeof(bool));
+
+	lop = list_head(outerpathkeys);
+	lip = list_head(innerpathkeys);
+	i = 0;
+	foreach(lc, best_path->path_mergeclauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		EquivalenceClass *oeclass;
+		EquivalenceClass *ieclass;
+		PathKey    *opathkey;
+		PathKey    *ipathkey;
+		EquivalenceClass *opeclass;
+		EquivalenceClass *ipeclass;
+		ListCell   *l2;
+
+		/* fetch outer/inner eclass from mergeclause */
+		Assert(IsA(rinfo, RestrictInfo));
+		if (rinfo->outer_is_left)
+		{
+			oeclass = rinfo->left_ec;
+			ieclass = rinfo->right_ec;
+		}
+		else
+		{
+			oeclass = rinfo->right_ec;
+			ieclass = rinfo->left_ec;
+		}
+		Assert(oeclass != NULL);
+		Assert(ieclass != NULL);
+
+		/*
+		 * For debugging purposes, we check that the eclasses match the paths'
+		 * pathkeys.  In typical cases the merge clauses are one-to-one with
+		 * the pathkeys, but when dealing with partially redundant query
+		 * conditions, we might have clauses that re-reference earlier path
+		 * keys.  The case that we need to reject is where a pathkey is
+		 * entirely skipped over.
+		 *
+		 * lop and lip reference the first as-yet-unused pathkey elements;
+		 * it's okay to match them, or any element before them.  If they're
+		 * NULL then we have found all pathkey elements to be used.
+		 */
+		if (lop)
+		{
+			opathkey = (PathKey *) lfirst(lop);
+			opeclass = opathkey->pk_eclass;
+			if (oeclass == opeclass)
+			{
+				/* fast path for typical case */
+				lop = lnext(lop);
+			}
+			else
+			{
+				/* redundant clauses ... must match something before lop */
+				foreach(l2, outerpathkeys)
+				{
+					if (l2 == lop)
+						break;
+					opathkey = (PathKey *) lfirst(l2);
+					opeclass = opathkey->pk_eclass;
+					if (oeclass == opeclass)
+						break;
+				}
+				if (oeclass != opeclass)
+					elog(ERROR, "outer pathkeys do not match mergeclauses");
+			}
+		}
+		else
+		{
+			/* redundant clauses ... must match some already-used pathkey */
+			opathkey = NULL;
+			opeclass = NULL;
+			foreach(l2, outerpathkeys)
+			{
+				opathkey = (PathKey *) lfirst(l2);
+				opeclass = opathkey->pk_eclass;
+				if (oeclass == opeclass)
+					break;
+			}
+			if (l2 == NULL)
+				elog(ERROR, "outer pathkeys do not match mergeclauses");
+		}
+
+		if (lip)
+		{
+			ipathkey = (PathKey *) lfirst(lip);
+			ipeclass = ipathkey->pk_eclass;
+			if (ieclass == ipeclass)
+			{
+				/* fast path for typical case */
+				lip = lnext(lip);
+			}
+			else
+			{
+				/* redundant clauses ... must match something before lip */
+				foreach(l2, innerpathkeys)
+				{
+					if (l2 == lip)
+						break;
+					ipathkey = (PathKey *) lfirst(l2);
+					ipeclass = ipathkey->pk_eclass;
+					if (ieclass == ipeclass)
+						break;
+				}
+				if (ieclass != ipeclass)
+					elog(ERROR, "inner pathkeys do not match mergeclauses");
+			}
+		}
+		else
+		{
+			/* redundant clauses ... must match some already-used pathkey */
+			ipathkey = NULL;
+			ipeclass = NULL;
+			foreach(l2, innerpathkeys)
+			{
+				ipathkey = (PathKey *) lfirst(l2);
+				ipeclass = ipathkey->pk_eclass;
+				if (ieclass == ipeclass)
+					break;
+			}
+			if (l2 == NULL)
+				elog(ERROR, "inner pathkeys do not match mergeclauses");
+		}
+
+		/* pathkeys should match each other too (more debugging) */
+		if (opathkey->pk_opfamily != ipathkey->pk_opfamily ||
+			opathkey->pk_eclass->ec_collation != ipathkey->pk_eclass->ec_collation ||
+			opathkey->pk_strategy != ipathkey->pk_strategy ||
+			opathkey->pk_nulls_first != ipathkey->pk_nulls_first)
+			elog(ERROR, "left and right pathkeys do not match in mergejoin");
+
+		/* OK, save info for executor */
+		mergefamilies[i] = opathkey->pk_opfamily;
+		mergecollations[i] = opathkey->pk_eclass->ec_collation;
+		mergestrategies[i] = opathkey->pk_strategy;
+		mergenullsfirst[i] = opathkey->pk_nulls_first;
+		i++;
+	}
+
+	/*
+	 * Note: it is not an error if we have additional pathkey elements (i.e.,
+	 * lop or lip isn't NULL here).  The input paths might be better-sorted
+	 * than we need for the current mergejoin.
+	 */
+
+	/*
+	 * Now we can build the mergejoin node.
+	 */
+	join_plan = make_mergejoin(tlist,
+							   joinclauses,
+							   otherclauses,
+							   mergeclauses,
+							   mergefamilies,
+							   mergecollations,
+							   mergestrategies,
+							   mergenullsfirst,
+							   outer_plan,
+							   inner_plan,
+							   best_path->jointype);
+
+	/* Costs of sort and material steps are included in path cost already */
+	copy_path_costsize(&join_plan->join.plan, &best_path->cpath.path);
+
+	return join_plan;
+}
+
+static MergeJoin *
+make_mergejoin(List *tlist,
+			   List *joinclauses,
+			   List *otherclauses,
+			   List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   Plan *lefttree,
+			   Plan *righttree,
+			   JoinType jointype)
+{
+	MergeJoin  *node = makeNode(MergeJoin);
+	Plan	   *plan = &node->join.plan;
+
+	/* cost should be inserted by caller */
+	plan->targetlist = tlist;
+	plan->qual = otherclauses;
+	plan->lefttree = lefttree;
+	plan->righttree = righttree;
+	node->mergeclauses = mergeclauses;
+	node->mergeFamilies = mergefamilies;
+	node->mergeCollations = mergecollations;
+	node->mergeStrategies = mergestrategies;
+	node->mergeNullsFirst = mergenullsfirst;
+	node->join.jointype = jointype;
+	node->join.joinqual = joinclauses;
+
+	return node;
+}
+
+static Material *
+make_material(Plan *lefttree)
+{
+	Material   *node = makeNode(Material);
+	Plan	   *plan = &node->plan;
+
+	/* cost should be inserted by caller */
+	plan->targetlist = lefttree->targetlist;
+	plan->qual = NIL;
+	plan->lefttree = lefttree;
+	plan->righttree = NULL;
+
+	return node;
+}
diff --git a/contrib/custmj/custmj.c b/contrib/custmj/custmj.c
new file mode 100644
index 0000000..ef64857
--- /dev/null
+++ b/contrib/custmj/custmj.c
@@ -0,0 +1,691 @@
+/* -------------------------------------------------------------------------
+ *
+ * contrib/custmj/custmj.c
+ *
+ * Custom version of MergeJoin - an example implementation of MergeJoin
+ * logic on top of Custom-Plan interface, to demonstrate how to use this
+ * interface for joining relations.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * -------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "commands/explain.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/nodeFuncs.h"
+#include "executor/executor.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/subselect.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "custmj.h"
+
+PG_MODULE_MAGIC;
+
+/* declaration of local variables */
+static add_join_path_hook_type	add_join_path_orig = NULL;
+bool		enable_custom_mergejoin;
+
+/* callback table of custom merge join */
+CustomPathMethods			custmj_path_methods;
+CustomPlanMethods			custmj_plan_methods;
+
+/*
+ * custmjAddJoinPath
+ *
+ * A callback function to add custom version of merge-join logic towards
+ * the supplied relations join.
+ */
+static void
+custmjAddJoinPath(PlannerInfo *root,
+				  RelOptInfo *joinrel,
+				  RelOptInfo *outerrel,
+				  RelOptInfo *innerrel,
+				  JoinType jointype,
+				  SpecialJoinInfo *sjinfo,
+				  List *restrictlist,
+				  Relids param_source_rels,
+				  Relids extra_lateral_rels)
+{
+	List	   *mergeclause_list = NIL;
+	bool		mergejoin_allowed = true;
+	SemiAntiJoinFactors semifactors;
+
+	if (add_join_path_orig)
+		(*add_join_path_orig)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  param_source_rels,
+							  extra_lateral_rels);
+	/* nothing to do anymore */
+	if (!enable_custom_mergejoin)
+		return;
+
+	/*
+	 * Find potential mergejoin clauses.
+	 */
+   	mergeclause_list = select_mergejoin_clauses(root,
+												joinrel,
+												outerrel,
+												innerrel,
+												restrictlist,
+												jointype,
+												&mergejoin_allowed);
+	if (!mergejoin_allowed)
+		return;
+
+	/*
+     * If it's SEMI or ANTI join, compute correction factors for cost
+     * estimation.  These will be the same for all paths.
+     */
+    if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+        compute_semi_anti_join_factors(root, outerrel, innerrel,
+                                       jointype, sjinfo, restrictlist,
+                                       &semifactors);
+
+	/*
+	 * 1. Consider mergejoin paths where both relations must be explicitly
+	 * sorted.  Skip this if we can't mergejoin.
+	 */
+	sort_inner_and_outer(root, joinrel, outerrel, innerrel,
+						 restrictlist, mergeclause_list, jointype,
+						 sjinfo,
+						 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 2. Consider paths where the outer relation need not be explicitly
+	 * sorted. This includes both nestloops and mergejoins where the outer
+	 * path is already ordered.  Again, skip this if we can't mergejoin.
+	 * (That's okay because we know that nestloop can't handle right/full
+	 * joins at all, so it wouldn't work in the prohibited cases either.)
+	 */
+	match_unsorted_outer(root, joinrel, outerrel, innerrel,
+						 restrictlist, mergeclause_list, jointype,
+						 sjinfo, &semifactors,
+						 param_source_rels, extra_lateral_rels);
+}
+
+/*
+ * CreateCustomMergeJoinPlan
+ *
+ * A method to populate CustomPlan node according to the supplied
+ * CustomPath node; being choosen by the planner.
+ */
+static CustomPlan *
+CreateCustomMergeJoinPlan(PlannerInfo *root, CustomPath *custom_path)
+{
+	CustomMergePath	   *cmpath = (CustomMergePath *) custom_path;
+	CustomMergeJoin	   *cmjoin;
+	MergeJoin		   *mjplan;
+	Plan			   *outer_plan;
+	Plan			   *inner_plan;
+
+	/* plans the underlying relations */
+	outer_plan = create_plan_recurse(root, cmpath->outerjoinpath);
+	inner_plan = create_plan_recurse(root, cmpath->innerjoinpath);
+
+	mjplan = create_mergejoin_plan(root, cmpath, outer_plan, inner_plan);
+
+	/*
+     * If there are any pseudoconstant clauses attached to this node, insert a
+     * gating Result node that evaluates the pseudoconstants as one-time
+     * quals.
+     */
+    if (root->hasPseudoConstantQuals)
+        mjplan = (MergeJoin *)
+			create_gating_plan(root, &mjplan->join.plan,
+							   cmpath->joinrestrictinfo);
+
+	/* construct a CustomMergeJoin plan */
+	cmjoin = palloc0(sizeof(CustomMergeJoin));
+	cmjoin->cplan.plan = mjplan->join.plan;
+	cmjoin->cplan.plan.type = T_CustomPlan;
+	cmjoin->cplan.methods = &custmj_plan_methods;
+	cmjoin->jointype = mjplan->join.jointype;
+	cmjoin->joinqual = mjplan->join.joinqual;
+	cmjoin->mergeclauses = mjplan->mergeclauses;
+	cmjoin->mergeFamilies = mjplan->mergeFamilies;
+	cmjoin->mergeCollations = mjplan->mergeCollations;
+	cmjoin->mergeStrategies = mjplan->mergeStrategies;
+	cmjoin->mergeNullsFirst = mjplan->mergeNullsFirst;
+	pfree(mjplan);
+
+	return &cmjoin->cplan;
+}
+
+/*
+ * TextOutCustomMergeJoinPath
+ *
+ * A method to support nodeToString for CustomPath node
+ */
+static void
+TextOutCustomMergeJoinPath(StringInfo str, Node *node)
+{
+	CustomMergePath	*cmpath = (CustomMergePath *) node;
+	char			*temp;
+
+	/* common fields should be dumped by the core backend */
+	Assert(cmpath->cpath.methods == &custmj_path_methods);
+	appendStringInfo(str, " :jointype %d", cmpath->jointype);
+	temp = nodeToString(cmpath->outerjoinpath);
+	appendStringInfo(str, " :outerjoinpath %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->innerjoinpath);
+	appendStringInfo(str, " :innerjoinpath %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->joinrestrictinfo);
+	appendStringInfo(str, " :joinrestrictinfo %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->path_mergeclauses);
+	appendStringInfo(str, " :path_mergeclauses %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->outersortkeys);
+	appendStringInfo(str, " :outersortkeys %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmpath->innersortkeys);
+	appendStringInfo(str, " :innersortkeys %s", temp);
+	pfree(temp);
+	appendStringInfo(str, " :materialize_inner %s",
+					 cmpath->materialize_inner ? "true" : "false");
+}
+
+/*
+ * SetCustomMergeJoinRef
+ *
+ * A method to adjust varno/varattno in the expression clauses.
+ */
+static void
+SetCustomMergeJoinRef(PlannerInfo *root,
+					  CustomPlan *custom_plan,
+					  int rtoffset)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *) custom_plan;
+	/* overall logic copied from set_join_references() */
+	Plan			*outer_plan = cmjoin->cplan.plan.lefttree;
+	Plan			*inner_plan = cmjoin->cplan.plan.righttree;
+	indexed_tlist	*outer_itlist;
+	indexed_tlist	*inner_itlist;
+
+	outer_itlist = build_tlist_index(outer_plan->targetlist);
+	inner_itlist = build_tlist_index(inner_plan->targetlist);
+
+	/* All join plans have tlist, qual, and joinqual */
+	cmjoin->cplan.plan.targetlist
+		= fix_join_expr(root,
+						cmjoin->cplan.plan.targetlist,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+	cmjoin->cplan.plan.qual
+		= fix_join_expr(root,
+						cmjoin->cplan.plan.qual,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+	cmjoin->joinqual
+		= fix_join_expr(root,
+						cmjoin->joinqual,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+
+	/* Now do join-type-specific stuff */
+	cmjoin->mergeclauses
+		= fix_join_expr(root,
+						cmjoin->mergeclauses,
+						outer_itlist,
+						inner_itlist,
+						(Index) 0,
+						rtoffset);
+
+	/*
+	 * outer_itlist is saved to test GetSpecialCustomVar method; that
+	 * shows actual Var node referenced by special varno in EXPLAIN
+	 * command.
+	 */
+	cmjoin->outer_itlist = outer_itlist;
+
+	pfree(inner_itlist);
+}
+
+/*
+ * FinalizeCustomMergePlan
+ *
+ * A method to 
+ */
+static void
+FinalizeCustomMergePlan(PlannerInfo *root,
+						CustomPlan *custom_plan,
+						Bitmapset **p_paramids,
+						Bitmapset **p_valid_params,
+						Bitmapset **p_scan_params)
+{
+	CustomMergeJoin	   *cmjoin = (CustomMergeJoin *) custom_plan;
+	Bitmapset  *paramids = *p_paramids;
+
+	paramids = finalize_primnode(root,
+								 (Node *) cmjoin->joinqual,
+								 paramids);
+	paramids = finalize_primnode(root,
+								 (Node *) cmjoin->mergeclauses,
+								 paramids);
+	*p_paramids = paramids;
+}
+
+/*
+ * BeginCustomMergeJoin
+ *
+ * A method to populate CustomPlanState node according to the supplied
+ * CustomPlan node, and initialize this execution node itself.
+ */
+static CustomPlanState *
+BeginCustomMergeJoin(CustomPlan *cplan, EState *estate, int eflags)
+{
+	CustomMergeJoin		   *cmplan = (CustomMergeJoin *) cplan;
+	CustomMergeJoinState   *cmjs = palloc0(sizeof(CustomMergeJoinState));
+	MergeJoinState		   *mjs;
+
+	mjs = _ExecInitMergeJoin(cmplan, estate, eflags);
+	cmjs->cps.ps = mjs->js.ps;
+	cmjs->cps.ps.type = T_CustomPlanState;
+	cmjs->cps.methods = &custmj_plan_methods;
+	cmjs->jointype = mjs->js.jointype;
+	cmjs->joinqual = mjs->js.joinqual;
+	cmjs->mj_NumClauses = mjs->mj_NumClauses;
+	cmjs->mj_Clauses = mjs->mj_Clauses;
+	cmjs->mj_JoinState = mjs->mj_JoinState;
+	cmjs->mj_ExtraMarks = mjs->mj_ExtraMarks;
+	cmjs->mj_ConstFalseJoin = mjs->mj_ConstFalseJoin;
+	cmjs->mj_FillOuter = mjs->mj_FillOuter;
+	cmjs->mj_FillInner = mjs->mj_FillInner;
+	cmjs->mj_MatchedOuter = mjs->mj_MatchedOuter;
+	cmjs->mj_MatchedInner = mjs->mj_MatchedInner;
+	cmjs->mj_OuterTupleSlot = mjs->mj_OuterTupleSlot;
+	cmjs->mj_InnerTupleSlot = mjs->mj_InnerTupleSlot;
+	cmjs->mj_MarkedTupleSlot = mjs->mj_MarkedTupleSlot;
+	cmjs->mj_NullOuterTupleSlot = mjs->mj_NullOuterTupleSlot;
+	cmjs->mj_NullInnerTupleSlot = mjs->mj_NullInnerTupleSlot;
+	cmjs->mj_OuterEContext = mjs->mj_OuterEContext;
+	cmjs->mj_InnerEContext = mjs->mj_InnerEContext;
+	pfree(mjs);
+
+	/*
+	 * MEMO: In case when a custom-plan node replace a join by a scan,
+	 * like a situation to implement remote-join stuff that receives
+	 * a joined relation and scan on it, the extension should adjust
+	 * varno / varattno of Var nodes in the targetlist of PlanState,
+	 * instead of Plan.
+	 * Because the executor evaluates expression nodes in the targetlist
+	 * of PlanState, but EXPLAIN command shows Var names according to
+	 * the targetlist of Plan, it shall not work if you adjusted the
+	 * targetlist to reference the ecxt_scantuple of ExprContext.
+	 */
+
+	return &cmjs->cps;
+}
+
+/*
+ * ExecCustomMergeJoin
+ *
+ * A method to run this execution node
+ */
+static TupleTableSlot *
+ExecCustomMergeJoin(CustomPlanState *node)
+{
+	return _ExecMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * EndCustomMergeJoin
+ *
+ * A method to end this execution node
+ */
+static void
+EndCustomMergeJoin(CustomPlanState *node)
+{
+	_ExecEndMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * ReScanCustomMergeJoin
+ *
+ * A method to rescan this execution node
+ */
+static void
+ReScanCustomMergeJoin(CustomPlanState *node)
+{
+	_ExecReScanMergeJoin((CustomMergeJoinState *) node);
+}
+
+/*
+ * ExplainCustomMergeJoinTargetRel
+ *
+ * A method to show target relation in EXPLAIN command.
+ */
+static void
+ExplainCustomMergeJoinTargetRel(CustomPlanState *node,
+								ExplainState *es)
+{
+	CustomMergeJoinState *cmjs = (CustomMergeJoinState *) node;
+	const char *jointype;
+
+	switch (cmjs->jointype)
+	{
+		case JOIN_INNER:
+			jointype = "Inner";
+			break;
+		case JOIN_LEFT:
+			jointype = "Left";
+			break;
+		case JOIN_FULL:
+			jointype = "Full";
+			break;
+		case JOIN_RIGHT:
+			jointype = "Right";
+			break;
+		case JOIN_SEMI:
+			jointype = "Semi";
+			break;
+		case JOIN_ANTI:
+			jointype = "Anti";
+			break;
+		default:
+			jointype = "???";
+			break;
+	}
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		if (cmjs->jointype != JOIN_INNER)
+			appendStringInfo(es->str, " %s Join", jointype);
+		else
+			appendStringInfoString(es->str, " Join");
+	}
+	else
+		ExplainPropertyText("Join Type", jointype, es);
+}
+
+/* a function copied from explain.c */
+static void
+show_upper_qual(List *qual, const char *qlabel,
+				PlanState *planstate, List *ancestors,
+				ExplainState *es)
+{
+	bool	useprefix = (list_length(es->rtable) > 1 || es->verbose);
+	Node   *node;
+	List   *context;
+    char   *exprstr;
+
+	/* No work if empty qual */
+	if (qual == NIL)
+		return;
+
+	/* Convert AND list to explicit AND */
+	node = (Node *) make_ands_explicit(qual);
+
+	/* And show it */
+	context = deparse_context_for_planstate((Node *) planstate,
+                                            ancestors,
+                                            es->rtable,
+                                            es->rtable_names);
+	exprstr = deparse_expression(node, context, useprefix, false);
+
+	ExplainPropertyText(qlabel, exprstr, es);
+}
+
+/* a function copied from explain.c */
+static void
+show_instrumentation_count(const char *qlabel, int which,
+                           PlanState *planstate, ExplainState *es)
+{
+	double		nfiltered;
+	double		nloops;
+
+	if (!es->analyze || !planstate->instrument)
+		return;
+
+	if (which == 2)
+		nfiltered = planstate->instrument->nfiltered2;
+	else
+		nfiltered = planstate->instrument->nfiltered1;
+	nloops = planstate->instrument->nloops;
+
+	/* In text mode, suppress zero counts; they're not interesting enough */
+	if (nfiltered > 0 || es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		if (nloops > 0)
+			ExplainPropertyFloat(qlabel, nfiltered / nloops, 0, es);
+		else
+			ExplainPropertyFloat(qlabel, 0.0, 0, es);
+	}
+}
+
+/*
+ * ExplainCustomMergeJoin
+ *
+ * A method to construct EXPLAIN output.
+ */
+static void
+ExplainCustomMergeJoin(CustomPlanState *node,
+					   List *ancestors,
+					   ExplainState *es)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *)node->ps.plan;
+
+	show_upper_qual(cmjoin->mergeclauses,
+					"Merge Cond", &node->ps, ancestors, es);
+	show_upper_qual(cmjoin->joinqual,
+					"Join Filter", &node->ps, ancestors, es);
+	if (cmjoin->joinqual)
+		show_instrumentation_count("Rows Removed by Join Filter", 1,
+								   &node->ps, es);
+	show_upper_qual(cmjoin->cplan.plan.qual,
+					"Filter", &node->ps, ancestors, es);
+	if (cmjoin->cplan.plan.qual)
+		show_instrumentation_count("Rows Removed by Filter", 2,
+								   &node->ps, es);
+}
+
+/*
+ * GetRelidsCustomMergeJoin
+ *
+ * A method to inform underlying range-table indexes.
+ */
+static Bitmapset *
+GetRelidsCustomMergeJoin(CustomPlanState *node)
+{
+	Bitmapset  *result = NULL;
+
+	if (outerPlanState(&node->ps))
+		ExplainPreScanNode(outerPlanState(&node->ps), &result);
+	if (innerPlanState(&node->ps))
+		ExplainPreScanNode(innerPlanState(&node->ps), &result);
+
+	return result;
+}
+
+/*
+ * GetSpecialCustomMergeVar
+ *
+ * Test handler of GetSpecialCustomVar method.
+ * In case when a custom-plan node replaced a join node but does not have
+ * two underlying sub-plan, like a remote join feature that retrieves one
+ * flat result set, EXPLAIN command cannot resolve name of the columns
+ * being referenced by special varno (INNER_VAR, OUTER_VAR or INDEX_VAR)
+ * because it tries to walk on the underlying sub-plan to be thre.
+ * However, such kind of custom-plan node does not have, because it replaces
+ * a part of plan sub-tree by one custom-plan node. In this case, custom-
+ * plan provider has to return an expression node that is referenced by
+ * the Var node with special varno.
+ */
+static Node *
+GetSpecialCustomMergeVar(CustomPlanState *cpstate, Var *varnode)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *)cpstate->ps.plan;
+	indexed_tlist	*itlist;
+	int		i;
+
+	if (varnode->varno != OUTER_VAR)
+		return NULL;
+
+	itlist = cmjoin->outer_itlist;
+	for (i=0; i < itlist->num_vars; i++)
+	{
+		if (itlist->vars[i].resno == varnode->varattno)
+		{
+			Var	   *newnode = copyObject(varnode);
+
+			newnode->varno = itlist->vars[i].varno;
+			newnode->varattno = itlist->vars[i].varattno;
+
+			elog(DEBUG2, "%s: (OUTER_VAR,%d) is reference to (%d,%d)",
+				 __FUNCTION__,
+				 varnode->varattno, newnode->varno, newnode->varattno);
+
+			return (Node *) newnode;
+		}
+	}
+	elog(ERROR, "outer_itlist has no entry for Var: %s",
+		 nodeToString(varnode));
+	return NULL;
+}
+
+/*
+ * TextOutCustomMergeJoin
+ *		nodeToString() support in CustomMergeJoin
+ */
+static void
+TextOutCustomMergeJoin(StringInfo str, const CustomPlan *node)
+{
+	CustomMergeJoin	*cmjoin = (CustomMergeJoin *) node;
+	char   *temp;
+	int		i, num;
+
+	/* common fields should be dumped by the core backend */
+	Assert(cmjoin->cplan.methods == &custmj_plan_methods);
+	appendStringInfo(str, " :jointype %d", cmjoin->jointype);
+	temp = nodeToString(cmjoin->joinqual);
+	appendStringInfo(str, " :joinqual %s", temp);
+	pfree(temp);
+	temp = nodeToString(cmjoin->mergeclauses);
+	appendStringInfo(str, " :mergeclauses %s", temp);
+	pfree(temp);
+
+	num = list_length(cmjoin->mergeclauses);
+	appendStringInfoString(str, " :mergeFamilies");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %u", cmjoin->mergeFamilies[i]);
+	appendStringInfoString(str, " :mergeCollations");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %u", cmjoin->mergeCollations[i]);
+	appendStringInfoString(str, " :mergeStrategies");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %d", cmjoin->mergeStrategies[i]);
+	appendStringInfoString(str, " :mergeNullsFirst");
+	for (i=0; i < num; i++)
+		appendStringInfo(str, " %d", (int) cmjoin->mergeNullsFirst[i]);
+}
+
+/*
+ * CopyCustomMergeJoin
+ *		copyObject() support in CustomMergeJoin
+ */
+static CustomPlan *
+CopyCustomMergeJoin(const CustomPlan *from)
+{
+	const CustomMergeJoin *oldnode = (const CustomMergeJoin *) from;
+	CustomMergeJoin *newnode  = palloc(sizeof(CustomMergeJoin));
+	int		num;
+
+	/* copying the common fields */
+	CopyCustomPlanCommon((const Node *) oldnode, (Node *) newnode);
+
+	newnode->jointype = oldnode->jointype;
+	newnode->joinqual = copyObject(oldnode->joinqual);
+	newnode->mergeclauses = copyObject(oldnode->mergeclauses);
+	num = list_length(oldnode->mergeclauses);
+	newnode->mergeFamilies = palloc(sizeof(Oid) * num);
+	memcpy(newnode->mergeFamilies,
+		   oldnode->mergeFamilies,
+		   sizeof(Oid) * num);
+	newnode->mergeCollations = palloc(sizeof(Oid) * num);
+	memcpy(newnode->mergeCollations,
+		   oldnode->mergeCollations,
+		   sizeof(Oid) * num);
+	newnode->mergeStrategies = palloc(sizeof(int) * num);
+	memcpy(newnode->mergeStrategies,
+		   oldnode->mergeStrategies,
+		   sizeof(int) * num);
+	newnode->mergeNullsFirst = palloc(sizeof(bool) * num);
+	memcpy(newnode->mergeNullsFirst,
+		   oldnode->mergeNullsFirst,
+		   sizeof(bool) * num);
+	num = oldnode->outer_itlist->num_vars;
+	newnode->outer_itlist = palloc(offsetof(indexed_tlist, vars[num]));
+	memcpy(newnode->outer_itlist,
+		   oldnode->outer_itlist,
+		   offsetof(indexed_tlist, vars[num]));
+
+	return &newnode->cplan;
+}
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	/* "custnl.enabled" to control availability of this module */
+	DefineCustomBoolVariable("enable_custom_mergejoin",
+							 "enables the planner's use of custom merge join",
+							 NULL,
+							 &enable_custom_mergejoin,
+							 true,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, NULL, NULL);
+
+	/* methods of CustomMergeJoinPath */
+	memset(&custmj_path_methods, 0, sizeof(CustomPathMethods));
+	custmj_path_methods.CustomName = "CustomMergeJoin";
+	custmj_path_methods.CreateCustomPlan = CreateCustomMergeJoinPlan;
+	custmj_path_methods.TextOutCustomPath = TextOutCustomMergeJoinPath;
+
+	/* methods of CustomMergeJoinPlan */
+	memset(&custmj_plan_methods, 0, sizeof(CustomPlanMethods));
+	custmj_plan_methods.CustomName = "CustomMergeJoin";
+	custmj_plan_methods.SetCustomPlanRef = SetCustomMergeJoinRef;
+	custmj_plan_methods.SupportBackwardScan = NULL;
+	custmj_plan_methods.FinalizeCustomPlan = FinalizeCustomMergePlan;
+	custmj_plan_methods.BeginCustomPlan = BeginCustomMergeJoin;
+	custmj_plan_methods.ExecCustomPlan = ExecCustomMergeJoin;
+	custmj_plan_methods.EndCustomPlan = EndCustomMergeJoin;
+	custmj_plan_methods.ReScanCustomPlan = ReScanCustomMergeJoin;
+	custmj_plan_methods.ExplainCustomPlanTargetRel
+		= ExplainCustomMergeJoinTargetRel;
+	custmj_plan_methods.ExplainCustomPlan = ExplainCustomMergeJoin;
+	custmj_plan_methods.GetRelidsCustomPlan = GetRelidsCustomMergeJoin;
+	custmj_plan_methods.GetSpecialCustomVar = GetSpecialCustomMergeVar;
+	custmj_plan_methods.TextOutCustomPlan = TextOutCustomMergeJoin;
+	custmj_plan_methods.CopyCustomPlan = CopyCustomMergeJoin;
+
+	/* hook registration */
+	add_join_path_orig = add_join_path_hook;
+	add_join_path_hook = custmjAddJoinPath;
+
+	elog(INFO, "MergeJoin logic on top of CustomPlan interface");
+}
diff --git a/contrib/custmj/custmj.h b/contrib/custmj/custmj.h
new file mode 100644
index 0000000..732bbff
--- /dev/null
+++ b/contrib/custmj/custmj.h
@@ -0,0 +1,148 @@
+/*
+ * definitions related to custom version of merge join
+ */
+#ifndef CUSTMJ_H
+#define CUSTMJ_H
+#include "nodes/nodes.h"
+#include "nodes/plannodes.h"
+#include "nodes/relation.h"
+
+typedef struct
+{
+	CustomPath	cpath;
+	/* fields come from JoinPath */
+	JoinType    jointype;
+    Path       *outerjoinpath;  /* path for the outer side of the join */
+    Path       *innerjoinpath;  /* path for the inner side of the join */
+    List       *joinrestrictinfo;       /* RestrictInfos to apply to join */
+	/* fields come from MergePath */
+	List       *path_mergeclauses;      /* join clauses to be used for merge */
+	List       *outersortkeys;  /* keys for explicit sort, if any */
+	List       *innersortkeys;  /* keys for explicit sort, if any */
+	bool        materialize_inner;      /* add Materialize to inner? */
+} CustomMergePath;
+
+struct indexed_tlist;
+
+typedef struct
+{
+	CustomPlan	cplan;
+	/* fields come from Join */
+	JoinType	jointype;
+	List	   *joinqual;
+	/* fields come from MergeJoin */
+	List	   *mergeclauses;   /* mergeclauses as expression trees */
+	/* these are arrays, but have the same length as the mergeclauses list: */
+	Oid		   *mergeFamilies;  /* per-clause OIDs of btree opfamilies */
+	Oid		   *mergeCollations;    /* per-clause OIDs of collations */
+	int		   *mergeStrategies;    /* per-clause ordering (ASC or DESC) */
+	bool	   *mergeNullsFirst;    /* per-clause nulls ordering */
+	/* for transvar testing */
+	struct indexed_tlist *outer_itlist;
+} CustomMergeJoin;
+
+typedef struct
+{
+	CustomPlanState	cps;
+	/* fields come from JoinState */
+	JoinType	jointype;
+	List	   *joinqual;		/* JOIN quals (in addition to ps.qual) */
+	/* fields come from MergeJoinState */
+	int			mj_NumClauses;
+	MergeJoinClause mj_Clauses; /* array of length mj_NumClauses */
+	int			mj_JoinState;
+	bool		mj_ExtraMarks;
+	bool		mj_ConstFalseJoin;
+	bool		mj_FillOuter;
+	bool		mj_FillInner;
+	bool		mj_MatchedOuter;
+	bool		mj_MatchedInner;
+	TupleTableSlot *mj_OuterTupleSlot;
+	TupleTableSlot *mj_InnerTupleSlot;
+	TupleTableSlot *mj_MarkedTupleSlot;
+	TupleTableSlot *mj_NullOuterTupleSlot;
+	TupleTableSlot *mj_NullInnerTupleSlot;
+	ExprContext *mj_OuterEContext;
+	ExprContext *mj_InnerEContext;
+} CustomMergeJoinState;
+
+/* custmj.c */
+extern bool						enable_custom_mergejoin;
+extern CustomPathMethods		custmj_path_methods;
+extern CustomPlanMethods		custmj_plan_methods;
+
+extern void	_PG_init(void);
+
+/* joinpath.c */
+extern List *select_mergejoin_clauses(PlannerInfo *root,
+									  RelOptInfo *joinrel,
+									  RelOptInfo *outerrel,
+									  RelOptInfo *innerrel,
+									  List *restrictlist,
+									  JoinType jointype,
+									  bool *mergejoin_allowed);
+
+extern void sort_inner_and_outer(PlannerInfo *root,
+								 RelOptInfo *joinrel,
+								 RelOptInfo *outerrel,
+								 RelOptInfo *innerrel,
+								 List *restrictlist,
+								 List *mergeclause_list,
+								 JoinType jointype,
+								 SpecialJoinInfo *sjinfo,
+								 Relids param_source_rels,
+								 Relids extra_lateral_rels);
+
+extern void match_unsorted_outer(PlannerInfo *root,
+								 RelOptInfo *joinrel,
+								 RelOptInfo *outerrel,
+								 RelOptInfo *innerrel,
+								 List *restrictlist,
+								 List *mergeclause_list,
+								 JoinType jointype,
+								 SpecialJoinInfo *sjinfo,
+								 SemiAntiJoinFactors *semifactors,
+								 Relids param_source_rels,
+								 Relids extra_lateral_rels);
+
+/* createplan.c */
+extern MergeJoin *create_mergejoin_plan(PlannerInfo *root,
+										CustomMergePath *best_path,
+										Plan *outer_plan,
+										Plan *inner_plan);
+extern Plan *create_gating_plan(PlannerInfo *root, Plan *plan, List *quals);
+
+/* setrefs.c */
+typedef struct tlist_vinfo
+{
+	Index		varno;			/* RT index of Var */
+	AttrNumber	varattno;		/* attr number of Var */
+	AttrNumber	resno;			/* TLE position of Var */
+} tlist_vinfo;
+
+typedef struct indexed_tlist
+{
+	List	   *tlist;			/* underlying target list */
+	int			num_vars;		/* number of plain Var tlist entries */
+	bool		has_ph_vars;	/* are there PlaceHolderVar entries? */
+	bool		has_non_vars;	/* are there other entries? */
+	/* array of num_vars entries: */
+	tlist_vinfo vars[1];		/* VARIABLE LENGTH ARRAY */
+} indexed_tlist;				/* VARIABLE LENGTH STRUCT */
+
+extern indexed_tlist *build_tlist_index(List *tlist);
+extern List *fix_join_expr(PlannerInfo *root,
+						   List *clauses,
+						   indexed_tlist *outer_itlist,
+						   indexed_tlist *inner_itlist,
+						   Index acceptable_rel,
+						   int rtoffset);
+/* nodeMergejoin.c */
+extern MergeJoinState *_ExecInitMergeJoin(CustomMergeJoin *node,
+										  EState *estate,
+										  int eflags);
+extern TupleTableSlot *_ExecMergeJoin(CustomMergeJoinState *node);
+extern void _ExecEndMergeJoin(CustomMergeJoinState *node);
+extern void _ExecReScanMergeJoin(CustomMergeJoinState *node);
+
+#endif	/* CUSTMJ_H */
diff --git a/contrib/custmj/expected/custmj.out b/contrib/custmj/expected/custmj.out
new file mode 100644
index 0000000..19ba188
--- /dev/null
+++ b/contrib/custmj/expected/custmj.out
@@ -0,0 +1,378 @@
+-- regression test for custmj extension
+--
+-- initial setup
+--
+CREATE TABLE t1 (a int, b text);
+CREATE TABLE t2 (x int, y text);
+CREATE TABLE t3 (n int primary key, m text);
+CREATE TABLE t4 (s int references t3(n), t text);
+INSERT INTO t1 (SELECT x, md5(x::text) FROM generate_series(  1,600) x);
+INSERT INTO t2 (SELECT x, md5(x::text) FROM generate_series(401,800) x);
+INSERT INTO t3 (SELECT x, md5(x::text) FROM generate_series(  1,800) x);
+INSERT INTO t4 (SELECT x, md5(x::text) FROM generate_series(201,600) x);
+VACUUM ANALYZE t1;
+VACUUM ANALYZE t2;
+VACUUM ANALYZE t3;
+VACUUM ANALYZE t4;
+-- LOAD this extension
+LOAD 'custmj';
+INFO:  MergeJoin logic on top of CustomPlan interface
+--
+-- explain output
+--
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Hash Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Hash Cond: (t1.a = t2.x)
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+   ->  Hash
+         Output: t2.x, t2.y
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Hash Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Hash Cond: (t1.a = t2.x)
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+   ->  Hash
+         Output: t2.x, t2.y
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+            QUERY PLAN             
+-----------------------------------
+ Hash Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Hash Cond: (t3.n = t4.s)
+   ->  Seq Scan on public.t3
+         Output: t3.n, t3.m
+   ->  Hash
+         Output: t4.s, t4.t
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(9 rows)
+
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+            QUERY PLAN             
+-----------------------------------
+ Hash Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Hash Cond: (t3.n = t4.s)
+   ->  Seq Scan on public.t3
+         Output: t3.n, t3.m
+   ->  Hash
+         Output: t4.s, t4.t
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(9 rows)
+
+-- force off hash_join
+SET enable_hashjoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Merge Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO bmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Merge Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO bmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO bmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO bmj4 FROM t3 FULL JOIN t4 ON n = s;
+-- force off built-in merge_join
+SET enable_mergejoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+            QUERY PLAN             
+-----------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO cmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+             QUERY PLAN             
+------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
+SELECT * INTO cmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO cmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+                 QUERY PLAN                  
+---------------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t3.n, t3.m, t4.s, t4.t
+   Merge Cond: (t3.n = t4.s)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+   ->  Sort
+         Output: t4.s, t4.t
+         Sort Key: t4.s
+         ->  Seq Scan on public.t4
+               Output: t4.s, t4.t
+(10 rows)
+
+SELECT * INTO cmj4 FROM t3 FULL JOIN t4 ON n = s;
+-- compare the difference of simple result
+SELECT * FROM bmj1 EXCEPT SELECT * FROM cmj1;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj1 EXCEPT SELECT * FROM bmj1;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj2 EXCEPT SELECT * FROM cmj2;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj2 EXCEPT SELECT * FROM bmj2;
+ a | b | x | y 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj3 EXCEPT SELECT * FROM cmj3;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj3 EXCEPT SELECT * FROM bmj3;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM bmj4 EXCEPT SELECT * FROM cmj4;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+SELECT * FROM cmj4 EXCEPT SELECT * FROM bmj4;
+ n | m | s | t 
+---+---+---+---
+(0 rows)
+
+-- a little bit complicated
+EXPLAIN (verbose, costs off)
+  SELECT (a + x + n) % s AS c1, md5(b || y || m || t) AS c2
+  FROM ((t1 join t2 on a = x) join t3 on y = m) join t4 on n = s
+  WHERE b like '%ab%' AND y like '%cd%' AND m like t;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Nested Loop
+   Output: (((t1.a + t2.x) + t3.n) % t4.s), md5((((t1.b || t2.y) || t3.m) || t4.t))
+   Join Filter: (t2.x = t1.a)
+   ->  Nested Loop
+         Output: t2.x, t2.y, t3.n, t3.m, t4.s, t4.t
+         Join Filter: (t3.m = t2.y)
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+               Filter: (t2.y ~~ '%cd%'::text)
+         ->  Materialize
+               Output: t3.n, t3.m, t4.s, t4.t
+               ->  Custom (CustomMergeJoin) Join
+                     Output: t3.n, t3.m, t4.s, t4.t
+                     Merge Cond: (t3.n = t4.s)
+                     Join Filter: (t3.m ~~ t4.t)
+                     ->  Index Scan using t3_pkey on public.t3
+                           Output: t3.n, t3.m
+                     ->  Sort
+                           Output: t4.s, t4.t
+                           Sort Key: t4.s
+                           ->  Seq Scan on public.t4
+                                 Output: t4.s, t4.t
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+         Filter: (t1.b ~~ '%ab%'::text)
+(25 rows)
+
+PREPARE p1(int,int) AS
+SELECT * FROM t1 JOIN t3 ON a = n WHERE n BETWEEN $1 AND $2;
+EXPLAIN (verbose, costs off) EXECUTE p1(100,100);
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Nested Loop
+   Output: t1.a, t1.b, t3.n, t3.m
+   Join Filter: (t1.a = t3.n)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+         Index Cond: ((t3.n >= 100) AND (t3.n <= 100))
+   ->  Seq Scan on public.t1
+         Output: t1.a, t1.b
+(8 rows)
+
+EXPLAIN (verbose, costs off) EXECUTE p1(100,1000);
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t3.n, t3.m
+   Merge Cond: (t3.n = t1.a)
+   ->  Index Scan using t3_pkey on public.t3
+         Output: t3.n, t3.m
+         Index Cond: ((t3.n >= 100) AND (t3.n <= 1000))
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+(11 rows)
+
+EXPLAIN (verbose, costs off)
+SELECT * FROM t1 JOIN t2 ON a = x WHERE x IN (SELECT n % 100 FROM t3);
+                   QUERY PLAN                   
+------------------------------------------------
+ Custom (CustomMergeJoin) Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t2.x = t1.a)
+   ->  Custom (CustomMergeJoin) Semi Join
+         Output: t2.x, t2.y, t3.n
+         Merge Cond: (t2.x = ((t3.n % 100)))
+         ->  Sort
+               Output: t2.x, t2.y
+               Sort Key: t2.x
+               ->  Seq Scan on public.t2
+                     Output: t2.x, t2.y
+         ->  Sort
+               Output: t3.n, ((t3.n % 100))
+               Sort Key: ((t3.n % 100))
+               ->  Seq Scan on public.t3
+                     Output: t3.n, (t3.n % 100)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+(21 rows)
+
+-- check GetSpecialCustomVar stuff
+SET client_min_messages = debug;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,1) is reference to (1,1)
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,2) is reference to (1,2)
+DEBUG:  GetSpecialCustomMergeVar: (OUTER_VAR,1) is reference to (1,1)
+             QUERY PLAN             
+------------------------------------
+ Custom (CustomMergeJoin) Full Join
+   Output: t1.a, t1.b, t2.x, t2.y
+   Merge Cond: (t1.a = t2.x)
+   ->  Sort
+         Output: t1.a, t1.b
+         Sort Key: t1.a
+         ->  Seq Scan on public.t1
+               Output: t1.a, t1.b
+   ->  Sort
+         Output: t2.x, t2.y
+         Sort Key: t2.x
+         ->  Seq Scan on public.t2
+               Output: t2.x, t2.y
+(13 rows)
+
diff --git a/contrib/custmj/joinpath.c b/contrib/custmj/joinpath.c
new file mode 100644
index 0000000..9ef940b
--- /dev/null
+++ b/contrib/custmj/joinpath.c
@@ -0,0 +1,988 @@
+/*-------------------------------------------------------------------------
+ *
+ * joinpath.c
+ *	  Routines to find all possible paths for processing a set of joins
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/joinpath.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <math.h>
+
+#include "executor/executor.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "custmj.h"
+
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
+
+#define PATH_PARAM_BY_REL(path, rel)  \
+	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+
+/*
+ * try_nestloop_path
+ *	  Consider a nestloop join path; if it appears useful, push it into
+ *	  the joinrel's pathlist via add_path().
+ */
+static void
+try_nestloop_path(PlannerInfo *root,
+				  RelOptInfo *joinrel,
+				  JoinType jointype,
+				  SpecialJoinInfo *sjinfo,
+				  SemiAntiJoinFactors *semifactors,
+				  Relids param_source_rels,
+				  Relids extra_lateral_rels,
+				  Path *outer_path,
+				  Path *inner_path,
+				  List *restrict_clauses,
+				  List *pathkeys)
+{
+	Relids		required_outer;
+	JoinCostWorkspace workspace;
+
+	/*
+	 * Check to see if proposed path is still parameterized, and reject if the
+	 * parameterization wouldn't be sensible.
+	 */
+	required_outer = calc_nestloop_required_outer(outer_path,
+												  inner_path);
+	if (required_outer &&
+		!bms_overlap(required_outer, param_source_rels))
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+		return;
+	}
+
+	/*
+	 * Independently of that, add parameterization needed for any
+	 * PlaceHolderVars that need to be computed at the join.
+	 */
+	required_outer = bms_add_members(required_outer, extra_lateral_rels);
+
+	/*
+	 * Do a precheck to quickly eliminate obviously-inferior paths.  We
+	 * calculate a cheap lower bound on the path's cost and then use
+	 * add_path_precheck() to see if the path is clearly going to be dominated
+	 * by some existing path for the joinrel.  If not, do the full pushup with
+	 * creating a fully valid path structure and submitting it to add_path().
+	 * The latter two steps are expensive enough to make this two-phase
+	 * methodology worthwhile.
+	 */
+	initial_cost_nestloop(root, &workspace, jointype,
+						  outer_path, inner_path,
+						  sjinfo, semifactors);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		add_path(joinrel, (Path *)
+				 create_nestloop_path(root,
+									  joinrel,
+									  jointype,
+									  &workspace,
+									  sjinfo,
+									  semifactors,
+									  outer_path,
+									  inner_path,
+									  restrict_clauses,
+									  pathkeys,
+									  required_outer));
+	}
+	else
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+	}
+}
+
+/*
+ * try_mergejoin_path
+ *	  Consider a merge join path; if it appears useful, push it into
+ *	  the joinrel's pathlist via add_path().
+ */
+static void
+try_mergejoin_path(PlannerInfo *root,
+				   RelOptInfo *joinrel,
+				   JoinType jointype,
+				   SpecialJoinInfo *sjinfo,
+				   Relids param_source_rels,
+				   Relids extra_lateral_rels,
+				   Path *outer_path,
+				   Path *inner_path,
+				   List *restrict_clauses,
+				   List *pathkeys,
+				   List *mergeclauses,
+				   List *outersortkeys,
+				   List *innersortkeys)
+{
+	Relids		required_outer;
+	JoinCostWorkspace workspace;
+
+	/*
+	 * Check to see if proposed path is still parameterized, and reject if the
+	 * parameterization wouldn't be sensible.
+	 */
+	required_outer = calc_non_nestloop_required_outer(outer_path,
+													  inner_path);
+	if (required_outer &&
+		!bms_overlap(required_outer, param_source_rels))
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+		return;
+	}
+
+	/*
+	 * Independently of that, add parameterization needed for any
+	 * PlaceHolderVars that need to be computed at the join.
+	 */
+	required_outer = bms_add_members(required_outer, extra_lateral_rels);
+
+	/*
+	 * If the given paths are already well enough ordered, we can skip doing
+	 * an explicit sort.
+	 */
+	if (outersortkeys &&
+		pathkeys_contained_in(outersortkeys, outer_path->pathkeys))
+		outersortkeys = NIL;
+	if (innersortkeys &&
+		pathkeys_contained_in(innersortkeys, inner_path->pathkeys))
+		innersortkeys = NIL;
+
+	/*
+	 * See comments in try_nestloop_path().
+	 */
+	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
+						   outer_path, inner_path,
+						   outersortkeys, innersortkeys,
+						   sjinfo);
+
+	if (add_path_precheck(joinrel,
+						  workspace.startup_cost, workspace.total_cost,
+						  pathkeys, required_outer))
+	{
+		/* KG: adjust to create CustomMergePath, instead of MergePath */
+		CustomMergePath	   *cmpath;
+		MergePath		   *mpath
+			= create_mergejoin_path(root,
+									joinrel,
+									jointype,
+									&workspace,
+									sjinfo,
+									outer_path,
+									inner_path,
+									restrict_clauses,
+									pathkeys,
+									required_outer,
+									mergeclauses,
+									outersortkeys,
+									innersortkeys);
+
+		/* adjust cost according to enable_(custom)_mergejoin GUCs */
+		if (!enable_mergejoin && enable_custom_mergejoin)
+		{
+			mpath->jpath.path.startup_cost -= disable_cost;
+			mpath->jpath.path.total_cost -= disable_cost;
+		}
+		else if (enable_mergejoin && !enable_custom_mergejoin)
+		{
+			mpath->jpath.path.startup_cost += disable_cost;
+			mpath->jpath.path.total_cost += disable_cost;
+		}
+
+		/* construct CustomMergePath object */
+		cmpath = palloc0(sizeof(CustomMergePath));
+		cmpath->cpath.path = mpath->jpath.path;
+		cmpath->cpath.path.type = T_CustomPath;
+		cmpath->cpath.path.pathtype = T_CustomPlan;
+		cmpath->cpath.methods = &custmj_path_methods;
+		cmpath->jointype = mpath->jpath.jointype;
+		cmpath->outerjoinpath = mpath->jpath.outerjoinpath;
+		cmpath->innerjoinpath = mpath->jpath.innerjoinpath;
+		cmpath->joinrestrictinfo = mpath->jpath.joinrestrictinfo;
+		cmpath->path_mergeclauses = mpath->path_mergeclauses;
+		cmpath->outersortkeys = mpath->outersortkeys;
+		cmpath->innersortkeys = mpath->innersortkeys;
+		cmpath->materialize_inner = mpath->materialize_inner;
+
+		add_path(joinrel, &cmpath->cpath.path);
+	}
+	else
+	{
+		/* Waste no memory when we reject a path here */
+		bms_free(required_outer);
+	}
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.	If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static inline bool
+clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
+						RelOptInfo *innerrel)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrel->relids) &&
+		bms_is_subset(rinfo->right_relids, innerrel->relids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrel->relids) &&
+			 bms_is_subset(rinfo->right_relids, outerrel->relids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+/*
+ * sort_inner_and_outer
+ *	  Create mergejoin join paths by explicitly sorting both the outer and
+ *	  inner join relations on each available merge ordering.
+ *
+ * 'joinrel' is the join relation
+ * 'outerrel' is the outer join relation
+ * 'innerrel' is the inner join relation
+ * 'restrictlist' contains all of the RestrictInfo nodes for restriction
+ *		clauses that apply to this join
+ * 'mergeclause_list' is a list of RestrictInfo nodes for available
+ *		mergejoin clauses in this join
+ * 'jointype' is the type of join to do
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'param_source_rels' are OK targets for parameterization of result paths
+ * 'extra_lateral_rels' are additional parameterization for result paths
+ */
+void
+sort_inner_and_outer(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	Path	   *outer_path;
+	Path	   *inner_path;
+	List	   *all_pathkeys;
+	ListCell   *l;
+
+	/*
+	 * We only consider the cheapest-total-cost input paths, since we are
+	 * assuming here that a sort is required.  We will consider
+	 * cheapest-startup-cost input paths later, and only if they don't need a
+	 * sort.
+	 *
+	 * This function intentionally does not consider parameterized input
+	 * paths, except when the cheapest-total is parameterized.	If we did so,
+	 * we'd have a combinatorial explosion of mergejoin paths of dubious
+	 * value.  This interacts with decisions elsewhere that also discriminate
+	 * against mergejoins with parameterized inputs; see comments in
+	 * src/backend/optimizer/README.
+	 */
+	outer_path = outerrel->cheapest_total_path;
+	inner_path = innerrel->cheapest_total_path;
+
+	/*
+	 * If either cheapest-total path is parameterized by the other rel, we
+	 * can't use a mergejoin.  (There's no use looking for alternative input
+	 * paths, since these should already be the least-parameterized available
+	 * paths.)
+	 */
+	if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
+		PATH_PARAM_BY_REL(inner_path, outerrel))
+		return;
+
+	/*
+	 * If unique-ification is requested, do it and then handle as a plain
+	 * inner join.
+	 */
+	if (jointype == JOIN_UNIQUE_OUTER)
+	{
+		outer_path = (Path *) create_unique_path(root, outerrel,
+												 outer_path, sjinfo);
+		Assert(outer_path);
+		jointype = JOIN_INNER;
+	}
+	else if (jointype == JOIN_UNIQUE_INNER)
+	{
+		inner_path = (Path *) create_unique_path(root, innerrel,
+												 inner_path, sjinfo);
+		Assert(inner_path);
+		jointype = JOIN_INNER;
+	}
+
+	/*
+	 * Each possible ordering of the available mergejoin clauses will generate
+	 * a differently-sorted result path at essentially the same cost.  We have
+	 * no basis for choosing one over another at this level of joining, but
+	 * some sort orders may be more useful than others for higher-level
+	 * mergejoins, so it's worth considering multiple orderings.
+	 *
+	 * Actually, it's not quite true that every mergeclause ordering will
+	 * generate a different path order, because some of the clauses may be
+	 * partially redundant (refer to the same EquivalenceClasses).	Therefore,
+	 * what we do is convert the mergeclause list to a list of canonical
+	 * pathkeys, and then consider different orderings of the pathkeys.
+	 *
+	 * Generating a path for *every* permutation of the pathkeys doesn't seem
+	 * like a winning strategy; the cost in planning time is too high. For
+	 * now, we generate one path for each pathkey, listing that pathkey first
+	 * and the rest in random order.  This should allow at least a one-clause
+	 * mergejoin without re-sorting against any other possible mergejoin
+	 * partner path.  But if we've not guessed the right ordering of secondary
+	 * keys, we may end up evaluating clauses as qpquals when they could have
+	 * been done as mergeclauses.  (In practice, it's rare that there's more
+	 * than two or three mergeclauses, so expending a huge amount of thought
+	 * on that is probably not worth it.)
+	 *
+	 * The pathkey order returned by select_outer_pathkeys_for_merge() has
+	 * some heuristics behind it (see that function), so be sure to try it
+	 * exactly as-is as well as making variants.
+	 */
+	all_pathkeys = select_outer_pathkeys_for_merge(root,
+												   mergeclause_list,
+												   joinrel);
+
+	foreach(l, all_pathkeys)
+	{
+		List	   *front_pathkey = (List *) lfirst(l);
+		List	   *cur_mergeclauses;
+		List	   *outerkeys;
+		List	   *innerkeys;
+		List	   *merge_pathkeys;
+
+		/* Make a pathkey list with this guy first */
+		if (l != list_head(all_pathkeys))
+			outerkeys = lcons(front_pathkey,
+							  list_delete_ptr(list_copy(all_pathkeys),
+											  front_pathkey));
+		else
+			outerkeys = all_pathkeys;	/* no work at first one... */
+
+		/* Sort the mergeclauses into the corresponding ordering */
+		cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
+														  outerkeys,
+														  true,
+														  mergeclause_list);
+
+		/* Should have used them all... */
+		Assert(list_length(cur_mergeclauses) == list_length(mergeclause_list));
+
+		/* Build sort pathkeys for the inner side */
+		innerkeys = make_inner_pathkeys_for_merge(root,
+												  cur_mergeclauses,
+												  outerkeys);
+
+		/* Build pathkeys representing output sort order */
+		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
+											 outerkeys);
+
+		/*
+		 * And now we can make the path.
+		 *
+		 * Note: it's possible that the cheapest paths will already be sorted
+		 * properly.  try_mergejoin_path will detect that case and suppress an
+		 * explicit sort step, so we needn't do so here.
+		 */
+		try_mergejoin_path(root,
+						   joinrel,
+						   jointype,
+						   sjinfo,
+						   param_source_rels,
+						   extra_lateral_rels,
+						   outer_path,
+						   inner_path,
+						   restrictlist,
+						   merge_pathkeys,
+						   cur_mergeclauses,
+						   outerkeys,
+						   innerkeys);
+	}
+}
+
+/*
+ * match_unsorted_outer
+ *	  Creates possible join paths for processing a single join relation
+ *	  'joinrel' by employing either iterative substitution or
+ *	  mergejoining on each of its possible outer paths (considering
+ *	  only outer paths that are already ordered well enough for merging).
+ *
+ * We always generate a nestloop path for each available outer path.
+ * In fact we may generate as many as five: one on the cheapest-total-cost
+ * inner path, one on the same with materialization, one on the
+ * cheapest-startup-cost inner path (if different), one on the
+ * cheapest-total inner-indexscan path (if any), and one on the
+ * cheapest-startup inner-indexscan path (if different).
+ *
+ * We also consider mergejoins if mergejoin clauses are available.	We have
+ * two ways to generate the inner path for a mergejoin: sort the cheapest
+ * inner path, or use an inner path that is already suitably ordered for the
+ * merge.  If we have several mergeclauses, it could be that there is no inner
+ * path (or only a very expensive one) for the full list of mergeclauses, but
+ * better paths exist if we truncate the mergeclause list (thereby discarding
+ * some sort key requirements).  So, we consider truncations of the
+ * mergeclause list as well as the full list.  (Ideally we'd consider all
+ * subsets of the mergeclause list, but that seems way too expensive.)
+ *
+ * 'joinrel' is the join relation
+ * 'outerrel' is the outer join relation
+ * 'innerrel' is the inner join relation
+ * 'restrictlist' contains all of the RestrictInfo nodes for restriction
+ *		clauses that apply to this join
+ * 'mergeclause_list' is a list of RestrictInfo nodes for available
+ *		mergejoin clauses in this join
+ * 'jointype' is the type of join to do
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'semifactors' contains valid data if jointype is SEMI or ANTI
+ * 'param_source_rels' are OK targets for parameterization of result paths
+ * 'extra_lateral_rels' are additional parameterization for result paths
+ */
+void
+match_unsorted_outer(PlannerInfo *root,
+					 RelOptInfo *joinrel,
+					 RelOptInfo *outerrel,
+					 RelOptInfo *innerrel,
+					 List *restrictlist,
+					 List *mergeclause_list,
+					 JoinType jointype,
+					 SpecialJoinInfo *sjinfo,
+					 SemiAntiJoinFactors *semifactors,
+					 Relids param_source_rels,
+					 Relids extra_lateral_rels)
+{
+	JoinType	save_jointype = jointype;
+	bool		nestjoinOK;
+	bool		useallclauses;
+	Path	   *inner_cheapest_total = innerrel->cheapest_total_path;
+	Path	   *matpath = NULL;
+	ListCell   *lc1;
+
+	/*
+	 * Nestloop only supports inner, left, semi, and anti joins.  Also, if we
+	 * are doing a right or full mergejoin, we must use *all* the mergeclauses
+	 * as join clauses, else we will not have a valid plan.  (Although these
+	 * two flags are currently inverses, keep them separate for clarity and
+	 * possible future changes.)
+	 */
+	switch (jointype)
+	{
+		case JOIN_INNER:
+		case JOIN_LEFT:
+		case JOIN_SEMI:
+		case JOIN_ANTI:
+			nestjoinOK = true;
+			useallclauses = false;
+			break;
+		case JOIN_RIGHT:
+		case JOIN_FULL:
+			nestjoinOK = false;
+			useallclauses = true;
+			break;
+		case JOIN_UNIQUE_OUTER:
+		case JOIN_UNIQUE_INNER:
+			jointype = JOIN_INNER;
+			nestjoinOK = true;
+			useallclauses = false;
+			break;
+		default:
+			elog(ERROR, "unrecognized join type: %d",
+				 (int) jointype);
+			nestjoinOK = false; /* keep compiler quiet */
+			useallclauses = false;
+			break;
+	}
+
+	/*
+	 * If inner_cheapest_total is parameterized by the outer rel, ignore it;
+	 * we will consider it below as a member of cheapest_parameterized_paths,
+	 * but the other possibilities considered in this routine aren't usable.
+	 */
+	if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
+		inner_cheapest_total = NULL;
+
+	/*
+	 * If we need to unique-ify the inner path, we will consider only the
+	 * cheapest-total inner.
+	 */
+	if (save_jointype == JOIN_UNIQUE_INNER)
+	{
+		/* No way to do this with an inner path parameterized by outer rel */
+		if (inner_cheapest_total == NULL)
+			return;
+		inner_cheapest_total = (Path *)
+			create_unique_path(root, innerrel, inner_cheapest_total, sjinfo);
+		Assert(inner_cheapest_total);
+	}
+	else if (nestjoinOK)
+	{
+		/*
+		 * Consider materializing the cheapest inner path, unless
+		 * enable_material is off or the path in question materializes its
+		 * output anyway.
+		 */
+		if (enable_material && inner_cheapest_total != NULL &&
+			!ExecMaterializesOutput(inner_cheapest_total->pathtype))
+			matpath = (Path *)
+				create_material_path(innerrel, inner_cheapest_total);
+	}
+
+	foreach(lc1, outerrel->pathlist)
+	{
+		Path	   *outerpath = (Path *) lfirst(lc1);
+		List	   *merge_pathkeys;
+		List	   *mergeclauses;
+		List	   *innersortkeys;
+		List	   *trialsortkeys;
+		Path	   *cheapest_startup_inner;
+		Path	   *cheapest_total_inner;
+		int			num_sortkeys;
+		int			sortkeycnt;
+
+		/*
+		 * We cannot use an outer path that is parameterized by the inner rel.
+		 */
+		if (PATH_PARAM_BY_REL(outerpath, innerrel))
+			continue;
+
+		/*
+		 * If we need to unique-ify the outer path, it's pointless to consider
+		 * any but the cheapest outer.	(XXX we don't consider parameterized
+		 * outers, nor inners, for unique-ified cases.	Should we?)
+		 */
+		if (save_jointype == JOIN_UNIQUE_OUTER)
+		{
+			if (outerpath != outerrel->cheapest_total_path)
+				continue;
+			outerpath = (Path *) create_unique_path(root, outerrel,
+													outerpath, sjinfo);
+			Assert(outerpath);
+		}
+
+		/*
+		 * The result will have this sort order (even if it is implemented as
+		 * a nestloop, and even if some of the mergeclauses are implemented by
+		 * qpquals rather than as true mergeclauses):
+		 */
+		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
+											 outerpath->pathkeys);
+
+		if (save_jointype == JOIN_UNIQUE_INNER)
+		{
+			/*
+			 * Consider nestloop join, but only with the unique-ified cheapest
+			 * inner path
+			 */
+			try_nestloop_path(root,
+							  joinrel,
+							  jointype,
+							  sjinfo,
+							  semifactors,
+							  param_source_rels,
+							  extra_lateral_rels,
+							  outerpath,
+							  inner_cheapest_total,
+							  restrictlist,
+							  merge_pathkeys);
+		}
+		else if (nestjoinOK)
+		{
+			/*
+			 * Consider nestloop joins using this outer path and various
+			 * available paths for the inner relation.	We consider the
+			 * cheapest-total paths for each available parameterization of the
+			 * inner relation, including the unparameterized case.
+			 */
+			ListCell   *lc2;
+
+			foreach(lc2, innerrel->cheapest_parameterized_paths)
+			{
+				Path	   *innerpath = (Path *) lfirst(lc2);
+
+				try_nestloop_path(root,
+								  joinrel,
+								  jointype,
+								  sjinfo,
+								  semifactors,
+								  param_source_rels,
+								  extra_lateral_rels,
+								  outerpath,
+								  innerpath,
+								  restrictlist,
+								  merge_pathkeys);
+			}
+
+			/* Also consider materialized form of the cheapest inner path */
+			if (matpath != NULL)
+				try_nestloop_path(root,
+								  joinrel,
+								  jointype,
+								  sjinfo,
+								  semifactors,
+								  param_source_rels,
+								  extra_lateral_rels,
+								  outerpath,
+								  matpath,
+								  restrictlist,
+								  merge_pathkeys);
+		}
+
+		/* Can't do anything else if outer path needs to be unique'd */
+		if (save_jointype == JOIN_UNIQUE_OUTER)
+			continue;
+
+		/* Can't do anything else if inner rel is parameterized by outer */
+		if (inner_cheapest_total == NULL)
+			continue;
+
+		/* Look for useful mergeclauses (if any) */
+		mergeclauses = find_mergeclauses_for_pathkeys(root,
+													  outerpath->pathkeys,
+													  true,
+													  mergeclause_list);
+
+		/*
+		 * Done with this outer path if no chance for a mergejoin.
+		 *
+		 * Special corner case: for "x FULL JOIN y ON true", there will be no
+		 * join clauses at all.  Ordinarily we'd generate a clauseless
+		 * nestloop path, but since mergejoin is our only join type that
+		 * supports FULL JOIN without any join clauses, it's necessary to
+		 * generate a clauseless mergejoin path instead.
+		 */
+		if (mergeclauses == NIL)
+		{
+			if (jointype == JOIN_FULL)
+				 /* okay to try for mergejoin */ ;
+			else
+				continue;
+		}
+		if (useallclauses && list_length(mergeclauses) != list_length(mergeclause_list))
+			continue;
+
+		/* Compute the required ordering of the inner path */
+		innersortkeys = make_inner_pathkeys_for_merge(root,
+													  mergeclauses,
+													  outerpath->pathkeys);
+
+		/*
+		 * Generate a mergejoin on the basis of sorting the cheapest inner.
+		 * Since a sort will be needed, only cheapest total cost matters. (But
+		 * try_mergejoin_path will do the right thing if inner_cheapest_total
+		 * is already correctly sorted.)
+		 */
+		try_mergejoin_path(root,
+						   joinrel,
+						   jointype,
+						   sjinfo,
+						   param_source_rels,
+						   extra_lateral_rels,
+						   outerpath,
+						   inner_cheapest_total,
+						   restrictlist,
+						   merge_pathkeys,
+						   mergeclauses,
+						   NIL,
+						   innersortkeys);
+
+		/* Can't do anything else if inner path needs to be unique'd */
+		if (save_jointype == JOIN_UNIQUE_INNER)
+			continue;
+
+		/*
+		 * Look for presorted inner paths that satisfy the innersortkey list
+		 * --- or any truncation thereof, if we are allowed to build a
+		 * mergejoin using a subset of the merge clauses.  Here, we consider
+		 * both cheap startup cost and cheap total cost.
+		 *
+		 * Currently we do not consider parameterized inner paths here. This
+		 * interacts with decisions elsewhere that also discriminate against
+		 * mergejoins with parameterized inputs; see comments in
+		 * src/backend/optimizer/README.
+		 *
+		 * As we shorten the sortkey list, we should consider only paths that
+		 * are strictly cheaper than (in particular, not the same as) any path
+		 * found in an earlier iteration.  Otherwise we'd be intentionally
+		 * using fewer merge keys than a given path allows (treating the rest
+		 * as plain joinquals), which is unlikely to be a good idea.  Also,
+		 * eliminating paths here on the basis of compare_path_costs is a lot
+		 * cheaper than building the mergejoin path only to throw it away.
+		 *
+		 * If inner_cheapest_total is well enough sorted to have not required
+		 * a sort in the path made above, we shouldn't make a duplicate path
+		 * with it, either.  We handle that case with the same logic that
+		 * handles the previous consideration, by initializing the variables
+		 * that track cheapest-so-far properly.  Note that we do NOT reject
+		 * inner_cheapest_total if we find it matches some shorter set of
+		 * pathkeys.  That case corresponds to using fewer mergekeys to avoid
+		 * sorting inner_cheapest_total, whereas we did sort it above, so the
+		 * plans being considered are different.
+		 */
+		if (pathkeys_contained_in(innersortkeys,
+								  inner_cheapest_total->pathkeys))
+		{
+			/* inner_cheapest_total didn't require a sort */
+			cheapest_startup_inner = inner_cheapest_total;
+			cheapest_total_inner = inner_cheapest_total;
+		}
+		else
+		{
+			/* it did require a sort, at least for the full set of keys */
+			cheapest_startup_inner = NULL;
+			cheapest_total_inner = NULL;
+		}
+		num_sortkeys = list_length(innersortkeys);
+		if (num_sortkeys > 1 && !useallclauses)
+			trialsortkeys = list_copy(innersortkeys);	/* need modifiable copy */
+		else
+			trialsortkeys = innersortkeys;		/* won't really truncate */
+
+		for (sortkeycnt = num_sortkeys; sortkeycnt > 0; sortkeycnt--)
+		{
+			Path	   *innerpath;
+			List	   *newclauses = NIL;
+
+			/*
+			 * Look for an inner path ordered well enough for the first
+			 * 'sortkeycnt' innersortkeys.	NB: trialsortkeys list is modified
+			 * destructively, which is why we made a copy...
+			 */
+			trialsortkeys = list_truncate(trialsortkeys, sortkeycnt);
+			innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
+													   trialsortkeys,
+													   NULL,
+													   TOTAL_COST);
+			if (innerpath != NULL &&
+				(cheapest_total_inner == NULL ||
+				 compare_path_costs(innerpath, cheapest_total_inner,
+									TOTAL_COST) < 0))
+			{
+				/* Found a cheap (or even-cheaper) sorted path */
+				/* Select the right mergeclauses, if we didn't already */
+				if (sortkeycnt < num_sortkeys)
+				{
+					newclauses =
+						find_mergeclauses_for_pathkeys(root,
+													   trialsortkeys,
+													   false,
+													   mergeclauses);
+					Assert(newclauses != NIL);
+				}
+				else
+					newclauses = mergeclauses;
+				try_mergejoin_path(root,
+								   joinrel,
+								   jointype,
+								   sjinfo,
+								   param_source_rels,
+								   extra_lateral_rels,
+								   outerpath,
+								   innerpath,
+								   restrictlist,
+								   merge_pathkeys,
+								   newclauses,
+								   NIL,
+								   NIL);
+				cheapest_total_inner = innerpath;
+			}
+			/* Same on the basis of cheapest startup cost ... */
+			innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
+													   trialsortkeys,
+													   NULL,
+													   STARTUP_COST);
+			if (innerpath != NULL &&
+				(cheapest_startup_inner == NULL ||
+				 compare_path_costs(innerpath, cheapest_startup_inner,
+									STARTUP_COST) < 0))
+			{
+				/* Found a cheap (or even-cheaper) sorted path */
+				if (innerpath != cheapest_total_inner)
+				{
+					/*
+					 * Avoid rebuilding clause list if we already made one;
+					 * saves memory in big join trees...
+					 */
+					if (newclauses == NIL)
+					{
+						if (sortkeycnt < num_sortkeys)
+						{
+							newclauses =
+								find_mergeclauses_for_pathkeys(root,
+															   trialsortkeys,
+															   false,
+															   mergeclauses);
+							Assert(newclauses != NIL);
+						}
+						else
+							newclauses = mergeclauses;
+					}
+					try_mergejoin_path(root,
+									   joinrel,
+									   jointype,
+									   sjinfo,
+									   param_source_rels,
+									   extra_lateral_rels,
+									   outerpath,
+									   innerpath,
+									   restrictlist,
+									   merge_pathkeys,
+									   newclauses,
+									   NIL,
+									   NIL);
+				}
+				cheapest_startup_inner = innerpath;
+			}
+
+			/*
+			 * Don't consider truncated sortkeys if we need all clauses.
+			 */
+			if (useallclauses)
+				break;
+		}
+	}
+}
+
+/*
+ * select_mergejoin_clauses
+ *	  Select mergejoin clauses that are usable for a particular join.
+ *	  Returns a list of RestrictInfo nodes for those clauses.
+ *
+ * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
+ * this is a right/full join and there are nonmergejoinable join clauses.
+ * The executor's mergejoin machinery cannot handle such cases, so we have
+ * to avoid generating a mergejoin plan.  (Note that this flag does NOT
+ * consider whether there are actually any mergejoinable clauses.  This is
+ * correct because in some cases we need to build a clauseless mergejoin.
+ * Simply returning NIL is therefore not enough to distinguish safe from
+ * unsafe cases.)
+ *
+ * We also mark each selected RestrictInfo to show which side is currently
+ * being considered as outer.  These are transient markings that are only
+ * good for the duration of the current add_paths_to_joinrel() call!
+ *
+ * We examine each restrictinfo clause known for the join to see
+ * if it is mergejoinable and involves vars from the two sub-relations
+ * currently of interest.
+ */
+List *
+select_mergejoin_clauses(PlannerInfo *root,
+						 RelOptInfo *joinrel,
+						 RelOptInfo *outerrel,
+						 RelOptInfo *innerrel,
+						 List *restrictlist,
+						 JoinType jointype,
+						 bool *mergejoin_allowed)
+{
+	List	   *result_list = NIL;
+	bool		isouterjoin = IS_OUTER_JOIN(jointype);
+	bool		have_nonmergeable_joinclause = false;
+	ListCell   *l;
+
+	foreach(l, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+
+		/*
+		 * If processing an outer join, only use its own join clauses in the
+		 * merge.  For inner joins we can use pushed-down clauses too. (Note:
+		 * we don't set have_nonmergeable_joinclause here because pushed-down
+		 * clauses will become otherquals not joinquals.)
+		 */
+		if (isouterjoin && restrictinfo->is_pushed_down)
+			continue;
+
+		/* Check that clause is a mergeable operator clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+		{
+			/*
+			 * The executor can handle extra joinquals that are constants, but
+			 * not anything else, when doing right/full merge join.  (The
+			 * reason to support constants is so we can do FULL JOIN ON
+			 * FALSE.)
+			 */
+			if (!restrictinfo->clause || !IsA(restrictinfo->clause, Const))
+				have_nonmergeable_joinclause = true;
+			continue;			/* not mergejoinable */
+		}
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer".
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
+		{
+			have_nonmergeable_joinclause = true;
+			continue;			/* no good for these input relations */
+		}
+
+		/*
+		 * Insist that each side have a non-redundant eclass.  This
+		 * restriction is needed because various bits of the planner expect
+		 * that each clause in a merge be associatable with some pathkey in a
+		 * canonical pathkey list, but redundant eclasses can't appear in
+		 * canonical sort orderings.  (XXX it might be worth relaxing this,
+		 * but not enough time to address it for 8.3.)
+		 *
+		 * Note: it would be bad if this condition failed for an otherwise
+		 * mergejoinable FULL JOIN clause, since that would result in
+		 * undesirable planner failure.  I believe that is not possible
+		 * however; a variable involved in a full join could only appear in
+		 * below_outer_join eclasses, which aren't considered redundant.
+		 *
+		 * This case *can* happen for left/right join clauses: the outer-side
+		 * variable could be equated to a constant.  Because we will propagate
+		 * that constant across the join clause, the loss of ability to do a
+		 * mergejoin is not really all that big a deal, and so it's not clear
+		 * that improving this is important.
+		 */
+		update_mergeclause_eclasses(root, restrictinfo);
+
+		if (EC_MUST_BE_REDUNDANT(restrictinfo->left_ec) ||
+			EC_MUST_BE_REDUNDANT(restrictinfo->right_ec))
+		{
+			have_nonmergeable_joinclause = true;
+			continue;			/* can't handle redundant eclasses */
+		}
+
+		result_list = lappend(result_list, restrictinfo);
+	}
+
+	/*
+	 * Report whether mergejoin is allowed (see comment at top of function).
+	 */
+	switch (jointype)
+	{
+		case JOIN_RIGHT:
+		case JOIN_FULL:
+			*mergejoin_allowed = !have_nonmergeable_joinclause;
+			break;
+		default:
+			*mergejoin_allowed = true;
+			break;
+	}
+
+	return result_list;
+}
diff --git a/contrib/custmj/nodeMergejoin.c b/contrib/custmj/nodeMergejoin.c
new file mode 100644
index 0000000..62dd8c0
--- /dev/null
+++ b/contrib/custmj/nodeMergejoin.c
@@ -0,0 +1,1694 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeMergejoin.c
+ *	  routines supporting merge joins
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeMergejoin.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecMergeJoin			mergejoin outer and inner relations.
+ *		ExecInitMergeJoin		creates and initializes run time states
+ *		ExecEndMergeJoin		cleans up the node.
+ *
+ * NOTES
+ *
+ *		Merge-join is done by joining the inner and outer tuples satisfying
+ *		join clauses of the form ((= outerKey innerKey) ...).
+ *		The join clause list is provided by the query planner and may contain
+ *		more than one (= outerKey innerKey) clause (for composite sort key).
+ *
+ *		However, the query executor needs to know whether an outer
+ *		tuple is "greater/smaller" than an inner tuple so that it can
+ *		"synchronize" the two relations. For example, consider the following
+ *		relations:
+ *
+ *				outer: (0 ^1 1 2 5 5 5 6 6 7)	current tuple: 1
+ *				inner: (1 ^3 5 5 5 5 6)			current tuple: 3
+ *
+ *		To continue the merge-join, the executor needs to scan both inner
+ *		and outer relations till the matching tuples 5. It needs to know
+ *		that currently inner tuple 3 is "greater" than outer tuple 1 and
+ *		therefore it should scan the outer relation first to find a
+ *		matching tuple and so on.
+ *
+ *		Therefore, rather than directly executing the merge join clauses,
+ *		we evaluate the left and right key expressions separately and then
+ *		compare the columns one at a time (see MJCompare).	The planner
+ *		passes us enough information about the sort ordering of the inputs
+ *		to allow us to determine how to make the comparison.  We may use the
+ *		appropriate btree comparison function, since Postgres' only notion
+ *		of ordering is specified by btree opfamilies.
+ *
+ *
+ *		Consider the above relations and suppose that the executor has
+ *		just joined the first outer "5" with the last inner "5". The
+ *		next step is of course to join the second outer "5" with all
+ *		the inner "5's". This requires repositioning the inner "cursor"
+ *		to point at the first inner "5". This is done by "marking" the
+ *		first inner 5 so we can restore the "cursor" to it before joining
+ *		with the second outer 5. The access method interface provides
+ *		routines to mark and restore to a tuple.
+ *
+ *
+ *		Essential operation of the merge join algorithm is as follows:
+ *
+ *		Join {
+ *			get initial outer and inner tuples				INITIALIZE
+ *			do forever {
+ *				while (outer != inner) {					SKIP_TEST
+ *					if (outer < inner)
+ *						advance outer						SKIPOUTER_ADVANCE
+ *					else
+ *						advance inner						SKIPINNER_ADVANCE
+ *				}
+ *				mark inner position							SKIP_TEST
+ *				do forever {
+ *					while (outer == inner) {
+ *						join tuples							JOINTUPLES
+ *						advance inner position				NEXTINNER
+ *					}
+ *					advance outer position					NEXTOUTER
+ *					if (outer == mark)						TESTOUTER
+ *						restore inner position to mark		TESTOUTER
+ *					else
+ *						break	// return to top of outer loop
+ *				}
+ *			}
+ *		}
+ *
+ *		The merge join operation is coded in the fashion
+ *		of a state machine.  At each state, we do something and then
+ *		proceed to another state.  This state is stored in the node's
+ *		execution state information and is preserved across calls to
+ *		ExecMergeJoin. -cim 10/31/89
+ */
+#include "postgres.h"
+
+#include "access/nbtree.h"
+#include "executor/execdebug.h"
+/* #include "executor/nodeMergejoin.h" */
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "custmj.h"
+
+/*
+ * States of the ExecMergeJoin state machine
+ */
+#define EXEC_MJ_INITIALIZE_OUTER		1
+#define EXEC_MJ_INITIALIZE_INNER		2
+#define EXEC_MJ_JOINTUPLES				3
+#define EXEC_MJ_NEXTOUTER				4
+#define EXEC_MJ_TESTOUTER				5
+#define EXEC_MJ_NEXTINNER				6
+#define EXEC_MJ_SKIP_TEST				7
+#define EXEC_MJ_SKIPOUTER_ADVANCE		8
+#define EXEC_MJ_SKIPINNER_ADVANCE		9
+#define EXEC_MJ_ENDOUTER				10
+#define EXEC_MJ_ENDINNER				11
+
+/*
+ * Runtime data for each mergejoin clause
+ */
+typedef struct MergeJoinClauseData
+{
+	/* Executable expression trees */
+	ExprState  *lexpr;			/* left-hand (outer) input expression */
+	ExprState  *rexpr;			/* right-hand (inner) input expression */
+
+	/*
+	 * If we have a current left or right input tuple, the values of the
+	 * expressions are loaded into these fields:
+	 */
+	Datum		ldatum;			/* current left-hand value */
+	Datum		rdatum;			/* current right-hand value */
+	bool		lisnull;		/* and their isnull flags */
+	bool		risnull;
+
+	/*
+	 * Everything we need to know to compare the left and right values is
+	 * stored here.
+	 */
+	SortSupportData ssup;
+}	MergeJoinClauseData;
+
+/* Result type for MJEvalOuterValues and MJEvalInnerValues */
+typedef enum
+{
+	MJEVAL_MATCHABLE,			/* normal, potentially matchable tuple */
+	MJEVAL_NONMATCHABLE,		/* tuple cannot join because it has a null */
+	MJEVAL_ENDOFJOIN			/* end of input (physical or effective) */
+} MJEvalResult;
+
+
+#define MarkInnerTuple(innerTupleSlot, mergestate) \
+	ExecCopySlot((mergestate)->mj_MarkedTupleSlot, (innerTupleSlot))
+
+
+/*
+ * MJExamineQuals
+ *
+ * This deconstructs the list of mergejoinable expressions, which is given
+ * to us by the planner in the form of a list of "leftexpr = rightexpr"
+ * expression trees in the order matching the sort columns of the inputs.
+ * We build an array of MergeJoinClause structs containing the information
+ * we will need at runtime.  Each struct essentially tells us how to compare
+ * the two expressions from the original clause.
+ *
+ * In addition to the expressions themselves, the planner passes the btree
+ * opfamily OID, collation OID, btree strategy number (BTLessStrategyNumber or
+ * BTGreaterStrategyNumber), and nulls-first flag that identify the intended
+ * sort ordering for each merge key.  The mergejoinable operator is an
+ * equality operator in the opfamily, and the two inputs are guaranteed to be
+ * ordered in either increasing or decreasing (respectively) order according
+ * to the opfamily and collation, with nulls at the indicated end of the range.
+ * This allows us to obtain the needed comparison function from the opfamily.
+ */
+static MergeJoinClause
+MJExamineQuals(List *mergeclauses,
+			   Oid *mergefamilies,
+			   Oid *mergecollations,
+			   int *mergestrategies,
+			   bool *mergenullsfirst,
+			   PlanState *parent)
+{
+	MergeJoinClause clauses;
+	int			nClauses = list_length(mergeclauses);
+	int			iClause;
+	ListCell   *cl;
+
+	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+
+	iClause = 0;
+	foreach(cl, mergeclauses)
+	{
+		OpExpr	   *qual = (OpExpr *) lfirst(cl);
+		MergeJoinClause clause = &clauses[iClause];
+		Oid			opfamily = mergefamilies[iClause];
+		Oid			collation = mergecollations[iClause];
+		StrategyNumber opstrategy = mergestrategies[iClause];
+		bool		nulls_first = mergenullsfirst[iClause];
+		int			op_strategy;
+		Oid			op_lefttype;
+		Oid			op_righttype;
+		Oid			sortfunc;
+
+		if (!IsA(qual, OpExpr))
+			elog(ERROR, "mergejoin clause is not an OpExpr");
+
+		/*
+		 * Prepare the input expressions for execution.
+		 */
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+
+		/* Set up sort support data */
+		clause->ssup.ssup_cxt = CurrentMemoryContext;
+		clause->ssup.ssup_collation = collation;
+		if (opstrategy == BTLessStrategyNumber)
+			clause->ssup.ssup_reverse = false;
+		else if (opstrategy == BTGreaterStrategyNumber)
+			clause->ssup.ssup_reverse = true;
+		else	/* planner screwed up */
+			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+		clause->ssup.ssup_nulls_first = nulls_first;
+
+		/* Extract the operator's declared left/right datatypes */
+		get_op_opfamily_properties(qual->opno, opfamily, false,
+								   &op_strategy,
+								   &op_lefttype,
+								   &op_righttype);
+		if (op_strategy != BTEqualStrategyNumber)		/* should not happen */
+			elog(ERROR, "cannot merge using non-equality operator %u",
+				 qual->opno);
+
+		/* And get the matching support or comparison function */
+		sortfunc = get_opfamily_proc(opfamily,
+									 op_lefttype,
+									 op_righttype,
+									 BTSORTSUPPORT_PROC);
+		if (OidIsValid(sortfunc))
+		{
+			/* The sort support function should provide a comparator */
+			OidFunctionCall1(sortfunc, PointerGetDatum(&clause->ssup));
+			Assert(clause->ssup.comparator != NULL);
+		}
+		else
+		{
+			/* opfamily doesn't provide sort support, get comparison func */
+			sortfunc = get_opfamily_proc(opfamily,
+										 op_lefttype,
+										 op_righttype,
+										 BTORDER_PROC);
+			if (!OidIsValid(sortfunc))	/* should not happen */
+				elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+					 BTORDER_PROC, op_lefttype, op_righttype, opfamily);
+			/* We'll use a shim to call the old-style btree comparator */
+			PrepareSortSupportComparisonShim(sortfunc, &clause->ssup);
+		}
+
+		iClause++;
+	}
+
+	return clauses;
+}
+
+/*
+ * MJEvalOuterValues
+ *
+ * Compute the values of the mergejoined expressions for the current
+ * outer tuple.  We also detect whether it's impossible for the current
+ * outer tuple to match anything --- this is true if it yields a NULL
+ * input, since we assume mergejoin operators are strict.  If the NULL
+ * is in the first join column, and that column sorts nulls last, then
+ * we can further conclude that no following tuple can match anything
+ * either, since they must all have nulls in the first column.	However,
+ * that case is only interesting if we're not in FillOuter mode, else
+ * we have to visit all the tuples anyway.
+ *
+ * For the convenience of callers, we also make this routine responsible
+ * for testing for end-of-input (null outer tuple), and returning
+ * MJEVAL_ENDOFJOIN when that's seen.  This allows the same code to be used
+ * for both real end-of-input and the effective end-of-input represented by
+ * a first-column NULL.
+ *
+ * We evaluate the values in OuterEContext, which can be reset each
+ * time we move to a new tuple.
+ */
+static MJEvalResult
+MJEvalOuterValues(CustomMergeJoinState *mergestate)
+{
+	ExprContext *econtext = mergestate->mj_OuterEContext;
+	MJEvalResult result = MJEVAL_MATCHABLE;
+	int			i;
+	MemoryContext oldContext;
+
+	/* Check for end of outer subplan */
+	if (TupIsNull(mergestate->mj_OuterTupleSlot))
+		return MJEVAL_ENDOFJOIN;
+
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	econtext->ecxt_outertuple = mergestate->mj_OuterTupleSlot;
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		clause->ldatum = ExecEvalExpr(clause->lexpr, econtext,
+									  &clause->lisnull, NULL);
+		if (clause->lisnull)
+		{
+			/* match is impossible; can we end the join early? */
+			if (i == 0 && !clause->ssup.ssup_nulls_first &&
+				!mergestate->mj_FillOuter)
+				result = MJEVAL_ENDOFJOIN;
+			else if (result == MJEVAL_MATCHABLE)
+				result = MJEVAL_NONMATCHABLE;
+		}
+	}
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+/*
+ * MJEvalInnerValues
+ *
+ * Same as above, but for the inner tuple.	Here, we have to be prepared
+ * to load data from either the true current inner, or the marked inner,
+ * so caller must tell us which slot to load from.
+ */
+static MJEvalResult
+MJEvalInnerValues(CustomMergeJoinState *mergestate, TupleTableSlot *innerslot)
+{
+	ExprContext *econtext = mergestate->mj_InnerEContext;
+	MJEvalResult result = MJEVAL_MATCHABLE;
+	int			i;
+	MemoryContext oldContext;
+
+	/* Check for end of inner subplan */
+	if (TupIsNull(innerslot))
+		return MJEVAL_ENDOFJOIN;
+
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	econtext->ecxt_innertuple = innerslot;
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		clause->rdatum = ExecEvalExpr(clause->rexpr, econtext,
+									  &clause->risnull, NULL);
+		if (clause->risnull)
+		{
+			/* match is impossible; can we end the join early? */
+			if (i == 0 && !clause->ssup.ssup_nulls_first &&
+				!mergestate->mj_FillInner)
+				result = MJEVAL_ENDOFJOIN;
+			else if (result == MJEVAL_MATCHABLE)
+				result = MJEVAL_NONMATCHABLE;
+		}
+	}
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+/*
+ * MJCompare
+ *
+ * Compare the mergejoinable values of the current two input tuples
+ * and return 0 if they are equal (ie, the mergejoin equalities all
+ * succeed), >0 if outer > inner, <0 if outer < inner.
+ *
+ * MJEvalOuterValues and MJEvalInnerValues must already have been called
+ * for the current outer and inner tuples, respectively.
+ */
+static int
+MJCompare(CustomMergeJoinState *mergestate)
+{
+	int			result = 0;
+	bool		nulleqnull = false;
+	ExprContext *econtext = mergestate->cps.ps.ps_ExprContext;
+	int			i;
+	MemoryContext oldContext;
+
+	/*
+	 * Call the comparison functions in short-lived context, in case they leak
+	 * memory.
+	 */
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	for (i = 0; i < mergestate->mj_NumClauses; i++)
+	{
+		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+
+		/*
+		 * Special case for NULL-vs-NULL, else use standard comparison.
+		 */
+		if (clause->lisnull && clause->risnull)
+		{
+			nulleqnull = true;	/* NULL "=" NULL */
+			continue;
+		}
+
+		result = ApplySortComparator(clause->ldatum, clause->lisnull,
+									 clause->rdatum, clause->risnull,
+									 &clause->ssup);
+
+		if (result != 0)
+			break;
+	}
+
+	/*
+	 * If we had any NULL-vs-NULL inputs, we do not want to report that the
+	 * tuples are equal.  Instead, if result is still 0, change it to +1. This
+	 * will result in advancing the inner side of the join.
+	 *
+	 * Likewise, if there was a constant-false joinqual, do not report
+	 * equality.  We have to check this as part of the mergequals, else the
+	 * rescan logic will do the wrong thing.
+	 */
+	if (result == 0 &&
+		(nulleqnull || mergestate->mj_ConstFalseJoin))
+		result = 1;
+
+	MemoryContextSwitchTo(oldContext);
+
+	return result;
+}
+
+
+/*
+ * Generate a fake join tuple with nulls for the inner tuple,
+ * and return it if it passes the non-join quals.
+ */
+static TupleTableSlot *
+MJFillOuter(CustomMergeJoinState *node)
+{
+	ExprContext *econtext = node->cps.ps.ps_ExprContext;
+	List	   *otherqual = node->cps.ps.qual;
+
+	ResetExprContext(econtext);
+
+	econtext->ecxt_outertuple = node->mj_OuterTupleSlot;
+	econtext->ecxt_innertuple = node->mj_NullInnerTupleSlot;
+
+	if (ExecQual(otherqual, econtext, false))
+	{
+		/*
+		 * qualification succeeded.  now form the desired projection tuple and
+		 * return the slot containing it.
+		 */
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		MJ_printf("ExecMergeJoin: returning outer fill tuple\n");
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+
+		if (isDone != ExprEndResult)
+		{
+			node->cps.ps.ps_TupFromTlist =
+				(isDone == ExprMultipleResult);
+			return result;
+		}
+	}
+	else
+		InstrCountFiltered2(node, 1);
+
+	return NULL;
+}
+
+/*
+ * Generate a fake join tuple with nulls for the outer tuple,
+ * and return it if it passes the non-join quals.
+ */
+static TupleTableSlot *
+MJFillInner(CustomMergeJoinState *node)
+{
+	ExprContext *econtext = node->cps.ps.ps_ExprContext;
+	List	   *otherqual = node->cps.ps.qual;
+
+	ResetExprContext(econtext);
+
+	econtext->ecxt_outertuple = node->mj_NullOuterTupleSlot;
+	econtext->ecxt_innertuple = node->mj_InnerTupleSlot;
+
+	if (ExecQual(otherqual, econtext, false))
+	{
+		/*
+		 * qualification succeeded.  now form the desired projection tuple and
+		 * return the slot containing it.
+		 */
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		MJ_printf("ExecMergeJoin: returning inner fill tuple\n");
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+
+		if (isDone != ExprEndResult)
+		{
+			node->cps.ps.ps_TupFromTlist =
+				(isDone == ExprMultipleResult);
+			return result;
+		}
+	}
+	else
+		InstrCountFiltered2(node, 1);
+
+	return NULL;
+}
+
+
+/*
+ * Check that a qual condition is constant true or constant false.
+ * If it is constant false (or null), set *is_const_false to TRUE.
+ *
+ * Constant true would normally be represented by a NIL list, but we allow an
+ * actual bool Const as well.  We do expect that the planner will have thrown
+ * away any non-constant terms that have been ANDed with a constant false.
+ */
+static bool
+check_constant_qual(List *qual, bool *is_const_false)
+{
+	ListCell   *lc;
+
+	foreach(lc, qual)
+	{
+		Const	   *con = (Const *) lfirst(lc);
+
+		if (!con || !IsA(con, Const))
+			return false;
+		if (con->constisnull || !DatumGetBool(con->constvalue))
+			*is_const_false = true;
+	}
+	return true;
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecMergeTupleDump
+ *
+ *		This function is called through the MJ_dump() macro
+ *		when EXEC_MERGEJOINDEBUG is defined
+ * ----------------------------------------------------------------
+ */
+#ifdef EXEC_MERGEJOINDEBUG
+
+static void
+ExecMergeTupleDumpOuter(MergeJoinState *mergestate)
+{
+	TupleTableSlot *outerSlot = mergestate->mj_OuterTupleSlot;
+
+	printf("==== outer tuple ====\n");
+	if (TupIsNull(outerSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(outerSlot);
+}
+
+static void
+ExecMergeTupleDumpInner(MergeJoinState *mergestate)
+{
+	TupleTableSlot *innerSlot = mergestate->mj_InnerTupleSlot;
+
+	printf("==== inner tuple ====\n");
+	if (TupIsNull(innerSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(innerSlot);
+}
+
+static void
+ExecMergeTupleDumpMarked(MergeJoinState *mergestate)
+{
+	TupleTableSlot *markedSlot = mergestate->mj_MarkedTupleSlot;
+
+	printf("==== marked tuple ====\n");
+	if (TupIsNull(markedSlot))
+		printf("(nil)\n");
+	else
+		MJ_debugtup(markedSlot);
+}
+
+static void
+ExecMergeTupleDump(MergeJoinState *mergestate)
+{
+	printf("******** ExecMergeTupleDump ********\n");
+
+	ExecMergeTupleDumpOuter(mergestate);
+	ExecMergeTupleDumpInner(mergestate);
+	ExecMergeTupleDumpMarked(mergestate);
+
+	printf("******** \n");
+}
+#endif
+
+/* ----------------------------------------------------------------
+ *		ExecMergeJoin
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+_ExecMergeJoin(CustomMergeJoinState *node)
+{
+	List	   *joinqual;
+	List	   *otherqual;
+	bool		qualResult;
+	int			compareResult;
+	PlanState  *innerPlan;
+	TupleTableSlot *innerTupleSlot;
+	PlanState  *outerPlan;
+	TupleTableSlot *outerTupleSlot;
+	ExprContext *econtext;
+	bool		doFillOuter;
+	bool		doFillInner;
+
+	/*
+	 * get information from node
+	 */
+	innerPlan = innerPlanState(node);
+	outerPlan = outerPlanState(node);
+	econtext = node->cps.ps.ps_ExprContext;
+	joinqual = node->joinqual;
+	otherqual = node->cps.ps.qual;
+	doFillOuter = node->mj_FillOuter;
+	doFillInner = node->mj_FillInner;
+
+	/*
+	 * Check to see if we're still projecting out tuples from a previous join
+	 * tuple (because there is a function-returning-set in the projection
+	 * expressions).  If so, try to project another one.
+	 */
+	if (node->cps.ps.ps_TupFromTlist)
+	{
+		TupleTableSlot *result;
+		ExprDoneCond isDone;
+
+		result = ExecProject(node->cps.ps.ps_ProjInfo, &isDone);
+		if (isDone == ExprMultipleResult)
+			return result;
+		/* Done with that source tuple... */
+		node->cps.ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.  Note this can't happen
+	 * until we're done projecting out tuples from a join tuple.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * ok, everything is setup.. let's go to work
+	 */
+	for (;;)
+	{
+		MJ_dump(node);
+
+		/*
+		 * get the current state of the join and do things accordingly.
+		 */
+		switch (node->mj_JoinState)
+		{
+				/*
+				 * EXEC_MJ_INITIALIZE_OUTER means that this is the first time
+				 * ExecMergeJoin() has been called and so we have to fetch the
+				 * first matchable tuple for both outer and inner subplans. We
+				 * do the outer side in INITIALIZE_OUTER state, then advance
+				 * to INITIALIZE_INNER state for the inner subplan.
+				 */
+			case EXEC_MJ_INITIALIZE_OUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_INITIALIZE_OUTER\n");
+
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* OK to go get the first inner tuple */
+						node->mj_JoinState = EXEC_MJ_INITIALIZE_INNER;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Stay in same state to fetch next outer tuple */
+						if (doFillOuter)
+						{
+							/*
+							 * Generate a fake join tuple with nulls for the
+							 * inner tuple, and return it if it passes the
+							 * non-join quals.
+							 */
+							TupleTableSlot *result;
+
+							result = MJFillOuter(node);
+							if (result)
+								return result;
+						}
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: nothing in outer subplan\n");
+						if (doFillInner)
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples. We set MatchedInner = true to
+							 * force the ENDOUTER state to advance inner.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							node->mj_MatchedInner = true;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+			case EXEC_MJ_INITIALIZE_INNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_INITIALIZE_INNER\n");
+
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+
+						/*
+						 * OK, we have the initial tuples.	Begin by skipping
+						 * non-matching tuples.
+						 */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Mark before advancing, if wanted */
+						if (node->mj_ExtraMarks)
+							ExecMarkPos(innerPlan);
+						/* Stay in same state to fetch next inner tuple */
+						if (doFillInner)
+						{
+							/*
+							 * Generate a fake join tuple with nulls for the
+							 * outer tuple, and return it if it passes the
+							 * non-join quals.
+							 */
+							TupleTableSlot *result;
+
+							result = MJFillInner(node);
+							if (result)
+								return result;
+						}
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more inner tuples */
+						MJ_printf("ExecMergeJoin: nothing in inner subplan\n");
+						if (doFillOuter)
+						{
+							/*
+							 * Need to emit left-join tuples for all outer
+							 * tuples, including the one we just fetched.  We
+							 * set MatchedOuter = false to force the ENDINNER
+							 * state to emit first tuple before advancing
+							 * outer.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDINNER;
+							node->mj_MatchedOuter = false;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * EXEC_MJ_JOINTUPLES means we have two tuples which satisfied
+				 * the merge clause so we join them and then proceed to get
+				 * the next inner tuple (EXEC_MJ_NEXTINNER).
+				 */
+			case EXEC_MJ_JOINTUPLES:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_JOINTUPLES\n");
+
+				/*
+				 * Set the next state machine state.  The right things will
+				 * happen whether we return this join tuple or just fall
+				 * through to continue the state machine execution.
+				 */
+				node->mj_JoinState = EXEC_MJ_NEXTINNER;
+
+				/*
+				 * Check the extra qual conditions to see if we actually want
+				 * to return this join tuple.  If not, can proceed with merge.
+				 * We must distinguish the additional joinquals (which must
+				 * pass to consider the tuples "matched" for outer-join logic)
+				 * from the otherquals (which must pass before we actually
+				 * return the tuple).
+				 *
+				 * We don't bother with a ResetExprContext here, on the
+				 * assumption that we just did one while checking the merge
+				 * qual.  One per tuple should be sufficient.  We do have to
+				 * set up the econtext links to the tuples for ExecQual to
+				 * use.
+				 */
+				outerTupleSlot = node->mj_OuterTupleSlot;
+				econtext->ecxt_outertuple = outerTupleSlot;
+				innerTupleSlot = node->mj_InnerTupleSlot;
+				econtext->ecxt_innertuple = innerTupleSlot;
+
+				qualResult = (joinqual == NIL ||
+							  ExecQual(joinqual, econtext, false));
+				MJ_DEBUG_QUAL(joinqual, qualResult);
+
+				if (qualResult)
+				{
+					node->mj_MatchedOuter = true;
+					node->mj_MatchedInner = true;
+
+					/* In an antijoin, we never return a matched tuple */
+					if (node->jointype == JOIN_ANTI)
+					{
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					}
+
+					/*
+					 * In a semijoin, we'll consider returning the first
+					 * match, but after that we're done with this outer tuple.
+					 */
+					if (node->jointype == JOIN_SEMI)
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+
+					qualResult = (otherqual == NIL ||
+								  ExecQual(otherqual, econtext, false));
+					MJ_DEBUG_QUAL(otherqual, qualResult);
+
+					if (qualResult)
+					{
+						/*
+						 * qualification succeeded.  now form the desired
+						 * projection tuple and return the slot containing it.
+						 */
+						TupleTableSlot *result;
+						ExprDoneCond isDone;
+
+						MJ_printf("ExecMergeJoin: returning tuple\n");
+
+						result = ExecProject(node->cps.ps.ps_ProjInfo,
+											 &isDone);
+
+						if (isDone != ExprEndResult)
+						{
+							node->cps.ps.ps_TupFromTlist =
+								(isDone == ExprMultipleResult);
+							return result;
+						}
+					}
+					else
+						InstrCountFiltered2(node, 1);
+				}
+				else
+					InstrCountFiltered1(node, 1);
+				break;
+
+				/*
+				 * EXEC_MJ_NEXTINNER means advance the inner scan to the next
+				 * tuple. If the tuple is not nil, we then proceed to test it
+				 * against the join qualification.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this inner tuple.
+				 */
+			case EXEC_MJ_NEXTINNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_NEXTINNER\n");
+
+				if (doFillInner && !node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next inner tuple, if any.  If there's none,
+				 * advance to next outer tuple (which may be able to join to
+				 * previously marked tuples).
+				 *
+				 * NB: must NOT do "extraMarks" here, since we may need to
+				 * return to previously marked tuples.
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+
+						/*
+						 * Test the new inner tuple to see if it matches
+						 * outer.
+						 *
+						 * If they do match, then we join them and move on to
+						 * the next inner tuple (EXEC_MJ_JOINTUPLES).
+						 *
+						 * If they do not match then advance to next outer
+						 * tuple.
+						 */
+						compareResult = MJCompare(node);
+						MJ_DEBUG_COMPARE(compareResult);
+
+						if (compareResult == 0)
+							node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+						else
+						{
+							Assert(compareResult < 0);
+							node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						}
+						break;
+					case MJEVAL_NONMATCHABLE:
+
+						/*
+						 * It contains a NULL and hence can't match any outer
+						 * tuple, so we can skip the comparison and assume the
+						 * new tuple is greater than current outer.
+						 */
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					case MJEVAL_ENDOFJOIN:
+
+						/*
+						 * No more inner tuples.  However, this might be only
+						 * effective and not physical end of inner plan, so
+						 * force mj_InnerTupleSlot to null to make sure we
+						 * don't fetch more inner tuples.  (We need this hack
+						 * because we are not transiting to a state where the
+						 * inner plan is assumed to be exhausted.)
+						 */
+						node->mj_InnerTupleSlot = NULL;
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+				}
+				break;
+
+				/*-------------------------------------------
+				 * EXEC_MJ_NEXTOUTER means
+				 *
+				 *				outer inner
+				 * outer tuple -  5		5  - marked tuple
+				 *				  5		5
+				 *				  6		6  - inner tuple
+				 *				  7		7
+				 *
+				 * we know we just bumped into the
+				 * first inner tuple > current outer tuple (or possibly
+				 * the end of the inner stream)
+				 * so get a new outer tuple and then
+				 * proceed to test it against the marked tuple
+				 * (EXEC_MJ_TESTOUTER)
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this outer tuple.
+				 *------------------------------------------------
+				 */
+			case EXEC_MJ_NEXTOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_NEXTOUTER\n");
+
+				if (doFillOuter && !node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* Go test the new tuple against the marked tuple */
+						node->mj_JoinState = EXEC_MJ_TESTOUTER;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Can't match, so fetch next outer tuple */
+						node->mj_JoinState = EXEC_MJ_NEXTOUTER;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: end of outer subplan\n");
+						innerTupleSlot = node->mj_InnerTupleSlot;
+						if (doFillInner && !TupIsNull(innerTupleSlot))
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*--------------------------------------------------------
+				 * EXEC_MJ_TESTOUTER If the new outer tuple and the marked
+				 * tuple satisfy the merge clause then we know we have
+				 * duplicates in the outer scan so we have to restore the
+				 * inner scan to the marked tuple and proceed to join the
+				 * new outer tuple with the inner tuples.
+				 *
+				 * This is the case when
+				 *						  outer inner
+				 *							4	  5  - marked tuple
+				 *			 outer tuple -	5	  5
+				 *		 new outer tuple -	5	  5
+				 *							6	  8  - inner tuple
+				 *							7	 12
+				 *
+				 *				new outer tuple == marked tuple
+				 *
+				 * If the outer tuple fails the test, then we are done
+				 * with the marked tuples, and we have to look for a
+				 * match to the current inner tuple.  So we will
+				 * proceed to skip outer tuples until outer >= inner
+				 * (EXEC_MJ_SKIP_TEST).
+				 *
+				 *		This is the case when
+				 *
+				 *						  outer inner
+				 *							5	  5  - marked tuple
+				 *			 outer tuple -	5	  5
+				 *		 new outer tuple -	6	  8  - inner tuple
+				 *							7	 12
+				 *
+				 *				new outer tuple > marked tuple
+				 *
+				 *---------------------------------------------------------
+				 */
+			case EXEC_MJ_TESTOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_TESTOUTER\n");
+
+				/*
+				 * Here we must compare the outer tuple with the marked inner
+				 * tuple.  (We can ignore the result of MJEvalInnerValues,
+				 * since the marked inner tuple is certainly matchable.)
+				 */
+				innerTupleSlot = node->mj_MarkedTupleSlot;
+				(void) MJEvalInnerValues(node, innerTupleSlot);
+
+				compareResult = MJCompare(node);
+				MJ_DEBUG_COMPARE(compareResult);
+
+				if (compareResult == 0)
+				{
+					/*
+					 * the merge clause matched so now we restore the inner
+					 * scan position to the first mark, and go join that tuple
+					 * (and any following ones) to the new outer.
+					 *
+					 * NOTE: we do not need to worry about the MatchedInner
+					 * state for the rescanned inner tuples.  We know all of
+					 * them will match this new outer tuple and therefore
+					 * won't be emitted as fill tuples.  This works *only*
+					 * because we require the extra joinquals to be constant
+					 * when doing a right or full join --- otherwise some of
+					 * the rescanned tuples might fail the extra joinquals.
+					 * This obviously won't happen for a constant-true extra
+					 * joinqual, while the constant-false case is handled by
+					 * forcing the merge clause to never match, so we never
+					 * get here.
+					 */
+					ExecRestrPos(innerPlan);
+
+					/*
+					 * ExecRestrPos probably should give us back a new Slot,
+					 * but since it doesn't, use the marked slot.  (The
+					 * previously returned mj_InnerTupleSlot cannot be assumed
+					 * to hold the required tuple.)
+					 */
+					node->mj_InnerTupleSlot = innerTupleSlot;
+					/* we need not do MJEvalInnerValues again */
+
+					node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+				}
+				else
+				{
+					/* ----------------
+					 *	if the new outer tuple didn't match the marked inner
+					 *	tuple then we have a case like:
+					 *
+					 *			 outer inner
+					 *			   4	 4	- marked tuple
+					 * new outer - 5	 4
+					 *			   6	 5	- inner tuple
+					 *			   7
+					 *
+					 *	which means that all subsequent outer tuples will be
+					 *	larger than our marked inner tuples.  So we need not
+					 *	revisit any of the marked tuples but can proceed to
+					 *	look for a match to the current inner.	If there's
+					 *	no more inners, no more matches are possible.
+					 * ----------------
+					 */
+					Assert(compareResult > 0);
+					innerTupleSlot = node->mj_InnerTupleSlot;
+
+					/* reload comparison data for current inner */
+					switch (MJEvalInnerValues(node, innerTupleSlot))
+					{
+						case MJEVAL_MATCHABLE:
+							/* proceed to compare it to the current outer */
+							node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+							break;
+						case MJEVAL_NONMATCHABLE:
+
+							/*
+							 * current inner can't possibly match any outer;
+							 * better to advance the inner scan than the
+							 * outer.
+							 */
+							node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+							break;
+						case MJEVAL_ENDOFJOIN:
+							/* No more inner tuples */
+							if (doFillOuter)
+							{
+								/*
+								 * Need to emit left-join tuples for remaining
+								 * outer tuples.
+								 */
+								node->mj_JoinState = EXEC_MJ_ENDINNER;
+								break;
+							}
+							/* Otherwise we're done. */
+							return NULL;
+					}
+				}
+				break;
+
+				/*----------------------------------------------------------
+				 * EXEC_MJ_SKIP means compare tuples and if they do not
+				 * match, skip whichever is lesser.
+				 *
+				 * For example:
+				 *
+				 *				outer inner
+				 *				  5		5
+				 *				  5		5
+				 * outer tuple -  6		8  - inner tuple
+				 *				  7    12
+				 *				  8    14
+				 *
+				 * we have to advance the outer scan
+				 * until we find the outer 8.
+				 *
+				 * On the other hand:
+				 *
+				 *				outer inner
+				 *				  5		5
+				 *				  5		5
+				 * outer tuple - 12		8  - inner tuple
+				 *				 14    10
+				 *				 17    12
+				 *
+				 * we have to advance the inner scan
+				 * until we find the inner 12.
+				 *----------------------------------------------------------
+				 */
+			case EXEC_MJ_SKIP_TEST:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIP_TEST\n");
+
+				/*
+				 * before we advance, make sure the current tuples do not
+				 * satisfy the mergeclauses.  If they do, then we update the
+				 * marked tuple position and go join them.
+				 */
+				compareResult = MJCompare(node);
+				MJ_DEBUG_COMPARE(compareResult);
+
+				if (compareResult == 0)
+				{
+					ExecMarkPos(innerPlan);
+
+					MarkInnerTuple(node->mj_InnerTupleSlot, node);
+
+					node->mj_JoinState = EXEC_MJ_JOINTUPLES;
+				}
+				else if (compareResult < 0)
+					node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
+				else
+					/* compareResult > 0 */
+					node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+				break;
+
+				/*
+				 * SKIPOUTER_ADVANCE: advance over an outer tuple that is
+				 * known not to join to any inner tuple.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this outer tuple.
+				 */
+			case EXEC_MJ_SKIPOUTER_ADVANCE:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIPOUTER_ADVANCE\n");
+
+				if (doFillOuter && !node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalOuterValues(node))
+				{
+					case MJEVAL_MATCHABLE:
+						/* Go test the new tuple against the current inner */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+						/* Can't match, so fetch next outer tuple */
+						node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more outer tuples */
+						MJ_printf("ExecMergeJoin: end of outer subplan\n");
+						innerTupleSlot = node->mj_InnerTupleSlot;
+						if (doFillInner && !TupIsNull(innerTupleSlot))
+						{
+							/*
+							 * Need to emit right-join tuples for remaining
+							 * inner tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDOUTER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * SKIPINNER_ADVANCE: advance over an inner tuple that is
+				 * known not to join to any outer tuple.
+				 *
+				 * Before advancing, we check to see if we must emit an
+				 * outer-join fill tuple for this inner tuple.
+				 */
+			case EXEC_MJ_SKIPINNER_ADVANCE:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_SKIPINNER_ADVANCE\n");
+
+				if (doFillInner && !node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/* Mark before advancing, if wanted */
+				if (node->mj_ExtraMarks)
+					ExecMarkPos(innerPlan);
+
+				/*
+				 * now we get the next inner tuple, if any
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				/* Compute join values and check for unmatchability */
+				switch (MJEvalInnerValues(node, innerTupleSlot))
+				{
+					case MJEVAL_MATCHABLE:
+						/* proceed to compare it to the current outer */
+						node->mj_JoinState = EXEC_MJ_SKIP_TEST;
+						break;
+					case MJEVAL_NONMATCHABLE:
+
+						/*
+						 * current inner can't possibly match any outer;
+						 * better to advance the inner scan than the outer.
+						 */
+						node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+						break;
+					case MJEVAL_ENDOFJOIN:
+						/* No more inner tuples */
+						MJ_printf("ExecMergeJoin: end of inner subplan\n");
+						outerTupleSlot = node->mj_OuterTupleSlot;
+						if (doFillOuter && !TupIsNull(outerTupleSlot))
+						{
+							/*
+							 * Need to emit left-join tuples for remaining
+							 * outer tuples.
+							 */
+							node->mj_JoinState = EXEC_MJ_ENDINNER;
+							break;
+						}
+						/* Otherwise we're done. */
+						return NULL;
+				}
+				break;
+
+				/*
+				 * EXEC_MJ_ENDOUTER means we have run out of outer tuples, but
+				 * are doing a right/full join and therefore must null-fill
+				 * any remaining unmatched inner tuples.
+				 */
+			case EXEC_MJ_ENDOUTER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_ENDOUTER\n");
+
+				Assert(doFillInner);
+
+				if (!node->mj_MatchedInner)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the outer
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedInner = true;		/* do it only once */
+
+					result = MJFillInner(node);
+					if (result)
+						return result;
+				}
+
+				/* Mark before advancing, if wanted */
+				if (node->mj_ExtraMarks)
+					ExecMarkPos(innerPlan);
+
+				/*
+				 * now we get the next inner tuple, if any
+				 */
+				innerTupleSlot = ExecProcNode(innerPlan);
+				node->mj_InnerTupleSlot = innerTupleSlot;
+				MJ_DEBUG_PROC_NODE(innerTupleSlot);
+				node->mj_MatchedInner = false;
+
+				if (TupIsNull(innerTupleSlot))
+				{
+					MJ_printf("ExecMergeJoin: end of inner subplan\n");
+					return NULL;
+				}
+
+				/* Else remain in ENDOUTER state and process next tuple. */
+				break;
+
+				/*
+				 * EXEC_MJ_ENDINNER means we have run out of inner tuples, but
+				 * are doing a left/full join and therefore must null- fill
+				 * any remaining unmatched outer tuples.
+				 */
+			case EXEC_MJ_ENDINNER:
+				MJ_printf("ExecMergeJoin: EXEC_MJ_ENDINNER\n");
+
+				Assert(doFillOuter);
+
+				if (!node->mj_MatchedOuter)
+				{
+					/*
+					 * Generate a fake join tuple with nulls for the inner
+					 * tuple, and return it if it passes the non-join quals.
+					 */
+					TupleTableSlot *result;
+
+					node->mj_MatchedOuter = true;		/* do it only once */
+
+					result = MJFillOuter(node);
+					if (result)
+						return result;
+				}
+
+				/*
+				 * now we get the next outer tuple, if any
+				 */
+				outerTupleSlot = ExecProcNode(outerPlan);
+				node->mj_OuterTupleSlot = outerTupleSlot;
+				MJ_DEBUG_PROC_NODE(outerTupleSlot);
+				node->mj_MatchedOuter = false;
+
+				if (TupIsNull(outerTupleSlot))
+				{
+					MJ_printf("ExecMergeJoin: end of outer subplan\n");
+					return NULL;
+				}
+
+				/* Else remain in ENDINNER state and process next tuple. */
+				break;
+
+				/*
+				 * broken state value?
+				 */
+			default:
+				elog(ERROR, "unrecognized mergejoin state: %d",
+					 (int) node->mj_JoinState);
+		}
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitMergeJoin
+ * ----------------------------------------------------------------
+ */
+MergeJoinState *
+_ExecInitMergeJoin(CustomMergeJoin *node, EState *estate, int eflags)
+{
+	MergeJoinState *mergestate;
+
+	/* check for unsupported flags */
+	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+	MJ1_printf("ExecInitMergeJoin: %s\n",
+			   "initializing node");
+
+	/*
+	 * create state structure
+	 */
+	mergestate = makeNode(MergeJoinState);
+	mergestate->js.ps.plan = (Plan *) node;
+	mergestate->js.ps.state = estate;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &mergestate->js.ps);
+
+	/*
+	 * we need two additional econtexts in which we can compute the join
+	 * expressions from the left and right input tuples.  The node's regular
+	 * econtext won't do because it gets reset too often.
+	 */
+	mergestate->mj_OuterEContext = CreateExprContext(estate);
+	mergestate->mj_InnerEContext = CreateExprContext(estate);
+
+	/*
+	 * initialize child expressions
+	 */
+	mergestate->js.ps.targetlist = (List *)
+		ExecInitExpr((Expr *) node->cplan.plan.targetlist,
+					 (PlanState *) mergestate);
+	mergestate->js.ps.qual = (List *)
+		ExecInitExpr((Expr *) node->cplan.plan.qual,
+					 (PlanState *) mergestate);
+	mergestate->js.jointype = node->jointype;
+	mergestate->js.joinqual = (List *)
+		ExecInitExpr((Expr *) node->joinqual,
+					 (PlanState *) mergestate);
+	mergestate->mj_ConstFalseJoin = false;
+	/* mergeclauses are handled below */
+
+	/*
+	 * initialize child nodes
+	 *
+	 * inner child must support MARK/RESTORE.
+	 */
+	outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+	innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
+											  eflags | EXEC_FLAG_MARK);
+
+	/*
+	 * For certain types of inner child nodes, it is advantageous to issue
+	 * MARK every time we advance past an inner tuple we will never return to.
+	 * For other types, MARK on a tuple we cannot return to is a waste of
+	 * cycles.	Detect which case applies and set mj_ExtraMarks if we want to
+	 * issue "unnecessary" MARK calls.
+	 *
+	 * Currently, only Material wants the extra MARKs, and it will be helpful
+	 * only if eflags doesn't specify REWIND.
+	 */
+	if (IsA(innerPlan(node), Material) &&
+		(eflags & EXEC_FLAG_REWIND) == 0)
+		mergestate->mj_ExtraMarks = true;
+	else
+		mergestate->mj_ExtraMarks = false;
+
+	/*
+	 * tuple table initialization
+	 */
+	ExecInitResultTupleSlot(estate, &mergestate->js.ps);
+
+	mergestate->mj_MarkedTupleSlot = ExecInitExtraTupleSlot(estate);
+	ExecSetSlotDescriptor(mergestate->mj_MarkedTupleSlot,
+						  ExecGetResultType(innerPlanState(mergestate)));
+
+	switch (node->jointype)
+	{
+		case JOIN_INNER:
+		case JOIN_SEMI:
+			mergestate->mj_FillOuter = false;
+			mergestate->mj_FillInner = false;
+			break;
+		case JOIN_LEFT:
+		case JOIN_ANTI:
+			mergestate->mj_FillOuter = true;
+			mergestate->mj_FillInner = false;
+			mergestate->mj_NullInnerTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(innerPlanState(mergestate)));
+			break;
+		case JOIN_RIGHT:
+			mergestate->mj_FillOuter = false;
+			mergestate->mj_FillInner = true;
+			mergestate->mj_NullOuterTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(outerPlanState(mergestate)));
+
+			/*
+			 * Can't handle right or full join with non-constant extra
+			 * joinclauses.  This should have been caught by planner.
+			 */
+			if (!check_constant_qual(node->joinqual,
+									 &mergestate->mj_ConstFalseJoin))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("RIGHT JOIN is only supported with merge-joinable join conditions")));
+			break;
+		case JOIN_FULL:
+			mergestate->mj_FillOuter = true;
+			mergestate->mj_FillInner = true;
+			mergestate->mj_NullOuterTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(outerPlanState(mergestate)));
+			mergestate->mj_NullInnerTupleSlot =
+				ExecInitNullTupleSlot(estate,
+							  ExecGetResultType(innerPlanState(mergestate)));
+
+			/*
+			 * Can't handle right or full join with non-constant extra
+			 * joinclauses.  This should have been caught by planner.
+			 */
+			if (!check_constant_qual(node->joinqual,
+									 &mergestate->mj_ConstFalseJoin))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("FULL JOIN is only supported with merge-joinable join conditions")));
+			break;
+		default:
+			elog(ERROR, "unrecognized join type: %d",
+				 (int) node->jointype);
+	}
+
+	/*
+	 * initialize tuple type and projection info
+	 */
+	ExecAssignResultTypeFromTL(&mergestate->js.ps);
+	ExecAssignProjectionInfo(&mergestate->js.ps, NULL);
+
+	/*
+	 * preprocess the merge clauses
+	 */
+	mergestate->mj_NumClauses = list_length(node->mergeclauses);
+	mergestate->mj_Clauses = MJExamineQuals(node->mergeclauses,
+											node->mergeFamilies,
+											node->mergeCollations,
+											node->mergeStrategies,
+											node->mergeNullsFirst,
+											(PlanState *) mergestate);
+
+	/*
+	 * initialize join state
+	 */
+	mergestate->mj_JoinState = EXEC_MJ_INITIALIZE_OUTER;
+	mergestate->js.ps.ps_TupFromTlist = false;
+	mergestate->mj_MatchedOuter = false;
+	mergestate->mj_MatchedInner = false;
+	mergestate->mj_OuterTupleSlot = NULL;
+	mergestate->mj_InnerTupleSlot = NULL;
+
+	/*
+	 * initialization successful
+	 */
+	MJ1_printf("ExecInitMergeJoin: %s\n",
+			   "node initialized");
+
+	return mergestate;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecEndMergeJoin
+ *
+ * old comments
+ *		frees storage allocated through C routines.
+ * ----------------------------------------------------------------
+ */
+void
+_ExecEndMergeJoin(CustomMergeJoinState *node)
+{
+	MJ1_printf("ExecEndMergeJoin: %s\n",
+			   "ending node processing");
+
+	/*
+	 * Free the exprcontext
+	 */
+	ExecFreeExprContext(&node->cps.ps);
+
+	/*
+	 * clean out the tuple table
+	 */
+	ExecClearTuple(node->cps.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->mj_MarkedTupleSlot);
+
+	/*
+	 * shut down the subplans
+	 */
+	ExecEndNode(innerPlanState(node));
+	ExecEndNode(outerPlanState(node));
+
+	MJ1_printf("ExecEndMergeJoin: %s\n",
+			   "node processing ended");
+}
+
+void
+_ExecReScanMergeJoin(CustomMergeJoinState *node)
+{
+	ExecClearTuple(node->mj_MarkedTupleSlot);
+
+	node->mj_JoinState = EXEC_MJ_INITIALIZE_OUTER;
+	node->cps.ps.ps_TupFromTlist = false;
+	node->mj_MatchedOuter = false;
+	node->mj_MatchedInner = false;
+	node->mj_OuterTupleSlot = NULL;
+	node->mj_InnerTupleSlot = NULL;
+
+	/*
+	 * if chgParam of subnodes is not null then plans will be re-scanned by
+	 * first ExecProcNode.
+	 */
+	if (node->cps.ps.lefttree->chgParam == NULL)
+		ExecReScan(node->cps.ps.lefttree);
+	if (node->cps.ps.righttree->chgParam == NULL)
+		ExecReScan(node->cps.ps.righttree);
+
+}
diff --git a/contrib/custmj/setrefs.c b/contrib/custmj/setrefs.c
new file mode 100644
index 0000000..9eb0b14
--- /dev/null
+++ b/contrib/custmj/setrefs.c
@@ -0,0 +1,326 @@
+/*-------------------------------------------------------------------------
+ *
+ * setrefs.c
+ *	  Post-processing of a completed plan tree: fix references to subplan
+ *	  vars, compute regproc values for operators, etc
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/plan/setrefs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/transam.h"
+#include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/tlist.h"
+#include "tcop/utility.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+#include "custmj.h"
+
+typedef struct
+{
+	PlannerInfo *root;
+	indexed_tlist *outer_itlist;
+	indexed_tlist *inner_itlist;
+	Index		acceptable_rel;
+	int			rtoffset;
+} fix_join_expr_context;
+
+typedef struct
+{
+	PlannerInfo *root;
+	indexed_tlist *subplan_itlist;
+	Index		newvarno;
+	int			rtoffset;
+} fix_upper_expr_context;
+
+static Var *search_indexed_tlist_for_non_var(Node *node,
+								 indexed_tlist *itlist,
+								 Index newvarno);
+static Node *fix_join_expr_mutator(Node *node,
+					  fix_join_expr_context *context);
+/*
+ * copyVar
+ *		Copy a Var node.
+ *
+ * fix_scan_expr and friends do this enough times that it's worth having
+ * a bespoke routine instead of using the generic copyObject() function.
+ */
+static inline Var *
+copyVar(Var *var)
+{
+	Var		   *newvar = (Var *) palloc(sizeof(Var));
+
+	*newvar = *var;
+	return newvar;
+}
+
+/*
+ * build_tlist_index --- build an index data structure for a child tlist
+ *
+ * In most cases, subplan tlists will be "flat" tlists with only Vars,
+ * so we try to optimize that case by extracting information about Vars
+ * in advance.	Matching a parent tlist to a child is still an O(N^2)
+ * operation, but at least with a much smaller constant factor than plain
+ * tlist_member() searches.
+ *
+ * The result of this function is an indexed_tlist struct to pass to
+ * search_indexed_tlist_for_var() or search_indexed_tlist_for_non_var().
+ * When done, the indexed_tlist may be freed with a single pfree().
+ */
+indexed_tlist *
+build_tlist_index(List *tlist)
+{
+	indexed_tlist *itlist;
+	tlist_vinfo *vinfo;
+	ListCell   *l;
+
+	/* Create data structure with enough slots for all tlist entries */
+	itlist = (indexed_tlist *)
+		palloc(offsetof(indexed_tlist, vars) +
+			   list_length(tlist) * sizeof(tlist_vinfo));
+
+	itlist->tlist = tlist;
+	itlist->has_ph_vars = false;
+	itlist->has_non_vars = false;
+
+	/* Find the Vars and fill in the index array */
+	vinfo = itlist->vars;
+	foreach(l, tlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(l);
+
+		if (tle->expr && IsA(tle->expr, Var))
+		{
+			Var		   *var = (Var *) tle->expr;
+
+			vinfo->varno = var->varno;
+			vinfo->varattno = var->varattno;
+			vinfo->resno = tle->resno;
+			vinfo++;
+		}
+		else if (tle->expr && IsA(tle->expr, PlaceHolderVar))
+			itlist->has_ph_vars = true;
+		else
+			itlist->has_non_vars = true;
+	}
+
+	itlist->num_vars = (vinfo - itlist->vars);
+
+	return itlist;
+}
+
+/*
+ * search_indexed_tlist_for_var --- find a Var in an indexed tlist
+ *
+ * If a match is found, return a copy of the given Var with suitably
+ * modified varno/varattno (to wit, newvarno and the resno of the TLE entry).
+ * Also ensure that varnoold is incremented by rtoffset.
+ * If no match, return NULL.
+ */
+static Var *
+search_indexed_tlist_for_var(Var *var, indexed_tlist *itlist,
+							 Index newvarno, int rtoffset)
+{
+	Index		varno = var->varno;
+	AttrNumber	varattno = var->varattno;
+	tlist_vinfo *vinfo;
+	int			i;
+
+	vinfo = itlist->vars;
+	i = itlist->num_vars;
+	while (i-- > 0)
+	{
+		if (vinfo->varno == varno && vinfo->varattno == varattno)
+		{
+			/* Found a match */
+			Var		   *newvar = copyVar(var);
+
+			newvar->varno = newvarno;
+			newvar->varattno = vinfo->resno;
+			if (newvar->varnoold > 0)
+				newvar->varnoold += rtoffset;
+			return newvar;
+		}
+		vinfo++;
+	}
+	return NULL;				/* no match */
+}
+
+/*
+ * search_indexed_tlist_for_non_var --- find a non-Var in an indexed tlist
+ *
+ * If a match is found, return a Var constructed to reference the tlist item.
+ * If no match, return NULL.
+ *
+ * NOTE: it is a waste of time to call this unless itlist->has_ph_vars or
+ * itlist->has_non_vars
+ */
+static Var *
+search_indexed_tlist_for_non_var(Node *node,
+								 indexed_tlist *itlist, Index newvarno)
+{
+	TargetEntry *tle;
+
+	tle = tlist_member(node, itlist->tlist);
+	if (tle)
+	{
+		/* Found a matching subplan output expression */
+		Var		   *newvar;
+
+		newvar = makeVarFromTargetEntry(newvarno, tle);
+		newvar->varnoold = 0;	/* wasn't ever a plain Var */
+		newvar->varoattno = 0;
+		return newvar;
+	}
+	return NULL;				/* no match */
+}
+
+/*
+ * fix_join_expr
+ *	   Create a new set of targetlist entries or join qual clauses by
+ *	   changing the varno/varattno values of variables in the clauses
+ *	   to reference target list values from the outer and inner join
+ *	   relation target lists.  Also perform opcode lookup and add
+ *	   regclass OIDs to root->glob->relationOids.
+ *
+ * This is used in two different scenarios: a normal join clause, where all
+ * the Vars in the clause *must* be replaced by OUTER_VAR or INNER_VAR
+ * references; and a RETURNING clause, which may contain both Vars of the
+ * target relation and Vars of other relations.  In the latter case we want
+ * to replace the other-relation Vars by OUTER_VAR references, while leaving
+ * target Vars alone.
+ *
+ * For a normal join, acceptable_rel should be zero so that any failure to
+ * match a Var will be reported as an error.  For the RETURNING case, pass
+ * inner_itlist = NULL and acceptable_rel = the ID of the target relation.
+ *
+ * 'clauses' is the targetlist or list of join clauses
+ * 'outer_itlist' is the indexed target list of the outer join relation
+ * 'inner_itlist' is the indexed target list of the inner join relation,
+ *		or NULL
+ * 'acceptable_rel' is either zero or the rangetable index of a relation
+ *		whose Vars may appear in the clause without provoking an error
+ * 'rtoffset': how much to increment varnoold by
+ *
+ * Returns the new expression tree.  The original clause structure is
+ * not modified.
+ */
+List *
+fix_join_expr(PlannerInfo *root,
+			  List *clauses,
+			  indexed_tlist *outer_itlist,
+			  indexed_tlist *inner_itlist,
+			  Index acceptable_rel,
+			  int rtoffset)
+{
+	fix_join_expr_context context;
+
+	context.root = root;
+	context.outer_itlist = outer_itlist;
+	context.inner_itlist = inner_itlist;
+	context.acceptable_rel = acceptable_rel;
+	context.rtoffset = rtoffset;
+	return (List *) fix_join_expr_mutator((Node *) clauses, &context);
+}
+
+static Node *
+fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
+{
+	Var		   *newvar;
+
+	if (node == NULL)
+		return NULL;
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+
+		/* First look for the var in the input tlists */
+		newvar = search_indexed_tlist_for_var(var,
+											  context->outer_itlist,
+											  OUTER_VAR,
+											  context->rtoffset);
+		if (newvar)
+			return (Node *) newvar;
+		if (context->inner_itlist)
+		{
+			newvar = search_indexed_tlist_for_var(var,
+												  context->inner_itlist,
+												  INNER_VAR,
+												  context->rtoffset);
+			if (newvar)
+				return (Node *) newvar;
+		}
+
+		/* If it's for acceptable_rel, adjust and return it */
+		if (var->varno == context->acceptable_rel)
+		{
+			var = copyVar(var);
+			var->varno += context->rtoffset;
+			if (var->varnoold > 0)
+				var->varnoold += context->rtoffset;
+			return (Node *) var;
+		}
+
+		/* No referent found for Var */
+		elog(ERROR, "variable not found in subplan target lists");
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		/* See if the PlaceHolderVar has bubbled up from a lower plan node */
+		if (context->outer_itlist->has_ph_vars)
+		{
+			newvar = search_indexed_tlist_for_non_var((Node *) phv,
+													  context->outer_itlist,
+													  OUTER_VAR);
+			if (newvar)
+				return (Node *) newvar;
+		}
+		if (context->inner_itlist && context->inner_itlist->has_ph_vars)
+		{
+			newvar = search_indexed_tlist_for_non_var((Node *) phv,
+													  context->inner_itlist,
+													  INNER_VAR);
+			if (newvar)
+				return (Node *) newvar;
+		}
+
+		/* If not supplied by input plans, evaluate the contained expr */
+		return fix_join_expr_mutator((Node *) phv->phexpr, context);
+	}
+	/* Try matching more complex expressions too, if tlists have any */
+	if (context->outer_itlist->has_non_vars)
+	{
+		newvar = search_indexed_tlist_for_non_var(node,
+												  context->outer_itlist,
+												  OUTER_VAR);
+		if (newvar)
+			return (Node *) newvar;
+	}
+	if (context->inner_itlist && context->inner_itlist->has_non_vars)
+	{
+		newvar = search_indexed_tlist_for_non_var(node,
+												  context->inner_itlist,
+												  INNER_VAR);
+		if (newvar)
+			return (Node *) newvar;
+	}
+	fix_expr_common(context->root, node);
+	return expression_tree_mutator(node,
+								   fix_join_expr_mutator,
+								   (void *) context);
+}
diff --git a/contrib/custmj/sql/custmj.sql b/contrib/custmj/sql/custmj.sql
new file mode 100644
index 0000000..ffb6d9d
--- /dev/null
+++ b/contrib/custmj/sql/custmj.sql
@@ -0,0 +1,79 @@
+-- regression test for custmj extension
+
+--
+-- initial setup
+--
+CREATE TABLE t1 (a int, b text);
+CREATE TABLE t2 (x int, y text);
+CREATE TABLE t3 (n int primary key, m text);
+CREATE TABLE t4 (s int references t3(n), t text);
+
+INSERT INTO t1 (SELECT x, md5(x::text) FROM generate_series(  1,600) x);
+INSERT INTO t2 (SELECT x, md5(x::text) FROM generate_series(401,800) x);
+INSERT INTO t3 (SELECT x, md5(x::text) FROM generate_series(  1,800) x);
+INSERT INTO t4 (SELECT x, md5(x::text) FROM generate_series(201,600) x);
+
+VACUUM ANALYZE t1;
+VACUUM ANALYZE t2;
+VACUUM ANALYZE t3;
+VACUUM ANALYZE t4;
+-- LOAD this extension
+LOAD 'custmj';
+
+--
+-- explain output
+--
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+
+-- force off hash_join
+SET enable_hashjoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+SELECT * INTO bmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+SELECT * INTO bmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+SELECT * INTO bmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+SELECT * INTO bmj4 FROM t3 FULL JOIN t4 ON n = s;
+
+-- force off built-in merge_join
+SET enable_mergejoin = off;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 JOIN t2 ON a = x;
+SELECT * INTO cmj1 FROM t1 JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;
+SELECT * INTO cmj2 FROM t1 FULL JOIN t2 ON a = x;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 JOIN t4 ON n = s;
+SELECT * INTO cmj3 FROM t3 JOIN t4 ON n = s;
+EXPLAIN (verbose, costs off) SELECT * FROM t3 FULL JOIN t4 ON n = s;
+SELECT * INTO cmj4 FROM t3 FULL JOIN t4 ON n = s;
+
+-- compare the difference of simple result
+SELECT * FROM bmj1 EXCEPT SELECT * FROM cmj1;
+SELECT * FROM cmj1 EXCEPT SELECT * FROM bmj1;
+SELECT * FROM bmj2 EXCEPT SELECT * FROM cmj2;
+SELECT * FROM cmj2 EXCEPT SELECT * FROM bmj2;
+SELECT * FROM bmj3 EXCEPT SELECT * FROM cmj3;
+SELECT * FROM cmj3 EXCEPT SELECT * FROM bmj3;
+SELECT * FROM bmj4 EXCEPT SELECT * FROM cmj4;
+SELECT * FROM cmj4 EXCEPT SELECT * FROM bmj4;
+
+-- a little bit complicated
+EXPLAIN (verbose, costs off)
+  SELECT (a + x + n) % s AS c1, md5(b || y || m || t) AS c2
+  FROM ((t1 join t2 on a = x) join t3 on y = m) join t4 on n = s
+  WHERE b like '%ab%' AND y like '%cd%' AND m like t;
+
+PREPARE p1(int,int) AS
+SELECT * FROM t1 JOIN t3 ON a = n WHERE n BETWEEN $1 AND $2;
+EXPLAIN (verbose, costs off) EXECUTE p1(100,100);
+EXPLAIN (verbose, costs off) EXECUTE p1(100,1000);
+
+EXPLAIN (verbose, costs off)
+SELECT * FROM t1 JOIN t2 ON a = x WHERE x IN (SELECT n % 100 FROM t3);
+
+-- check GetSpecialCustomVar stuff
+SET client_min_messages = debug;
+EXPLAIN (verbose, costs off) SELECT * FROM t1 FULL JOIN t2 ON a = x;

pgsql-v9.4-custom-scan.part-1.v12.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v12.patchDownload

 doc/src/sgml/custom-plan.sgml           | 315 ++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml              |   1 +
 doc/src/sgml/postgres.sgml              |   1 +
 src/backend/commands/explain.c          |  45 ++++-
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  23 +++
 src/backend/executor/execProcnode.c     |  15 ++
 src/backend/executor/nodeCustom.c       |  73 ++++++++
 src/backend/nodes/copyfuncs.c           |  42 +++++
 src/backend/nodes/outfuncs.c            |  40 ++++
 src/backend/optimizer/path/allpaths.c   |  34 ++--
 src/backend/optimizer/path/joinpath.c   |  16 ++
 src/backend/optimizer/plan/createplan.c |  55 ++++--
 src/backend/optimizer/plan/setrefs.c    |  25 ++-
 src/backend/optimizer/plan/subselect.c  | 128 +++++++------
 src/backend/utils/adt/ruleutils.c       |  56 ++++++
 src/include/commands/explain.h          |   1 +
 src/include/executor/nodeCustom.h       |  30 +++
 src/include/nodes/execnodes.h           |  12 ++
 src/include/nodes/nodes.h               |   6 +
 src/include/nodes/plannodes.h           |  77 ++++++++
 src/include/nodes/relation.h            |  29 +++
 src/include/optimizer/paths.h           |  17 ++
 src/include/optimizer/planmain.h        |  12 ++
 src/include/optimizer/subselect.h       |   7 +
 25 files changed, 958 insertions(+), 104 deletions(-)

diff --git a/doc/src/sgml/custom-plan.sgml b/doc/src/sgml/custom-plan.sgml
new file mode 100644
index 0000000..8d456f9
--- /dev/null
+++ b/doc/src/sgml/custom-plan.sgml
@@ -0,0 +1,315 @@
+<!-- doc/src/sgml/custom-plan.sgml -->
+
+<chapter id="custom-plan">
+ <title>Writing A Custom Plan Provider</title>
+
+ <indexterm zone="custom-plan">
+  <primary>custom plan</primary>
+  <secondary>handler for</secondary>
+ </indexterm>
+ <para>
+  The custom-plan interface enables extensions to implement its custom
+  behavior, instead of built-in plan node, according to the cost based
+  optimizer manner.
+  Its key component is <literal>CustomPlan</> node that has usual
+  <literal>Plan</> field and a table of function-pointers; that performs
+  like methods of base class in object oriented programming language,
+  thus <literal>CustomPlan</> node works as a polymorphism plan / execution
+  node.
+  The core backend does not assume anything about behavior of this node
+  type, thus, note that it is responsibility of the custom-plan provider
+  to work its custom node as if the built-in plan / execution node being
+  replaced.
+ </para>
+ <para>
+  Overall steps to use this custom-plan interface is below.
+ </para>
+ <para>
+  Custom-plan provider can add <literal>CustomPath</> on a particular
+  relation scan using <literal>add_scan_path_hook</> or a particular
+  relations join using <literal>add_join_path_hook</>.
+  Then, the planner chooses the cheapest path towards a particular
+  scan or join in the built-in and custom paths.
+  So, <literal>CustomPath</> node has to have proper cost estimation
+  for right plan selection, no need to say.
+ </para>
+ <para>
+  Usually, custom-plan provider extends <literal>CustomPath</> type
+  to have its private fields, like:
+<programlisting>
+typedef struct {
+    CustomPath    cpath;
+        :
+    List         *somethin_private;
+        :
+} YourOwnCustomPath;
+</programlisting>
+  You can also extend <literal>CustomPlan</> and <literal>CustomPlanState</>
+  type with similar manner.
+ </para>
+ <para>
+  <literal>CustomPathMethods</> is table of function-pointers
+  for <literal>CustomPath</>, and <literal>CustomPlanMethods</> is
+  table of function-pointers for <literal>CustomPlan</> and
+  <literal>CustomPlanState</>.
+  Extension has to implement the functions according to the specification
+  in the next section.
+ </para>
+
+ <sect1 id="custom-plan-spec">
+  <title>Specification of Custom Plan Interface</title>
+  <sect2 id="custom-scan-register">
+   <title>Registration of custom-plan path</title>
+   <para>
+    The first task of custom-plan provide is to add <literal>CustomPath</>
+    towards a particular relation scan or relations join.
+    Right now, only scan and join are supported by planner thus cost-based
+    optimization shall be applied, however, other kind of nodes (like sort,
+    aggregate and so on...) are not supported.
+   </para>
+   <para>
+<programlisting>
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+                                        RelOptInfo *baserel,
+                                        RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+</programlisting>
+    Custom-plan provider can add its custom-path using
+    <literal>add_scan_path_hook</> to provide alternative way to scan
+    the relation being specified.
+   </para>
+   <para>
+<programlisting>
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                        RelOptInfo *joinrel,
+                                        RelOptInfo *outerrel,
+                                        RelOptInfo *innerrel,
+                                        JoinType jointype,
+                                        SpecialJoinInfo *sjinfo,
+                                        List *restrictlist,
+                                        Relids param_source_rels,
+                                        Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
+</programlisting>
+    Also, custom-plan provider can add its custom-path using
+    <literal>add_join_path_hook</> to provide alternative way to join
+    two relations (note that both or either of relations are also joined
+    relations, not only base relations) being specified.
+   </para>
+  </sect2>
+
+  <sect2 id="custom-path-methods">
+   <title>Methods of CustomPath</title>
+   <para>
+    This section introduces the method functions of <literal>CustomPath</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlan *
+CreateCustomPlan(PlannerInfo *root,
+                 CustomPath *custom_path);
+</programlisting>
+    This method pupolates a node object that (at least) extends
+    <literal>CustomPlan</> data type, according to the supplied
+    <literal>CustomPath</>.
+    If this custom-plan support mark-and-restore position, its
+    node tag should be <literal>CustomPlanMarkPos</>, instead of
+    <literal>CustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+TextOutCustomPath(StringInfo str, Node *node);
+</programlisting>    
+    This method is needed to support <literal>nodeToString</> for your
+    custom path type to dump its private fields also.
+    The message format has to follow the manner in <filename>outfuncs.c</>.
+   </para>
+  </sect2>
+  <sect2 id="custom-plan-methods">
+   <title>Methods of CustomPlan</title>
+   <para>
+    This section introduces the method functions of <literal>CustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+SetCustomPlanRef(PlannerInfo *root,
+                 CustomPlan *custom_plan,
+                 int rtoffset);
+</programlisting>
+    This method requires custom-plan provides to adjust <literal>Var</> node
+    references in the supplied <literal>CustomPlan</> node.
+    Usually, it shall be shifted by <literal>rtoffset</>, or replaced by
+    <literal>INNER_VAR</> or <literal>OUTER_VAR</> if it references either
+    left or right subplan.
+   </para>
+   <para>
+<programlisting>
+bool
+SupportBackwardScan(CustomPlan *custom_plan);
+</programlisting>
+    This optional method informs the core backend whether this custom-plan
+    supports backward scan capability, or not.
+    If this method is implemented and returns <literal>true</>, it means
+    this custom-plan node supports backward scan. Elsewhere, it is not
+    available.
+   </para>
+   <para>
+<programlisting>
+void
+FinalizeCustomPlan(PlannerInfo *root,
+                   CustomPlan *custom_plan,
+                   Bitmapset **paramids,
+                   Bitmapset **valid_params,
+                   Bitmapset **scan_params);
+</programlisting>
+    This optional method informs the core backend which query parameters
+    are referenced in this custom-plan node, in addition to the ones
+    considered in the <literal>targetlist</> and <literal>qual</> fields
+    of the base <literal>Plan</> node.
+    If parameters are found in the private data field managed by custom-
+    plan provider, it needs to update the supplied bitmapset as expected
+    in the <literal>finalize_plan()</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlanState *
+BeginCustomPlan(CustomPlan *custom_plan,
+                EState *estate,
+                int eflags);
+</programlisting>
+    This method populates a <literal>CustomPlanState</> object according to
+    the supplied <literal>CustomPlan</>, and initializes execution of this
+    custom-plan node, first of all.
+   </para>
+   <para>
+<programlisting>
+TupleTableSlot *
+ExecCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It fetches one tuple from this custom-plan node. This custom-plan node
+    has to set a valid tuple on the <literal>ps_ResultTupleSlot</> and
+    return if any, or returns <literal>NULL</> to inform the upper node
+    it already reached end of the scan.
+   </para>
+   <para>
+<programlisting>
+Node *
+MultiExecCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    Unlike <literal>ExecCustomPlan</>, it allows upper node to fetch
+    multiple tuples, however, you need to pay attention the data format
+    and the way to return it because it fully depends on the type of
+    upper node.
+   </para>
+   <para>
+<programlisting>
+void
+EndCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It ends the execution of this custom-plan node, and releases the
+    resources being allocated. Usually, it is not important to release
+    memory in the per execution memory context, so custom-plan provider
+    should be responsible to its own resources regardless of the framework.
+   </para>
+   <para>
+<programlisting>
+void
+ReScanCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It restarts the current scan from the beginning.
+    Note that parameters of the scan depends on may change values,
+    so rewinded scan does not need to return exactly identical tuples.
+   </para>
+   <para>
+<programlisting>
+void
+MarkPosCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It is optional, but should be implemented if <literal>CustomPlanMarkPos</>
+    was applied, instead of <literal>CustomPlan</>.
+    It saves the current position of the custom-plan on somewhere private
+    state, to restore the position later.    
+   </para>
+   <para>
+<programlisting>
+void
+RestrPosCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It is optional, but should be implemented if <literal>CustomPlanMarkPos</>
+    was applied, instead of <literal>CustomPlan</>.
+    It restores the current position of the custom-plan from the private
+    information being saved somewhere at <literal>MarkPosCustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomPlanTargetRel(CustomPlanState *cpstate,
+                           ExplainState *es);
+</programlisting>
+    It shows the target relation, if this custom-plan node replaced
+    a particular relation scan. Because of implementation reason, this
+    method is separated from the <literal>ExplainCustomPlan</>.
+   </para>
+   <para>
+<programlisting>
+void
+ExplainCustomPlan(CustomPlanState *cpstate,
+                  List *ancestors,
+                  ExplainState *es);
+</programlisting>
+    It put properties of this custom-plan node into the supplied
+    <literal>ExplainState</> according to the usual <command>EXPLAIN</>
+    manner.
+   </para>
+   <para>
+<programlisting>
+Bitmapset *
+GetRelidsCustomPlan(CustomPlanState *cpstate);
+</programlisting>
+    It returns a set of range-table indexes being scanned by this custom-
+    plan node. In case of multiple relations are underlying, it is not
+    always singleton bitmap.
+   </para>
+   <para>
+<programlisting>
+Node *
+GetSpecialCustomVar(CustomPlanState *cpstate,
+                    Var *varnode);
+</programlisting>
+    This optional method returns an expression node to be referenced by
+    the supplied <literal>varnode</> that has special <literal>varno</>
+    (<literal>INNER_VAR</>, <literal>OUTER_VAR</> or <literal>INDEX_VAR</>).
+    <command>EXPLAIN</> command shows column name being referenced in the
+    targetlist or qualifiers of plan nodes. If a var node has special
+    <literal>varno</>, it recursively walks down the underlying subplan to
+    ensure the actual expression referenced by this special varno.
+    In case when a custom-plan node replaced a join node but does not have
+    underlying sub-plan on the left- and right-tree, it is unavailable to
+    use a usual logic, so custom-plan provider has to implement this method
+    to inform the core backend the expression node being referenced by
+    the supplied <literal>varnode</> that has special <literal>varno</>.
+    If this method is not implemented or returns <literal>NULL</>,
+    the core backend solves the special varnode reference as usual.
+   </para>
+   <para>
+<programlisting>
+void
+TextOutCustomPlan(StringInfo str, const CustomPlan *node);
+</programlisting>
+    This method is needed to support <literal>nodeToString</> for your
+    custom plan type to dump its private fields also.
+    The message format has to follow the manner in <filename>outfuncs.c</>.
+   </para>
+   <para>
+<programlisting>
+CustomPlan *
+CopyCustomPlan(const CustomPlan *from);
+</programlisting>
+    This methos is needed to support <literal>copyObject</> for your
+    custom plan type to copy its private fields also.
+   </para>
+  </sect2>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index ab6fcf7..f3b8362 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -92,6 +92,7 @@
 <!ENTITY nls        SYSTEM "nls.sgml">
 <!ENTITY plhandler  SYSTEM "plhandler.sgml">
 <!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-plan  SYSTEM "custom-plan.sgml">
 <!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
 <!ENTITY protocol   SYSTEM "protocol.sgml">
 <!ENTITY sources    SYSTEM "sources.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 9bde108..5f415c6 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -242,6 +242,7 @@
   &nls;
   &plhandler;
   &fdwhandler;
+  &custom-plan;
   &geqo;
   &indexam;
   &gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 08f3167..ff9fc7b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -51,7 +52,6 @@ static void ExplainOneQuery(Query *query, IntoClause *into, ExplainState *es,
 static void report_triggers(ResultRelInfo *rInfo, bool show_relname,
 				ExplainState *es);
 static double elapsed_time(instr_time *starttime);
-static void ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used);
 static void ExplainPreScanMemberNodes(List *plans, PlanState **planstates,
 						  Bitmapset **rels_used);
 static void ExplainPreScanSubPlans(List *plans, Bitmapset **rels_used);
@@ -700,7 +700,7 @@ elapsed_time(instr_time *starttime)
  * This ensures that we don't confusingly assign un-suffixed aliases to RTEs
  * that never appear in the EXPLAIN output (such as inheritance parents).
  */
-static void
+void
 ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 {
 	Plan	   *plan = planstate->plan;
@@ -721,6 +721,16 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			{
+				CustomPlanState	   *cpstate = (CustomPlanState *)planstate;
+				Bitmapset		   *temp
+					= cpstate->methods->GetRelidsCustomPlan(cpstate);
+
+				*rels_used = bms_union(*rels_used, temp);
+			}
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -847,6 +857,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -935,6 +946,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomPlan:
+			sname = "Custom";
+			custom_name = ((CustomPlan *) plan)->methods->CustomName;
+			if (custom_name != NULL)
+				pname = psprintf("Custom (%s)", custom_name);
+			else
+				pname = sname;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1036,6 +1055,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1051,6 +1072,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_CustomPlan:
+			{
+				CustomPlanState	*cps = (CustomPlanState *)planstate;
+
+				if (cps->methods->ExplainCustomPlanTargetRel)
+					cps->methods->ExplainCustomPlanTargetRel(cps, es);
+			}
+			break;
 		case T_IndexScan:
 			{
 				IndexScan  *indexscan = (IndexScan *) plan;
@@ -1347,6 +1376,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomPlan:
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			if (((CustomPlanState *) planstate)->methods->ExplainCustomPlan)
+			{
+				CustomPlanState *cpstate = (CustomPlanState *) planstate;
+
+				cpstate->methods->ExplainCustomPlan(cpstate, ancestors, es);
+			}
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 8c01a63..47e7a3c 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecReScanCustomPlan((CustomPlanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecCustomMarkPos((CustomPlanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecCustomRestrPos((CustomPlanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -390,6 +403,7 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_ValuesScan:
 		case T_Material:
 		case T_Sort:
+		case T_CustomPlanMarkPos:
 			return true;
 
 		case T_Result:
@@ -465,6 +479,15 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomPlan:
+			{
+				CustomPlan *cplan = (CustomPlan *) node;
+
+				if (cplan->methods->SupportBackwardScan)
+					return cplan->methods->SupportBackwardScan(cplan);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c5ecd18..5aa117b 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			result = (PlanState *) ExecInitCustomPlan((CustomPlan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +449,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			result = ExecCustomPlan((CustomPlanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -678,6 +689,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecEndCustomPlan((CustomPlanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..e3c8f58
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,73 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *    Routines to handle execution of custom plan node
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/executor.h"
+#include "executor/nodeCustom.h"
+#include "nodes/execnodes.h"
+#include "nodes/plannodes.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+CustomPlanState *
+ExecInitCustomPlan(CustomPlan *custom_plan, EState *estate, int eflags)
+{
+	CustomPlanState	   *cpstate
+		= custom_plan->methods->BeginCustomPlan(custom_plan, estate, eflags);
+
+	Assert(IsA(cpstate, CustomPlanState));
+
+	return cpstate;
+}
+
+TupleTableSlot *
+ExecCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->ExecCustomPlan != NULL);
+	return cpstate->methods->ExecCustomPlan(cpstate);
+}
+
+Node *
+MultiExecCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->MultiExecCustomPlan != NULL);
+	return cpstate->methods->MultiExecCustomPlan(cpstate);
+}
+
+void
+ExecEndCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->EndCustomPlan != NULL);
+	cpstate->methods->EndCustomPlan(cpstate);
+}
+
+void
+ExecReScanCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->ReScanCustomPlan != NULL);
+	cpstate->methods->ReScanCustomPlan(cpstate);
+}
+
+void
+ExecCustomMarkPos(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->MarkPosCustomPlan != NULL);
+	cpstate->methods->MarkPosCustomPlan(cpstate);
+}
+
+void
+ExecCustomRestrPos(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->RestrPosCustomPlan != NULL);
+	cpstate->methods->RestrPosCustomPlan(cpstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c89d808..18505cd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,42 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomPlan
+ */
+static CustomPlan *
+_copyCustomPlan(const CustomPlan *from)
+{
+	CustomPlan *newnode = from->methods->CopyCustomPlan(from);
+
+	Assert(IsA(newnode, CustomPlan));
+	return newnode;
+}
+
+/*
+ * _copyCustomPlanMarkPos
+ */
+static CustomPlanMarkPos *
+_copyCustomPlanMarkPos(const CustomPlanMarkPos *from)
+{
+	CustomPlanMarkPos *newnode = from->methods->CopyCustomPlan(from);
+
+	Assert(IsA(newnode, CustomPlanMarkPos));
+	return newnode;
+}
+
+/* copy common part of CustomPlan */
+void
+CopyCustomPlanCommon(const Node *__from, Node *__newnode)
+{
+	CustomPlan *from = (CustomPlan *) __from;
+	CustomPlan *newnode = (CustomPlan *) __newnode;
+
+	((Node *) newnode)->type = nodeTag(from);
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+	COPY_SCALAR_FIELD(methods);
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3983,6 +4019,12 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomPlan:
+			retval = _copyCustomPlan(from);
+			break;
+		case T_CustomPlanMarkPos:
+			retval = _copyCustomPlanMarkPos(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bfb4b9f..8a93bc5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -563,6 +563,27 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	WRITE_BOOL_FIELD(fsSystemCol);
 }
 
+/* dump common part of CustomPlan structure */
+static void
+_outCustomPlan(StringInfo str, const CustomPlan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLAN");
+	_outPlanInfo(str, (const Plan *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPlan(str, node);
+}
+
+static void
+_outCustomPlanMarkPos(StringInfo str, const CustomPlanMarkPos *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLANMARKPOS");
+	_outPlanInfo(str, (const Plan *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPlan(str, node);
+}
+
 static void
 _outJoin(StringInfo str, const Join *node)
 {
@@ -1581,6 +1602,16 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 }
 
 static void
+_outCustomPath(StringInfo str, const CustomPath *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPATH");
+	_outPathInfo(str, (const Path *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPath(str, (Node *)node);
+}
+
+static void
 _outAppendPath(StringInfo str, const AppendPath *node)
 {
 	WRITE_NODE_TYPE("APPENDPATH");
@@ -2828,6 +2859,12 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomPlan:
+				_outCustomPlan(str, obj);
+				break;
+			case T_CustomPlanMarkPos:
+				_outCustomPlanMarkPos(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
@@ -3036,6 +3073,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignPath:
 				_outForeignPath(str, obj);
 				break;
+			case T_CustomPath:
+				_outCustomPath(str, obj);
+				break;
 			case T_AppendPath:
 				_outAppendPath(str, obj);
 				break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..6c1ea7e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -47,6 +47,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -323,7 +325,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				}
 				break;
 			case RTE_SUBQUERY:
-				/* Subquery --- fully handled during set_rel_size */
+				/* Subquery --- path was added during set_rel_size */
 				break;
 			case RTE_FUNCTION:
 				/* RangeFunction */
@@ -334,12 +336,19 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				set_values_pathlist(root, rel, rte);
 				break;
 			case RTE_CTE:
-				/* CTE reference --- fully handled during set_rel_size */
+				/* CTE reference --- path was added during set_rel_size */
 				break;
 			default:
 				elog(ERROR, "unexpected rtekind: %d", (int) rel->rtekind);
 				break;
 		}
+
+		/* Also, consider custom plans */
+		if (add_scan_path_hook)
+			(*add_scan_path_hook)(root, rel, rte);
+
+		/* Select cheapest path */
+		set_cheapest(rel);
 	}
 
 #ifdef OPTIMIZER_DEBUG
@@ -388,9 +397,6 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
-
-	/* Now find the cheapest of the paths for this rel */
-	set_cheapest(rel);
 }
 
 /*
@@ -416,9 +422,6 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 {
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
-
-	/* Select cheapest path */
-	set_cheapest(rel);
 }
 
 /*
@@ -1235,9 +1238,6 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1306,9 +1306,6 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1329,9 +1326,6 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1398,9 +1392,6 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1451,9 +1442,6 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index a996116..2fb6678 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,8 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +261,20 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 184d37a..055a818 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -42,11 +42,7 @@
 #include "utils/lsyscache.h"
 
 
-static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
 static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
-static List *build_path_tlist(PlannerInfo *root, Path *path);
-static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
-static void disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path);
 static Plan *create_gating_plan(PlannerInfo *root, Plan *plan, List *quals);
 static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path);
 static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path);
@@ -77,23 +73,20 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomPlan *create_custom_plan(PlannerInfo *root,
+									  CustomPath *best_path);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
 					  Plan *outer_plan, Plan *inner_plan);
 static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
-static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
 static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
 static void process_subquery_nestloop_params(PlannerInfo *root,
 								 List *subplan_params);
 static List *fix_indexqual_references(PlannerInfo *root, IndexPath *index_path);
 static List *fix_indexorderby_references(PlannerInfo *root, IndexPath *index_path);
 static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol);
-static List *get_switched_clauses(List *clauses, Relids outerrelids);
-static List *order_qual_clauses(PlannerInfo *root, List *clauses);
-static void copy_path_costsize(Plan *dest, Path *src);
-static void copy_plan_costsize(Plan *dest, Plan *src);
 static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
 static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 			   Oid indexid, List *indexqual, List *indexqualorig,
@@ -215,7 +208,7 @@ create_plan(PlannerInfo *root, Path *best_path)
  * create_plan_recurse
  *	  Recursive guts of create_plan().
  */
-static Plan *
+Plan *
 create_plan_recurse(PlannerInfo *root, Path *best_path)
 {
 	Plan	   *plan;
@@ -261,6 +254,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 			plan = create_unique_plan(root,
 									  (UniquePath *) best_path);
 			break;
+		case T_CustomPlan:
+			plan = (Plan *) create_custom_plan(root, (CustomPath *) best_path);
+			break;
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -430,7 +426,7 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 /*
  * Build a target list (ie, a list of TargetEntry) for the Path's output.
  */
-static List *
+List *
 build_path_tlist(PlannerInfo *root, Path *path)
 {
 	RelOptInfo *rel = path->parent;
@@ -466,7 +462,7 @@ build_path_tlist(PlannerInfo *root, Path *path)
  *		Decide whether to use a tlist matching relation structure,
  *		rather than only those Vars actually referenced.
  */
-static bool
+bool
 use_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
 {
 	int			i;
@@ -526,7 +522,7 @@ use_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
  * undo the decision made by use_physical_tlist().	Currently, Hash, Sort,
  * and Material nodes want this, so they don't have to store useless columns.
  */
-static void
+void
 disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
 {
 	/* Only need to undo it for path types handled by create_scan_plan() */
@@ -569,7 +565,7 @@ disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path)
  * in most cases we have only a very bad idea of the probability of the gating
  * qual being true.
  */
-static Plan *
+Plan *
 create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
 {
 	List	   *pseudoconstants;
@@ -1072,6 +1068,26 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
 	return plan;
 }
 
+/*
+ * create_custom_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomPlan *
+create_custom_plan(PlannerInfo *root, CustomPath *best_path)
+{
+	CustomPlan	   *cplan;
+
+	/* Populate CustomPlan according to the CustomPath */
+	Assert(best_path->methods->CreateCustomPlan != NULL);
+	cplan = best_path->methods->CreateCustomPlan(root, best_path);
+	Assert(IsA(cplan, CustomPlan) || IsA(cplan, CustomPlanMarkPos));
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&cplan->plan, &best_path->path);
+
+	return cplan;
+}
 
 /*****************************************************************************
  *
@@ -2006,7 +2022,6 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
-
 /*****************************************************************************
  *
  *	JOIN METHODS
@@ -2540,7 +2555,7 @@ create_hashjoin_plan(PlannerInfo *root,
  * root->curOuterRels are replaced by Params, and entries are added to
  * root->curOuterParams if not already present.
  */
-static Node *
+Node *
 replace_nestloop_params(PlannerInfo *root, Node *expr)
 {
 	/* No setup needed for tree walk, so away we go */
@@ -3023,7 +3038,7 @@ fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol)
  *	  touched; a modified list is returned.  We do, however, set the transient
  *	  outer_is_left field in each RestrictInfo to show which side was which.
  */
-static List *
+List *
 get_switched_clauses(List *clauses, Relids outerrelids)
 {
 	List	   *t_list = NIL;
@@ -3089,7 +3104,7 @@ get_switched_clauses(List *clauses, Relids outerrelids)
  * instead of bare clauses.  It's OK because we only sort by cost, but
  * a cost/selectivity combination would likely do the wrong thing.
  */
-static List *
+List *
 order_qual_clauses(PlannerInfo *root, List *clauses)
 {
 	typedef struct
@@ -3156,7 +3171,7 @@ order_qual_clauses(PlannerInfo *root, List *clauses)
  * Copy cost and size info from a Path node to the Plan node created from it.
  * The executor usually won't use this info, but it's needed by EXPLAIN.
  */
-static void
+void
 copy_path_costsize(Plan *dest, Path *src)
 {
 	if (src)
@@ -3179,7 +3194,7 @@ copy_path_costsize(Plan *dest, Path *src)
  * Copy cost and size info from a lower plan node to an inserted node.
  * (Most callers alter the info after copying it.)
  */
-static void
+void
 copy_plan_costsize(Plan *dest, Plan *src)
 {
 	if (src)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 46affe7..e0fd9a2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -17,6 +17,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_type.h"
+#include "executor/nodeCustom.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -86,7 +87,6 @@ static void add_rtes_to_flat_rtable(PlannerInfo *root, bool recursing);
 static void flatten_unplanned_rtes(PlannerGlobal *glob, RangeTblEntry *rte);
 static bool flatten_rtes_walker(Node *node, PlannerGlobal *glob);
 static void add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte);
-static Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
 static Plan *set_indexonlyscan_references(PlannerInfo *root,
 							 IndexOnlyScan *plan,
 							 int rtoffset);
@@ -94,7 +94,6 @@ static Plan *set_subqueryscan_references(PlannerInfo *root,
 							SubqueryScan *plan,
 							int rtoffset);
 static bool trivial_subqueryscan(SubqueryScan *plan);
-static Node *fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset);
 static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
 static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
 static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
@@ -419,7 +418,7 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
 /*
  * set_plan_refs: recurse through the Plan nodes of a single subquery level
  */
-static Plan *
+Plan *
 set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 {
 	ListCell   *l;
@@ -576,6 +575,22 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 			}
 			break;
 
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			{
+				CustomPlan	   *cplan = (CustomPlan *) plan;
+
+				/*
+				 * Extension is responsible to handle set-reference
+				 * correctly.
+				 */
+				Assert(cplan->methods->SetCustomPlanRef != NULL);
+				cplan->methods->SetCustomPlanRef(root,
+												 cplan,
+												 rtoffset);
+			}
+			break;
+
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
@@ -1057,7 +1072,7 @@ copyVar(Var *var)
  * We assume it's okay to update opcode info in-place.  So this could possibly
  * scribble on the planner's input data structures, but it's OK.
  */
-static void
+void
 fix_expr_common(PlannerInfo *root, Node *node)
 {
 	/* We assume callers won't call us on a NULL pointer */
@@ -1126,7 +1141,7 @@ fix_expr_common(PlannerInfo *root, Node *node)
  * looking up operator opcode info for OpExpr and related nodes,
  * and adding OIDs from regclass Const nodes into root->glob->relationOids.
  */
-static Node *
+Node *
 fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset)
 {
 	fix_scan_expr_context context;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a3f3583..6b0c762 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -75,12 +75,8 @@ static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect,
 static Node *replace_correlation_vars_mutator(Node *node, PlannerInfo *root);
 static Node *process_sublinks_mutator(Node *node,
 						 process_sublinks_context *context);
-static Bitmapset *finalize_plan(PlannerInfo *root,
-			  Plan *plan,
-			  Bitmapset *valid_params,
-			  Bitmapset *scan_params);
-static bool finalize_primnode(Node *node, finalize_primnode_context *context);
-
+static bool finalize_primnode_walker(Node *node,
+									 finalize_primnode_context *context);
 
 /*
  * Select a PARAM_EXEC number to identify the given Var as a parameter for
@@ -2045,7 +2041,7 @@ SS_finalize_plan(PlannerInfo *root, Plan *plan, bool attach_initplans)
  * The return value is the computed allParam set for the given Plan node.
  * This is just an internal notational convenience.
  */
-static Bitmapset *
+Bitmapset *
 finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			  Bitmapset *scan_params)
 {
@@ -2070,15 +2066,15 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 	 */
 
 	/* Find params in targetlist and qual */
-	finalize_primnode((Node *) plan->targetlist, &context);
-	finalize_primnode((Node *) plan->qual, &context);
+	finalize_primnode_walker((Node *) plan->targetlist, &context);
+	finalize_primnode_walker((Node *) plan->qual, &context);
 
 	/* Check additional node-type-specific fields */
 	switch (nodeTag(plan))
 	{
 		case T_Result:
-			finalize_primnode(((Result *) plan)->resconstantqual,
-							  &context);
+			finalize_primnode_walker(((Result *) plan)->resconstantqual,
+									 &context);
 			break;
 
 		case T_SeqScan:
@@ -2086,10 +2082,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_IndexScan:
-			finalize_primnode((Node *) ((IndexScan *) plan)->indexqual,
-							  &context);
-			finalize_primnode((Node *) ((IndexScan *) plan)->indexorderby,
-							  &context);
+			finalize_primnode_walker((Node *)((IndexScan *)plan)->indexqual,
+									 &context);
+			finalize_primnode_walker((Node *)((IndexScan *)plan)->indexorderby,
+									 &context);
 
 			/*
 			 * we need not look at indexqualorig, since it will have the same
@@ -2100,10 +2096,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_IndexOnlyScan:
-			finalize_primnode((Node *) ((IndexOnlyScan *) plan)->indexqual,
-							  &context);
-			finalize_primnode((Node *) ((IndexOnlyScan *) plan)->indexorderby,
-							  &context);
+			finalize_primnode_walker((Node *)((IndexOnlyScan *) plan)->indexqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((IndexOnlyScan *) plan)->indexorderby,
+									 &context);
 
 			/*
 			 * we need not look at indextlist, since it cannot contain Params.
@@ -2112,8 +2108,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_BitmapIndexScan:
-			finalize_primnode((Node *) ((BitmapIndexScan *) plan)->indexqual,
-							  &context);
+			finalize_primnode_walker((Node *) ((BitmapIndexScan *) plan)->indexqual,
+									&context);
 
 			/*
 			 * we need not look at indexqualorig, since it will have the same
@@ -2122,14 +2118,14 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_BitmapHeapScan:
-			finalize_primnode((Node *) ((BitmapHeapScan *) plan)->bitmapqualorig,
-							  &context);
+			finalize_primnode_walker((Node *) ((BitmapHeapScan *) plan)->bitmapqualorig,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
 		case T_TidScan:
-			finalize_primnode((Node *) ((TidScan *) plan)->tidquals,
-							  &context);
+			finalize_primnode_walker((Node *) ((TidScan *) plan)->tidquals,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
@@ -2167,7 +2163,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 					funccontext = context;
 					funccontext.paramids = NULL;
 
-					finalize_primnode(rtfunc->funcexpr, &funccontext);
+					finalize_primnode_walker(rtfunc->funcexpr, &funccontext);
 
 					/* remember results for execution */
 					rtfunc->funcparams = funccontext.paramids;
@@ -2183,8 +2179,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_ValuesScan:
-			finalize_primnode((Node *) ((ValuesScan *) plan)->values_lists,
-							  &context);
+			finalize_primnode_walker((Node *) ((ValuesScan *) plan)->values_lists,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
@@ -2231,11 +2227,24 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_ForeignScan:
-			finalize_primnode((Node *) ((ForeignScan *) plan)->fdw_exprs,
-							  &context);
+			finalize_primnode_walker((Node *)((ForeignScan *) plan)->fdw_exprs,
+									 &context);
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomPlan:
+			{
+				CustomPlan *cplan = (CustomPlan *) plan;
+
+				if (cplan->methods->FinalizeCustomPlan)
+					cplan->methods->FinalizeCustomPlan(root,
+													   cplan,
+													   &context.paramids,
+													   &valid_params,
+													   &scan_params);
+			}
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
@@ -2247,8 +2256,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 											  locally_added_param);
 				scan_params = bms_add_member(bms_copy(scan_params),
 											 locally_added_param);
-				finalize_primnode((Node *) mtplan->returningLists,
-								  &context);
+				finalize_primnode_walker((Node *) mtplan->returningLists,
+										 &context);
 				foreach(l, mtplan->plans)
 				{
 					context.paramids =
@@ -2329,8 +2338,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			{
 				ListCell   *l;
 
-				finalize_primnode((Node *) ((Join *) plan)->joinqual,
-								  &context);
+				finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+										 &context);
 				/* collect set of params that will be passed to right child */
 				foreach(l, ((NestLoop *) plan)->nestParams)
 				{
@@ -2343,24 +2352,24 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_MergeJoin:
-			finalize_primnode((Node *) ((Join *) plan)->joinqual,
-							  &context);
-			finalize_primnode((Node *) ((MergeJoin *) plan)->mergeclauses,
-							  &context);
+			finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((MergeJoin *) plan)->mergeclauses,
+									 &context);
 			break;
 
 		case T_HashJoin:
-			finalize_primnode((Node *) ((Join *) plan)->joinqual,
-							  &context);
-			finalize_primnode((Node *) ((HashJoin *) plan)->hashclauses,
+			finalize_primnode_walker((Node *) ((Join *) plan)->joinqual,
+									 &context);
+			finalize_primnode_walker((Node *) ((HashJoin *) plan)->hashclauses,
 							  &context);
 			break;
 
 		case T_Limit:
-			finalize_primnode(((Limit *) plan)->limitOffset,
-							  &context);
-			finalize_primnode(((Limit *) plan)->limitCount,
-							  &context);
+			finalize_primnode_walker(((Limit *) plan)->limitOffset,
+									 &context);
+			finalize_primnode_walker(((Limit *) plan)->limitCount,
+									 &context);
 			break;
 
 		case T_RecursiveUnion:
@@ -2381,10 +2390,10 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			break;
 
 		case T_WindowAgg:
-			finalize_primnode(((WindowAgg *) plan)->startOffset,
-							  &context);
-			finalize_primnode(((WindowAgg *) plan)->endOffset,
-							  &context);
+			finalize_primnode_walker(((WindowAgg *) plan)->startOffset,
+									 &context);
+			finalize_primnode_walker(((WindowAgg *) plan)->endOffset,
+									 &context);
 			break;
 
 		case T_Hash:
@@ -2473,8 +2482,21 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
  * finalize_primnode: add IDs of all PARAM_EXEC params appearing in the given
  * expression tree to the result set.
  */
+Bitmapset *
+finalize_primnode(PlannerInfo *root, Node *node, Bitmapset *paramids)
+{
+	finalize_primnode_context	context;
+
+	context.root = root;
+	context.paramids = paramids;
+
+	finalize_primnode_walker(node, &context);
+
+	return context.paramids;
+}
+
 static bool
-finalize_primnode(Node *node, finalize_primnode_context *context)
+finalize_primnode_walker(Node *node, finalize_primnode_context *context)
 {
 	if (node == NULL)
 		return false;
@@ -2496,7 +2518,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 		Bitmapset  *subparamids;
 
 		/* Recurse into the testexpr, but not into the Plan */
-		finalize_primnode(subplan->testexpr, context);
+		finalize_primnode_walker(subplan->testexpr, context);
 
 		/*
 		 * Remove any param IDs of output parameters of the subplan that were
@@ -2513,7 +2535,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 		}
 
 		/* Also examine args list */
-		finalize_primnode((Node *) subplan->args, context);
+		finalize_primnode_walker((Node *) subplan->args, context);
 
 		/*
 		 * Add params needed by the subplan to paramids, but excluding those
@@ -2528,7 +2550,7 @@ finalize_primnode(Node *node, finalize_primnode_context *context)
 
 		return false;			/* no more to do here */
 	}
-	return expression_tree_walker(node, finalize_primnode,
+	return expression_tree_walker(node, finalize_primnode_walker,
 								  (void *) context);
 }
 
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 566b4c9..934d796 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -5292,6 +5292,25 @@ get_utility_query_def(Query *query, deparse_context *context)
 	}
 }
 
+/*
+ * GetSpecialCustomVar
+ *
+ * Utility routine to call optional GetSpecialCustomVar method of
+ * CustomPlanState
+ */
+static Node *
+GetSpecialCustomVar(PlanState *ps, Var *varnode)
+{
+	CustomPlanState *cps = (CustomPlanState *) ps;
+
+	Assert(IsA(ps, CustomPlanState));
+	Assert(IS_SPECIAL_VARNO(varnode->varno));
+
+	if (cps->methods->GetSpecialCustomVar)
+		return (Node *)cps->methods->GetSpecialCustomVar(cps, varnode);
+
+	return NULL;
+}
 
 /*
  * Display a Var appropriately.
@@ -5323,6 +5342,7 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 	deparse_columns *colinfo;
 	char	   *refname;
 	char	   *attname;
+	Node	   *expr;
 
 	/* Find appropriate nesting depth */
 	netlevelsup = var->varlevelsup + levelsup;
@@ -5345,6 +5365,22 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 		colinfo = deparse_columns_fetch(var->varno, dpns);
 		attnum = var->varattno;
 	}
+	else if (IS_SPECIAL_VARNO(var->varno) &&
+			 IsA(dpns->planstate, CustomPlanState) &&
+			 (expr = GetSpecialCustomVar(dpns->planstate, var)) != NULL)
+	{
+		/*
+		 * Force parentheses because our caller probably assumed a Var is a
+		 * simple expression.
+		 */
+		if (!IsA(expr, Var))
+			appendStringInfoChar(buf, '(');
+		get_rule_expr((Node *) expr, context, true);
+		if (!IsA(expr, Var))
+			appendStringInfoChar(buf, ')');
+
+		return NULL;
+	}
 	else if (var->varno == OUTER_VAR && dpns->outer_tlist)
 	{
 		TargetEntry *tle;
@@ -5633,6 +5669,26 @@ get_name_for_var_field(Var *var, int fieldno,
 		rte = rt_fetch(var->varno, dpns->rtable);
 		attnum = var->varattno;
 	}
+	else if (IS_SPECIAL_VARNO(var->varno) &&
+			 IsA(dpns->planstate, CustomPlanState) &&
+			 (expr = GetSpecialCustomVar(dpns->planstate, var)) != NULL)
+	{
+		StringInfo		saved = context->buf;
+		StringInfoData	temp;
+
+		initStringInfo(&temp);
+		context->buf = &temp;
+
+		if (!IsA(expr, Var))
+			appendStringInfoChar(context->buf, '(');
+		get_rule_expr((Node *) expr, context, true);
+		if (!IsA(expr, Var))
+			appendStringInfoChar(context->buf, ')');
+
+		context->buf = saved;
+
+		return temp.data;
+	}
 	else if (var->varno == OUTER_VAR && dpns->outer_tlist)
 	{
 		TargetEntry *tle;
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3488be3..f914696 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -54,6 +54,7 @@ extern PGDLLIMPORT ExplainOneQuery_hook_type ExplainOneQuery_hook;
 typedef const char *(*explain_get_index_name_hook_type) (Oid indexId);
 extern PGDLLIMPORT explain_get_index_name_hook_type explain_get_index_name_hook;
 
+extern void ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used);
 
 extern void ExplainQuery(ExplainStmt *stmt, const char *queryString,
 			 ParamListInfo params, DestReceiver *dest);
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..e6e049e
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,30 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "nodes/plannodes.h"
+#include "nodes/execnodes.h"
+
+/*
+ * General executor code
+ */
+extern CustomPlanState *ExecInitCustomPlan(CustomPlan *custom_plan,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomPlan(CustomPlanState *cpstate);
+extern Node *MultiExecCustomPlan(CustomPlanState *cpstate);
+extern void ExecEndCustomPlan(CustomPlanState *cpstate);
+
+extern void ExecReScanCustomPlan(CustomPlanState *cpstate);
+extern void ExecCustomMarkPos(CustomPlanState *cpstate);
+extern void ExecCustomRestrPos(CustomPlanState *cpstate);
+
+#endif	/* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a301a08..8af5bf2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1501,6 +1501,18 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	 CustomPlanState information
+ *
+ *		CustomPlan nodes are used to execute custom code within executor.
+ * ----------------
+ */
+typedef struct CustomPlanState
+{
+	PlanState	ps;
+	const CustomPlanMethods	   *methods;
+} CustomPlanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5b8df59..f4a1246 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,8 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomPlan,
+	T_CustomPlanMarkPos,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +109,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomPlanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +227,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
@@ -513,6 +517,8 @@ extern void *stringToNode(char *str);
  */
 extern void *copyObject(const void *obj);
 
+extern void CopyCustomPlanCommon(const Node *from, Node *newnode);
+
 /*
  * nodes/equalfuncs.c
  */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 38c039c..7468d4c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -15,8 +15,10 @@
 #define PLANNODES_H
 
 #include "access/sdir.h"
+#include "lib/stringinfo.h"
 #include "nodes/bitmapset.h"
 #include "nodes/primnodes.h"
+#include "nodes/relation.h"
 
 
 /* ----------------------------------------------------------------
@@ -479,6 +481,81 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomPlan node
+ * ----------------
+ */
+struct CustomPlanMethods;
+
+typedef struct CustomPlan
+{
+	Plan		plan;
+	const struct CustomPlanMethods *methods;
+} CustomPlan;
+
+/* almost same to CustomPlan, but support MarkPos/RestorePos */
+typedef CustomPlan CustomPlanMarkPos;
+
+/* not to include execnodes.h here */
+typedef struct CustomPlanState CustomPlanState;
+typedef struct EState EState;
+typedef struct ExplainState	ExplainState;
+typedef struct TupleTableSlot TupleTableSlot;
+
+typedef void (*SetCustomPlanRef_function)(PlannerInfo *root,
+										  CustomPlan *custom_plan,
+										  int rtoffset);
+typedef bool (*SupportCustomBackwardScan_function)(CustomPlan *custom_plan);
+typedef void (*FinalizeCustomPlan_function)(PlannerInfo *root,
+											CustomPlan *custom_plan,
+											Bitmapset **paramids,
+											Bitmapset **valid_params,
+											Bitmapset **scan_params);
+typedef CustomPlanState *(*BeginCustomPlan_function)(CustomPlan *custom_plan,
+													 EState *estate,
+													 int eflags);
+typedef TupleTableSlot *(*ExecCustomPlan_function)(CustomPlanState *cpstate);
+typedef Node *(*MultiExecCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*EndCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*ReScanCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*MarkPosCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*RestrPosCustomPlan_function)(CustomPlanState *cpstate);
+typedef void (*ExplainCustomPlanTargetRel_function)(CustomPlanState *cpstate,
+													ExplainState *es);
+typedef void (*ExplainCustomPlan_function)(CustomPlanState *cpstate,
+										   List *ancestors,
+										   ExplainState *es);
+typedef Bitmapset *(*GetRelidsCustomPlan_function)(CustomPlanState *cpstate);
+typedef Node *(*GetSpecialCustomVar_function)(CustomPlanState *cpstate,
+											  Var *varnode);
+typedef void (*TextOutCustomPlan_function)(StringInfo str,
+										   const CustomPlan *node);
+typedef CustomPlan *(*CopyCustomPlan_function)(const CustomPlan *from);
+
+typedef struct CustomPlanMethods
+{
+	const char						   *CustomName;
+	/* callbacks for the planner stage */
+	SetCustomPlanRef_function			SetCustomPlanRef;
+	SupportCustomBackwardScan_function	SupportBackwardScan;
+	FinalizeCustomPlan_function			FinalizeCustomPlan;
+	/* callbacks for the executor stage */
+	BeginCustomPlan_function			BeginCustomPlan;
+	ExecCustomPlan_function				ExecCustomPlan;
+	MultiExecCustomPlan_function		MultiExecCustomPlan;
+	EndCustomPlan_function				EndCustomPlan;
+	ReScanCustomPlan_function			ReScanCustomPlan;
+	MarkPosCustomPlan_function			MarkPosCustomPlan;
+	RestrPosCustomPlan_function			RestrPosCustomPlan;
+	/* callbacks for EXPLAIN */
+	ExplainCustomPlanTargetRel_function	ExplainCustomPlanTargetRel;
+	ExplainCustomPlan_function			ExplainCustomPlan;
+	GetRelidsCustomPlan_function		GetRelidsCustomPlan;
+	GetSpecialCustomVar_function		GetSpecialCustomVar;
+	/* callbacks for general node management */
+	TextOutCustomPlan_function			TextOutCustomPlan;
+	CopyCustomPlan_function				CopyCustomPlan;
+} CustomPlanMethods;
 
 /*
  * ==========
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c607b36..cbbf1e0 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
 #define RELATION_H
 
 #include "access/sdir.h"
+#include "lib/stringinfo.h"
 #include "nodes/params.h"
 #include "nodes/parsenodes.h"
 #include "storage/block.h"
@@ -878,6 +879,34 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * custom_flags is a set of CUSTOM_* bits to control its behavior.
+ * custom_methods is a set of function pointers that are declared in
+ * CustomPathMethods structure; extension has to set up correctly.
+ */
+struct CustomPathMethods;
+
+typedef struct CustomPath
+{
+	Path		path;
+	const struct CustomPathMethods   *methods;
+} CustomPath;
+
+typedef struct CustomPlan CustomPlan;
+
+typedef CustomPlan *(*CreateCustomPlan_function)(PlannerInfo *root,
+												 CustomPath *custom_path);
+typedef void (*TextOutCustomPath_function)(StringInfo str, Node *node);
+
+typedef struct CustomPathMethods
+{
+	const char				   *CustomName;
+	CreateCustomPlan_function	CreateCustomPlan;
+	TextOutCustomPath_function	TextOutCustomPath;
+} CustomPathMethods;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9b22fda..3047d3d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,23 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 8bdb7db..28b89d9 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,6 +41,10 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  * prototypes for plan/createplan.c
  */
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
+extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
+extern List *build_path_tlist(PlannerInfo *root, Path *path);
+extern bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
+extern void disuse_physical_tlist(PlannerInfo *root, Plan *plan, Path *path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
@@ -86,6 +90,11 @@ extern ModifyTable *make_modifytable(PlannerInfo *root,
 				 List *withCheckOptionLists, List *returningLists,
 				 List *rowMarks, int epqParam);
 extern bool is_projection_capable_plan(Plan *plan);
+extern List *order_qual_clauses(PlannerInfo *root, List *clauses);
+extern List *get_switched_clauses(List *clauses, Relids outerrelids);
+extern void copy_path_costsize(Plan *dest, Path *src);
+extern void copy_plan_costsize(Plan *dest, Plan *src);
+extern Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
 
 /*
  * prototypes for plan/initsplan.c
@@ -127,6 +136,9 @@ extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
  * prototypes for plan/setrefs.c
  */
 extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
+extern Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
+extern void fix_expr_common(PlannerInfo *root, Node *node);
+extern Node *fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset);
 extern void fix_opfuncids(Node *node);
 extern void set_opfuncid(OpExpr *opexpr);
 extern void set_sa_opfuncid(ScalarArrayOpExpr *opexpr);
diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h
index 5607e98..138b60b 100644
--- a/src/include/optimizer/subselect.h
+++ b/src/include/optimizer/subselect.h
@@ -29,6 +29,13 @@ extern void SS_finalize_plan(PlannerInfo *root, Plan *plan,
 				 bool attach_initplans);
 extern Param *SS_make_initplan_from_plan(PlannerInfo *root, Plan *plan,
 					Oid resulttype, int32 resulttypmod, Oid resultcollation);
+extern Bitmapset *finalize_plan(PlannerInfo *root,
+								Plan *plan,
+								Bitmapset *valid_params,
+								Bitmapset *scan_params);
+extern Bitmapset *finalize_primnode(PlannerInfo *root,
+									Node *node,
+									Bitmapset *paramids);
 extern Param *assign_nestloop_param_var(PlannerInfo *root, Var *var);
 extern Param *assign_nestloop_param_placeholdervar(PlannerInfo *root,
 									 PlaceHolderVar *phv);

#98

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Kouhei Kaigai (#97)

Re: Custom Scan APIs (Re: Custom Plan node)

On 24 March 2014 10:25, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Brief summary of the current approach that has been revised from my
original submission through the discussion on pgsql-hackers:

The plannode was renamed to CustomPlan, instead of CustomScan, because
it dropped all the hardcoded portion that assumes the custom-node shall
perform as alternative scan or join method, because it prevents this
custom-node to perform as other stuff; like sort or append potentially.
According to the suggestion by Tom, I put a structure that contains
several function pointers on the new CustomPlan node, and extension will
allocate an object that extends CustomPlan node.
It looks like polymorphism in object oriented programming language.
The core backend knows abstracted set of methods defined in the
tables of function pointers, and extension can implement its own logic
on the callback, using private state on the extended object.

I just wanted to add some review comments here. I also apologise for
not reviewing this earlier; I had misunderstood the maturity of the
patch and had assumed it was a request for comments/WIP.

Overall, I very much support the concept of providing for alternate
scans. I like the placement of calls in the optimizer and we'll be
able to do much with that. Other comments in order that I consider
them important.

* There is no declarative structure for this at all. I was expecting
to see a way to declare that a specific table might have an alternate
scan path, but we just call the plugin always and it has to separately
make a cache lookup to see if anything extra is needed. The Index AM
allows us to perform scans, yet indexes are very clearly declared and
easily and clearly identifiable. We need the same thing for alternate
plans.

* There is no mechanism at all for maintaining other data structures.
Are we supposed to use the Index AM? Triggers? Or? The lack of clarity
there is disturbing, though I could be simply missing something big
and obvious.

* There is no catalog support. Complex APIs in Postgres typically have
a structure like pg_am which allows the features to be clearly
identified. I'd be worried about our ability to keep track of so many
calls in such pivotal places without that. I want to be able to know
what a plugin is doing, especially when it will likely come in binary
form. I don't see an easy way to have plugins partially override each
other or work together. What happens when I want to use Mr.X's clever
new join plugin at the same time as Mr.Y's GPU accelerator?

* How do we control security? What stops the Custom Scan API from
overriding privileges? Shouldn't the alternate data structures be
recognised as objects so we can grant privileges? Or do we simply say
if an alternate data structure is linked to a heap then has implied
privileges. It would be a shame to implement better security in one
patch and then ignore it in another (from the same author).

All of the above I can let pass in this release, but in the longer
term we need to look for more structure around these ideas so we can
manage and control what happens. The way this is now is quite raw -
suitable for R&D but not for longer term production usage by a wider
audience, IMHO. I wouldn't like to make commitments about the
longevity of this API either; if we accept it, it should have a big
"may change radically" sign hanging on it. Having said that, I am
interested in progress here and I accept that things will look like
this at this stage of the ideas process, so these are not things to
cause delay.

Some things I would like to see change on in this release are...

* It's not clear to me when we load/create the alternate data
structures. That can't happen at _init time. I was expecting this to
look like an infrastructure for unlogged indexes, but it doesn't look
like that either.

* The name Custom makes me nervous. It sounds too generic, as if the
design or objectives for this is are a little unclear. AlternateScan
sounds like a better name since its clearer that we are scanning an
alternate data structure rather than the main heap.

* The prune hook makes me feel very uneasy. It seems weirdly specific
implementation detail, made stranger by the otherwise lack of data
maintenance API calls. Calling that for every dirty page sounds like
an issue and my patch rejection indicator is flashing red around that.

Two additional use cases I will be looking to explore will be

* Code to make Mat Views recognised as alternate scan targets
* Code to allow queries access to sampled data rather the fully
detailed data, if the result would be within acceptable tolerance for
user

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#99

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Simon Riggs (#98)

Re: Custom Scan APIs (Re: Custom Plan node)

Simon Riggs <simon@2ndQuadrant.com> writes:

[ assorted comments about custom-scan patch, but particularly ]

* The prune hook makes me feel very uneasy. It seems weirdly specific
implementation detail, made stranger by the otherwise lack of data
maintenance API calls. Calling that for every dirty page sounds like
an issue and my patch rejection indicator is flashing red around that.

Yeah. After a fast review of the custom-scan and cache-scan patches, it
seems to me that my original fears are largely confirmed: the custom scan
patch is not going to be sufficient to allow development of any truly new
plan type. Yeah, you can plug in some new execution node types, but
actually doing anything interesting is going to require patching other
parts of the system. Are we going to say to all comers, "sure, we'll put
a hook call anywhere you like, just ask"? I can't see this as being the
way to go.

Another way of describing the problem is that it's not clear where the API
boundaries are for potential users of a custom-scan feature. (Simon said
several things that are closely related to this point.) One thing I don't
like at all about the patch is its willingness to turn anything whatsoever
into a publicly exported function, which basically says that the design
attitude is there *are* no boundaries. But that's not going to lead to
anything maintainable. We're certainly not going to want to guarantee
that these suddenly-exported functions will all now have stable APIs
forevermore.

Overall I concur with Simon's conclusion that this might be of interest
for R&D purposes, but it's hard to see anyone wanting to support a
production feature built on this. It would be only marginally less
painful than supporting a patch that just adds the equivalent code
to the backend in the traditional way.

So I'm feeling that this was kind of a dead end. It was worth doing
the legwork to see if this sort of approach could be useful, but the
answer seems like "no".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#100

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Simon Riggs (#98)

Re: Custom Scan APIs (Re: Custom Plan node)

On 24 March 2014 10:25, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Brief summary of the current approach that has been revised from my
original submission through the discussion on pgsql-hackers:

The plannode was renamed to CustomPlan, instead of CustomScan, because
it dropped all the hardcoded portion that assumes the custom-node
shall perform as alternative scan or join method, because it prevents
this custom-node to perform as other stuff; like sort or append

potentially.

According to the suggestion by Tom, I put a structure that contains
several function pointers on the new CustomPlan node, and extension
will allocate an object that extends CustomPlan node.
It looks like polymorphism in object oriented programming language.
The core backend knows abstracted set of methods defined in the tables
of function pointers, and extension can implement its own logic on the
callback, using private state on the extended object.

I just wanted to add some review comments here. I also apologise for not
reviewing this earlier; I had misunderstood the maturity of the patch and
had assumed it was a request for comments/WIP.

Thanks for your interest and many comments.

Overall, I very much support the concept of providing for alternate scans.
I like the placement of calls in the optimizer and we'll be able to do much
with that. Other comments in order that I consider them important.

* There is no declarative structure for this at all. I was expecting to
see a way to declare that a specific table might have an alternate scan
path, but we just call the plugin always and it has to separately make a
cache lookup to see if anything extra is needed. The Index AM allows us
to perform scans, yet indexes are very clearly declared and easily and
clearly identifiable. We need the same thing for alternate plans.

* There is no mechanism at all for maintaining other data structures.
Are we supposed to use the Index AM? Triggers? Or? The lack of clarity there
is disturbing, though I could be simply missing something big and obvious.

* There is no catalog support. Complex APIs in Postgres typically have a
structure like pg_am which allows the features to be clearly identified.
I'd be worried about our ability to keep track of so many calls in such
pivotal places without that. I want to be able to know what a plugin is
doing, especially when it will likely come in binary form. I don't see an
easy way to have plugins partially override each other or work together.
What happens when I want to use Mr.X's clever new join plugin at the same
time as Mr.Y's GPU accelerator?

It was a choice on implementation. I just followed usual PG's hook manner;
that expects loaded extension saves the original function pointer and
has secondary call towards the function on its invocation. Thus, it needs
to walk on chain of extensions if multiple custom providers are loaded.

Even though I initially chose this design, it is an option to have catalog
support to track registered custom-scan providers and its metadata; what
function generate paths, what flags are turned on or what kind of relations
are supported...etc. Probably, optimizer skip some extensions that don't
support the target relation obviously.

* How do we control security? What stops the Custom Scan API from overriding
privileges? Shouldn't the alternate data structures be recognised as
objects so we can grant privileges? Or do we simply say if an alternate
data structure is linked to a heap then has implied privileges. It would
be a shame to implement better security in one patch and then ignore it
in another (from the same author).

In general, we have no mechanism to prevent overriding privilege mechanism
by c-binary extensions. Extension can override (existing) hooks and modify
requiredPerms bits of RangeTblEntry; that eventually cause privilege bypass.
But it is neutral for custom-scan API itself. Even though we implements
an alternative scan logic on the API, the core backend still checks required
privileges on ExecCheckRTPerms being called on the head of executor (unless
author of extension does not do something strange).

All of the above I can let pass in this release, but in the longer term
we need to look for more structure around these ideas so we can manage and
control what happens. The way this is now is quite raw - suitable for R&D
but not for longer term production usage by a wider audience, IMHO. I
wouldn't like to make commitments about the longevity of this API either;
if we accept it, it should have a big "may change radically" sign hanging
on it. Having said that, I am interested in progress here and I accept that
things will look like this at this stage of the ideas process, so these
are not things to cause delay.

Some things I would like to see change on in this release are...

* It's not clear to me when we load/create the alternate data structures.
That can't happen at _init time. I was expecting this to look like an
infrastructure for unlogged indexes, but it doesn't look like that either.

I expected *_preload_libraries GUCs to load extensions.
If we have catalog support, extension shall be loaded prior to the first
invocation when optimizer asks the registered one capability of alternative
scan. I love the idea.

* The name Custom makes me nervous. It sounds too generic, as if the design
or objectives for this is are a little unclear. AlternateScan sounds like
a better name since its clearer that we are scanning an alternate data
structure rather than the main heap.

I don't have special preference on its name.

* The prune hook makes me feel very uneasy. It seems weirdly specific
implementation detail, made stranger by the otherwise lack of data
maintenance API calls. Calling that for every dirty page sounds like an
issue and my patch rejection indicator is flashing red around that.

All I want to do is cache-invalidation on the timing when vacuum is
running, but the proposed prune hook might not be an only answer.
In case when extension manages its cache data structure, which way
can we have to invalidate it? I never stick on existing proposition.

Two additional use cases I will be looking to explore will be

* Code to make Mat Views recognised as alternate scan targets
* Code to allow queries access to sampled data rather the fully detailed
data, if the result would be within acceptable tolerance for user

Let me investigate how to implement it. Probably, the idea around
materialized-view is more simple to do.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#101

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Tom Lane (#99)

Re: Custom Scan APIs (Re: Custom Plan node)

Simon Riggs <simon@2ndQuadrant.com> writes:

[ assorted comments about custom-scan patch, but particularly ]

* The prune hook makes me feel very uneasy. It seems weirdly specific
implementation detail, made stranger by the otherwise lack of data
maintenance API calls. Calling that for every dirty page sounds like
an issue and my patch rejection indicator is flashing red around that.

Yeah. After a fast review of the custom-scan and cache-scan patches, it
seems to me that my original fears are largely confirmed: the custom scan
patch is not going to be sufficient to allow development of any truly new
plan type. Yeah, you can plug in some new execution node types, but actually
doing anything interesting is going to require patching other parts of the
system. Are we going to say to all comers, "sure, we'll put a hook call
anywhere you like, just ask"? I can't see this as being the way to go.

Here is two different points to be discussed; one is generic to the custom-
plan API, and other is specific to my cache-only scan implementation.

Because existing plan/exec nodes are all built-in and some functional stuffs
are consolidated to a particular source file (like createplan.c, setrefs.c),
so it does not make problems if commonly called functions are declared as
static functions.
Custom-plan API changes this assumption, in other words, it allows to have
some portion of jobs in createplan.c or setrefs.c externally, so it needs
to have the commonly used functions being external.
Because I had try & error during development, I could not list up all the
functions to be public at once. However, it is not a fundamental matter,
should be solved during the discussion on pgsql-hackers.

Regarding to the specific portion in the cache-only scan, it may happen
if we want to create an extension that tracks vacuuming, independent from
custom-scan.
Usually, extension utilizes multiple hooks and interfaces to implement
the feature they want to do. In case of cache-only scan, unfortunately,
PG lacks a way to track heap vacuuming even though it needed to invalidate
cached data. It is unrelated issue from the custom-scan API. We may see
same problem if I tried to create an extension to count number of records
being vacuumed.

Another way of describing the problem is that it's not clear where the API
boundaries are for potential users of a custom-scan feature. (Simon said
several things that are closely related to this point.) One thing I don't
like at all about the patch is its willingness to turn anything whatsoever
into a publicly exported function, which basically says that the design
attitude is there *are* no boundaries. But that's not going to lead to
anything maintainable. We're certainly not going to want to guarantee that
these suddenly-exported functions will all now have stable APIs
forevermore.

I'd like to have *several* existing static functions as (almost) stable
APIs, but not all. Indeed, my patch randomly might pick up static functions
to redefine as external functions, however, it does not mean custom-plan
eventually requires all the functions being external.
According to my investigation, here is two types of functions to be exposed.
- A function that walks on plan/exec node tree recursively
(Eg: create_plan_recurse)
- A function that adjusts internal state of the core backend
(Eg: fix_expr_common)

At least, these functions are not majority. I don't think it should be
a strong blocker of this new feature.
(I may have oversights of course, please point out.)

Overall I concur with Simon's conclusion that this might be of interest
for R&D purposes, but it's hard to see anyone wanting to support a production
feature built on this. It would be only marginally less painful than
supporting a patch that just adds the equivalent code to the backend in
the traditional way.

As we adjusted FDW APIs through the first several releases, in general,
any kind of interfaces takes time to stabilize. Even though it *initially*
sticks on R&D purpose (I don't deny), it shall be brushed up to production
stage. I think a feature for R&D purpose is a good start-point.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#102

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Tom Lane (#99)

Re: Custom Scan APIs (Re: Custom Plan node)

On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

[ assorted comments about custom-scan patch, but particularly ]

* The prune hook makes me feel very uneasy. It seems weirdly specific
implementation detail, made stranger by the otherwise lack of data
maintenance API calls. Calling that for every dirty page sounds like
an issue and my patch rejection indicator is flashing red around that.

Yeah. After a fast review of the custom-scan and cache-scan patches, it
seems to me that my original fears are largely confirmed: the custom scan
patch is not going to be sufficient to allow development of any truly new
plan type. Yeah, you can plug in some new execution node types, but
actually doing anything interesting is going to require patching other
parts of the system. Are we going to say to all comers, "sure, we'll put
a hook call anywhere you like, just ask"? I can't see this as being the
way to go.

Without prejudice to the rest of what you said, this argument doesn't
hold much water with me. I mean, anything that our extensibility
mechanism doesn't support today will require new hooks, but does that
mean we're never going to add any more hooks? I sure hope not. When
hooks are proposed here, we evaluate on them on their merits and
attempt to judge the likelihood that a hook in a particular place will
be useful, but generally we're not averse to adding them, and as long
as the paths aren't too performance-critical, I don't think we should
be averse to adding them.

We have a great system today for letting people add new data types and
things of that sort, but anything that penetrates more deeply into the
heart of the system pretty much can't be done; this is why various
companies, such as our respective employers, have developed and
maintained forks of the PostgreSQL code base instead of just hooking
in to the existing code. We probably can't solve that problem
completely, but that doesn't mean we should throw in the towel.

And in particular, I think it's pretty normal that a new facility like
custom scans might create additional demand for new hooks. If
something was completely impossible before, and the new facility makes
it almost-possible, then why shouldn't someone ask for a hook there?
A prune hook probably has no business in the custom scan patch proper,
but whether it's a good idea or a bad one should be decided on the
merits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#103

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Robert Haas (#102)

Re: Custom Scan APIs (Re: Custom Plan node)

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. After a fast review of the custom-scan and cache-scan patches, it
seems to me that my original fears are largely confirmed: the custom scan
patch is not going to be sufficient to allow development of any truly new
plan type. Yeah, you can plug in some new execution node types, but
actually doing anything interesting is going to require patching other
parts of the system.

Without prejudice to the rest of what you said, this argument doesn't
hold much water with me. I mean, anything that our extensibility
mechanism doesn't support today will require new hooks, but does that
mean we're never going to add any more hooks? I sure hope not.

No, that's not what I said. ISTM that the argument for the custom-scan
API is that it allows interesting new things to be done *without further
modifying the core code*. But the example application (cache_scan) fails
to demonstrate that, and indeed seems to be a counterexample. Whether
we'd accept cache_scan on its own merits is a separate question. The
problem for me is that custom-scan isn't showing that it can support what
was claimed without doing serious damage to modularity and maintainability
of the core code.

What this may mean is that we need more attention to refactoring of the
core code. But just removing "static" from any function that looks like
it might be handy isn't my idea of well-considered refactoring. More the
opposite in fact: if those things turn into APIs that we have to support,
it's going to kill any ability to do such refactoring.

A concrete example here is setrefs.c, whose responsibilities tend to
change from release to release. I think if we committed custom-scan
as is, we'd have great difficulty changing setrefs.c's transformations
ever again, at least if we hoped to not break users of the custom-scan
API. I'm not sure what the solution is --- but turning setrefs into
a white box instead of a black box isn't it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#104

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Tom Lane (#103)

Re: Custom Scan APIs (Re: Custom Plan node)

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

A concrete example here is setrefs.c, whose responsibilities tend to
change from release to release. I think if we committed custom-scan
as is, we'd have great difficulty changing setrefs.c's transformations
ever again, at least if we hoped to not break users of the custom-scan
API. I'm not sure what the solution is --- but turning setrefs into
a white box instead of a black box isn't it.

Yeah, this was my (general) complaint as well and the answer that I kept
getting back is "well, it's ok, you can still break it between major
releases and the custom scan users will just have to deal with it".

I'm a bit on the fence about that, itself, but the other half of that
coin is that we could end up with parts of the *core* code that think
it's ok to go pulling in these functions, once they're exposed, and that
could end up making things quite ugly and difficult to maintain going
forward.

Thanks,

Stephen

#105

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Tom Lane (#103)

Re: Custom Scan APIs (Re: Custom Plan node)

On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. After a fast review of the custom-scan and cache-scan patches, it
seems to me that my original fears are largely confirmed: the custom scan
patch is not going to be sufficient to allow development of any truly new
plan type. Yeah, you can plug in some new execution node types, but
actually doing anything interesting is going to require patching other
parts of the system.

Without prejudice to the rest of what you said, this argument doesn't
hold much water with me. I mean, anything that our extensibility
mechanism doesn't support today will require new hooks, but does that
mean we're never going to add any more hooks? I sure hope not.

No, that's not what I said. ISTM that the argument for the custom-scan
API is that it allows interesting new things to be done *without further
modifying the core code*. But the example application (cache_scan) fails
to demonstrate that, and indeed seems to be a counterexample. Whether
we'd accept cache_scan on its own merits is a separate question. The
problem for me is that custom-scan isn't showing that it can support what
was claimed without doing serious damage to modularity and maintainability
of the core code.

I think there's two separate things in there, one of which I agree
with and one of which I disagree with. I agree that we must avoid
damaging the modularity and maintainability of the core code; I don't
agree that custom-scan needs to be able to do interesting things with
zero additional changes to the core code. If we come up with three
interesting applications for custom scan that require 5 new hooks
between them, I'll consider that a major success - assuming those
hooks don't unduly limit future changes we may wish to make in the
core code. I think your concern about exposing APIs that may not be
terribly stable is well-founded, but I don't think that means we
shouldn't expose *anything*.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#106

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#105)

Re: Custom Scan APIs (Re: Custom Plan node)

Hi,

On 2014-04-15 11:07:11 -0400, Robert Haas wrote:

On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

[ discussion ]

What I think this discussion shows that this patch isn't ready for
9.4. The first iteration of the patch came in 2013-11-06. Imo that's
pretty damn late for a relatively complex patch. And obviously we don't
have agreement on the course forward.
I don't think we need to stop discussing, but I think it's pretty clear
that this isn't 9.4 material. And that it's far from "Ready for Committer".

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#107

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Andres Freund (#106)

Re: Custom Scan APIs (Re: Custom Plan node)

Andres Freund <andres@2ndquadrant.com> writes:

What I think this discussion shows that this patch isn't ready for
9.4. The first iteration of the patch came in 2013-11-06. Imo that's
pretty damn late for a relatively complex patch. And obviously we don't
have agreement on the course forward.
I don't think we need to stop discussing, but I think it's pretty clear
that this isn't 9.4 material. And that it's far from "Ready for Committer".

Yeah. I'm still not exactly convinced that custom-scan will ever allow
independent development of new plan types (which, with all due respect to
Robert, is what it was being sold as last year in Ottawa). But I'm not
opposed in principle to committing it, if we can find a way to have a
cleaner API for things like setrefs.c. It seems like late-stage planner
processing in general is an issue for this patch (createplan.c and
subselect.c are also looking messy). EXPLAIN isn't too great either.

I'm not sure exactly what to do about those cases, but I wonder
whether things would get better if we had the equivalent of
expression_tree_walker/mutator capability for plan nodes. The state
of affairs in setrefs and subselect, at least, is a bit reminiscent
of the bad old days when we had lots of different bespoke code for
traversing expression trees.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#108

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Tom Lane (#107)

Re: Custom Scan APIs (Re: Custom Plan node)

On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. After a fast review of the custom-scan and cache-scan
patches, it seems to me that my original fears are largely
confirmed: the custom scan patch is not going to be sufficient to
allow development of any truly new plan type. Yeah, you can plug in
some new execution node types, but actually doing anything
interesting is going to require patching other parts of the system.

Without prejudice to the rest of what you said, this argument doesn't
hold much water with me. I mean, anything that our extensibility
mechanism doesn't support today will require new hooks, but does that
mean we're never going to add any more hooks? I sure hope not.

No, that's not what I said. ISTM that the argument for the
custom-scan API is that it allows interesting new things to be done
*without further modifying the core code*. But the example
application (cache_scan) fails to demonstrate that, and indeed seems
to be a counterexample. Whether we'd accept cache_scan on its own
merits is a separate question. The problem for me is that custom-scan
isn't showing that it can support what was claimed without doing
serious damage to modularity and maintainability of the core code.

I think there's two separate things in there, one of which I agree with
and one of which I disagree with. I agree that we must avoid damaging the
modularity and maintainability of the core code; I don't agree that
custom-scan needs to be able to do interesting things with zero additional
changes to the core code. If we come up with three interesting applications
for custom scan that require 5 new hooks between them, I'll consider that
a major success - assuming those hooks don't unduly limit future changes
we may wish to make in the core code. I think your concern about exposing
APIs that may not be terribly stable is well-founded, but I don't think
that means we shouldn't expose *anything*.

I agree 100%.

We usually change hook definition release-by-release, and it is author's
responsibility to follow the newer interface if he continues to maintain
his extension on the newer release also.
Probably, it is a gray stuff neither black nor white. If we can design
a perfect interface, it might be good but has no evolution further.
Of course, it does not justify poor designed interface, but an important
stuff is to find out a best way at this moment. It may take core
refactoring, not just exposing static functions. What I tried to implement
is the only way to implement it.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

#109

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Tom Lane (#107)

Re: Custom Scan APIs (Re: Custom Plan node)

Andres Freund <andres@2ndquadrant.com> writes:

What I think this discussion shows that this patch isn't ready for
9.4. The first iteration of the patch came in 2013-11-06. Imo that's
pretty damn late for a relatively complex patch. And obviously we
don't have agreement on the course forward.
I don't think we need to stop discussing, but I think it's pretty
clear that this isn't 9.4 material. And that it's far from "Ready for

Committer".

Yep, today is the expected feature freeze date towards v9.4.
It is little bit late to include v9.4 features, unfortunately.

Yeah. I'm still not exactly convinced that custom-scan will ever allow
independent development of new plan types (which, with all due respect to
Robert, is what it was being sold as last year in Ottawa). But I'm not
opposed in principle to committing it, if we can find a way to have a cleaner
API for things like setrefs.c. It seems like late-stage planner processing
in general is an issue for this patch (createplan.c and subselect.c are
also looking messy). EXPLAIN isn't too great either.

I'm not sure exactly what to do about those cases, but I wonder whether
things would get better if we had the equivalent of
expression_tree_walker/mutator capability for plan nodes. The state of
affairs in setrefs and subselect, at least, is a bit reminiscent of the
bad old days when we had lots of different bespoke code for traversing
expression trees.

Hmm. If we have something like expression_tree_walker/mutator for plan nodes,
we can pass a walker/mutator function's pointer instead of exposing static
functions that takes recursive jobs.
If custom-plan provider (that has sub-plans) got a callback with walker/
mutator pointer, all it has to do for sub-plans are calling this new
plan-tree walking support routine with supplied walker/mutator.
It seems to me more simple design than what I did.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#110

Kohei KaiGai

kaigai@kaigai.gr.jp

over 11 years ago

In reply to: Kouhei Kaigai (#109)

Re: Custom Scan APIs (Re: Custom Plan node)

Yeah. I'm still not exactly convinced that custom-scan will ever allow
independent development of new plan types (which, with all due respect to
Robert, is what it was being sold as last year in Ottawa). But I'm not
opposed in principle to committing it, if we can find a way to have a cleaner
API for things like setrefs.c. It seems like late-stage planner processing
in general is an issue for this patch (createplan.c and subselect.c are
also looking messy). EXPLAIN isn't too great either.

I'm not sure exactly what to do about those cases, but I wonder whether
things would get better if we had the equivalent of
expression_tree_walker/mutator capability for plan nodes. The state of
affairs in setrefs and subselect, at least, is a bit reminiscent of the
bad old days when we had lots of different bespoke code for traversing
expression trees.

Hmm. If we have something like expression_tree_walker/mutator for plan nodes,
we can pass a walker/mutator function's pointer instead of exposing static
functions that takes recursive jobs.
If custom-plan provider (that has sub-plans) got a callback with walker/
mutator pointer, all it has to do for sub-plans are calling this new
plan-tree walking support routine with supplied walker/mutator.
It seems to me more simple design than what I did.

I tried to code the similar walker/mutator functions on plan-node tree,
however, it was not available to implement these routines enough
simple, because the job of walker/mutator functions are not uniform
thus caller side also must have a large switch-case branches.

I picked up setrefs.c for my investigation.
The set_plan_refs() applies fix_scan_list() on the expression tree being
appeared in the plan node if it is delivered from Scan, however, it also
applies set_join_references() for subclass of Join, or
set_dummy_tlist_references() for some other plan nodes.
It implies that the walker/mutator functions of Plan node has to apply
different operation according to the type of Plan node. I'm not certain
how much different forms are needed.
(In addition, set_plan_refs() performs usually like a walker, but
often performs as a mutator if trivial subquery....)

I'm expecting the function like below. It allows to call plan_walker
function for each plan-node and also allows to call expr_walker
function for each expression-node on the plan node.

bool
plan_tree_walker(Plan *plan,
bool (*plan_walker) (),
bool (*expr_walker) (),
void *context)

I'd like to see if something other form to implement this routine.

One alternative idea to give custom-plan provider a chance to
handle its subplans is, to give function pointers (1) to handle
recursion of plan-tree and (2) to set up backend's internal
state.
In case of setrefs.c, set_plan_refs() and fix_expr_common()
are minimum necessity for extensions. It also kills necessity
to export static functions.

How about your thought?
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers