Custom Plan node

Started by Kohei KaiGaiover 12 years ago20 messages

kaigai@kaigai.gr.jp

over 12 years ago

1 attachment(s)

Hi,

The attached patch adds a new plan node type; CustomPlan that enables
extensions to get control during query execution, via registered callbacks.
Right now, all the jobs of the executor are built-in, except for foreign scan,
thus we have no way to run self implemented code within extension, instead
of a particular plan-tree portion. It is painful for people who want
to implement
an edge feature on the executor, because all we can do is to replace whole
of the executor portion but unreasonable maintenance burden.

CustomPlan requires extensions two steps to use; registration of a set of
callbacks, and manipulation of plan tree.
First, extension has to register a set of callbacks with a unique name
using RegisterCustomPlan(). Each callbacks are defined as follows, and
extension is responsible to perform these routines works well.

void BeginCustomPlan(CustomPlanState *cestate, int eflags);
TupleTableSlot *ExecCustomPlan(CustomPlanState *node);
Node *MultiExecCustomPlan(CustomPlanState *node);
void EndCustomPlan(CustomPlanState *node);
void ExplainCustomPlan(CustomPlanState *node, ExplainState *es);
void ReScanCustomPlan(CustomPlanState *node);
void ExecMarkPosCustomPlan(CustomPlanState *node);
void ExecRestrPosCustomPlan(CustomPlanState *node);

These callbacks are invoked if plan tree contained CustomPlan node.
However, usual code path never construct this node type towards any
SQL input. So, extension needs to manipulate the plan tree already
constructed.
It is the second job. Extension will put its local code on the planner_hook
to reference and manipulate PlannedStmt object. It can replace particular
nodes in plan tree by CustomPlan, or inject it into arbitrary point.

Though my intention is to implement GPU accelerate table scan or other
stuff on top of this feature, probably, some other useful features can be
thought. Someone suggested it may be useful for PG-XC folks to implement
clustered-scan, at the developer meeting. Also, I have an idea to implement
in-memory query cache that enables to cut off a particular branch of plan tree.
Probably, other folks have other ideas.

The contrib/xtime module shows a simple example that records elapsed time
of the underlying plan node, then print it at end of execution.
For example, this query constructs the following plan-tree as usually we see.

postgres=# EXPLAIN (costs off)
SELECT * FROM t1 JOIN t2 ON t1.a = t2.x
WHERE x BETWEEN 1000 AND 1200 ORDER BY y;
QUERY PLAN
-----------------------------------------------------
Sort
Sort Key: t2.y
-> Nested Loop
-> Seq Scan on t2
Filter: ((x >= 1000) AND (x <= 1200))
-> Index Scan using t1_pkey on t1
Index Cond: (a = t2.x)
(7 rows)

Once xtime module manipulate the plan tree to inject CustomPlan,
it shall become as follows:

postgres=# LOAD '$libdir/xtime';
LOAD
postgres=# EXPLAIN (costs off)
SELECT * FROM t1 JOIN t2 ON t1.a = t2.x
WHERE x BETWEEN 1000 AND 1200 ORDER BY y;
QUERY PLAN
-----------------------------------------------------------------
CustomPlan:xtime
-> Sort
Sort Key: y
-> CustomPlan:xtime
-> Nested Loop
-> CustomPlan:xtime on t2
Filter: ((x >= 1000) AND (x <= 1200))
-> CustomPlan:xtime
-> Index Scan using t1_pkey on t1
Index Cond: (a = x)
(10 rows)

You can see CustomPlan with name of "xtime" appeared in the plan-tree,
then the executor calls functions being registered as callback of "xtime",
when it met CustomPlan during recursive execution.

Extension has to set name of custom plan provider at least when it
construct a CustomPlan node and put it on the target plan tree.
A set of callbacks are looked up by the name, and installed on
CustomPlanState object for execution, on ExecIniNode().
The reason why I didn't put function pointer directly is, plan nodes need
to be complianced to copyObject() and others.

Please any comments.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-plan-node.v1.patchapplication/octet-stream; name=pgsql-v9.4-custom-plan-node.v1.patchDownload

 contrib/Makefile                           |   3 +-
 contrib/xtime/Makefile                     |  14 +
 contrib/xtime/xtime.c                      | 646 +++++++++++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/filelist.sgml                 |   1 +
 doc/src/sgml/xtime.sgml                    | 110 +++++
 src/backend/commands/explain.c             |  69 +++
 src/backend/executor/Makefile              |   3 +-
 src/backend/executor/execAmi.c             |  18 +
 src/backend/executor/execProcnode.c        |  18 +
 src/backend/executor/nodeCustomPlan.c      | 325 +++++++++++++++
 src/backend/nodes/copyfuncs.c              |  26 ++
 src/backend/nodes/outfuncs.c               |  15 +
 src/include/executor/nodeCustomPlan.h      |  55 +++
 src/include/nodes/execnodes.h              |  28 ++
 src/include/nodes/nodes.h                  |   2 +
 src/include/nodes/plannodes.h              |  37 ++
 src/test/regress/GNUmakefile               |  14 +-
 src/test/regress/input/custom_exec.source  | 184 ++++++++
 src/test/regress/output/custom_exec.source | 541 ++++++++++++++++++++++++
 src/test/regress/parallel_schedule         |   2 +-
 src/test/regress/serial_schedule           |   1 +
 22 files changed, 2107 insertions(+), 6 deletions(-)

diff --git a/contrib/Makefile b/contrib/Makefile
index 8a2a937..33a5c42 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -53,7 +53,8 @@ SUBDIRS = \
 		tsearch2	\
 		unaccent	\
 		vacuumlo	\
-		worker_spi
+		worker_spi	\
+		xtime
 
 ifeq ($(with_openssl),yes)
 SUBDIRS += sslinfo
diff --git a/contrib/xtime/Makefile b/contrib/xtime/Makefile
new file mode 100644
index 0000000..ee64b58
--- /dev/null
+++ b/contrib/xtime/Makefile
@@ -0,0 +1,14 @@
+# contrib/xtime/Makefile
+
+MODULES = xtime
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/xtime
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/xtime/xtime.c b/contrib/xtime/xtime.c
new file mode 100644
index 0000000..3b92484
--- /dev/null
+++ b/contrib/xtime/xtime.c
@@ -0,0 +1,646 @@
+/*
+ * xtime.c
+ *
+ * An example module for custom executor APIs. It prints time to execute
+ * underlying execution node.
+ *
+ * Copyright (C) 2013, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+
+#include "access/relscan.h"
+#include "executor/nodeCustomPlan.h"
+#include "commands/explain.h"
+#include "lib/ilist.h"
+#include "nodes/execnodes.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/planner.h"
+#include "parser/parsetree.h"
+#include "utils/guc.h"
+#include "utils/plancache.h"
+#include "utils/rel.h"
+#include "sys/time.h"
+
+PG_MODULE_MAGIC;
+
+void _PG_init(void);
+
+typedef struct {
+	int				nest_level;
+	struct timeval	elapsed_time;
+	const char	   *signature;
+} xtime_state;
+
+/* saved planner hook */
+static planner_hook_type		original_planner_hook = NULL;
+
+/*
+ * stuff related to xtime.mode.
+ *
+ * It has the third mode, not only "on" and "off", for regression test
+ * usage. The "regtest" mode performs as "on" doing, except for it does
+ * not print elapsed time being tracked.
+ */
+#define XTIME_MODE_DISABLED		0
+#define XTIME_MODE_ENABLED		1
+#define XTIME_MODE_REGTEST		2
+static int						xtime_mode = XTIME_MODE_DISABLED;
+static struct config_enum_entry	xtime_mode_options[] = {
+	{ "off",		XTIME_MODE_DISABLED,	false },
+	{ "disabled",	XTIME_MODE_DISABLED,	true },
+	{ "true",		XTIME_MODE_DISABLED,	true },
+	{ "on",			XTIME_MODE_ENABLED,		false },
+	{ "enabled",	XTIME_MODE_ENABLED,		true },
+	{ "false",		XTIME_MODE_ENABLED,		true },
+	{ "regtest",	XTIME_MODE_REGTEST,		false },
+};
+
+/*
+ * we need to reset planned cache, if xtime.mode was updated because it
+ * affects the saved plan being constructed based on older mode.
+ */
+static void
+xtime_mode_assign(int newval, void *extra)
+{
+	if (newval != xtime_mode)
+		ResetPlanCache();
+}
+
+/*
+ * xtime_plan_signature
+ *
+ * It returns a signature string of the provided plan.
+ */
+static const char *
+xtime_plan_signature(Plan *plan, List *rtable)
+{
+	RangeTblEntry  *rte;
+	char			namebuf[80 + NAMEDATALEN];
+
+	switch (nodeTag(plan))
+	{
+		case T_Result:
+			return "Result";
+        case T_ModifyTable:
+			switch (((ModifyTable *) plan)->operation)
+			{
+				case CMD_INSERT:
+					return "ModifyTable (Insert)";
+				case CMD_UPDATE:
+					return "ModifyTable (Update)";
+				case CMD_DELETE:
+					return "ModifyTable (Delete)";
+				default:
+					return "ModifyTable (unknown)";
+			}
+			break;
+		case T_Append:
+			return "Append";
+		case T_MergeAppend:
+			return "Merge Append";
+		case T_RecursiveUnion:
+			return "Recursive Union";
+		case T_BitmapAnd:
+			return "BitmapAnd";
+		case T_BitmapOr:
+			return "BitmapOr";
+		case T_NestLoop:
+			return "Nested Loop";
+		case T_MergeJoin:
+			return "Merge Join";
+		case T_HashJoin:
+			return "Hash Join";
+		case T_SeqScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Seq Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_IndexScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Index Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_IndexOnlyScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Index Only Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_BitmapIndexScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Bitmap Index Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_BitmapHeapScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Bitmap Heap Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_TidScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Tid Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_SubqueryScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Subquery Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_FunctionScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Function Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_ValuesScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Values Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_CteScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "CTE Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_WorkTableScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "WorkTable Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_ForeignScan:
+			rte = rt_fetch(((Scan *)plan)->scanrelid, rtable);
+			snprintf(namebuf, sizeof(namebuf),
+					 "Foreign Scan on %s", rte->eref->aliasname);
+			return pstrdup(namebuf);
+		case T_Material:
+			return "Materialize";
+		case T_Sort:
+			return "Sort";
+		case T_Group:
+			return "Group";
+		case T_Agg:
+			switch (((Agg *) plan)->aggstrategy)
+			{
+				case AGG_PLAIN:
+					return "Aggregate (Plain)";
+				case AGG_SORTED:
+					return "GroupAggregate (Sorted)";
+				case AGG_HASHED:
+					return "HashAggregate (Hashed)";
+				default:
+					return "Aggregate (unknown)";
+			}
+			break;
+		case T_WindowAgg:
+			return "WindowAgg";
+		case T_Unique:
+			return "Unique";
+		case T_SetOp:
+			switch (((SetOp *) plan)->strategy)
+			{
+				case SETOP_SORTED:
+					return "SetOp (Sorted)";
+				case SETOP_HASHED:
+					return "HashSetOp (Hashed)";
+				default:
+					return "SetOp (unknown)";
+			}
+			break;
+		case T_LockRows:
+			return "LockRows";
+		case T_Limit:
+			return "Limit";
+		case T_Hash:
+			return "Hash";
+		default:
+			break;
+	}
+	return "???";
+}
+
+/*
+ * xtime_begin
+ *
+ * BeginCustomPlan handler
+ */
+static void
+xtime_begin(CustomPlanState *node, int eflags)
+{
+	CustomPlan	   *custom = (CustomPlan *)node->ss.ps.plan;
+	xtime_state	   *xstate;
+	ListCell	   *cell;
+	int				nest_level = 0;
+
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	foreach (cell, custom->cust_private)
+	{
+		DefElem	   *defel = lfirst(cell);
+
+		if (strcmp(defel->defname, "nest_level") == 0)
+			nest_level = intVal(defel->arg);
+		else
+			elog(ERROR, "unexpected xtime planner info: %s",
+				 nodeToString(defel));
+	}
+
+	xstate = palloc0(sizeof(xtime_state));
+	xstate->nest_level = nest_level;
+
+	/*
+	 * This routine shall be called back with valid relation handler
+	 * if CustomPlan replaced an existing SeqScan plan and set a valid 
+	 * index on scanrenid. In this case, xtime custom plan performs as
+	 * if SeqScan is doing. So, it also initializes scan descriptor
+	 * according to the manner of sequential scan.
+	 *
+	 * Elsewhere, this node just performs to call its underlying node.
+	 * So, no need to set up any extra stuff for self relation scan.
+	 */
+	if (node->ss.ss_currentRelation)
+	{
+		Relation	relation = node->ss.ss_currentRelation;
+		Snapshot	snapshot = node->ss.ps.state->es_snapshot;
+		char		namebuf[80 + NAMEDATALEN];
+
+		node->ss.ss_currentScanDesc = heap_beginscan(relation,
+													 snapshot,
+													 0,
+													 NULL);
+		snprintf(namebuf, sizeof(namebuf),
+				 "CustomPlan:xtime on %s",
+				 RelationGetRelationName(relation));
+		xstate->signature = pstrdup(namebuf);
+	}
+	else
+	{
+		Plan   *subplan = innerPlanState(node)->plan;
+		List   *rtables = node->ss.ps.state->es_range_table;
+
+		xstate->signature = xtime_plan_signature(subplan, rtables);
+	}
+	node->cust_state = xstate;
+}
+
+/*
+ * xtime_rel_getnext
+ *
+ * ExecCustomPlan handler
+ * It has to perform as if ExecSeqScan is doing if this custom plan works
+ * to scan a relation by itself. So, it calls ExecScan with two callbacks
+ * that implement same jobs in SeqNext and SeqRecheck.
+ */
+static TupleTableSlot *
+xtime_rel_getnext(CustomPlanState *node)
+{
+	HeapTuple	tuple;
+
+	tuple = heap_getnext(node->ss.ss_currentScanDesc,
+						 node->ss.ps.state->es_direction);
+	if (tuple)
+		ExecStoreTuple(tuple,
+					   node->ss.ss_ScanTupleSlot,
+					   node->ss.ss_currentScanDesc->rs_cbuf,
+					   false);
+	else
+		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+    return node->ss.ss_ScanTupleSlot;
+}
+
+static bool
+xtime_rel_recheck(ScanState *node, TupleTableSlot *slot)
+{
+	return true;
+}
+
+static TupleTableSlot *
+xtime_exec(CustomPlanState *node)
+{
+	xtime_state	   *xstate = node->cust_state;
+	TupleTableSlot *slot;
+	struct timeval	tv1, tv2;
+
+	if (xtime_mode != XTIME_MODE_DISABLED)
+		gettimeofday(&tv1, NULL);
+
+	if (node->ss.ss_currentRelation)
+	{
+		slot = ExecScan((ScanState *)node,
+						(ExecScanAccessMtd) xtime_rel_getnext,
+						(ExecScanRecheckMtd) xtime_rel_recheck);
+	}
+	else
+	{
+		slot = ExecProcNode(innerPlanState(node));
+	}
+
+	if (xtime_mode != XTIME_MODE_DISABLED)
+	{
+		gettimeofday(&tv2, NULL);
+		xstate->elapsed_time.tv_sec += tv2.tv_sec - tv1.tv_sec;
+		xstate->elapsed_time.tv_usec += tv2.tv_usec - tv1.tv_usec;
+	}
+	return slot;
+}
+
+/*
+ * xtime_multi_exec
+ *
+ * MultiExecCustomPlan handler
+ */
+static Node *
+xtime_multi_exec(CustomPlanState *node)
+{
+	xtime_state    *xstate = node->cust_state;
+	Node		   *result;
+	struct timeval	tv1, tv2;
+
+	Assert(node->ss.ss_currentRelation == NULL);
+
+	if (xtime_mode != XTIME_MODE_DISABLED)
+		gettimeofday(&tv1, NULL);
+
+	result = MultiExecProcNode(innerPlanState(node));
+
+	if (xtime_mode != XTIME_MODE_DISABLED)
+	{
+		gettimeofday(&tv2, NULL);
+		xstate->elapsed_time.tv_sec += tv2.tv_sec - tv1.tv_sec;
+		xstate->elapsed_time.tv_usec += tv2.tv_usec - tv1.tv_usec;
+	}
+	return result;
+}
+
+/*
+ * xtime_end
+ *
+ * EndCustomPlan handler
+ */
+static void
+xtime_end(CustomPlanState *node)
+{
+	xtime_state    *xstate = node->cust_state;
+	StringInfoData	str;
+
+	if (!xstate)
+		return;
+
+	/*
+	 * Release resources being acquired by itself, however, rest of ones
+	 * are released by the framework, so extension does not need to care
+	 * about them, like per-scan memory context and so on.
+	 * In this case, all we have to clean up is scan descriptor if custom
+	 * plan performs to scan a relation by itself.
+	 */
+	if (node->ss.ss_currentRelation)
+		heap_endscan(node->ss.ss_currentScanDesc);
+
+	if (xtime_mode == XTIME_MODE_DISABLED)
+		return;
+
+	initStringInfo(&str);
+	appendStringInfoSpaces(&str, xstate->nest_level);
+	appendStringInfo(&str, "execution time of %s: ", xstate->signature);
+	if (xtime_mode == XTIME_MODE_ENABLED)
+	{
+		double	elapsed = ((double)(xstate->elapsed_time.tv_sec * 1000000 +
+									xstate->elapsed_time.tv_usec)) / 1000.0;
+		appendStringInfo(&str, "% .3f ms", elapsed);
+	}
+	else
+		appendStringInfo(&str, "**.*** ms");
+
+	elog(INFO, "%s", str.data);
+	pfree(str.data);
+}
+
+/*
+ * xtime_rescan
+ *
+ * ReScanCustomPlan handler
+ */
+static void
+xtime_rescan(CustomPlanState *node)
+{
+	if (node->ss.ss_currentRelation)
+	{
+		heap_rescan(node->ss.ss_currentScanDesc, NULL);
+
+		ExecScanReScan((ScanState *) node);
+	}
+	else if (innerPlanState(node)->chgParam == NULL)
+		ExecReScan(innerPlanState(node));
+}
+
+/*
+ * xtime_mark_pos
+ *
+ * ExecMarkPosCustomPlan handler
+ */
+static void
+xtime_mark_pos(CustomPlanState *node)
+{
+	if (node->ss.ss_currentRelation)
+	{
+		heap_markpos(node->ss.ss_currentScanDesc);
+	}
+	else
+	{
+		ExecMarkPos(innerPlanState(node));
+	}	
+}
+
+/*
+ * xtime_restr_pos
+ *
+ * ExecRestrPosCustomPlan
+ */
+static void
+xtime_restr_pos(CustomPlanState *node)
+{
+	if (node->ss.ss_currentRelation)
+	{
+		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+		heap_restrpos(node->ss.ss_currentScanDesc);
+	}
+	else
+	{
+		ExecRestrPos(innerPlanState(node));
+	}
+}
+
+static Plan *
+xtime_create_plan(Plan *plan, int nest_level)
+{
+	CustomPlan *custom = makeNode(CustomPlan);
+	DefElem	   *defel;
+
+	/*
+	 * No idea how to handle cost parameter after all the planning jobs.
+	 * So, simply we copies the cost parameters from the original or
+	 * underlying plan node.
+	 */
+	custom->scan.plan.startup_cost	= plan->startup_cost;
+	custom->scan.plan.total_cost	= plan->total_cost;
+	custom->scan.plan.plan_rows		= plan->plan_rows;
+	custom->scan.plan.plan_width	= plan->plan_width;
+
+	/* paramIDs being affected are not changed */
+	custom->scan.plan.extParam	= bms_copy(plan->extParam);
+	custom->scan.plan.allParam	= bms_copy(plan->allParam);
+
+	/*
+	 * To use this module as an example of custom plan node, we demonstrate
+	 * two different usage. The first case replaces an existing SeqScan node
+	 * by custom plan node with self relation scanning. So, we put a valid
+	 * scanrelid to suggest framework to open the target relation with
+	 * suitable lock level. It also make sense to reduce possible projection
+	 * if targetlist is compatible with relation form. The second case 
+	 * inserts a custom plan node on head of the target plan.
+	 */
+	if (IsA(plan, SeqScan))
+	{
+		custom->scan.scanrelid = ((SeqScan *)plan)->scanrelid;
+		custom->scan.plan.targetlist = ((SeqScan *)plan)->plan.targetlist;
+		custom->scan.plan.qual = ((SeqScan *)plan)->plan.qual;
+	}
+	else
+	{
+		List	   *my_tlist = NIL;
+		ListCell   *cell;
+
+		/*
+		 * Construct a targetlist that just reference underlying plan node,
+		 * because upper node assumes compatible TupleDesc.
+		 */
+		foreach (cell, plan->targetlist)
+		{
+			TargetEntry	*tle = lfirst(cell);
+			TargetEntry	*my_tle;
+			Var			*my_var;
+			char		*resname;
+
+			my_var = makeVar(INNER_VAR,
+							 tle->resno,
+							 exprType((Node *)tle->expr),
+							 exprTypmod((Node *)tle->expr),
+							 exprCollation((Node *)tle->expr),
+							 0);
+			resname = (tle->resname ? pstrdup(tle->resname) : NULL);
+			my_tle = makeTargetEntry((Expr *)my_var,
+									 tle->resno,
+									 resname,
+									 tle->resjunk);
+			my_tlist = lappend(my_tlist, my_tle);
+		}
+		custom->scan.plan.targetlist = my_tlist;
+		custom->scan.plan.qual = NIL;
+		innerPlan(custom) = plan;
+	}
+	custom->cust_name = pstrdup("xtime");
+	defel = makeDefElem("nest_level", (Node *)makeInteger(nest_level));
+	custom->cust_private = list_make1((Node *)defel);
+
+	return (Plan *)custom;
+}
+
+static Plan *
+xtime_subplan_walker(Plan *plan, int nest_level)
+{
+	ListCell   *cell;
+
+	if (IsA(plan, Append))
+	{
+		foreach (cell, ((Append *) plan)->appendplans)
+			lfirst(cell) = xtime_subplan_walker((Plan *)lfirst(cell),
+												nest_level + 1);
+	}
+	else if (IsA(plan, ModifyTable))
+	{
+		foreach (cell, ((ModifyTable *)plan)->plans)
+			lfirst(cell) = xtime_subplan_walker((Plan *)lfirst(cell),
+												nest_level + 1);
+	}
+
+	if (plan->lefttree)
+		plan->lefttree = xtime_subplan_walker(plan->lefttree,
+											  nest_level + 1);
+	if (plan->righttree)
+		plan->righttree = xtime_subplan_walker(plan->righttree,
+											   nest_level + 1);
+	/*
+	 * Note that Hash node is tightly coupled to HashJoin, so it makes
+	 * problem if CustomExex would be injected between them.
+	 */
+	if (IsA(plan, Hash))
+		return plan;
+
+	return xtime_create_plan(plan, nest_level);
+}
+
+/*
+ * xtime_planner
+ *
+ * It tries to rewrite the plan tree being constructed at the planner.
+ */
+static PlannedStmt *
+xtime_planner(Query *parse,
+			  int cursorOptions,
+			  ParamListInfo boundParams)
+{
+	PlannedStmt	*result;
+
+	if (original_planner_hook)
+		result = original_planner_hook(parse, cursorOptions, boundParams);
+	else
+		result = standard_planner(parse, cursorOptions, boundParams);
+
+	/* walk on underlying plan tree to inject custom-exec node */
+	if (xtime_mode != XTIME_MODE_DISABLED)
+		result->planTree = xtime_subplan_walker(result->planTree, 0);
+
+	return result;
+}
+
+void
+_PG_init(void)
+{
+	CustomPlanRoutine	routine;
+
+	/*
+	 * Add custom GUC variable
+	 */
+	DefineCustomEnumVariable("xtime.mode",
+							 "performing mode of xtime extension",
+							 NULL,
+							 &xtime_mode,
+							 XTIME_MODE_ENABLED,
+							 xtime_mode_options,
+							 PGC_USERSET,
+							 GUC_NOT_IN_SAMPLE,
+							 NULL, xtime_mode_assign, NULL);
+
+	/*
+	 * Registration of custom executor provider
+	 */
+	strcpy(routine.CustomPlanName, "xtime");
+	routine.IsSupportBackwardScan	= true;
+	routine.BeginCustomPlan		= xtime_begin;
+	routine.ExecCustomPlan		= xtime_exec;
+	routine.MultiExecCustomPlan	= xtime_multi_exec;
+	routine.EndCustomPlan		= xtime_end;
+	routine.ReScanCustomPlan	= xtime_rescan;
+	routine.ExecMarkPosCustomPlan	= xtime_mark_pos;
+	routine.ExecRestrPosCustomPlan	= xtime_restr_pos;
+	routine.ExplainCustomPlan	= NULL;	/* no additional information */
+	RegisterCustomPlan(&routine);
+
+	/*
+	 * Registration of planner_hook
+	 */
+	if (planner_hook)
+		original_planner_hook = planner_hook;
+	planner_hook = xtime_planner;
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index dd8e09e..7ad68ca 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -144,6 +144,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
  &unaccent;
  &uuid-ossp;
  &xml2;
+ &xtime;
 
 </appendix>
 
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 914090d..8bd8243 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -148,6 +148,7 @@
 <!ENTITY uuid-ossp       SYSTEM "uuid-ossp.sgml">
 <!ENTITY vacuumlo        SYSTEM "vacuumlo.sgml">
 <!ENTITY xml2            SYSTEM "xml2.sgml">
+<!ENTITY xtime           SYSTEM "xtime.sgml">
 
 <!-- appendixes -->
 <!ENTITY contacts   SYSTEM "contacts.sgml">
diff --git a/doc/src/sgml/xtime.sgml b/doc/src/sgml/xtime.sgml
new file mode 100644
index 0000000..3074ec2
--- /dev/null
+++ b/doc/src/sgml/xtime.sgml
@@ -0,0 +1,110 @@
+<!-- doc/src/sgml/xtime.sgml -->
+
+<sect1 id="xtime" xreflabel="xtime">
+ <title>xtime</title>
+
+ <indexterm zone="xtime">
+  <primary>xtime</primary>
+ </indexterm>
+
+ <para>
+  <filename>xtime</filename> is designed to provide a simple code
+  example to utilize custom plan node feature to override a part of
+  executor logic, but it also expose which node consumes how much
+  time on its execution.
+ </para>
+ <para>
+  Once <filename>xtime</filename> is enabled, it injects its
+  <literal>CustomPlan</> node on top of each executor node.
+  <xref linkend="sql-explain"> will show you how does it works.
+  The example below shows plan tree of a simple query that
+  contains sort, join, index and sequential scan.
+<programlisting>
+postgres=# EXPLAIN (costs off)
+           SELECT * FROM t1 JOIN t2 ON t1.a = t2.x
+                    WHERE x BETWEEN 1000 AND 1200 ORDER BY y;
+                     QUERY PLAN
+-----------------------------------------------------
+ Sort
+   Sort Key: t2.y
+   ->  Nested Loop
+         ->  Seq Scan on t2
+               Filter: ((x >= 1000) AND (x <= 1200))
+         ->  Index Scan using t1_pkey on t1
+               Index Cond: (a = t2.x)
+(7 rows)
+</programlisting>
+  Once <filename>xtime</filename> loaded, it rewrites the plan
+  tree being constructed using <literal>planner_hook</literal>,
+  as follows:
+<programlisting>
+postgres=# LOAD '$libdir/xtime';
+LOAD
+postgres=# EXPLAIN (costs off)
+           SELECT * FROM t1 JOIN t2 ON t1.a = t2.x
+                    WHERE x BETWEEN 1000 AND 1200 ORDER BY y;
+                           QUERY PLAN
+-----------------------------------------------------------------
+ CustomPlan:xtime
+   ->  Sort
+         Sort Key: y
+         ->  CustomPlan:xtime
+               ->  Nested Loop
+                     ->  CustomPlan:xtime on t2
+                           Filter: ((x >= 1000) AND (x <= 1200))
+                     ->  CustomPlan:xtime
+                           ->  Index Scan using t1_pkey on t1
+                                 Index Cond: (a = x)
+(10 rows)
+</programlisting>
+  Each <literal>CustomPlan</> of <filename>xtime</> records the
+  time when underlying executor node was started and ended for
+  each execution, then it prints total time consumption at end
+  of the query execution.
+<programlisting>
+postgres=# \timing
+Timing is on.
+postgres=# SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE x BETWEEN 1000 AND 1200 ORDER BY y;
+INFO:  execution time of Sort:  28.508 ms
+INFO:   execution time of Nested Loop:  28.100 ms
+INFO:    execution time of CustomPlan:xtime on t2:  26.658 ms
+INFO:    execution time of Index Scan on t1:  1.183 ms
+        :
+   &lt;snip&gt;
+        :
+Time: 30.794 ms
+</programlisting>
+ </para>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term>
+     <varname>xtime.mode</varname> (<type>int</type>)
+    </term>
+    <indexterm>
+     <primary><varname>xtime.mode</> configuration parameter</primary>
+    </indexterm>
+    <listitem>
+     <para>
+      Either of <literal>on</>, <literal>off</> or <literal>regtest</> shall
+      be set. <literal>on</> means its functionality is enabled, and works
+      as described above. <literal>off</> means its functionality is disabled,
+      thus it performs as if <filename>xtime</> is not loaded.
+      <literal>regtest</> performs almost as <literal>on</> doing, but it
+      does not print time consumption to run regression test correctly.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+
+ <sect2>
+  <title>Author</title>
+  <para>
+   KaiGai Kohei <email>kaigai@ak.jp.nec.com</email>
+  </para>
+ </sect2>
+</sect1>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 91bea51..2d523f6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
 #include "commands/createas.h"
 #include "commands/defrem.h"
 #include "commands/prepare.h"
+#include "executor/nodeCustomPlan.h"
 #include "executor/hashjoin.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_custom_exec_info(CustomPlanState *cestate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -809,6 +811,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	char		namebuf[NAMEDATALEN + 32];
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -961,6 +964,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			pname = sname = "Hash";
 			break;
+		case T_CustomPlan:
+			sname = "CustomPlan";
+			snprintf(namebuf, sizeof(namebuf), "CustomPlan:%s",
+					 ((CustomPlan *) plan)->cust_name);
+			pname = namebuf;
+			break;
 		default:
 			pname = sname = "???";
 			break;
@@ -1121,6 +1130,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					ExplainPropertyText("Command", setopcmd, es);
 			}
 			break;
+		case T_CustomPlan:
+			if (((Scan *)plan)->scanrelid > 0)
+				ExplainScanTarget((Scan *) plan, es);
+			break;
 		default:
 			break;
 	}
@@ -1357,6 +1370,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info((HashState *) planstate, es);
 			break;
+		case T_CustomPlan:
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_custom_exec_info((CustomPlanState *) planstate, es);
+			break;
 		default:
 			break;
 	}
@@ -1477,6 +1497,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		(IsA(plan, CustomPlan) && ((CustomPlan *)plan)->cust_subplans) ||
 		planstate->subPlan;
 	if (haschildren)
 	{
@@ -1531,6 +1552,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainNode(((SubqueryScanState *) planstate)->subplan, ancestors,
 						"Subquery", NULL, es);
 			break;
+		case T_CustomPlan:
+			if (((CustomPlan *) plan)->cust_subplans)
+				ExplainMemberNodes(((CustomPlan *) plan)->cust_subplans,
+							   ((CustomPlanState *) planstate)->cust_subplans,
+							   ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -1858,6 +1885,18 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for a CustomPlan node
+ */
+static void
+show_custom_exec_info(CustomPlanState *cestate, ExplainState *es)
+{
+	CustomPlanRoutine  *cust_routine = cestate->cust_routine;
+
+	if (cust_routine->ExplainCustomPlan != NULL)
+		cust_routine->ExplainCustomPlan(cestate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2025,6 +2064,36 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomPlan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace =
+						get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else if (rte->rtekind == RTE_FUNCTION)
+			{
+				Node   *funcexpr = ((FunctionScan *) plan)->funcexpr;
+
+				if (funcexpr && IsA(funcexpr, FuncExpr))
+				{
+					Oid		funcid = ((FuncExpr *) funcexpr)->funcid;
+
+					objectname = get_func_name(funcid);
+					if (es->verbose)
+						namespace =
+							get_namespace_name(get_func_namespace(funcid));
+				}
+				objecttag = "Function Name";
+			}
+			else if (rte->rtekind == RTE_CTE)
+			{
+				objectname = rte->ctename;
+				objecttag = "CTE Name";
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..b808bd7 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -24,6 +24,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
        nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
        nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
-       nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
+       nodeForeignscan.o nodeWindowAgg.o nodeCustomPlan.o \
+       tstoreReceiver.o spi.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..f545e01 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustomPlan.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -249,6 +250,10 @@ ExecReScan(PlanState *node)
 			ExecReScanLimit((LimitState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecReScanCustomPlan((CustomPlanState *) node);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			break;
@@ -303,6 +308,10 @@ ExecMarkPos(PlanState *node)
 			ExecResultMarkPos((ResultState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecMarkPosCustomPlan((CustomPlanState *) node);
+			break;
+
 		default:
 			/* don't make hard error unless caller asks to restore... */
 			elog(DEBUG2, "unrecognized node type: %d", (int) nodeTag(node));
@@ -360,6 +369,10 @@ ExecRestrPos(PlanState *node)
 			ExecResultRestrPos((ResultState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecRestrPosCustomPlan((CustomPlanState *) node);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			break;
@@ -475,6 +488,11 @@ ExecSupportsBackwardScan(Plan *node)
 			/* these don't evaluate tlist */
 			return ExecSupportsBackwardScan(outerPlan(node));
 
+		case T_CustomPlan:
+			if (CustomPlanSupportBackwardScan((CustomPlan *)node))
+				return TargetListSupportsBackwardScan(node->targetlist);
+			return false;
+
 		default:
 			return false;
 	}
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 76dd62f..f1ab93e 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustomPlan.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -315,6 +316,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 												 estate, eflags);
 			break;
 
+		case T_CustomPlan:
+			result = (PlanState *) ExecInitCustomPlan((CustomPlan *) node,
+													  estate, eflags);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			result = NULL;		/* keep compiler quiet */
@@ -500,6 +506,10 @@ ExecProcNode(PlanState *node)
 			result = ExecLimit((LimitState *) node);
 			break;
 
+		case T_CustomPlanState:
+			result = ExecCustomPlan((CustomPlanState *) node);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			result = NULL;
@@ -558,6 +568,10 @@ MultiExecProcNode(PlanState *node)
 			result = MultiExecBitmapOr((BitmapOrState *) node);
 			break;
 
+		case T_CustomPlanState:
+			result = MultiExecCustomPlan((CustomPlanState *) node);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			result = NULL;
@@ -736,6 +750,10 @@ ExecEndNode(PlanState *node)
 			ExecEndLimit((LimitState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecEndCustomPlan((CustomPlanState *) node);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			break;
diff --git a/src/backend/executor/nodeCustomPlan.c b/src/backend/executor/nodeCustomPlan.c
new file mode 100644
index 0000000..a61622e
--- /dev/null
+++ b/src/backend/executor/nodeCustomPlan.c
@@ -0,0 +1,325 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustomPlan.c
+ *    Routines to handle execution of custom plan node and management
+ *    of its provider.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "executor/nodeCustomPlan.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/* table of registered custom execution providers */
+static HTAB *custom_exec_hash = NULL;
+
+/*
+ * RegisterCustomPlan
+ *
+ * It registers a set of callbacks with a unique name, as custom plan.
+ * Then, extensions can use the registered custom plan if it inject custom
+ * plan node into the plan tree.
+ * A typical usage of this framework requires extensions two jobs; the first
+ * one is preliminary registration of custom plan at _PG_init() that shall be
+ * called when its module is loaded. The second one is manipulation of plan
+ * tree to add CustomPlan with registered name.
+ * A set of callbacks shall be associated on the CustomPlanState node later,
+ * and executor calls the callbacks during its jobs, so it allows extension
+ * to override a part of executor portion.
+ * Of course, it is a responsibility of extension to fetch values from the
+ * underlying plan nodes, and to return appropriate values to the upper node.
+ */
+void
+RegisterCustomPlan(const CustomPlanRoutine *routine)
+{
+	CustomPlanRoutine *entry;
+	bool		found;
+
+	if (!routine->CustomPlanName)
+		elog(ERROR, "name of custom plan was not provided.");
+
+	if (!custom_exec_hash)
+	{
+		HASHCTL		ctl;
+
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.hcxt = CacheMemoryContext;
+		ctl.keysize = NAMEDATALEN;
+		ctl.entrysize = sizeof(CustomPlanRoutine);
+
+		custom_exec_hash = hash_create("custom plan provider hash",
+									   128,
+									   &ctl,
+									   HASH_ELEM | HASH_CONTEXT);
+	}
+
+	entry = hash_search(custom_exec_hash,
+						routine->CustomPlanName,
+						HASH_ENTER, &found);
+	if (found)
+		elog(ERROR, "custom plan '%s' was already registered",
+			 routine->CustomPlanName);
+
+	Assert(strcmp(routine->CustomPlanName, entry->CustomPlanName) == 0);
+	memcpy(entry, routine, sizeof(CustomPlanRoutine));
+}
+
+/*
+ * get_custom_plan_rouine
+ *
+ * It looks up a registered custom plan by the given name.
+ */
+static CustomPlanRoutine *
+get_custom_plan_rouine(const char *cust_name)
+{
+	CustomPlanRoutine *entry;
+
+	/* lookup custom execution provider */
+	if (!custom_exec_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("no custom execution provider was registered")));
+
+	entry = (CustomPlanRoutine *) hash_search(custom_exec_hash,
+											  cust_name, HASH_FIND, NULL);
+	if (!entry)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("custom execution provider \"%s\" was not registered",
+						cust_name)));
+
+	return entry;
+}
+
+/*
+ * CustomPlanSupportBackwardScan
+ * 
+ * It checks whether the given custom plan supports backward scan, or not.
+ * Note that it does not check its capability of nodes in targetlist, even
+ * though it checks left-/right-tree and subplans.
+ */
+bool
+CustomPlanSupportBackwardScan(CustomPlan *node)
+{
+	CustomPlanRoutine *entry = get_custom_plan_rouine(node->cust_name);
+	ListCell	   *cell;
+
+	if (!entry->IsSupportBackwardScan)
+		return false;
+	if (innerPlan(node) && !ExecSupportsBackwardScan(innerPlan(node)))
+		return false;
+	if (outerPlan(node) && !ExecSupportsBackwardScan(outerPlan(node)))
+		return false;
+	foreach (cell, node->cust_subplans)
+	{
+		if (!ExecSupportsBackwardScan((Plan *) lfirst(cell)))
+			return false;
+	}
+	return true;
+}
+
+/*
+ * ExecInitCustomPlan
+ *
+ * It constructs a CustomPlanState node according to the supplied CustomPlan,
+ * and recursively initializes underlying left-/right-tree and subplans, if
+ * any.
+ * It also opens the relation with suitable lock level, if CustomPlan has
+ * a valid 'scanrelid'. It is a recommendable way to implement a custom plan
+ * to scan a particular relation instead of built-in Scan nodes; to avoid
+ * unnecessary projection if its target-list is compatible with definition
+ * of the target relation.
+ */
+CustomPlanState *
+ExecInitCustomPlan(CustomPlan *node, EState *estate, int eflags)
+{
+	Plan			   *plan = &node->scan.plan;
+	CustomPlanRoutine  *entry = get_custom_plan_rouine(node->cust_name);
+	CustomPlanState	   *custom;
+	ListCell		   *cell;
+	int					index;
+
+	/*
+	 * create state structure
+	 */
+	custom = makeNode(CustomPlanState);
+	custom->ss.ps.plan = (Plan *) node;
+	custom->ss.ps.state = estate;
+	if (node->cust_subplans != NIL)
+	{
+		custom->cust_numplans = list_length(node->cust_subplans);
+		custom->cust_subplans = palloc0(sizeof(PlanState *) *
+										custom->cust_numplans);
+	}
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &custom->ss.ps);
+
+	custom->ss.ps.ps_TupFromTlist = false;
+
+	/*
+	 * initialize child expressions
+	 */
+	custom->ss.ps.targetlist =
+		(List *) ExecInitExpr((Expr *) plan->targetlist,
+							  (PlanState *) custom);
+	custom->ss.ps.qual =
+		(List *) ExecInitExpr((Expr *) plan->qual,
+							  (PlanState *) custom);
+
+	/* tuple table initialization */
+	ExecInitResultTupleSlot(estate, &custom->ss.ps);
+
+	/* initialization if custom-exec scan on relation */
+	if (node->scan.scanrelid > 0)
+	{
+		Relation	rel;
+
+		ExecInitScanTupleSlot(estate, &custom->ss);
+		rel = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+		custom->ss.ss_currentRelation = rel;
+		ExecAssignScanType(&custom->ss, RelationGetDescr(rel));
+	}
+
+	/* initialize underlying subplans, if exist */
+	outerPlanState(custom) = ExecInitNode(outerPlan(node), estate, eflags);
+	innerPlanState(custom) = ExecInitNode(innerPlan(node), estate, eflags);
+	index = 0;
+	foreach (cell, node->cust_subplans)
+	{
+		custom->cust_subplans[index++]
+			= ExecInitNode((Plan *)lfirst(cell), estate, eflags);
+	}
+
+	/*
+	 * Initialize result tuple type and projection info.
+	 */
+	ExecAssignResultTypeFromTL(&custom->ss.ps);
+	if (node->scan.scanrelid > 0)
+		ExecAssignScanProjectionInfo(&custom->ss);
+	else
+		ExecAssignProjectionInfo(&custom->ss.ps, NULL);
+
+	/*
+	 * Tell the custom-exec provider to initiate this plan
+	 */
+	custom->cust_routine = entry;
+	custom->cust_state = NULL;
+	custom->cust_routine->BeginCustomPlan(custom, eflags);
+
+	return custom;
+}
+
+/*
+ * ExecCustomPlan
+ *
+ * It calls back extension to get a tuple being stored in TupleTableSlot.
+ * NULL means no more tuples can be fetched.
+ * Also note that extension is responsible to execute underlying plans
+ * with suitable timing.
+ */
+TupleTableSlot *
+ExecCustomPlan(CustomPlanState *node)
+{
+	Assert(node->cust_routine->ExecCustomPlan != NULL);
+
+	return node->cust_routine->ExecCustomPlan(node);
+}
+
+/*
+ * MultiExecCustomPlan
+ *
+ * It is a variation of ExecCustomPlan if CustomPlan is connected to some
+ * node types that expect underlying plan returns multiple tuples according
+ * to its expectation.
+ * Please note that HashJoin and Hash are tightly connected, and its protocol
+ * to return scanned result is a bit ad-hoc. Extension needs to pay attention
+ * if it tries to replace Hash plan.
+ */
+Node *
+MultiExecCustomPlan(CustomPlanState *node)
+{
+	Assert(node->cust_routine->MultiExecCustomPlan != NULL);
+
+	return node->cust_routine->MultiExecCustomPlan(node);
+}
+
+/*
+ * ExecEndCustomPlan
+ *
+ * It ends this custom plan. Extension is also called back to release
+ * resources used for execution.
+ */
+void
+ExecEndCustomPlan(CustomPlanState *node)
+{
+	int		index;
+
+	/* Let the custom-exec shut down */
+	node->cust_routine->EndCustomPlan(node);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&node->ss.ps);
+
+	ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	if (node->ss.ss_ScanTupleSlot)
+		ExecClearTuple(node->ss.ss_ScanTupleSlot);
+	if (node->ss.ss_currentRelation)
+		ExecCloseScanRelation(node->ss.ss_currentRelation);
+
+	/* End the underlying exec-nodes also */
+	ExecEndNode(outerPlanState(node));
+	ExecEndNode(innerPlanState(node));
+	for (index=0; index < node->cust_numplans; index++)
+		ExecEndNode(node->cust_subplans[index]);
+}
+
+/*
+ * ExecReScanCustomPlan
+ *
+ * It calls back extension to reset current position of this scan.
+ */
+void
+ExecReScanCustomPlan(CustomPlanState *node)
+{
+	node->cust_routine->ReScanCustomPlan(node);
+
+	if (node->ss.ss_currentRelation)
+		ExecScanReScan(&node->ss);
+}
+
+/*
+ * ExecMarkPosCustomPlan
+ */
+void
+ExecMarkPosCustomPlan(CustomPlanState *node)
+{
+	if (node->cust_routine->ExecMarkPosCustomPlan)
+		node->cust_routine->ExecMarkPosCustomPlan(node);
+	else
+		elog(DEBUG2, "CustomPlan:%s does not support ExecMarkPos",
+			 node->cust_routine->CustomPlanName);
+}
+
+/*
+ * ExecRestrPosCustomPlan
+ */
+void
+ExecRestrPosCustomPlan(CustomPlanState *node)
+{
+	if (node->cust_routine->ExecRestrPosCustomPlan)
+		node->cust_routine->ExecRestrPosCustomPlan(node);
+	else
+		elog(ERROR, "CustomPlan:%s does not support ExecMarkPos",
+			 node->cust_routine->CustomPlanName);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 788907e..cf4e817 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -938,6 +938,29 @@ _copyLimit(const Limit *from)
 }
 
 /*
+ * _copyCustomPlan
+ */
+static CustomPlan *
+_copyCustomPlan(const CustomPlan *from)
+{
+	CustomPlan	   *newnode = makeNode(CustomPlan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(cust_name);
+	COPY_NODE_FIELD(cust_private);
+	COPY_NODE_FIELD(cust_subplans);
+
+	return newnode;
+}
+
+/*
  * _copyNestLoopParam
  */
 static NestLoopParam *
@@ -3970,6 +3993,9 @@ copyObject(const void *from)
 		case T_Limit:
 			retval = _copyLimit(from);
 			break;
+		case T_CustomPlan:
+			retval = _copyCustomPlan(from);
+			break;
 		case T_NestLoopParam:
 			retval = _copyNestLoopParam(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index cff4734..c93938e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -823,6 +823,18 @@ _outLimit(StringInfo str, const Limit *node)
 }
 
 static void
+_outCustomPlan(StringInfo str, const CustomPlan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(cust_name);
+	WRITE_NODE_FIELD(cust_private);
+	WRITE_NODE_FIELD(cust_subplans);
+}
+
+static void
 _outNestLoopParam(StringInfo str, const NestLoopParam *node)
 {
 	WRITE_NODE_TYPE("NESTLOOPPARAM");
@@ -2855,6 +2867,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_Limit:
 				_outLimit(str, obj);
 				break;
+			case T_CustomPlan:
+				_outCustomPlan(str, obj);
+				break;
 			case T_NestLoopParam:
 				_outNestLoopParam(str, obj);
 				break;
diff --git a/src/include/executor/nodeCustomPlan.h b/src/include/executor/nodeCustomPlan.h
new file mode 100644
index 0000000..79fc2ae
--- /dev/null
+++ b/src/include/executor/nodeCustomPlan.h
@@ -0,0 +1,55 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustomPlan.h
+ *
+ * prototypes for custom plan nodes
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOMPLAN_H
+#define NODECUSTOMPLAN_H
+
+#include "commands/explain.h"
+#include "nodes/execnodes.h"
+
+typedef void (*BeginCustomPlan_function)(CustomPlanState *cestate,
+										 int eflags);
+typedef TupleTableSlot *(*ExecCustomPlan_function)(CustomPlanState *node);
+typedef Node *(*MultiExecCustomPlan_function)(CustomPlanState *node);
+typedef void (*ReScanCustomPlan_function)(CustomPlanState *node);
+typedef void (*EndCustomPlan_function)(CustomPlanState *node);
+typedef void (*ExplainCustomPlan_function)(CustomPlanState *node,
+										   ExplainState *es);
+typedef void (*ExecMarkPosCustomPlan_function)(CustomPlanState *node);
+typedef void (*ExecRestrPosCustomPlan_function)(CustomPlanState *node);
+
+typedef struct CustomPlanRoutine
+{
+	char							CustomPlanName[NAMEDATALEN];
+	bool							IsSupportBackwardScan;
+	BeginCustomPlan_function		BeginCustomPlan;
+	ExecCustomPlan_function			ExecCustomPlan;
+	MultiExecCustomPlan_function	MultiExecCustomPlan;
+	EndCustomPlan_function			EndCustomPlan;
+	ExplainCustomPlan_function		ExplainCustomPlan;
+	ReScanCustomPlan_function		ReScanCustomPlan;
+	ExecMarkPosCustomPlan_function	ExecMarkPosCustomPlan;
+	ExecRestrPosCustomPlan_function	ExecRestrPosCustomPlan;
+} CustomPlanRoutine;
+
+extern void RegisterCustomPlan(const CustomPlanRoutine *routine);
+extern bool CustomPlanSupportBackwardScan(CustomPlan *node);
+
+extern CustomPlanState *ExecInitCustomPlan(CustomPlan *node,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomPlan(CustomPlanState *node);
+extern Node *MultiExecCustomPlan(CustomPlanState *node);
+extern void ExecEndCustomPlan(CustomPlanState *node);
+extern void ExecReScanCustomPlan(CustomPlanState *node);
+extern void ExecMarkPosCustomPlan(CustomPlanState *node);
+extern void ExecRestrPosCustomPlan(CustomPlanState *node);
+
+#endif	/* NODECUSTOMPLAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3b430e0..4f557db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1887,4 +1887,32 @@ typedef struct LimitState
 	TupleTableSlot *subSlot;	/* tuple last obtained from subplan */
 } LimitState;
 
+/*
+ * ----------------
+ *	CustomPlanState information
+ *
+ */
+typedef struct
+{
+	ScanState	ss;
+
+	/*
+	 * callback routines of this custom plan provider. Note that,
+	 * we use struct pointer to avoid including nodeCustomPlan.h here.
+	 */
+	struct CustomPlanRoutine *cust_routine;
+
+	/*
+	 * provider of custom-executor can keep private state here.
+	 */
+	void	   *cust_state;
+
+	/*
+	 * NULL, or array of PlanStates for inputs, if this custom-
+	 * executor performs like Append.
+	 */
+	PlanState **cust_subplans;
+	int			cust_numplans;
+} CustomPlanState;
+
 #endif   /* EXECNODES_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 78368c6..2eb142c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -76,6 +76,7 @@ typedef enum NodeTag
 	T_SetOp,
 	T_LockRows,
 	T_Limit,
+	T_CustomPlan,
 	/* these aren't subclasses of Plan: */
 	T_NestLoopParam,
 	T_PlanRowMark,
@@ -121,6 +122,7 @@ typedef enum NodeTag
 	T_SetOpState,
 	T_LockRowsState,
 	T_LimitState,
+	T_CustomPlanState,
 
 	/*
 	 * TAGS FOR PRIMITIVE NODES (primnodes.h)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 44ea0b7..2484e54 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -750,6 +750,43 @@ typedef struct Limit
 	Node	   *limitCount;		/* COUNT parameter, or NULL if none */
 } Limit;
 
+/* ----------------
+ *		custom plan node
+ *
+ * Note: we assume that extension appends a custom plan or replaces part
+ * of the given ones with a custom plan on the planner_hook. Custom plan
+ * provider is identified with its name being registred, then suitable set
+ * of callbacks shall be choosen prior to its execution.
+ * ----------------
+ */
+typedef struct CustomPlan
+{
+	/*
+	 * Common field of Plan nodes. Also note that scanrelid can have
+	 * a valid index into the range table if it scans a particular
+	 * table. In this case, PostgreSQL opens the relation with suitable
+	 * lock level prior to invocation of its callback.
+	 * Otherwise, set 0 on scanrelid instead.
+	 */
+	Scan		scan;
+
+	/*
+	 * Name of the custom plan provider; must be set to identify
+	 * which provider shall run this node.
+	 */
+	char	   *cust_name;
+
+	/*
+	 * Private information to be passed to executor callback
+	 */
+	List	   *cust_private;
+
+	/*
+	 * List of subplans if custom node has multiple number of subplans
+	 * like as Append node has multiple children.
+	 */
+	List	   *cust_subplans;
+} CustomPlan;
 
 /*
  * RowMarkType -
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index d5935b6..d73a675 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -100,7 +100,7 @@ installdirs-tests: installdirs
 
 # Get some extra C modules from contrib/spi and contrib/dummy_seclabel...
 
-all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+all: refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) xtime$(DLSUFFIX)
 
 refint$(DLSUFFIX): $(top_builddir)/contrib/spi/refint$(DLSUFFIX)
 	cp $< $@
@@ -111,19 +111,27 @@ autoinc$(DLSUFFIX): $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX)
 dummy_seclabel$(DLSUFFIX): $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX)
 	cp $< $@
 
+xtime$(DLSUFFIX): $(top_builddir)/contrib/xtime/xtime$(DLSUFFIX)
+	cp $< $@
+
 $(top_builddir)/contrib/spi/refint$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/spi/autoinc$(DLSUFFIX): | submake-contrib-spi ;
 
 $(top_builddir)/contrib/dummy_seclabel/dummy_seclabel$(DLSUFFIX): | submake-contrib-dummy_seclabel ;
 
+$(top_builddir)/contrib/xtime/xtime$(DLSUFFIX): | submake-contrib-xtime ;
+
 submake-contrib-spi:
 	$(MAKE) -C $(top_builddir)/contrib/spi
 
 submake-contrib-dummy_seclabel:
 	$(MAKE) -C $(top_builddir)/contrib/dummy_seclabel
 
-.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel
+submake-contrib-xtime:
+	$(MAKE) -C $(top_builddir)/contrib/xtime
+
+.PHONY: submake-contrib-spi submake-contrib-dummy_seclabel submake-contrib-xtime
 
 # Tablespace setup
 
@@ -170,7 +178,7 @@ bigcheck: all tablespace-setup
 
 clean distclean maintainer-clean: clean-lib
 # things built by `all' target
-	rm -f $(OBJS) refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX)
+	rm -f $(OBJS) refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) xtime$(DLSUFFIX)
 	rm -f pg_regress_main.o pg_regress.o pg_regress$(X)
 # things created by various check targets
 	rm -f $(output_files) $(input_files)
diff --git a/src/test/regress/input/custom_exec.source b/src/test/regress/input/custom_exec.source
new file mode 100644
index 0000000..054fb1e
--- /dev/null
+++ b/src/test/regress/input/custom_exec.source
@@ -0,0 +1,184 @@
+--
+-- Test for custom executor nodes
+--
+
+-- Clean up in case a prior regression run failed
+SET client_min_messages TO 'warning';
+
+DROP SCHEMA IF EXISTS custom_exec_test CASCADE;
+
+RESET client_min_messages;
+
+-- Setting up tables for tests
+CREATE SCHEMA custom_exec_test;
+SET search_path TO custom_exec_test, pg_temp, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT v, md5(v::text) FROM generate_series(1,10000) AS v);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int references t1(a),
+    y   text,
+    z   float
+);
+INSERT INTO t2 (SELECT v, md5(v::text), (v::text || '.' || v::text)::float FROM generate_series(1,10000,2) AS v);
+CREATE INDEX t2_x_idx ON t2(x);
+VACUUM ANALYZE t2;
+
+CREATE TABLE s1 (
+    a   int,
+    b   text
+);
+INSERT INTO s1 (SELECT v, md5((v+10)::text) FROM generate_series(1,2000) AS v);
+INSERT INTO s1 (SELECT v, md5((v+20)::text) FROM generate_series(1,2000) AS v);
+INSERT INTO s1 (SELECT v, md5((v+30)::text) FROM generate_series(1,2000) AS v);
+INSERT INTO s1 (SELECT v, md5((v+40)::text) FROM generate_series(1,2000) AS v);
+INSERT INTO s1 (SELECT v, md5((v+50)::text) FROM generate_series(1,2000) AS v);
+CREATE INDEX s1_a_idx ON s1(a);
+VACUUM ANALYZE s1;
+
+CREATE TABLE u1 (
+    x   int,
+    y   text,
+    z   float
+);
+CREATE INDEX u1_x_idx ON u1(x);
+INSERT INTO u1 (SELECT v, md5(v::text) FROM generate_series(1,5000) AS v);
+VACUUM ANALYZE u1;
+
+CREATE TABLE u2 () INHERITS (u1);
+INSERT INTO u2 (SELECT v, md5((v+5000)::text) FROM generate_series(1,5000) AS v);
+CREATE UNIQUE INDEX u2_x_idx ON u2(x);
+VACUUM ANALYZE u2;
+
+CREATE TABLE u3 () INHERITS (u1);
+INSERT INTO u3 (SELECT v, md5((v+10000)::text) FROM generate_series(1,5000) AS v);
+VACUUM ANALYZE u3;
+
+-- Load example custom execution provider 
+LOAD '@libdir@/xtime@DLSUFFIX@';
+SET xtime.regression_test = on;
+
+-- test for Hash Join
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x WHERE a BETWEEN 100 AND 200;
+SELECT * INTO TEMP no_cust FROM t1 JOIN t2 ON a = x WHERE a BETWEEN 100 AND 200;
+
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x WHERE a BETWEEN 100 AND 200;
+SELECT * INTO TEMP with_cust FROM t1 JOIN t2 ON a = x WHERE a BETWEEN 100 AND 200;
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+DROP TABLE IF EXISTS no_cust, with_cust;
+
+
+-- test for SeqScan replace
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x WHERE a > 1000;
+SELECT * INTO TEMP no_cust FROM t1 JOIN t2 ON a = x WHERE a > 1000;
+
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x WHERE a > 1000;
+SELECT * INTO TEMP with_cust FROM t1 JOIN t2 ON a = x WHERE a > 1000;
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+DROP TABLE IF EXISTS no_cust, with_cust;
+
+
+-- test for target list with projection
+SET xtime.mode = off;
+EXPLAIN(costs off, verbose) SELECT a * 10000 + x AS c1, b || '_' || 'y' AS c2
+        INTO TEMP no_cust FROM t1 JOIN t2 ON a = x WHERE x % 2 = 0;
+SELECT a * 10000 + x AS c1, b || '_' || 'y' AS c2
+        INTO TEMP no_cust FROM t1 JOIN t2 ON a = x WHERE x % 2 = 0;
+
+SET xtime.mode = regtest;
+EXPLAIN(costs off, verbose) SELECT a * 10000 + x AS c1, b || '_' || 'y' AS c2
+        INTO TEMP q1 FROM t1 JOIN t2 ON a = x WHERE x % 2 = 0;
+SELECT a * 10000 + x AS c1, b || '_' || 'y' AS c2
+        INTO TEMP with_cust FROM t1 JOIN t2 ON a = x WHERE x % 2 = 0;
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+DROP TABLE IF EXISTS no_cust, with_cust;
+
+
+-- test for Bitmap Scan
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM s1 WHERE a between 100 and 200;
+SELECT * INTO TEMP no_cust FROM s1 WHERE a between 100 and 200;
+
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM s1 WHERE a between 100 and 200;
+SELECT * INTO TEMP with_cust FROM s1 WHERE a between 100 and 200;
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+DROP TABLE IF EXISTS no_cust, with_cust;
+
+
+-- test for Sort/Limit
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM s1 WHERE a > 9000 ORDER BY b LIMIT 20;
+SELECT * INTO TEMP no_cust FROM s1 WHERE a > 9000 ORDER BY b LIMIT 20;
+
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM s1 WHERE a > 9000 ORDER BY b LIMIT 20;
+SELECT * INTO TEMP with_cust FROM s1 WHERE a > 9000 ORDER BY b LIMIT 20;
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+DROP TABLE IF EXISTS no_cust, with_cust;
+
+
+-- test for complicated query
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x
+       WHERE a IN (SELECT a FROM s1 WHERE b like '%abc%' ORDER BY b LIMIT 100);
+SELECT * INTO TEMP no_cust FROM t1 JOIN t2 ON a = x
+       WHERE a IN (SELECT a FROM s1 WHERE b like '%abc%' ORDER BY b LIMIT 100);
+
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x
+       WHERE a IN (SELECT a FROM s1 WHERE b like '%abc%' ORDER BY b LIMIT 100);
+SELECT * INTO TEMP with_cust FROM t1 JOIN t2 ON a = x
+       WHERE a IN (SELECT a FROM s1 WHERE b like '%abc%' ORDER BY b LIMIT 100);
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+DROP TABLE IF EXISTS no_cust, with_cust;
+
+
+-- test for inherited tables
+SET xtime.mode = off;
+EXPLAIN(costs off, verbose) SELECT x+z AS c1, y||y AS c2 FROM u1 WHERE x BETWEEN 1111 AND 4444;
+SELECT x+z AS c1, y||y AS c2 INTO no_cust FROM u1 WHERE x BETWEEN 1111 AND 4444;
+
+SET xtime.mode = regtest;
+EXPLAIN(costs off, verbose) SELECT x+z AS c1, y||y AS c2 FROM u1 WHERE x BETWEEN 1111 AND 4444;
+SELECT x+z AS c1, y||y AS c2 INTO with_cust FROM u1 WHERE x BETWEEN 1111 AND 4444;
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+DROP TABLE IF EXISTS no_cust, with_cust;
+
+
+-- test for inherited tables with index scan
+SET xtime.mode = off;
+EXPLAIN(costs off, verbose) SELECT x-z AS c1, y||y AS c2 FROM u1 WHERE x = 3333;
+SELECT x-z AS c1, y||y AS c2 INTO no_cust FROM u1 WHERE x = 3333;
+
+SET xtime.mode = regtest;
+EXPLAIN(costs off, verbose) SELECT x-z AS c1, y||y AS c2 FROM u1 WHERE x = 3333;
+SELECT x-z AS c1, y||y AS c2 INTO with_cust FROM u1 WHERE x = 3333;
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+DROP TABLE IF EXISTS no_cust, with_cust;
+
+-- test for invalidation of prepared statement
+SET xtime.mode = off;
+PREPARE p1(int,int) AS SELECT * FROM u1 WHERE x BETWEEN $1 AND $2;
+EXPLAIN EXECUTE p1(100,200);
+
+SET xtime.mode = on;
+PREPARE p2(int) AS SELECT * FROM u1 WHERE x = $1;
+EXPLAIN EXECUTE p1(100,200);
+EXPLAIN EXECUTE p2(2222);
+
+SET xtime.mode = off;
+EXPLAIN EXECUTE p1(100,200);
+EXPLAIN EXECUTE p2(2222);
+
+-- Clean up resources
+DROP SCHEMA IF EXISTS custom_exec_test CASCADE;
diff --git a/src/test/regress/output/custom_exec.source b/src/test/regress/output/custom_exec.source
new file mode 100644
index 0000000..146e468
--- /dev/null
+++ b/src/test/regress/output/custom_exec.source
@@ -0,0 +1,541 @@
+--
+-- Test for custom executor nodes
+--
+-- Clean up in case a prior regression run failed
+SET client_min_messages TO 'warning';
+DROP SCHEMA IF EXISTS custom_exec_test CASCADE;
+RESET client_min_messages;
+-- Setting up tables for tests
+CREATE SCHEMA custom_exec_test;
+SET search_path TO custom_exec_test, pg_temp, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT v, md5(v::text) FROM generate_series(1,10000) AS v);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int references t1(a),
+    y   text,
+    z   float
+);
+INSERT INTO t2 (SELECT v, md5(v::text), (v::text || '.' || v::text)::float FROM generate_series(1,10000,2) AS v);
+CREATE INDEX t2_x_idx ON t2(x);
+VACUUM ANALYZE t2;
+CREATE TABLE s1 (
+    a   int,
+    b   text
+);
+INSERT INTO s1 (SELECT v, md5((v+10)::text) FROM generate_series(1,2000) AS v);
+INSERT INTO s1 (SELECT v, md5((v+20)::text) FROM generate_series(1,2000) AS v);
+INSERT INTO s1 (SELECT v, md5((v+30)::text) FROM generate_series(1,2000) AS v);
+INSERT INTO s1 (SELECT v, md5((v+40)::text) FROM generate_series(1,2000) AS v);
+INSERT INTO s1 (SELECT v, md5((v+50)::text) FROM generate_series(1,2000) AS v);
+CREATE INDEX s1_a_idx ON s1(a);
+VACUUM ANALYZE s1;
+CREATE TABLE u1 (
+    x   int,
+    y   text,
+    z   float
+);
+CREATE INDEX u1_x_idx ON u1(x);
+INSERT INTO u1 (SELECT v, md5(v::text) FROM generate_series(1,5000) AS v);
+VACUUM ANALYZE u1;
+CREATE TABLE u2 () INHERITS (u1);
+INSERT INTO u2 (SELECT v, md5((v+5000)::text) FROM generate_series(1,5000) AS v);
+CREATE UNIQUE INDEX u2_x_idx ON u2(x);
+VACUUM ANALYZE u2;
+CREATE TABLE u3 () INHERITS (u1);
+INSERT INTO u3 (SELECT v, md5((v+10000)::text) FROM generate_series(1,5000) AS v);
+VACUUM ANALYZE u3;
+-- Load example custom execution provider 
+LOAD '@libdir@/xtime@DLSUFFIX@';
+SET xtime.regression_test = on;
+-- test for Hash Join
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x WHERE a BETWEEN 100 AND 200;
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Hash Join
+   Hash Cond: (t2.x = t1.a)
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Index Scan using t1_pkey on t1
+               Index Cond: ((a >= 100) AND (a <= 200))
+(6 rows)
+
+SELECT * INTO TEMP no_cust FROM t1 JOIN t2 ON a = x WHERE a BETWEEN 100 AND 200;
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x WHERE a BETWEEN 100 AND 200;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ CustomPlan:xtime
+   ->  Hash Join
+         Hash Cond: (x = t1.a)
+         ->  CustomPlan:xtime on t2
+         ->  Hash
+               ->  CustomPlan:xtime
+                     ->  Index Scan using t1_pkey on t1
+                           Index Cond: ((a >= 100) AND (a <= 200))
+(8 rows)
+
+SELECT * INTO TEMP with_cust FROM t1 JOIN t2 ON a = x WHERE a BETWEEN 100 AND 200;
+INFO:  execution time of Hash Join: **.*** ms
+INFO:   execution time of CustomPlan:xtime on t2: **.*** ms
+INFO:    execution time of Index Scan on t1: **.*** ms
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+INFO:  execution time of HashSetOp (Hashed): **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 1: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 2: **.*** ms
+ a | b | x | y | z 
+---+---+---+---+---
+(0 rows)
+
+DROP TABLE IF EXISTS no_cust, with_cust;
+-- test for SeqScan replace
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x WHERE a > 1000;
+         QUERY PLAN         
+----------------------------
+ Hash Join
+   Hash Cond: (t1.a = t2.x)
+   ->  Seq Scan on t1
+         Filter: (a > 1000)
+   ->  Hash
+         ->  Seq Scan on t2
+(6 rows)
+
+SELECT * INTO TEMP no_cust FROM t1 JOIN t2 ON a = x WHERE a > 1000;
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x WHERE a > 1000;
+                QUERY PLAN                
+------------------------------------------
+ CustomPlan:xtime
+   ->  Hash Join
+         Hash Cond: (t1.a = t2.x)
+         ->  CustomPlan:xtime on t1
+               Filter: (a > 1000)
+         ->  Hash
+               ->  CustomPlan:xtime on t2
+(7 rows)
+
+SELECT * INTO TEMP with_cust FROM t1 JOIN t2 ON a = x WHERE a > 1000;
+INFO:  execution time of Hash Join: **.*** ms
+INFO:   execution time of CustomPlan:xtime on t1: **.*** ms
+INFO:    execution time of CustomPlan:xtime on t2: **.*** ms
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+INFO:  execution time of HashSetOp (Hashed): **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 1: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 2: **.*** ms
+ a | b | x | y | z 
+---+---+---+---+---
+(0 rows)
+
+DROP TABLE IF EXISTS no_cust, with_cust;
+-- test for target list with projection
+SET xtime.mode = off;
+EXPLAIN(costs off, verbose) SELECT a * 10000 + x AS c1, b || '_' || 'y' AS c2
+        INTO TEMP no_cust FROM t1 JOIN t2 ON a = x WHERE x % 2 = 0;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Nested Loop
+   Output: ((t1.a * 10000) + t2.x), ((t1.b || '_'::text) || 'y'::text)
+   ->  Seq Scan on custom_exec_test.t2
+         Output: t2.x, t2.y, t2.z
+         Filter: ((t2.x % 2) = 0)
+   ->  Index Scan using t1_pkey on custom_exec_test.t1
+         Output: t1.a, t1.b
+         Index Cond: (t1.a = t2.x)
+(8 rows)
+
+SELECT a * 10000 + x AS c1, b || '_' || 'y' AS c2
+        INTO TEMP no_cust FROM t1 JOIN t2 ON a = x WHERE x % 2 = 0;
+SET xtime.mode = regtest;
+EXPLAIN(costs off, verbose) SELECT a * 10000 + x AS c1, b || '_' || 'y' AS c2
+        INTO TEMP q1 FROM t1 JOIN t2 ON a = x WHERE x % 2 = 0;
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
+ CustomPlan:xtime
+   Output: (((t1.a * 10000) + x)), (((t1.b || '_'::text) || 'y'::text))
+   ->  Nested Loop
+         Output: ((t1.a * 10000) + x), ((t1.b || '_'::text) || 'y'::text)
+         ->  CustomPlan:xtime on custom_exec_test.t2
+               Output: x, y, z
+               Filter: ((x % 2) = 0)
+         ->  CustomPlan:xtime
+               Output: t1.a, t1.b
+               ->  Index Scan using t1_pkey on custom_exec_test.t1
+                     Output: t1.a, t1.b
+                     Index Cond: (t1.a = x)
+(12 rows)
+
+SELECT a * 10000 + x AS c1, b || '_' || 'y' AS c2
+        INTO TEMP with_cust FROM t1 JOIN t2 ON a = x WHERE x % 2 = 0;
+INFO:  execution time of Nested Loop: **.*** ms
+INFO:   execution time of CustomPlan:xtime on t2: **.*** ms
+INFO:   execution time of Index Scan on t1: **.*** ms
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+INFO:  execution time of HashSetOp (Hashed): **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 1: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 2: **.*** ms
+ c1 | c2 
+----+----
+(0 rows)
+
+DROP TABLE IF EXISTS no_cust, with_cust;
+-- test for Bitmap Scan
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM s1 WHERE a between 100 and 200;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Bitmap Heap Scan on s1
+   Recheck Cond: ((a >= 100) AND (a <= 200))
+   ->  Bitmap Index Scan on s1_a_idx
+         Index Cond: ((a >= 100) AND (a <= 200))
+(4 rows)
+
+SELECT * INTO TEMP no_cust FROM s1 WHERE a between 100 and 200;
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM s1 WHERE a between 100 and 200;
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ CustomPlan:xtime
+   ->  Bitmap Heap Scan on s1
+         Recheck Cond: ((a >= 100) AND (a <= 200))
+         ->  CustomPlan:xtime
+               ->  Bitmap Index Scan on s1_a_idx
+                     Index Cond: ((a >= 100) AND (a <= 200))
+(6 rows)
+
+SELECT * INTO TEMP with_cust FROM s1 WHERE a between 100 and 200;
+INFO:  execution time of Bitmap Heap Scan on s1: **.*** ms
+INFO:   execution time of Bitmap Index Scan on s1: **.*** ms
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+INFO:  execution time of HashSetOp (Hashed): **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 1: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 2: **.*** ms
+ a | b 
+---+---
+(0 rows)
+
+DROP TABLE IF EXISTS no_cust, with_cust;
+-- test for Sort/Limit
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM s1 WHERE a > 9000 ORDER BY b LIMIT 20;
+                 QUERY PLAN                  
+---------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: b
+         ->  Index Scan using s1_a_idx on s1
+               Index Cond: (a > 9000)
+(5 rows)
+
+SELECT * INTO TEMP no_cust FROM s1 WHERE a > 9000 ORDER BY b LIMIT 20;
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM s1 WHERE a > 9000 ORDER BY b LIMIT 20;
+                          QUERY PLAN                           
+---------------------------------------------------------------
+ CustomPlan:xtime
+   ->  Limit
+         ->  CustomPlan:xtime
+               ->  Sort
+                     Sort Key: b
+                     ->  CustomPlan:xtime
+                           ->  Index Scan using s1_a_idx on s1
+                                 Index Cond: (a > 9000)
+(8 rows)
+
+SELECT * INTO TEMP with_cust FROM s1 WHERE a > 9000 ORDER BY b LIMIT 20;
+INFO:  execution time of Limit: **.*** ms
+INFO:   execution time of Sort: **.*** ms
+INFO:    execution time of Index Scan on s1: **.*** ms
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+INFO:  execution time of HashSetOp (Hashed): **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 1: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 2: **.*** ms
+ a | b 
+---+---
+(0 rows)
+
+DROP TABLE IF EXISTS no_cust, with_cust;
+-- test for complicated query
+SET xtime.mode = off;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x
+       WHERE a IN (SELECT a FROM s1 WHERE b like '%abc%' ORDER BY b LIMIT 100);
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Nested Loop
+   ->  Merge Join
+         Merge Cond: (t2.x = s1.a)
+         ->  Index Scan using t2_x_idx on t2
+         ->  Sort
+               Sort Key: s1.a
+               ->  HashAggregate
+                     ->  Limit
+                           ->  Sort
+                                 Sort Key: s1.b
+                                 ->  Seq Scan on s1
+                                       Filter: (b ~~ '%abc%'::text)
+   ->  Index Scan using t1_pkey on t1
+         Index Cond: (a = t2.x)
+(14 rows)
+
+SELECT * INTO TEMP no_cust FROM t1 JOIN t2 ON a = x
+       WHERE a IN (SELECT a FROM s1 WHERE b like '%abc%' ORDER BY b LIMIT 100);
+SET xtime.mode = regtest;
+EXPLAIN(costs off) SELECT * FROM t1 JOIN t2 ON a = x
+       WHERE a IN (SELECT a FROM s1 WHERE b like '%abc%' ORDER BY b LIMIT 100);
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
+ CustomPlan:xtime
+   ->  Nested Loop
+         ->  CustomPlan:xtime
+               ->  Merge Join
+                     Merge Cond: (t2.x = a)
+                     ->  CustomPlan:xtime
+                           ->  Index Scan using t2_x_idx on t2
+                     ->  CustomPlan:xtime
+                           ->  Sort
+                                 Sort Key: a
+                                 ->  CustomPlan:xtime
+                                       ->  HashAggregate
+                                             ->  CustomPlan:xtime
+                                                   ->  Limit
+                                                         ->  CustomPlan:xtime
+                                                               ->  Sort
+                                                                     Sort Key: b
+                                                                     ->  CustomPlan:xtime on s1
+                                                                           Filter: (b ~~ '%abc%'::text)
+         ->  CustomPlan:xtime
+               ->  Index Scan using t1_pkey on t1
+                     Index Cond: (a = t2.x)
+(22 rows)
+
+SELECT * INTO TEMP with_cust FROM t1 JOIN t2 ON a = x
+       WHERE a IN (SELECT a FROM s1 WHERE b like '%abc%' ORDER BY b LIMIT 100);
+INFO:  execution time of Nested Loop: **.*** ms
+INFO:   execution time of Merge Join: **.*** ms
+INFO:    execution time of Sort: **.*** ms
+INFO:     execution time of HashAggregate (Hashed): **.*** ms
+INFO:      execution time of Limit: **.*** ms
+INFO:       execution time of Sort: **.*** ms
+INFO:        execution time of CustomPlan:xtime on s1: **.*** ms
+INFO:    execution time of Index Scan on t2: **.*** ms
+INFO:   execution time of Index Scan on t1: **.*** ms
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+INFO:  execution time of HashSetOp (Hashed): **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 1: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 2: **.*** ms
+ a | b | x | y | z 
+---+---+---+---+---
+(0 rows)
+
+DROP TABLE IF EXISTS no_cust, with_cust;
+-- test for inherited tables
+SET xtime.mode = off;
+EXPLAIN(costs off, verbose) SELECT x+z AS c1, y||y AS c2 FROM u1 WHERE x BETWEEN 1111 AND 4444;
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Result
+   Output: ((u1.x)::double precision + u1.z), (u1.y || u1.y)
+   ->  Append
+         ->  Seq Scan on custom_exec_test.u1
+               Output: u1.x, u1.z, u1.y
+               Filter: ((u1.x >= 1111) AND (u1.x <= 4444))
+         ->  Seq Scan on custom_exec_test.u2
+               Output: u2.x, u2.z, u2.y
+               Filter: ((u2.x >= 1111) AND (u2.x <= 4444))
+         ->  Seq Scan on custom_exec_test.u3
+               Output: u3.x, u3.z, u3.y
+               Filter: ((u3.x >= 1111) AND (u3.x <= 4444))
+(12 rows)
+
+SELECT x+z AS c1, y||y AS c2 INTO no_cust FROM u1 WHERE x BETWEEN 1111 AND 4444;
+SET xtime.mode = regtest;
+EXPLAIN(costs off, verbose) SELECT x+z AS c1, y||y AS c2 FROM u1 WHERE x BETWEEN 1111 AND 4444;
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ CustomPlan:xtime
+   Output: (((u1_1.x)::double precision + u1_1.z)), ((u1_1.y || u1_1.y))
+   ->  Result
+         Output: ((u1_1.x)::double precision + u1_1.z), (u1_1.y || u1_1.y)
+         ->  CustomPlan:xtime
+               Output: u1_1.x, u1_1.z, u1_1.y
+               ->  Append
+                     ->  CustomPlan:xtime on custom_exec_test.u1 u1_1
+                           Output: u1_1.x, u1_1.z, u1_1.y
+                           Filter: ((u1_1.x >= 1111) AND (u1_1.x <= 4444))
+                     ->  CustomPlan:xtime on custom_exec_test.u2
+                           Output: u2.x, u2.z, u2.y
+                           Filter: ((u2.x >= 1111) AND (u2.x <= 4444))
+                     ->  CustomPlan:xtime on custom_exec_test.u3
+                           Output: u3.x, u3.z, u3.y
+                           Filter: ((u3.x >= 1111) AND (u3.x <= 4444))
+(16 rows)
+
+SELECT x+z AS c1, y||y AS c2 INTO with_cust FROM u1 WHERE x BETWEEN 1111 AND 4444;
+INFO:  execution time of Result: **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of CustomPlan:xtime on u1: **.*** ms
+INFO:    execution time of CustomPlan:xtime on u2: **.*** ms
+INFO:    execution time of CustomPlan:xtime on u3: **.*** ms
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+INFO:  execution time of HashSetOp (Hashed): **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 1: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 2: **.*** ms
+ c1 | c2 
+----+----
+(0 rows)
+
+DROP TABLE IF EXISTS no_cust, with_cust;
+-- test for inherited tables with index scan
+SET xtime.mode = off;
+EXPLAIN(costs off, verbose) SELECT x-z AS c1, y||y AS c2 FROM u1 WHERE x = 3333;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Result
+   Output: ((u1.x)::double precision - u1.z), (u1.y || u1.y)
+   ->  Append
+         ->  Index Scan using u1_x_idx on custom_exec_test.u1
+               Output: u1.x, u1.z, u1.y
+               Index Cond: (u1.x = 3333)
+         ->  Index Scan using u2_x_idx on custom_exec_test.u2
+               Output: u2.x, u2.z, u2.y
+               Index Cond: (u2.x = 3333)
+         ->  Seq Scan on custom_exec_test.u3
+               Output: u3.x, u3.z, u3.y
+               Filter: (u3.x = 3333)
+(12 rows)
+
+SELECT x-z AS c1, y||y AS c2 INTO no_cust FROM u1 WHERE x = 3333;
+SET xtime.mode = regtest;
+EXPLAIN(costs off, verbose) SELECT x-z AS c1, y||y AS c2 FROM u1 WHERE x = 3333;
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
+ CustomPlan:xtime
+   Output: (((u1.x)::double precision - u1.z)), ((u1.y || u1.y))
+   ->  Result
+         Output: ((u1.x)::double precision - u1.z), (u1.y || u1.y)
+         ->  CustomPlan:xtime
+               Output: u1.x, u1.z, u1.y
+               ->  Append
+                     ->  CustomPlan:xtime
+                           Output: u1.x, u1.z, u1.y
+                           ->  Index Scan using u1_x_idx on custom_exec_test.u1
+                                 Output: u1.x, u1.z, u1.y
+                                 Index Cond: (u1.x = 3333)
+                     ->  CustomPlan:xtime
+                           Output: u2.x, u2.z, u2.y
+                           ->  Index Scan using u2_x_idx on custom_exec_test.u2
+                                 Output: u2.x, u2.z, u2.y
+                                 Index Cond: (u2.x = 3333)
+                     ->  CustomPlan:xtime on custom_exec_test.u3 u1
+                           Output: x, z, y
+                           Filter: (x = 3333)
+(20 rows)
+
+SELECT x-z AS c1, y||y AS c2 INTO with_cust FROM u1 WHERE x = 3333;
+INFO:  execution time of Result: **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Index Scan on u1: **.*** ms
+INFO:    execution time of Index Scan on u1: **.*** ms
+INFO:    execution time of CustomPlan:xtime on u3: **.*** ms
+SELECT * FROM no_cust EXCEPT SELECT * FROM with_cust;	-- should be empty
+INFO:  execution time of HashSetOp (Hashed): **.*** ms
+INFO:   execution time of Append: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 1: **.*** ms
+INFO:    execution time of Subquery Scan on *SELECT* 2: **.*** ms
+ c1 | c2 
+----+----
+(0 rows)
+
+DROP TABLE IF EXISTS no_cust, with_cust;
+-- test for invalidation of prepared statement
+SET xtime.mode = off;
+PREPARE p1(int,int) AS SELECT * FROM u1 WHERE x BETWEEN $1 AND $2;
+EXPLAIN EXECUTE p1(100,200);
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Append  (cost=0.28..137.56 rows=300 width=45)
+   ->  Index Scan using u1_x_idx on u1  (cost=0.28..10.28 rows=100 width=45)
+         Index Cond: ((x >= 100) AND (x <= 200))
+   ->  Index Scan using u2_x_idx on u2  (cost=0.28..10.28 rows=100 width=45)
+         Index Cond: ((x >= 100) AND (x <= 200))
+   ->  Seq Scan on u3  (cost=0.00..117.00 rows=100 width=45)
+         Filter: ((x >= 100) AND (x <= 200))
+(7 rows)
+
+SET xtime.mode = on;
+PREPARE p2(int) AS SELECT * FROM u1 WHERE x = $1;
+EXPLAIN EXECUTE p1(100,200);
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ CustomPlan:xtime  (cost=0.28..137.56 rows=300 width=45)
+   ->  Append  (cost=0.28..137.56 rows=300 width=45)
+         ->  CustomPlan:xtime  (cost=0.28..10.28 rows=100 width=45)
+               ->  Index Scan using u1_x_idx on u1  (cost=0.28..10.28 rows=100 width=45)
+                     Index Cond: ((x >= 100) AND (x <= 200))
+         ->  CustomPlan:xtime  (cost=0.28..10.28 rows=100 width=45)
+               ->  Index Scan using u2_x_idx on u2  (cost=0.28..10.28 rows=100 width=45)
+                     Index Cond: ((x >= 100) AND (x <= 200))
+         ->  CustomPlan:xtime on u3 u1  (cost=0.00..117.00 rows=100 width=45)
+               Filter: ((x >= 100) AND (x <= 200))
+(10 rows)
+
+EXPLAIN EXECUTE p2(2222);
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ CustomPlan:xtime  (cost=0.28..121.10 rows=3 width=45)
+   ->  Append  (cost=0.28..121.10 rows=3 width=45)
+         ->  CustomPlan:xtime  (cost=0.28..8.30 rows=1 width=45)
+               ->  Index Scan using u1_x_idx on u1  (cost=0.28..8.30 rows=1 width=45)
+                     Index Cond: (x = 2222)
+         ->  CustomPlan:xtime  (cost=0.28..8.30 rows=1 width=45)
+               ->  Index Scan using u2_x_idx on u2  (cost=0.28..8.30 rows=1 width=45)
+                     Index Cond: (x = 2222)
+         ->  CustomPlan:xtime on u3 u1  (cost=0.00..104.50 rows=1 width=45)
+               Filter: (x = 2222)
+(10 rows)
+
+SET xtime.mode = off;
+EXPLAIN EXECUTE p1(100,200);
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Append  (cost=0.28..137.56 rows=300 width=45)
+   ->  Index Scan using u1_x_idx on u1  (cost=0.28..10.28 rows=100 width=45)
+         Index Cond: ((x >= 100) AND (x <= 200))
+   ->  Index Scan using u2_x_idx on u2  (cost=0.28..10.28 rows=100 width=45)
+         Index Cond: ((x >= 100) AND (x <= 200))
+   ->  Seq Scan on u3  (cost=0.00..117.00 rows=100 width=45)
+         Filter: ((x >= 100) AND (x <= 200))
+(7 rows)
+
+EXPLAIN EXECUTE p2(2222);
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
+ Append  (cost=0.28..121.10 rows=3 width=45)
+   ->  Index Scan using u1_x_idx on u1  (cost=0.28..8.30 rows=1 width=45)
+         Index Cond: (x = 2222)
+   ->  Index Scan using u2_x_idx on u2  (cost=0.28..8.30 rows=1 width=45)
+         Index Cond: (x = 2222)
+   ->  Seq Scan on u3  (cost=0.00..104.50 rows=1 width=45)
+         Filter: (x = 2222)
+(7 rows)
+
+-- Clean up resources
+DROP SCHEMA IF EXISTS custom_exec_test CASCADE;
+NOTICE:  drop cascades to 6 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
+drop cascades to table s1
+drop cascades to table u1
+drop cascades to table u2
+drop cascades to table u3
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fd08e8d..dddc444 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -98,7 +98,7 @@ test: event_trigger
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json indirect_toast
+test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json indirect_toast custom_exec
 
 # ----------
 # Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 1ed059b..864928e 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -120,6 +120,7 @@ test: functional_deps
 test: advisory_lock
 test: json
 test: indirect_toast
+test: custom_exec
 test: plancache
 test: limit
 test: plpgsql

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Kohei KaiGai (#1)

Re: Custom Plan node

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

The attached patch adds a new plan node type; CustomPlan that enables
extensions to get control during query execution, via registered callbacks.

TBH, I think this is really an exercise in building useless mechanism.
I don't believe that any actually *interesting* new types of plan node can
be inserted into a query plan without invasive changes to the planner, and
so it's a bit pointless to set up hooks whereby you can avoid touching any
source code in the executor.

... Extension will put its local code on the planner_hook
to reference and manipulate PlannedStmt object.

That is hardly a credible design for doing anything interesting with
custom plans. It's got exactly the same problem you are complaining about
for the executor, ie you have to replace the whole of the planner if you
try to do things that way.

One other point here is: if you need more than one kind of custom plan
node, how will you tell what's what? I doubt you can completely eliminate
the need for IsA-style tests, especially in the planner area. The sample
contrib module here already exposes the failure mode I'm worried about:
it falls down as soon as it sees a plan node type it doesn't know. If you
could show me how this would work together with some other extension
that's also adding custom plan nodes of its own, then I might think you
had something.

In the same vein, the patch fails to provide credible behavior for
ExecSupportsMarkRestore, ExecMaterializesOutput, ExplainPreScanNode,
search_plan_tree, and probably some other places that need to know
about all possible plan node types.

Even if you'd covered every one of those bases, you've still only got
support for "generic" plan nodes having no particularly unique properties.
As an example of what I'm thinking about here, NestLoop, which might be
considered the most vanilla of all join plan nodes, actually has a lot of
specialized infrastructure in both the planner and the executor to support
its ability to pass outer-relation values into the inner-relation scan.
I think that as soon as you try to do anything of real interest with
custom plan nodes, you'll be finding you need special-purpose additions
that no set of generic hooks could reasonably cover.

In short, I don't understand or agree with this idea that major changes
should be implementable without touching any of the core code in any way.
This is open source --- if you need a modified version then modify it.
I used to build systems that needed hook-style extensibility because the
core code was burned into ROM; but that's not what we're dealing with
today, and I don't really see the argument for sacrificing readability
and performance by putting hooks everywhere, especially in places with
vague, ever-changing API contracts.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#2)

Re: Custom Plan node

On Fri, Sep 6, 2013 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

The attached patch adds a new plan node type; CustomPlan that enables
extensions to get control during query execution, via registered callbacks.

TBH, I think this is really an exercise in building useless mechanism.
I don't believe that any actually *interesting* new types of plan node can
be inserted into a query plan without invasive changes to the planner, and
so it's a bit pointless to set up hooks whereby you can avoid touching any
source code in the executor.

I find this a somewhat depressing response. Didn't we discuss this
exact design at the developer meeting in Ottawa? I thought it sounded
reasonable to you then, or at least I don't remember you panning it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#3)

Re: Custom Plan node

Robert Haas <robertmhaas@gmail.com> writes:

I find this a somewhat depressing response. Didn't we discuss this
exact design at the developer meeting in Ottawa? I thought it sounded
reasonable to you then, or at least I don't remember you panning it.

What I recall saying is that I didn't see how the planner side of it would
work ... and I still don't see that. I'd be okay with committing
executor-side fixes only if we had a vision of where we'd go on the
planner side; but this patch doesn't offer any path forward there.

This is not unlike the FDW stuff, where getting a reasonable set of
planner APIs in place was by far the hardest part (and isn't really done
even yet, since you still can't do remote joins or remote aggregation in
any reasonable fashion). But you can do simple stuff reasonably simply,
without reimplementing all of the planner along the way --- and I think
we should look for some equivalent level of usefulness from this before
we commit it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kohei KaiGai

kaigai@kaigai.gr.jp

over 12 years ago

In reply to: Tom Lane (#4)

Re: Custom Plan node

2013/9/7 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

I find this a somewhat depressing response. Didn't we discuss this
exact design at the developer meeting in Ottawa? I thought it sounded
reasonable to you then, or at least I don't remember you panning it.

What I recall saying is that I didn't see how the planner side of it would
work ... and I still don't see that. I'd be okay with committing
executor-side fixes only if we had a vision of where we'd go on the
planner side; but this patch doesn't offer any path forward there.

The reason why this patch stick on executor-side is we concluded
not to patch the planner code from the beginning in Ottawa because
of its complexity.
I'd also like to agree that planner support for custom plan is helpful
to construct better execution plan, however, it also make sense even
if this feature begins a functionality that offers a way to arrange a plan
tree being already constructed.

Anyway, let me investigate what's kind of APIs to be added for planner
stage also.

This is not unlike the FDW stuff, where getting a reasonable set of
planner APIs in place was by far the hardest part (and isn't really done
even yet, since you still can't do remote joins or remote aggregation in
any reasonable fashion). But you can do simple stuff reasonably simply,
without reimplementing all of the planner along the way --- and I think
we should look for some equivalent level of usefulness from this before
we commit it.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kohei KaiGai

kaigai@kaigai.gr.jp

over 12 years ago

In reply to: Kohei KaiGai (#5)

Re: Custom Plan node

2013/9/7 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/9/7 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

I find this a somewhat depressing response. Didn't we discuss this
exact design at the developer meeting in Ottawa? I thought it sounded
reasonable to you then, or at least I don't remember you panning it.

What I recall saying is that I didn't see how the planner side of it would
work ... and I still don't see that. I'd be okay with committing
executor-side fixes only if we had a vision of where we'd go on the
planner side; but this patch doesn't offer any path forward there.

The reason why this patch stick on executor-side is we concluded
not to patch the planner code from the beginning in Ottawa because
of its complexity.
I'd also like to agree that planner support for custom plan is helpful
to construct better execution plan, however, it also make sense even
if this feature begins a functionality that offers a way to arrange a plan
tree being already constructed.

Anyway, let me investigate what's kind of APIs to be added for planner
stage also.

It is a brief idea to add planner support on custom node, if we need it
from the beginning. Of course, it is not still detailed considered and
needs much brushing up, however, it may be a draft to implement this
feature.

We may be able to categorize plan node types into three; scan, join
and others.

Even though planner tries to test various combinations of join and scan
to minimize its estimated cost, we have less options on other types
like T_Agg and so on. It seems to me the other types are almost put
according to the query's form, so it does not make a big problem even
if all we can do is manipulation of plan-tree at planner_hook.
That is similar to what proposed patch is doing.

So, let's focus on join and scan. It needs to give extensions a chance
to override built-in path if they can offer more cheap path.
It leads an API that allows to add alternative paths when built-in feature
is constructing candidate paths. Once path was added, we can compare
them according to the estimated cost.
For example, let's assume a query tries to join foreign table A and B
managed by same postgres_fdw server, remote join likely has cheaper
cost than local join. If extension has a capability to handle the case
correctly, it may be able to add an alternative "custom-join" path with
cheaper-cost.
Then, this path shall be transformed to "CustomJoin" node that launches
a query to get a result of A join B being remotely joined.
In this case, here is no matter even if "CustomJoin" has underlying
ForeignScan nodes on the left-/right-tree, because extension can handle
the things to do with its arbitrary.

So, the following APIs may be needed for planner support, at least.

* API to add an alternative join path, in addition to built-in join logic.
* API to add an alternative scan path, in addition to built-in scan logic.
* API to construct "CustomJoin" according to the related path.
* API to construct "CustomScan" according to the related path.

Any comment please.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Fetter

david@fetter.org

over 12 years ago

In reply to: Kohei KaiGai (#6)

Re: Custom Plan node

On Sat, Sep 07, 2013 at 02:49:54PM +0200, Kohei KaiGai wrote:

2013/9/7 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/9/7 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

I find this a somewhat depressing response. Didn't we discuss this
exact design at the developer meeting in Ottawa? I thought it sounded
reasonable to you then, or at least I don't remember you panning it.

What I recall saying is that I didn't see how the planner side of it would
work ... and I still don't see that. I'd be okay with committing
executor-side fixes only if we had a vision of where we'd go on the
planner side; but this patch doesn't offer any path forward there.

The reason why this patch stick on executor-side is we concluded
not to patch the planner code from the beginning in Ottawa because
of its complexity.
I'd also like to agree that planner support for custom plan is helpful
to construct better execution plan, however, it also make sense even
if this feature begins a functionality that offers a way to arrange a plan
tree being already constructed.

Anyway, let me investigate what's kind of APIs to be added for planner
stage also.

It is a brief idea to add planner support on custom node, if we need it
from the beginning. Of course, it is not still detailed considered and
needs much brushing up, however, it may be a draft to implement this
feature.

We may be able to categorize plan node types into three; scan, join
and others.

Even though planner tries to test various combinations of join and scan
to minimize its estimated cost, we have less options on other types
like T_Agg and so on. It seems to me the other types are almost put
according to the query's form, so it does not make a big problem even
if all we can do is manipulation of plan-tree at planner_hook.
That is similar to what proposed patch is doing.

So, let's focus on join and scan. It needs to give extensions a chance
to override built-in path if they can offer more cheap path.
It leads an API that allows to add alternative paths when built-in feature
is constructing candidate paths. Once path was added, we can compare
them according to the estimated cost.
For example, let's assume a query tries to join foreign table A and B
managed by same postgres_fdw server, remote join likely has cheaper
cost than local join. If extension has a capability to handle the case
correctly, it may be able to add an alternative "custom-join" path with
cheaper-cost.
Then, this path shall be transformed to "CustomJoin" node that launches
a query to get a result of A join B being remotely joined.
In this case, here is no matter even if "CustomJoin" has underlying
ForeignScan nodes on the left-/right-tree, because extension can handle
the things to do with its arbitrary.

So, the following APIs may be needed for planner support, at least.

* API to add an alternative join path, in addition to built-in join logic.
* API to add an alternative scan path, in addition to built-in scan logic.
* API to construct "CustomJoin" according to the related path.
* API to construct "CustomScan" according to the related path.

Any comment please.

The broad outlines look great.

Do we have any way, at least conceptually, to consider the graph of
the cluster with edges weighted by network bandwidth and latency?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kohei KaiGai

kaigai@kaigai.gr.jp

over 12 years ago

In reply to: David Fetter (#7)

Re: Custom Plan node

2013/9/7 David Fetter <david@fetter.org>:

On Sat, Sep 07, 2013 at 02:49:54PM +0200, Kohei KaiGai wrote:

2013/9/7 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/9/7 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

I find this a somewhat depressing response. Didn't we discuss this
exact design at the developer meeting in Ottawa? I thought it sounded
reasonable to you then, or at least I don't remember you panning it.

What I recall saying is that I didn't see how the planner side of it would
work ... and I still don't see that. I'd be okay with committing
executor-side fixes only if we had a vision of where we'd go on the
planner side; but this patch doesn't offer any path forward there.

The reason why this patch stick on executor-side is we concluded
not to patch the planner code from the beginning in Ottawa because
of its complexity.
I'd also like to agree that planner support for custom plan is helpful
to construct better execution plan, however, it also make sense even
if this feature begins a functionality that offers a way to arrange a plan
tree being already constructed.

Anyway, let me investigate what's kind of APIs to be added for planner
stage also.

It is a brief idea to add planner support on custom node, if we need it
from the beginning. Of course, it is not still detailed considered and
needs much brushing up, however, it may be a draft to implement this
feature.

We may be able to categorize plan node types into three; scan, join
and others.

Even though planner tries to test various combinations of join and scan
to minimize its estimated cost, we have less options on other types
like T_Agg and so on. It seems to me the other types are almost put
according to the query's form, so it does not make a big problem even
if all we can do is manipulation of plan-tree at planner_hook.
That is similar to what proposed patch is doing.

So, let's focus on join and scan. It needs to give extensions a chance
to override built-in path if they can offer more cheap path.
It leads an API that allows to add alternative paths when built-in feature
is constructing candidate paths. Once path was added, we can compare
them according to the estimated cost.
For example, let's assume a query tries to join foreign table A and B
managed by same postgres_fdw server, remote join likely has cheaper
cost than local join. If extension has a capability to handle the case
correctly, it may be able to add an alternative "custom-join" path with
cheaper-cost.
Then, this path shall be transformed to "CustomJoin" node that launches
a query to get a result of A join B being remotely joined.
In this case, here is no matter even if "CustomJoin" has underlying
ForeignScan nodes on the left-/right-tree, because extension can handle
the things to do with its arbitrary.

So, the following APIs may be needed for planner support, at least.

* API to add an alternative join path, in addition to built-in join logic.
* API to add an alternative scan path, in addition to built-in scan logic.
* API to construct "CustomJoin" according to the related path.
* API to construct "CustomScan" according to the related path.

Any comment please.

The broad outlines look great.

Do we have any way, at least conceptually, to consider the graph of
the cluster with edges weighted by network bandwidth and latency?

As postgres_fdw is now doing?
Its configuration allows to add cost to connect remote server as startup
cost, and also add cost to transfer data on network being multiplexed
with estimated number of rows, according to per-server configuration.
I think it is responsibility of the custom plan provider, and fully depends
on the nature of what does it want to provide.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Fetter

david@fetter.org

over 12 years ago

In reply to: Kohei KaiGai (#8)

Re: Custom Plan node

On Sat, Sep 07, 2013 at 05:21:31PM +0200, Kohei KaiGai wrote:

2013/9/7 David Fetter <david@fetter.org>:

The broad outlines look great.

Do we have any way, at least conceptually, to consider the graph
of the cluster with edges weighted by network bandwidth and
latency?

As postgres_fdw is now doing? Its configuration allows to add cost
to connect remote server as startup cost, and also add cost to
transfer data on network being multiplexed with estimated number of
rows, according to per-server configuration. I think it is
responsibility of the custom plan provider, and fully depends on the
nature of what does it want to provide.

Interesting :)

Sorry I was unclear. I meant something that could take into account
the bandwidths and latencies throughout the graph, not just the
connections directly from the originating node.

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#4)

Re: Custom Plan node

On Fri, Sep 6, 2013 at 7:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I find this a somewhat depressing response. Didn't we discuss this
exact design at the developer meeting in Ottawa? I thought it sounded
reasonable to you then, or at least I don't remember you panning it.

What I recall saying is that I didn't see how the planner side of it would
work ... and I still don't see that. I'd be okay with committing
executor-side fixes only if we had a vision of where we'd go on the
planner side; but this patch doesn't offer any path forward there.

This is not unlike the FDW stuff, where getting a reasonable set of
planner APIs in place was by far the hardest part (and isn't really done
even yet, since you still can't do remote joins or remote aggregation in
any reasonable fashion). But you can do simple stuff reasonably simply,
without reimplementing all of the planner along the way --- and I think
we should look for some equivalent level of usefulness from this before
we commit it.

I do think there are problems with this as written. The example
consumer of the hook seems to contain a complete list of plan nodes,
which is an oxymoron in the face of a facility to add custom plan
nodes.

But, I guess I'm not yet convinced that one-for-one substitution of
nodes is impossible even with something about this simple. If someone
can do a post-pass over the plan tree and replace a SeqScan node with
an AwesomeSeqScan node or a Sort node with a RadixSort node, would
that constitute a sufficient POC to justify this infrastructure?
Obviously, what you'd really want is to be able to inject those nodes
(with proper costing) at the time they'd otherwise be generated, since
it could affect whether or not a path involving a substituted node
survives in the first place, but I'm not sure it's reasonable to
expect the planner infrastructure for such changes in the same path as
the executor hooks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#10)

Re: Custom Plan node

Robert Haas <robertmhaas@gmail.com> writes:

But, I guess I'm not yet convinced that one-for-one substitution of
nodes is impossible even with something about this simple. If someone
can do a post-pass over the plan tree and replace a SeqScan node with
an AwesomeSeqScan node or a Sort node with a RadixSort node, would
that constitute a sufficient POC to justify this infrastructure?

No, for exactly the reason you mention: such a change wouldn't have been
accounted for in the planner's other choices, and thus this isn't anything
more than a kluge.

In these specific examples you'd have to ask whether it wouldn't make more
sense to be modifying or hooking the executor's code for the existing plan
node types, anyway. The main reason I can see for not attacking it like
that would be if you wanted the planner to do something different ---
which the above approach forecloses.

Let me be clear that I'm not against the concept of custom plan nodes.
But it was obvious from the beginning that making the executor deal with
them would be much easier than making the planner deal with them. I don't
think we should commit a bunch of executor-side infrastructure in the
absence of any (ahem) plan for doing something realistic on the planner
side. Either that infrastructure will go unused, or we'll be facing a
continual stream of demands for doubtless-half-baked planner changes
so that people can do something with it.

I'd be willing to put in the infrastructure as soon as it's clear that we
have a way forward, but not if it's never going to be more than a kluge.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Stephen Frost

sfrost@snowman.net

over 12 years ago

In reply to: Robert Haas (#10)

Re: Custom Plan node

* Robert Haas (robertmhaas@gmail.com) wrote:

But, I guess I'm not yet convinced that one-for-one substitution of
nodes is impossible even with something about this simple.

Couldn't that be done with hooks in those specific plan nodes, or
similar..? Of course, as Tom points out, that wouldn't address how the
costing is done and it could end up being wrong if the implementation of
the node is completely different.

All that said, I've already been wishing for a way to change how Append
works to allow for parallel execution through FDWs; eg: you have a bunch
of foreign tables (say, 32) to independent PG clusters on indepentdent
pieces of hardware which can all execute a given request in parallel.
With a UNION ALL view created over top of those tables, it'd be great if
we fired off all the queries at once and then went through collecting
the responses, instead of going through them serially..

The same approach could actually be said for Appends which go across
tablespaces, if you consider that independent tablespaces mean
independent and parallelizable I/O access. Of course, all of this would
need to deal sanely with ORDER BY and LIMIT cases.

Thanks,

Stephen

#13

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#11)

Re: Custom Plan node

On Mon, Sep 9, 2013 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Let me be clear that I'm not against the concept of custom plan nodes.
But it was obvious from the beginning that making the executor deal with
them would be much easier than making the planner deal with them. I don't
think we should commit a bunch of executor-side infrastructure in the
absence of any (ahem) plan for doing something realistic on the planner
side. Either that infrastructure will go unused, or we'll be facing a
continual stream of demands for doubtless-half-baked planner changes
so that people can do something with it.

I'd be willing to put in the infrastructure as soon as it's clear that we
have a way forward, but not if it's never going to be more than a kluge.

Fair enough, I think. So the action item for KaiGai is to think of
how the planner integration might work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Greg Stark

stark@mit.edu

over 12 years ago

In reply to: Robert Haas (#13)

Re: Custom Plan node

On Tue, Sep 10, 2013 at 11:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I'd be willing to put in the infrastructure as soon as it's clear that we
have a way forward, but not if it's never going to be more than a kluge.

Fair enough, I think. So the action item for KaiGai is to think of
how the planner integration might work.

It's hard to imagine how the planner could possibly be pluggable in a
generally useful way. It sounds like putting an insurmountable barrier
in place that blocks a feature that would be useful in the Executor.

If you only want to handle nodes which directly replace one of the
existing nodes providing the same semantics then that might be
possible. But I think you also want to be able to interpose new nodes
in the tree, for example for the query cache idea.

Frankly I think the planner is hard enough to change when you have
full access to the source I can't imagine trying to manipulate it from
a distance like this.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Greg Stark (#14)

Re: Custom Plan node

Greg Stark <stark@mit.edu> writes:

It's hard to imagine how the planner could possibly be pluggable in a
generally useful way. It sounds like putting an insurmountable barrier
in place that blocks a feature that would be useful in the Executor.

But it's *not* useful without a credible way to modify the planner.

Frankly I think the planner is hard enough to change when you have
full access to the source I can't imagine trying to manipulate it from
a distance like this.

Yeah. To tell the truth, I'm not really convinced that purely-plugin
addition of new plan node types is going to be worth anything. I think
pretty much any interesting project (for example, the parallelization of
Append nodes that was mumbled about above) is going to require changes
that won't fit within such an infrastructure. Still, I'm willing to
accept a patch for plugin plan nodes, *if* it addresses the hard part
of that problem and not only the easy part.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Kohei KaiGai

kaigai@kaigai.gr.jp

over 12 years ago

In reply to: Robert Haas (#13)

Re: Custom Plan node

2013/9/10 Robert Haas <robertmhaas@gmail.com>:

On Mon, Sep 9, 2013 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Let me be clear that I'm not against the concept of custom plan nodes.
But it was obvious from the beginning that making the executor deal with
them would be much easier than making the planner deal with them. I don't
think we should commit a bunch of executor-side infrastructure in the
absence of any (ahem) plan for doing something realistic on the planner
side. Either that infrastructure will go unused, or we'll be facing a
continual stream of demands for doubtless-half-baked planner changes
so that people can do something with it.

I'd be willing to put in the infrastructure as soon as it's clear that we
have a way forward, but not if it's never going to be more than a kluge.

Fair enough, I think. So the action item for KaiGai is to think of
how the planner integration might work.

Do you think the idea I mentioned at the upthread is worth to investigate
for more detailed consideration? Or, does it seems to you short-sighted
thinking to fit this infrastructure with planner?

It categorizes plan node into three: join, scan and other stuff.
Cost based estimation is almost applied on join and scan, so abstracted
scan and join may make sense to inform core planner what does this
custom plan node try to do.
On the other hand, other stuff, like Agg, is a stuff that must be added
according to the provided query, even if its cost estimation was not small,
to perform as the provided query described.
So, I thought important one is integration of join and scan, but manipulation
of plan tree for other stuff is sufficient.

How about your opinion?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Kohei KaiGai (#16)

Re: Custom Plan node

On Tue, Sep 10, 2013 at 11:45 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Fair enough, I think. So the action item for KaiGai is to think of
how the planner integration might work.

Do you think the idea I mentioned at the upthread is worth to investigate
for more detailed consideration? Or, does it seems to you short-sighted
thinking to fit this infrastructure with planner?

It categorizes plan node into three: join, scan and other stuff.
Cost based estimation is almost applied on join and scan, so abstracted
scan and join may make sense to inform core planner what does this
custom plan node try to do.
On the other hand, other stuff, like Agg, is a stuff that must be added
according to the provided query, even if its cost estimation was not small,
to perform as the provided query described.
So, I thought important one is integration of join and scan, but manipulation
of plan tree for other stuff is sufficient.

How about your opinion?

Well, I don't know that I'm smart enough to predict every sort of
thing that someone might want to do here, unfortunately. This is a
difficult area: there are many possible things someone might want to
do, and as Tom points out, there's a lot of special handling of
particular node types that can make things difficult. And I can't
claim to be an expert in this area.

That having been said, I think the idea of a CustomScan node is
probably worth investigating. I don't know if that would work out
well or poorly, but I think it would be worth some experimentation.
Perhaps you could have a hook that gets called for each baserel, and
can decide whether or not it wishes to inject any additional paths;
and then a CustomScan node that could be used to introduce such paths.
I've been thinking that we really ought to have the ability to
optimize CTID range queries, like SELECT * FROM foo WHERE ctid > 'some
constant'. We have a Tid Scan node, but it only handles equalities,
not inequalities. I suppose this functionality should really be in
core, but since it's not it might make an interesting test for the
infrastructure you're proposing. You may be able to think of
something else you like better; it's just a thought.

I am a little less sanguine about the chances of a CustomJoin node
working out well. I agree that we need something to handle join
pushdown, but it seems to me that might be done by providing a Foreign
Scan path into the joinrel rather than by adding a concept of foreign
joins per se. There are other possible new join types, like the Range
Join that Jeff Davis has mentioned in the past, which might be
interesting. But overall, I can't see us adding very many more join
types, so I'm not totally sure how much extensibility would help us
here. We've added a few scan types over the years (index only scans
in 9.2, and bitmap index scans in 8.1, I think) but all of our
existing join types go back to time immemorial.

And I think that lumping everything else together under "not a scan or
join" has the least promise of all. Is replacing Append really the
same as replacing Sort? I think we'll need to think harder here about
what we're trying to accomplish and how to get there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Kohei KaiGai

kaigai@kaigai.gr.jp

over 12 years ago

In reply to: Robert Haas (#17)

1 attachment(s)

Re: Custom Plan node

2013/9/13 Robert Haas <robertmhaas@gmail.com>:

On Tue, Sep 10, 2013 at 11:45 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Fair enough, I think. So the action item for KaiGai is to think of
how the planner integration might work.

Do you think the idea I mentioned at the upthread is worth to investigate
for more detailed consideration? Or, does it seems to you short-sighted
thinking to fit this infrastructure with planner?

It categorizes plan node into three: join, scan and other stuff.
Cost based estimation is almost applied on join and scan, so abstracted
scan and join may make sense to inform core planner what does this
custom plan node try to do.
On the other hand, other stuff, like Agg, is a stuff that must be added
according to the provided query, even if its cost estimation was not small,
to perform as the provided query described.
So, I thought important one is integration of join and scan, but manipulation
of plan tree for other stuff is sufficient.

How about your opinion?

Well, I don't know that I'm smart enough to predict every sort of
thing that someone might want to do here, unfortunately. This is a
difficult area: there are many possible things someone might want to
do, and as Tom points out, there's a lot of special handling of
particular node types that can make things difficult. And I can't
claim to be an expert in this area.

Sorry for my late response. I've tried to investigate the planner code
to find out the way to integrate this custom api, and it is still in
progress.
One special handling I found was that create_join_plan() adjust
root->curOuterRels prior to recursion of inner tree if NestLoop.
Probably, we need some flags to control these special handling
in the core.
It is a hard job to list up all the stuff, so it seems to me we need
to check-up them during code construction...

That having been said, I think the idea of a CustomScan node is
probably worth investigating. I don't know if that would work out
well or poorly, but I think it would be worth some experimentation.
Perhaps you could have a hook that gets called for each baserel, and
can decide whether or not it wishes to inject any additional paths;
and then a CustomScan node that could be used to introduce such paths.
I've been thinking that we really ought to have the ability to
optimize CTID range queries, like SELECT * FROM foo WHERE ctid > 'some
constant'. We have a Tid Scan node, but it only handles equalities,
not inequalities. I suppose this functionality should really be in
core, but since it's not it might make an interesting test for the
infrastructure you're proposing. You may be able to think of
something else you like better; it's just a thought.

This above framework was exactly what I considered.
Probably, we have to put a hook on functions invoked by
set_base_rel_pathlist() to add another possible way to scan
the provided baserel, then set_cheapest() will choose the
most reasonable one.
The attached patch, it's just a works-in-progress, shows
which hook I try to put around the code. Please grep it
with "add_custom_scan_paths".
Regarding to the "guest module" of this framework, another
idea that I have is, built-in query cache module that returns
previous scan result being cached if table contents was not
updated from the previous run. Probably, it makes sense in
case when most of rows are filtered out in this scan.
Anyway, I'd like to consider something useful to demonstrate
this API.

I am a little less sanguine about the chances of a CustomJoin node
working out well. I agree that we need something to handle join
pushdown, but it seems to me that might be done by providing a Foreign
Scan path into the joinrel rather than by adding a concept of foreign
joins per se.

Indeed, if we have a hook on add_paths_to_joinrel(), it also makes
sense for foreign tables; probably, planner will choose foreign-path
instead of existing join node including foreign-scans.

There are other possible new join types, like the Range
Join that Jeff Davis has mentioned in the past, which might be
interesting. But overall, I can't see us adding very many more join
types, so I'm not totally sure how much extensibility would help us
here. We've added a few scan types over the years (index only scans
in 9.2, and bitmap index scans in 8.1, I think) but all of our
existing join types go back to time immemorial.

It seems to me a significant point. Even if the custom node being added
by extension instead of joins looks like a scan for core-PostgreSQL,
extension will be able to run its custom join equivalent to join.

I think, the above built-in query cache idea make sense to demonstrate
this pseudo join node; that will be able to hold a materialized result being
already joined. At least, it seems to me sufficient for my target; table join
accelerated with GPU device.

And I think that lumping everything else together under "not a scan or
join" has the least promise of all. Is replacing Append really the
same as replacing Sort? I think we'll need to think harder here about
what we're trying to accomplish and how to get there.

As long as extension modifies PlannedStmt on the planner_hook,
I don't think it is not difficult so much, as I demonstrate on the
previous patch.
Unlike scan or join, existing code is not designed to compare
multiple possible paths, so it seems to me a feature to adjust
a plan-tree already construct is sufficient for most usage
because extension can decide which one can offer more cheap
path than built-in ones.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-plan-node.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-plan-node.v2.patchDownload

 src/backend/commands/explain.c          |  84 ++++++++++++++++++
 src/backend/executor/Makefile           |   2 +-
 src/backend/executor/execAmi.c          |  39 +++++++++
 src/backend/nodes/copyfuncs.c           |  77 ++++++++++++++++
 src/backend/nodes/outfuncs.c            |  49 +++++++++++
 src/backend/optimizer/path/allpaths.c   |  23 +++++
 src/backend/optimizer/path/joinpath.c   |  19 ++++
 src/backend/optimizer/plan/createplan.c | 150 +++++++++++++++++++++++++++++++-
 src/backend/optimizer/util/pathnode.c   | 110 +++++++++++++++++++++++
 src/include/nodes/execnodes.h           |  45 ++++++++++
 src/include/nodes/nodes.h               |   9 ++
 src/include/nodes/plannodes.h           |  38 ++++++++
 src/include/nodes/relation.h            |  24 +++++
 src/include/optimizer/pathnode.h        |  26 ++++++
 src/include/optimizer/paths.h           |  25 ++++++
 15 files changed, 718 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 91bea51..e4d0120 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,7 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/hashjoin.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "optimizer/clauses.h"
 #include "parser/parsetree.h"
@@ -84,6 +85,7 @@ static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
 						   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
+static void show_custom_info(Node *cstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 						ExplainState *es);
@@ -861,6 +863,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			pname = "Hash";		/* "Join" gets added by jointype switch */
 			sname = "Hash Join";
 			break;
+		case T_CustomJoin:
+			pname = sname = "Custom Join";
+			break;
 		case T_SeqScan:
 			pname = sname = "Seq Scan";
 			break;
@@ -897,6 +902,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomScan:
+			pname = sname = "Custom Scan";
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -958,6 +966,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Limit:
 			pname = sname = "Limit";
 			break;
+		case T_CustomPlan:
+			pname = sname = "Custom Plan";
+			break;
 		case T_Hash:
 			pname = sname = "Hash";
 			break;
@@ -1011,6 +1022,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
 		case T_IndexScan:
@@ -1051,6 +1063,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
+		case T_CustomJoin:
 			{
 				const char *jointype;
 
@@ -1291,6 +1304,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomScan:
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_custom_info((Node *) planstate, es);
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
@@ -1328,6 +1348,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				show_instrumentation_count("Rows Removed by Filter", 2,
 										   planstate, es);
 			break;
+		case T_CustomJoin:
+			show_upper_qual(((HashJoin *) plan)->join.joinqual,
+							"Join Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 2,
+										   planstate, es);
+			show_custom_info((Node *) planstate, es);
+			break;
 		case T_Agg:
 		case T_Group:
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
@@ -1357,6 +1385,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Hash:
 			show_hash_info((HashState *) planstate, es);
 			break;
+		case T_CustomPlan:
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			show_custom_info((Node *) planstate, es);
+			break;
 		default:
 			break;
 	}
@@ -1477,6 +1512,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		(IsA(plan, CustomPlan) && ((CustomPlan *) plan)->custom_subplans) ||
 		planstate->subPlan;
 	if (haschildren)
 	{
@@ -1531,6 +1567,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainNode(((SubqueryScanState *) planstate)->subplan, ancestors,
 						"Subquery", NULL, es);
 			break;
+		case T_CustomPlan:
+			if (((CustomPlan *) plan)->custom_subplans)
+				ExplainMemberNodes(((CustomPlan *) plan)->custom_subplans,
+						 ((CustomPlanState *) planstate)->custom_subplans,
+								   ancestors, es);
+			break;
 		default:
 			break;
 	}
@@ -1858,6 +1900,35 @@ show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es)
 }
 
 /*
+ * Show extra information for Custom(Plan|Scan|Join) node
+ */
+static void
+show_custom_info(Node *cstate, ExplainState *es)
+{
+	CustomProvider *provider;
+	const char	   *custom_name;
+
+	switch (nodeTag(cstate))
+	{
+		case T_CustomPlan:
+			custom_name = ((CustomPlan *) cstate)->custom_name;
+			break;
+		case T_CustomScan:
+			custom_name = ((CustomScan *) cstate)->custom_name;
+			break;
+		case T_CustomJoin:
+			custom_name = ((CustomJoin *) cstate)->custom_name;
+			break;
+		default:
+			return;
+	}
+
+	provider = get_custom_provider(custom_name);
+	if (provider->ExplainCustom)
+		(*provider->ExplainCustom)(cstate, es);
+}
+
+/*
  * Fetch the name of an index in an EXPLAIN
  *
  * We allow plugins to get control here so that plans involving hypothetical
@@ -2025,6 +2096,19 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 			objectname = rte->ctename;
 			objecttag = "CTE Name";
 			break;
+		case T_CustomScan:
+			if (rte->rtekind == RTE_RELATION)
+			{
+				objectname = get_rel_name(rte->relid);
+				if (es->verbose)
+					namespace = get_namespace_name(get_rel_namespace(rte->relid));
+				objecttag = "Relation Name";
+			}
+			else
+			{
+				/* TODO: add support for other rtekind */
+			}
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..af707b0 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -16,7 +16,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
        nodeBitmapAnd.o nodeBitmapOr.o \
-       nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
+       nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeCustom.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
        nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index a078104..9cc6bdd 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -149,6 +150,10 @@ ExecReScan(PlanState *node)
 			ExecReScanBitmapOr((BitmapOrState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecReScanCustomPlan((CustomPlanState *) node);
+			break;
+
 		case T_SeqScanState:
 			ExecReScanSeqScan((SeqScanState *) node);
 			break;
@@ -197,6 +202,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomScanState:
+			ExecReScanCustomScan((CustomScanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -209,6 +218,10 @@ ExecReScan(PlanState *node)
 			ExecReScanHashJoin((HashJoinState *) node);
 			break;
 
+		case T_CustomJoinState:
+			ExecReScanCustomJoin((CustomJoinState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecReScanMaterial((MaterialState *) node);
 			break;
@@ -421,6 +434,9 @@ ExecSupportsMarkRestore(NodeTag plantype)
 bool
 ExecSupportsBackwardScan(Plan *node)
 {
+	ListCell   *cell;
+	uint		flags;
+
 	if (node == NULL)
 		return false;
 
@@ -461,6 +477,29 @@ ExecSupportsBackwardScan(Plan *node)
 			return IndexSupportsBackwardScan(((IndexOnlyScan *) node)->indexid) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomPlan:
+			flags = ((CustomPlan *) node)->custom_flags;
+			if ((flags & CUSTOM_FLAGS_SUPPORT_BACKWARD_SCAN) == 0)
+				return false;
+
+			foreach(cell, ((CustomPlan *) node)->custom_subplans)
+			{
+				if (!ExecSupportsBackwardScan((Plan *) lfirst(cell)))
+					return false;
+			}
+			if (outerPlan(node) && !ExecSupportsBackwardScan(outerPlan(node)))
+				return false;
+			if (innerPlan(node) && !ExecSupportsBackwardScan(innerPlan(node)))
+				return false;
+			return TargetListSupportsBackwardScan(node->targetlist);
+
+		case T_CustomScan:
+			flags = ((CustomScan *) node)->custom_flags;
+			if ((flags & CUSTOM_FLAGS_SUPPORT_BACKWARD_SCAN) == 0)
+				return false;
+
+			return TargetListSupportsBackwardScan(node->targetlist);
+
 		case T_SubqueryScan:
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65f3b98..974cd67 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -304,6 +304,27 @@ _copyBitmapOr(const BitmapOr *from)
 	return newnode;
 }
 
+/*
+ * _copyCustomPlan
+ */
+static CustomPlan *
+_copyCustomPlan(const CustomPlan *from)
+{
+	CustomPlan *newnode = makeNode(CustomPlan);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyPlanFields((const Plan *) from, (Plan *) newnode);
+
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_expr);
+	COPY_NODE_FIELD(custom_subplans);
+
+	return newnode;
+}
 
 /*
  * CopyScanFields
@@ -602,6 +623,30 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomScan
+ */
+static CustomScan *
+_copyCustomScan(const CustomScan *from)
+{
+	CustomScan *newnode = makeNode(CustomScan);
+
+	/*
+	 * copy node superclass fields
+	 */
+    CopyScanFields((const Scan *) from, (Scan *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_expr);
+
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -707,6 +752,29 @@ _copyHashJoin(const HashJoin *from)
 	return newnode;
 }
 
+/*
+ * _copyCustomJoin
+ */
+static CustomJoin *
+_copyCustomJoin(const CustomJoin *from)
+{
+	CustomJoin *newnode = makeNode(CustomJoin);
+
+	/*
+	 * copy node superclass fields
+	 */
+	CopyJoinFields((const Join *) from, (Join *) newnode);
+
+	/*
+	 * copy remainder of node
+	 */
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_flags);
+	COPY_NODE_FIELD(custom_private);
+	COPY_NODE_FIELD(custom_expr);
+
+	return newnode;
+}
 
 /*
  * _copyMaterial
@@ -3890,6 +3958,9 @@ copyObject(const void *from)
 		case T_BitmapOr:
 			retval = _copyBitmapOr(from);
 			break;
+		case T_CustomPlan:
+			retval = _copyCustomPlan(from);
+			break;
 		case T_Scan:
 			retval = _copyScan(from);
 			break;
@@ -3929,6 +4000,9 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomScan:
+			retval = _copyCustomScan(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
@@ -3941,6 +4015,9 @@ copyObject(const void *from)
 		case T_HashJoin:
 			retval = _copyHashJoin(from);
 			break;
+		case T_CustomJoin:
+			retval = _copyCustomJoin(from);
+			break;
 		case T_Material:
 			retval = _copyMaterial(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a2903f9..d260763 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -423,6 +423,20 @@ _outBitmapOr(StringInfo str, const BitmapOr *node)
 }
 
 static void
+_outCustomPlan(StringInfo str, const CustomPlan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLAN");
+
+	_outPlanInfo(str, (const Plan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_UINT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+	WRITE_NODE_FIELD(custom_subplans);
+}
+
+static void
 _outScan(StringInfo str, const Scan *node)
 {
 	WRITE_NODE_TYPE("SCAN");
@@ -568,6 +582,19 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomScan(StringInfo str, const CustomScan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMSCAN");
+
+	_outScanInfo(str, (const Scan *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_UINT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -627,6 +654,19 @@ _outHashJoin(StringInfo str, const HashJoin *node)
 }
 
 static void
+_outCustomJoin(StringInfo str, const CustomJoin *node)
+{
+	WRITE_NODE_TYPE("CUSTOMJOIN");
+
+	_outJoinPlanInfo(str, (const Join *) node);
+
+	WRITE_STRING_FIELD(custom_name);
+	WRITE_UINT_FIELD(custom_flags);
+	WRITE_NODE_FIELD(custom_private);
+	WRITE_NODE_FIELD(custom_exprs);
+}
+
+static void
 _outAgg(StringInfo str, const Agg *node)
 {
 	int			i;
@@ -2775,6 +2815,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_BitmapOr:
 				_outBitmapOr(str, obj);
 				break;
+			case T_CustomPlan:
+				_outCustomPlan(str, obj);
+				break;
 			case T_Scan:
 				_outScan(str, obj);
 				break;
@@ -2814,6 +2857,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomScan:
+				_outCustomScan(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
@@ -2826,6 +2872,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_HashJoin:
 				_outHashJoin(str, obj);
 				break;
+			case T_CustomJoin:
+				_outCustomJoin(str, obj);
+				break;
 			case T_Agg:
 				_outAgg(str, obj);
 				break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index bfd3809..9d0cbf5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -46,6 +46,8 @@ int			geqo_threshold;
 /* Hook for plugins to replace standard_join_search() */
 join_search_hook_type join_search_hook = NULL;
 
+/* Hook for plugins to add custom scan paths */
+add_scan_path_hook_type add_scan_path_hook = NULL;
 
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
@@ -399,6 +401,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 }
@@ -427,6 +432,9 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path */
 	set_cheapest(rel);
 }
@@ -1246,6 +1254,9 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1269,6 +1280,9 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_functionscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1292,6 +1306,9 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1361,6 +1378,9 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
@@ -1414,6 +1434,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
 
+	/* Consider Custom scans */
+	add_custom_scan_paths(root,rel,rte);
+
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
 }
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5b477e5..162b21d 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -21,6 +21,9 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 
+/* Hook for plugins to add custom join paths */
+add_join_path_hook_type add_join_path_hook = NULL;
+
 
 #define PATH_PARAM_BY_REL(path, rel)  \
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -259,6 +262,22 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided with custom execution provider.
+	 */
+	if (add_join_path_hook)
+		(*add_join_path_hook)(root,
+							  joinrel,
+							  outerrel,
+							  innerrel,
+							  jointype,
+							  sjinfo,
+							  restrictlist,
+							  mergeclause_list,
+							  &semifactors,
+							  param_source_rels,
+							  extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9b9eb2f..a165a9b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -21,6 +21,7 @@
 
 #include "access/skey.h"
 #include "catalog/pg_class.h"
+#include "executor/nodeCustom.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -77,12 +78,20 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomScan *create_customscan_plan(PlannerInfo *root,
+                                          CustomScanPath *best_path,
+                                          List *tlist,
+                                          List *scan_clauses);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
 					  Plan *outer_plan, Plan *inner_plan);
 static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
+static CustomJoin *create_customjoin_plan(PlannerInfo *root,
+										  CustomJoinPath *best_path,
+										  Plan *outer_plan,
+										  Plan *inner_plan);
 static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
 static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
 static void process_subquery_nestloop_params(PlannerInfo *root,
@@ -235,11 +244,13 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 		case T_CteScan:
 		case T_WorkTableScan:
 		case T_ForeignScan:
+		case T_CustomScan:
 			plan = create_scan_plan(root, best_path);
 			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
+		case T_CustomJoin:
 			plan = create_join_plan(root,
 									(JoinPath *) best_path);
 			break;
@@ -411,6 +422,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
 													scan_clauses);
 			break;
 
+		case T_CustomScan:
+			plan = (Plan *)create_customscan_plan(root,
+												  (CustomScanPath *)best_path,
+												  tlist,
+												  scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -607,9 +625,14 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
 	outer_plan = create_plan_recurse(root, best_path->outerjoinpath);
 
 	/* For a nestloop, include outer relids in curOuterRels for inner side */
-	if (best_path->path.pathtype == T_NestLoop)
+	if (best_path->path.pathtype == T_NestLoop ||
+		(best_path->path.pathtype == T_CustomJoin &&
+		 (((CustomJoinPath *) best_path)->custom_flags &
+		  CUSTOM_FLAGS_APPLY_PARAMS_ON_INNER) != 0))
+	{
 		root->curOuterRels = bms_union(root->curOuterRels,
 								   best_path->outerjoinpath->parent->relids);
+	}
 
 	inner_plan = create_plan_recurse(root, best_path->innerjoinpath);
 
@@ -637,6 +660,17 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
 												 outer_plan,
 												 inner_plan);
 			break;
+		case T_CustomJoin:
+			if (saveOuterRels != root->curOuterRels)
+			{
+				bms_free(root->curOuterRels);
+				root->curOuterRels = saveOuterRels;
+			}
+			plan = (Plan *)create_customjoin_plan(root,
+												  (CustomJoinPath *) best_path,
+												  outer_plan,
+												  inner_plan);
+			break;
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->path.pathtype);
@@ -2016,6 +2050,51 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	return scan_plan;
 }
 
+/*
+ * create_customscan_plan
+ *   Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *   with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomScan *
+create_customscan_plan(PlannerInfo *root,
+					   CustomScanPath *best_path,
+					   List *tlist,
+					   List *scan_clauses)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	CustomScan	   *scan_plan;
+	RelOptInfo	   *rel = best_path->path.parent;
+	Index			scan_relid = rel->relid;
+	RangeTblEntry  *rte;
+
+	/* it should be a base rel */
+	Assert(scan_relid > 0);
+	rte = planner_rt_fetch(scan_relid, root);
+
+	/* Sort clauses into best execution order. */
+    scan_clauses = order_qual_clauses(root, scan_clauses);
+
+	scan_plan = provider->GetCustomScan(root, rel, rte,
+										best_path,
+										tlist,
+										scan_clauses);
+	Assert(IsA(scan_plan, CustomScan));
+    Assert(strcmp(scan_plan->custom_name, provider->name) == 0);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+
+	/* Replace any outer-relation variables with nestloop params in the qual */
+	if (best_path->path.param_info)
+	{
+		scan_plan->scan.plan.qual = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->scan.plan.qual);
+		scan_plan->custom_exprs = (List *)
+			replace_nestloop_params(root, (Node *) scan_plan->custom_exprs);
+	}
+
+	return scan_plan;
+}
 
 /*****************************************************************************
  *
@@ -2534,6 +2613,69 @@ create_hashjoin_plan(PlannerInfo *root,
 	return join_plan;
 }
 
+CustomJoin *
+create_customjoin_plan(PlannerInfo *root,
+					   CustomJoinPath *best_path,
+					   Plan *outer_plan,
+					   Plan *inner_plan)
+{
+	CustomProvider *provider = get_custom_provider(best_path->custom_name);
+	RelOptInfo *joinrel = best_path->jpath.path.parent;
+	CustomJoin *join_plan;
+	List	   *join_clauses;
+	List	   *other_clauses;
+	List	   *tlist;
+
+	/*
+	 * Construct target-list of this join path because build_path_tlist()
+	 * is static function.
+	 */
+	tlist = build_path_tlist(root, &best_path->jpath.path);
+
+	/* Sort join qual clauses into best execution order */
+    join_clauses = order_qual_clauses(root, best_path->jpath.joinrestrictinfo);
+
+
+	/* Get the join qual clauses (in plain expression form) */
+	/* Any pseudoconstant clauses are ignored here */
+	if (IS_OUTER_JOIN(best_path->jpath.jointype))
+	{
+		extract_actual_join_clauses(join_clauses,
+									&join_clauses, &other_clauses);
+	}
+	else
+	{
+		/* We can treat all clauses alike for an inner join */
+		join_clauses = extract_actual_clauses(join_clauses, false);
+		other_clauses = NIL;
+	}
+
+	/* Construct CustomJoin node */
+	join_plan = provider->GetCustomJoin(root,
+										joinrel,
+										best_path,
+										tlist,
+										join_clauses,
+										other_clauses,
+										outer_plan,
+										inner_plan);
+	Assert(IsA(join_plan, CustomJoin));
+	Assert(strcmp(join_plan->custom_name, provider->name) == 0);
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&join_plan->join.plan, &best_path->jpath.path);
+
+	/* Replace any outer-relation variables with nestloop params */
+	if (best_path->jpath.path.param_info)
+	{
+		join_clauses = (List *)
+			replace_nestloop_params(root, (Node *) join_clauses);
+		other_clauses = (List *)
+			replace_nestloop_params(root, (Node *) other_clauses);
+	}
+
+	return join_plan;
+}
 
 /*****************************************************************************
  *
@@ -4863,6 +5005,12 @@ is_projection_capable_plan(Plan *plan)
 		case T_MergeAppend:
 		case T_RecursiveUnion:
 			return false;
+		/*
+		 * XXX - Due to nature of CustomScan and CustomJoin nodes, they heve
+		 * to be capable for projection. It depends on implementation if
+		 * CustomPlan, however, core PostgreSQL does not add CustomPlan node
+		 * right now.
+		 */
 		default:
 			break;
 	}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 64b17051..7dc51be 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1738,6 +1738,46 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * create_customscan_path
+ *    Creates a path corresponding to a scan of a relation based on logic
+ *    logic being provided by extensions.
+ *
+ * This function is never called from core PostgreSQL. An usual usage is
+ * invocation from callbacks on add_scan_path_hook. We don't have any
+ * assumption on the custom scan logic, thus, caller is responsible to
+ * set adequate cost estimation here.
+ */
+CustomScanPath *
+create_customscan_path(PlannerInfo *root,
+					   RelOptInfo *baserel,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomScanPath *pathnode = makeNode(CustomScanPath);
+
+	pathnode->path.pathtype = T_CustomScan;
+	pathnode->path.parent = baserel;
+	pathnode->path.param_info = get_baserel_parampathinfo(root, baserel,
+														  required_outer);
+	pathnode->path.rows = rows;
+	pathnode->path.startup_cost = startup_cost;
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.pathkeys = pathkeys;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
@@ -2000,6 +2040,76 @@ create_hashjoin_path(PlannerInfo *root,
 }
 
 /*
+ * create_customjoin_path
+ *    Creates a pathnode corresponding to a join between two relations based
+ *    on custom logic provided by extension.
+ *
+ * 'joinrel' is the join relation
+ * 'jointype' is the type of join required
+ * 'rows' is number of estimated rows
+ * 'startup_cost' is estimated startup cost
+ * 'total_cost' is estimated total cost
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'outer_path' is the cheapest outer path
+ * 'inner_path' is the cheapest inner path
+ * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
+ * 'pathkeys' are the path keys of the new join path
+ * 'required_outer' is the set of required outer rels
+ * 'custom_name' is the name of custom execution provider
+ * 'custom_flag' is set of CUSTOM_* flags
+ * 'custom_private' is a list of private datum
+ */
+CustomJoinPath *
+create_customjoin_path(PlannerInfo *root,
+					   RelOptInfo *joinrel,
+					   JoinType jointype,
+					   double rows,
+					   Cost startup_cost,
+					   Cost total_cost,
+					   SpecialJoinInfo *sjinfo,
+					   Path *outer_path,
+					   Path *inner_path,
+					   List *restrict_clauses,
+					   List *pathkeys,
+					   Relids required_outer,
+					   const char *custom_name,
+					   uint32 custom_flags,
+					   List *custom_private)
+{
+	CustomJoinPath *pathnode = makeNode(CustomJoinPath);
+
+	pathnode->jpath.path.pathtype = T_CustomJoin;
+	pathnode->jpath.path.parent = joinrel;
+	pathnode->jpath.path.param_info
+		= get_joinrel_parampathinfo(root,
+									joinrel,
+									outer_path,
+									inner_path,
+									sjinfo,
+									required_outer,
+									&restrict_clauses);
+	/*
+	 * Unlike other create_XXXjoin_path routines, it is caller's job to
+	 * calculate adequate cost estimation for this custom join logic.
+	 */
+	pathnode->jpath.path.rows = rows;
+	pathnode->jpath.path.startup_cost = startup_cost;
+	pathnode->jpath.path.total_cost = total_cost;
+
+	pathnode->jpath.path.pathkeys = pathkeys;
+	pathnode->jpath.jointype = jointype;
+	pathnode->jpath.outerjoinpath = outer_path;
+	pathnode->jpath.innerjoinpath = inner_path;
+	pathnode->jpath.joinrestrictinfo = restrict_clauses;
+
+	pathnode->custom_name = pstrdup(custom_name);
+	pathnode->custom_flags = custom_flags;
+	pathnode->custom_private = custom_private;
+
+	return pathnode;
+}
+
+/*
  * reparameterize_path
  *		Attempt to modify a Path to have greater parameterization
  *
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3b430e0..ecf816c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1175,6 +1175,24 @@ typedef struct BitmapOrState
 	int			nplans;			/* number of input plans */
 } BitmapOrState;
 
+/* ----------------
+ *   CustomPlanState information
+ *
+ *   CustomPlan is used to implement custom behavior but expect for scan
+ *   and join.
+ */
+typedef struct CustomPlanState
+{
+	PlanState	ps;
+	/* use struct pointer to avoid including custom.h here */
+	struct CustomProvider *custom_provider;
+	uint		custom_flags;
+	void	   *custom_state;
+	/* special handling if multiple subplans are here, as Append */
+	PlanState **custom_subplans;
+	int			custom_numplans;
+} CustomPlanState;
+
 /* ----------------------------------------------------------------
  *				 Scan State Information
  * ----------------------------------------------------------------
@@ -1494,6 +1512,20 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *   CustomScanState information
+ *
+ *   CustomScan nodes are used to scan relation based on custom logic.
+ */
+typedef struct CustomScanState
+{
+	ScanState	ss;
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	uint		custom_flags;
+	void	   *custom_state;
+} CustomScanState;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
@@ -1626,6 +1658,19 @@ typedef struct HashJoinState
 	bool		hj_OuterNotEmpty;
 } HashJoinState;
 
+/* ----------------
+ *   CustomJoinState information
+ *
+ *   CustomJoin nodes are used to join relations based on custom logic
+ */
+typedef struct CustomJoinState
+{
+	JoinState	js;
+	/* use struct pointer to avoid including nodeCustom.h here */
+	struct CustomProvider *custom_provider;
+	uint		custom_flags;
+	void	   *custom_state;
+} CustomJoinState;
 
 /* ----------------------------------------------------------------
  *				 Materialization State Information
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 78368c6..d04eccd 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -76,6 +76,9 @@ typedef enum NodeTag
 	T_SetOp,
 	T_LockRows,
 	T_Limit,
+	T_CustomPlan,
+	T_CustomScan,
+	T_CustomJoin,
 	/* these aren't subclasses of Plan: */
 	T_NestLoopParam,
 	T_PlanRowMark,
@@ -121,6 +124,9 @@ typedef enum NodeTag
 	T_SetOpState,
 	T_LockRowsState,
 	T_LimitState,
+	T_CustomPlanState,
+	T_CustomScanState,
+	T_CustomJoinState,
 
 	/*
 	 * TAGS FOR PRIMITIVE NODES (primnodes.h)
@@ -229,6 +235,9 @@ typedef enum NodeTag
 	T_ResultPath,
 	T_MaterialPath,
 	T_UniquePath,
+	T_CustomPlanPath,
+	T_CustomScanPath,
+	T_CustomJoinPath,
 	T_EquivalenceClass,
 	T_EquivalenceMember,
 	T_PathKey,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 44ea0b7..aa12205 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -483,6 +483,18 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomScan node
+ *
+ */
+typedef struct CustomScan
+{
+	Scan		scan;
+	const char *custom_name;
+	uint		custom_flags;
+	List	   *custom_private;
+	List	   *custom_exprs;
+} CustomScan;
 
 /*
  * ==========
@@ -570,6 +582,19 @@ typedef struct HashJoin
 } HashJoin;
 
 /* ----------------
+ *		CustomJoin node
+ *
+ */
+typedef struct CustomJoin
+{
+	Join		join;
+	const char *custom_name;
+	uint		custom_flags;
+	List	   *custom_private;
+	List	   *custom_exprs;
+} CustomJoin;
+
+/* ----------------
  *		materialization node
  * ----------------
  */
@@ -750,6 +775,19 @@ typedef struct Limit
 	Node	   *limitCount;		/* COUNT parameter, or NULL if none */
 } Limit;
 
+/*
+ * CustomPlan node
+ *
+ */
+typedef struct CustomPlan
+{
+	Plan		plan;
+	const char *custom_name;
+	uint		custom_flags;
+	List	   *custom_private;
+	List	   *custom_exprs;
+	List	   *custom_subplans;	/* if here is multiple subplans */
+} CustomPlan;
 
 /*
  * RowMarkType -
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a2853fb..4a3c924 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1036,6 +1036,30 @@ typedef struct HashPath
 } HashPath;
 
 /*
+ * CustomScanPath
+ *
+ */
+typedef struct CustomScanPath
+{
+	Path		path;
+	const char *custom_name;
+	uint32		custom_flags;
+	List	   *custom_private;
+} CustomScanPath;
+
+/*
+ * CustomJoinPath
+ *
+ */
+typedef struct CustomJoinPath
+{
+	JoinPath	jpath;
+	const char *custom_name;
+	uint32		custom_flags;
+	List	   *custom_private;
+} CustomJoinPath;
+
+/*
  * Restriction clause info.
  *
  * We create one of these for each AND sub-clause of a restriction condition
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9686229..735dbec 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -82,6 +82,16 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						List *pathkeys,
 						Relids required_outer,
 						List *fdw_private);
+extern CustomScanPath *create_customscan_path(PlannerInfo *root,
+                                              RelOptInfo *baserel,
+                                              double rows,
+                                              Cost startup_cost,
+                                              Cost total_cost,
+                                              List *pathkeys,
+                                              Relids required_outer,
+                                              const char *custom_name,
+                                              uint32 custom_flags,
+                                              List *custom_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
 extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
@@ -124,6 +134,22 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root,
 					 Relids required_outer,
 					 List *hashclauses);
 
+extern CustomJoinPath *create_customjoin_path(PlannerInfo *root,
+											  RelOptInfo *joinrel,
+											  JoinType jointype,
+											  double rows,
+											  Cost startup_cost,
+											  Cost total_cost,
+											  SpecialJoinInfo *sjinfo,
+											  Path *outer_path,
+											  Path *inner_path,
+											  List *restrict_clauses,
+											  List *pathkeys,
+											  Relids required_outer,
+											  const char *custom_name,
+											  uint32 custom_flags,
+											  List *custom_private);
+
 extern Path *reparameterize_path(PlannerInfo *root, Path *path,
 					Relids required_outer,
 					double loop_count);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ef93c7..e9aa199 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -29,6 +29,31 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 														  List *initial_rels);
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
+/* Hook for plugins to add custom scan path, in addition to default ones */
+typedef void (*add_scan_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *baserel,
+										RangeTblEntry *rte);
+extern PGDLLIMPORT add_scan_path_hook_type add_scan_path_hook;
+
+#define add_custom_scan_paths(root,baserel,rte)				\
+	do {													\
+		if (add_scan_path_hook)								\
+			(*add_scan_path_hook)((root),(baserel),(rte));	\
+	} while(0)
+
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *outerrel,
+										RelOptInfo *innerrel,
+										JoinType jointype,
+										SpecialJoinInfo *sjinfo,
+										List *restrictlist,
+										List *mergeclause_list,
+										SemiAntiJoinFactors *semifactors,
+										Relids param_source_rels,
+										Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,

#19

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Kohei KaiGai (#18)

Re: Custom Plan node

On Thu, Oct 3, 2013 at 3:05 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Sorry for my late response. I've tried to investigate the planner code
to find out the way to integrate this custom api, and it is still in
progress.
One special handling I found was that create_join_plan() adjust
root->curOuterRels prior to recursion of inner tree if NestLoop.
Probably, we need some flags to control these special handling
in the core.
It is a hard job to list up all the stuff, so it seems to me we need
to check-up them during code construction...

I'm pretty sure that is only a very small tip of a very large iceberg.

This above framework was exactly what I considered.
Probably, we have to put a hook on functions invoked by
set_base_rel_pathlist() to add another possible way to scan
the provided baserel, then set_cheapest() will choose the
most reasonable one.
The attached patch, it's just a works-in-progress, shows
which hook I try to put around the code. Please grep it
with "add_custom_scan_paths".

That seems fairly reasonable to me, but I think we'd want a working
demonstration

Regarding to the "guest module" of this framework, another
idea that I have is, built-in query cache module that returns
previous scan result being cached if table contents was not
updated from the previous run. Probably, it makes sense in
case when most of rows are filtered out in this scan.
Anyway, I'd like to consider something useful to demonstrate
this API.

I doubt that has any chance of working well. Supposing that query
caching is a feature we want, I don't think that plan trees are the
right place to try to install it. That sounds like something that
ought to be done before we plan and execute the query in the first
place.

I am a little less sanguine about the chances of a CustomJoin node
working out well. I agree that we need something to handle join
pushdown, but it seems to me that might be done by providing a Foreign
Scan path into the joinrel rather than by adding a concept of foreign
joins per se.

Indeed, if we have a hook on add_paths_to_joinrel(), it also makes
sense for foreign tables; probably, planner will choose foreign-path
instead of existing join node including foreign-scans.

Yes, I think it's reasonable to think about injecting a scan path into
a join node.

And I think that lumping everything else together under "not a scan or
join" has the least promise of all. Is replacing Append really the
same as replacing Sort? I think we'll need to think harder here about
what we're trying to accomplish and how to get there.

As long as extension modifies PlannedStmt on the planner_hook,
I don't think it is not difficult so much, as I demonstrate on the
previous patch.
Unlike scan or join, existing code is not designed to compare
multiple possible paths, so it seems to me a feature to adjust
a plan-tree already construct is sufficient for most usage
because extension can decide which one can offer more cheap
path than built-in ones.

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Kohei KaiGai

kaigai@kaigai.gr.jp

over 12 years ago

In reply to: Robert Haas (#19)

Re: Custom Plan node

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

I am a little less sanguine about the chances of a CustomJoin node
working out well. I agree that we need something to handle join
pushdown, but it seems to me that might be done by providing a Foreign
Scan path into the joinrel rather than by adding a concept of foreign
joins per se.

Indeed, if we have a hook on add_paths_to_joinrel(), it also makes
sense for foreign tables; probably, planner will choose foreign-path
instead of existing join node including foreign-scans.

Yes, I think it's reasonable to think about injecting a scan path into
a join node.

And I think that lumping everything else together under "not a scan or
join" has the least promise of all. Is replacing Append really the
same as replacing Sort? I think we'll need to think harder here about
what we're trying to accomplish and how to get there.

As long as extension modifies PlannedStmt on the planner_hook,
I don't think it is not difficult so much, as I demonstrate on the
previous patch.
Unlike scan or join, existing code is not designed to compare
multiple possible paths, so it seems to me a feature to adjust
a plan-tree already construct is sufficient for most usage
because extension can decide which one can offer more cheap
path than built-in ones.

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

Even though it is an off-topic in this thread....

Regarding to the "guest module" of this framework, another
idea that I have is, built-in query cache module that returns
previous scan result being cached if table contents was not
updated from the previous run. Probably, it makes sense in
case when most of rows are filtered out in this scan.
Anyway, I'd like to consider something useful to demonstrate
this API.

I doubt that has any chance of working well. Supposing that query
caching is a feature we want, I don't think that plan trees are the
right place to try to install it. That sounds like something that
ought to be done before we plan and execute the query in the first
place.

The query cache I mention above is intended to cache a part of
query tree. For example, if an ad-hoc query tries to join A,B and
C, and data analyst modifies qualifiers attached on C but it joins
A and B as usual. I though, it makes sense if we can cache the
result of the statically joined result. Idea is similar to materialized
view. If executor can pick up cached result on AxB, all it needs
to do is joining C and the cached one.
Of course, I know it may have difficulty like cache validation. :-(

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers