[v9.5] Custom Plan API

Started by Kouhei Kaigaiover 11 years ago64 messages

kaigai@ak.jp.nec.com

over 11 years ago

1 attachment(s)

Prior to the development cycle towards v9.5, I'd like to reopen
the discussion of custom-plan interface. Even though we had lots
of discussion during the last three commit-fests, several issues
are still under discussion. So, I'd like to clarify direction of
the implementation, prior to the first commit-fest.

(1) DDL support and system catalog

Simon suggested that DDL command should be supported to track custom-
plan providers being installed, and to avoid nonsense hook calls
if it is an obvious case that custom-plan provider can help. It also
makes sense to give a chance to load extensions once installed.
(In the previous design, I assumed modules are loaded by LOAD command
or *_preload_libraries parameters).

I tried to implement the following syntax:

CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;

It records a particular function as an entrypoint of custom-plan provider,
then it will be called when planner tries to find out the best path to scan
or join relations. This function takes an argument (INTERNAL type) that packs
information to construct and register an alternative scan/join path, like
PlannerInfo, RelOptInfo and so on.

(*) The data structure below will be supplied, in case of scan path.
typedef struct {
uint32 custom_class;
PlannerInfo *root;
RelOptInfo *baserel;
RangeTblEntry *rte;
} customScanArg;

This function, usually implemented with C-language, can construct a custom
object being delivered from CustomPath type that contains a set of function
pointers; including functions that populate another objects delivered from
CustomPlan or CustomPlanState as I did in the patch towards v9.4 development.

Properties of individual custom-plan providers are recorded in the
pg_custom_plan system catalog. Right now, its definition is quite simple
- only superuser can create / drop custom-plan providers, and its definition
does not belong to a particular namespace.
Because of this assumption (only superuser can touch), I don't put database
ACL mechanism here.
What kind of characteristics should be there?

(2) Static functions to be exported

Tom concerned that custom-plan API needs several key functions can be
called by extensions, although these are declared as static functions,
thus, it looks like a part of interfaces.
Once people thought it is stable ones we can use beyond the version up,
it may become a barrier to the future improvement in the core code.
Is it a right understanding, isn't it?

One solution is to write a notice clearly, like: "these external functions
are not stable interfaces, so extension should not assumed these functions
are available beyond future version up".

Nevertheless, more stable functions are more kindness for authors of extensions.
So, I tried a few approaches.

First of all, we categorized functions into three categories.
(A) It walks on plan/expression tree recursively.
(B) It modifies internal state of the core backend.
(C) It is commonly used but in a particular source file.

Although the number of functions are not so many, (A) and (B) must have
its entrypoint from extensions. If unavailable, extension needs to manage
a copied code with small enhancement by itself, and its burden is similar
to just branching the tree.
Example of (A) are: create_plan_recurse, set_plan_refs, ...
Example of (B) are: fix_expr_common, ...

On the other hands, (C) functions are helpful if available, however, it
is not mandatory requirement to implement.

Our first trial, according to the proposition by Tom, is to investigate
a common walker function on plan tree as we are now doing on expression
tree. We expected, we can give function pointers of key routines to
extensions, instead of exporting the static functions.
However, it didn't work well because existing recursive call takes
various kind of jobs for each plan-node type, so it didn't fit a structure
of walker functions; that applies a uniform operation for each node.

Note that, I assumed the following walker functions that applies plan_walker
or expr_walker on the underlying plan/expression trees.
bool
plan_tree_walker(Plan *plan,
bool (*plan_walker) (),
bool (*expr_walker) (),
void *context)
Please tell me if it is different from your ideas, I'll reconsider it.

On the next, I tried another approach that gives function pointers of
(A) and (B) functions as a part of custom-plan interface.
It is workable at least, however, it seems to me its interface definition
has advantage in comparison to the original approach.

For example, below is definition of the callback in setref.c.

+   void    (*SetCustomPlanRef)(PlannerInfo *root,
+                               CustomPlan *custom_plan,
+                               int rtoffset,
+                               Plan *(*fn_set_plan_refs)(PlannerInfo *root,
+                                                         Plan *plan,
+                                                         int rtoffset),
+                               void (*fn_fix_expr_common)(PlannerInfo *root,
+                                                          Node *node));

Extension needs set_plan_refs() and fix_expr_common() at least, I added
function pointers of them. But this definition has to be updated according
to the future update of these functions. It does not seem to me a proper
way to smooth the impact of future internal change.

So, I'd like to find out where is a good common ground to solve the matter.

One idea is the first simple solution. The core PostgreSQL will be developed
independently from the out-of-tree modules, so we don't care about stability
of declaration of internal functions, even if it is exported to multiple
source files. (I believe it is our usual manner.)

One other idea is, a refactoring of the core backend to consolidate routines
per plan-node, not processing stage. For example, createplan.c contains most
of codes commonly needed to create plan, in addition to individual plan node.
Let's assume a function like create_seqscan_plan() are located in a separated
source file, then routines to be exported become clear.
One expected disadvantage is, this refactoring makes complicated to back patches.

Do you have any other ideas to implement it well?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp]
Sent: Tuesday, April 29, 2014 10:07 AM
To: Kaigai Kouhei(海外浩平)
Cc: Tom Lane; Andres Freund; Robert Haas; Simon Riggs; PgHacker; Stephen
Frost; Shigeru Hanada; Jim Mlodgenski; Peter Eisentraut
Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Yeah. I'm still not exactly convinced that custom-scan will ever
allow independent development of new plan types (which, with all due
respect to Robert, is what it was being sold as last year in Ottawa).
But I'm not opposed in principle to committing it, if we can find a
way to have a cleaner API for things like setrefs.c. It seems like
late-stage planner processing in general is an issue for this patch
(createplan.c and subselect.c are also looking messy). EXPLAIN isn't

too great either.

I'm not sure exactly what to do about those cases, but I wonder
whether things would get better if we had the equivalent of
expression_tree_walker/mutator capability for plan nodes. The state
of affairs in setrefs and subselect, at least, is a bit reminiscent
of the bad old days when we had lots of different bespoke code for
traversing expression trees.

Hmm. If we have something like expression_tree_walker/mutator for plan
nodes, we can pass a walker/mutator function's pointer instead of
exposing static functions that takes recursive jobs.
If custom-plan provider (that has sub-plans) got a callback with
walker/ mutator pointer, all it has to do for sub-plans are calling
this new plan-tree walking support routine with supplied walker/mutator.
It seems to me more simple design than what I did.

I tried to code the similar walker/mutator functions on plan-node tree,
however, it was not available to implement these routines enough simple,
because the job of walker/mutator functions are not uniform thus caller
side also must have a large switch-case branches.

I picked up setrefs.c for my investigation.
The set_plan_refs() applies fix_scan_list() on the expression tree being
appeared in the plan node if it is delivered from Scan, however, it also
applies set_join_references() for subclass of Join, or
set_dummy_tlist_references() for some other plan nodes.
It implies that the walker/mutator functions of Plan node has to apply
different operation according to the type of Plan node. I'm not certain
how much different forms are needed.
(In addition, set_plan_refs() performs usually like a walker, but often
performs as a mutator if trivial subquery....)

I'm expecting the function like below. It allows to call plan_walker
function for each plan-node and also allows to call expr_walker function
for each expression-node on the plan node.

bool
plan_tree_walker(Plan *plan,
bool (*plan_walker) (),
bool (*expr_walker) (),
void *context)

I'd like to see if something other form to implement this routine.

One alternative idea to give custom-plan provider a chance to handle its
subplans is, to give function pointers (1) to handle recursion of plan-tree
and (2) to set up backend's internal state.
In case of setrefs.c, set_plan_refs() and fix_expr_common() are minimum
necessity for extensions. It also kills necessity to export static
functions.

How about your thought?
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.5-custom-plan-with-ctidscan.v0.patchapplication/octet-stream; name=pgsql-v9.5-custom-plan-with-ctidscan.v0.patchDownload

 contrib/ctidscan/Makefile                  |   19 +
 contrib/ctidscan/ctidscan--1.0.sql         |   12 +
 contrib/ctidscan/ctidscan.c                | 1695 ++++++++++++++++++++++++++++
 contrib/ctidscan/ctidscan.control          |    5 +
 contrib/ctidscan/expected/ctidscan.out     |  294 +++++
 contrib/ctidscan/sql/ctidscan.sql          |   50 +
 src/backend/catalog/Makefile               |    2 +-
 src/backend/catalog/dependency.c           |   12 +-
 src/backend/catalog/objectaddress.c        |   60 +
 src/backend/commands/Makefile              |    2 +-
 src/backend/commands/custom_plan.c         |  335 ++++++
 src/backend/commands/dropcmds.c            |    5 +
 src/backend/commands/explain.c             |   38 +
 src/backend/executor/Makefile              |    2 +-
 src/backend/executor/execAmi.c             |   24 +
 src/backend/executor/execProcnode.c        |   19 +
 src/backend/executor/nodeCustom.c          |  100 ++
 src/backend/nodes/copyfuncs.c              |   47 +
 src/backend/nodes/equalfuncs.c             |   14 +
 src/backend/nodes/outfuncs.c               |   39 +
 src/backend/optimizer/path/allpaths.c      |   31 +-
 src/backend/optimizer/path/joinpath.c      |    7 +
 src/backend/optimizer/plan/createplan.c    |   30 +
 src/backend/optimizer/plan/setrefs.c       |   14 +-
 src/backend/optimizer/plan/subselect.c     |   21 +-
 src/backend/parser/gram.y                  |   91 +-
 src/backend/tcop/utility.c                 |   16 +
 src/backend/utils/adt/ruleutils.c          |   52 +
 src/backend/utils/cache/syscache.c         |   23 +
 src/include/catalog/dependency.h           |    1 +
 src/include/catalog/indexing.h             |    6 +
 src/include/catalog/pg_custom_plan.h       |   51 +
 src/include/catalog/pg_operator.h          |    3 +
 src/include/commands/custom_plan.h         |   59 +
 src/include/executor/nodeCustom.h          |   30 +
 src/include/nodes/execnodes.h              |   41 +
 src/include/nodes/nodes.h                  |    5 +
 src/include/nodes/parsenodes.h             |   13 +
 src/include/nodes/plannodes.h              |   51 +
 src/include/nodes/relation.h               |   32 +
 src/include/optimizer/subselect.h          |    6 +
 src/include/parser/kwlist.h                |    3 +
 src/include/utils/builtins.h               |    2 +
 src/include/utils/syscache.h               |    2 +
 src/test/regress/expected/sanity_check.out |    1 +
 45 files changed, 3327 insertions(+), 38 deletions(-)

diff --git a/contrib/ctidscan/Makefile b/contrib/ctidscan/Makefile
new file mode 100644
index 0000000..1e476a6
--- /dev/null
+++ b/contrib/ctidscan/Makefile
@@ -0,0 +1,19 @@
+# contrib/ctidscan/Makefile
+
+MODULES = ctidscan
+
+EXTENSION = ctidscan
+DATA = ctidscan--1.0.sql
+
+REGRESS = ctidscan
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/ctidscan
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/ctidscan/ctidscan--1.0.sql b/contrib/ctidscan/ctidscan--1.0.sql
new file mode 100644
index 0000000..5aca9b9
--- /dev/null
+++ b/contrib/ctidscan/ctidscan--1.0.sql
@@ -0,0 +1,12 @@
+--
+-- Create ctidscan handler function
+--
+CREATE FUNCTION ctidscanaddpath(internal)
+  RETURNS pg_catalog.void
+  AS 'MODULE_PATHNAME','CtidScanAddPath'
+  LANGUAGE C STRICT;
+
+--
+-- Create a custom-plan provider
+--
+CREATE CUSTOM PLAN ctidscan FOR scan HANDLER ctidscanaddpath;
diff --git a/contrib/ctidscan/ctidscan.c b/contrib/ctidscan/ctidscan.c
new file mode 100644
index 0000000..5bf7ba6
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.c
@@ -0,0 +1,1695 @@
+/*
+ * ctidscan.c
+ *
+ * Definition of Custom TidScan implementation.
+ *
+ * It is designed to demonstrate Custom Scan APIs; that allows to override
+ * a part of executor node. This extension focus on a workload that tries
+ * to fetch records with tid larger or less than a particular value.
+ * In case when inequality operators were given, this module construct
+ * a custom scan path that enables to skip records not to be read. Then,
+ * if it was the cheapest one, it shall be used to run the query.
+ * Custom Scan APIs callbacks this extension when executor tries to fetch
+ * underlying records, then it utilizes existing heap_getnext() but seek
+ * the records to be read prior to fetching the first record.
+ *
+ * Portions Copyright (c) 2014, PostgreSQL Global Development Group
+ */
+#include "postgres.h"
+#include "access/relscan.h"
+#include "access/sysattr.h"
+#include "catalog/pg_custom_plan.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/custom_plan.h"
+#include "commands/explain.h"
+#include "executor/executor.h"
+#include "executor/nodeCustom.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/cost.h"
+#include "optimizer/paths.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
+#include "optimizer/placeholder.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/subselect.h"
+#include "parser/parsetree.h"
+#include "storage/bufmgr.h"
+#include "storage/itemptr.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/spccache.h"
+
+PG_MODULE_MAGIC;
+
+typedef struct {
+	CustomPath		cpath;
+	List		   *ctid_quals;
+} CtidScanPath;
+
+typedef struct {
+	CustomPlan		cplan;
+	Index			scanrelid;
+	List		   *ctid_quals;
+} CtidScanPlan;
+
+typedef struct {
+	CustomPlanState	cps;
+	Index			scanrelid;		/* range table index of the relation */
+	Relation		scan_rel;		/* relation to be scanned */
+	HeapScanDesc	scan_desc;		/* scan-descriptor of the relation */
+	TupleTableSlot *scan_slot;		/* tuple slot for scanning */
+	List		   *ctid_quals;		/* list of ExprState for inequality ops */
+	ItemPointerData	ip_min;			/* minimum ItemPointer */
+	ItemPointerData	ip_max;			/* maximum ItemPointer */
+	int				ip_min_comp;	/* comparison policy to ip_min */
+	int				ip_max_comp;	/* comparison policy to ip_max */
+	bool			ip_needs_eval;	/* true, if needs to seek again */
+} CtidScanPlanState;
+
+static CustomPathMethods	ctidscan_path_methods;
+static CustomPlanMethods	ctidscan_plan_methods;
+static CustomPlanStateMethods ctidscan_ps_methods;
+
+/* function declarations */
+void	_PG_init(void);
+Datum	CtidScanAddPath(PG_FUNCTION_ARGS);
+
+/*
+ * Functions below are copied from the core PostgreSQL because of
+ * static declaration. Even though it is helpful common part for
+ * custom-plan providers, however, one argument from other side is
+ * it may prevent future improvement if people thought these are
+ * well defined API set.
+ * So, right now, we copied these implementation from the core.
+ */
+static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
+static List *order_qual_clauses(PlannerInfo *root, List *clauses);
+static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
+static List *fix_scan_list(PlannerInfo *root, List *node, int rtoffset,
+						   void (*fn_fix_expr_common)(
+							   PlannerInfo *root,
+							   Node *node));
+static bool tlist_matches_tupdesc(PlanState *ps, List *tlist,
+								  Index varno, TupleDesc tupdesc);
+static List *build_path_tlist(PlannerInfo *root, Path *path);
+
+#define IsCTIDVar(node,rtindex)											\
+	((node) != NULL &&													\
+	 IsA((node), Var) &&												\
+	 ((Var *) (node))->varno == (rtindex) &&							\
+	 ((Var *) (node))->varattno == SelfItemPointerAttributeNumber &&	\
+	 ((Var *) (node))->varlevelsup == 0)
+
+/*
+ * CTidQualFromExpr
+ *
+ * It checks whether the given restriction clauses enables to determine
+ * the zone to be scanned, or not. If one or more restriction clauses are
+ * available, it returns a list of them, or NIL elsewhere.
+ * The caller can consider all the conditions are chained with AND-
+ * boolean operator, so all the operator works for narrowing down the
+ * scope of custom tid scan.
+ */
+static List *
+CTidQualFromExpr(Node *expr, int varno)
+{
+	if (is_opclause(expr))
+	{
+		OpExpr *op = (OpExpr *) expr;
+		Node   *arg1;
+		Node   *arg2;
+		Node   *other = NULL;
+
+		/* only inequality operators are candidate */
+		if (op->opno != TIDLessOperator &&
+			op->opno != TIDLessEqualOperator &&
+			op->opno != TIDGreaterOperator &&
+			op->opno != TIDGreaterEqualOperator)
+			return NULL;
+
+		if (list_length(op->args) != 2)
+			return false;
+
+		arg1 = linitial(op->args);
+		arg2 = lsecond(op->args);
+
+		if (IsCTIDVar(arg1, varno))
+			other = arg2;
+		else if (IsCTIDVar(arg2, varno))
+			other = arg1;
+		else
+			return NULL;
+		if (exprType(other) != TIDOID)
+			return NULL;	/* probably can't happen */
+		/* The other argument must be a pseudoconstant */
+		if (!is_pseudo_constant_clause(other))
+			return NULL;
+
+		return list_make1(copyObject(op));
+	}
+	else if (and_clause(expr))
+	{
+		List	   *rlst = NIL;
+		ListCell   *lc;
+
+		foreach(lc, ((BoolExpr *) expr)->args)
+		{
+			List   *temp = CTidQualFromExpr((Node *) lfirst(lc), varno);
+
+			rlst = list_concat(rlst, temp);
+		}
+		return rlst;
+	}
+	return NIL;
+}
+
+/*
+ * CTidEstimateCosts
+ *
+ * It estimates cost to scan the target relation according to the given
+ * restriction clauses. Its logic to scan relations are almost same as
+ * SeqScan doing, because it uses regular heap_getnext(), except for
+ * the number of tuples to be scanned if restriction clauses work well.
+*/
+static void
+CTidEstimateCosts(PlannerInfo *root,
+				  RelOptInfo *baserel,
+				  CtidScanPath *ctid_path)
+{
+	Path	   *path = &ctid_path->cpath.path;
+	List	   *ctid_quals = ctid_path->ctid_quals;
+	ListCell   *lc;
+	double		ntuples;
+	ItemPointerData ip_min;
+	ItemPointerData ip_max;
+	bool		has_min_val = false;
+	bool		has_max_val = false;
+	BlockNumber	num_pages;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_per_tuple;
+	QualCost	qpqual_cost;
+	QualCost	ctid_qual_cost;
+	double		spc_random_page_cost;
+
+	/* Should only be applied to base relations */
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/* Mark the path with the correct row estimate */
+	if (path->param_info)
+		path->rows = path->param_info->ppi_rows;
+	else
+		path->rows = baserel->rows;
+
+	/* Estimate how many tuples we may retrieve */
+	ItemPointerSet(&ip_min, 0, 0);
+	ItemPointerSet(&ip_max, MaxBlockNumber, MaxOffsetNumber);
+	foreach (lc, ctid_quals)
+	{
+		OpExpr	   *op = lfirst(lc);
+		Oid			opno;
+		Node	   *other;
+
+		Assert(is_opclause(op));
+		if (IsCTIDVar(linitial(op->args), baserel->relid))
+		{
+			opno = op->opno;
+			other = lsecond(op->args);
+		}
+		else if (IsCTIDVar(lsecond(op->args), baserel->relid))
+		{
+			/* To simplifies, we assume as if Var node is 1st argument */
+			opno = get_commutator(op->opno);
+			other = linitial(op->args);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		if (IsA(other, Const))
+		{
+			ItemPointer	ip = (ItemPointer)(((Const *) other)->constvalue);
+
+			/*
+			 * Just an rough estimation, we don't distinct inequality and
+			 * inequality-or-equal operator.
+			 */
+			switch (opno)
+			{
+				case TIDLessOperator:
+				case TIDLessEqualOperator:
+					if (ItemPointerCompare(ip, &ip_max) < 0)
+						ItemPointerCopy(ip, &ip_max);
+					has_max_val = true;
+					break;
+				case TIDGreaterOperator:
+				case TIDGreaterEqualOperator:
+					if (ItemPointerCompare(ip, &ip_min) > 0)
+						ItemPointerCopy(ip, &ip_min);
+					has_min_val = true;
+					break;
+				default:
+					elog(ERROR, "unexpected operator code: %u", op->opno);
+					break;
+			}
+		}
+	}
+
+	/* estimated number of tuples in this relation */
+	ntuples = baserel->pages * baserel->tuples;
+
+	if (has_min_val && has_max_val)
+	{
+		/* case of both side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_min_val)
+	{
+		/* case of only lower side being bounded */
+		BlockNumber	bnum_max = baserel->pages;
+		BlockNumber	bnum_min = BlockIdGetBlockNumber(&ip_min.ip_blkid);
+
+		bnum_min = Max(bnum_min, 0);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else if (has_max_val)
+	{
+		/* case of only upper side being bounded */
+		BlockNumber	bnum_max = BlockIdGetBlockNumber(&ip_max.ip_blkid);
+		BlockNumber	bnum_min = 0;
+
+		bnum_max = Min(bnum_max, baserel->pages);
+		num_pages = Min(bnum_max - bnum_min + 1, 1);
+	}
+	else
+	{
+		/*
+		 * Just a rough estimation. We assume half of records shall be
+		 * read using this restriction clause, but undeterministic untill
+		 * executor run it actually.
+		 */
+		num_pages = Max((baserel->pages + 1) / 2, 1);
+	}
+	ntuples *= ((double) num_pages) / ((double) baserel->pages);
+
+	/*
+	 * The TID qual expressions will be computed once, any other baserestrict
+	 * quals once per retrieved tuple.
+	 */
+	cost_qual_eval(&ctid_qual_cost, ctid_quals, root);
+
+	/* fetch estimated page cost for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  NULL);
+
+	/* disk costs --- assume each tuple on a different page */
+	run_cost += spc_random_page_cost * ntuples;
+
+	/*
+	 * Add scanning CPU costs
+	 * (logic copied from get_restriction_qual_cost)
+	 */
+	if (path->param_info)
+	{
+		/* Include costs of pushed-down clauses */
+		cost_qual_eval(&qpqual_cost, path->param_info->ppi_clauses, root);
+
+		qpqual_cost.startup += baserel->baserestrictcost.startup;
+		qpqual_cost.per_tuple += baserel->baserestrictcost.per_tuple;
+	}
+	else
+		qpqual_cost = baserel->baserestrictcost;
+
+	/*
+	 * We don't decrease cost for the inequality operators, because
+	 * it is subset of qpquals and still in.
+	 */
+	startup_cost += qpqual_cost.startup + ctid_qual_cost.per_tuple;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple -
+		ctid_qual_cost.per_tuple;
+	run_cost = cpu_per_tuple * ntuples;
+
+	path->startup_cost = startup_cost;
+	path->total_cost = startup_cost + run_cost;
+}
+
+/*
+ * CreateCtidScanPlan - A method of CustomPath; that populate a custom
+ * object being delivered from CustomPlan type, according to the supplied
+ * CustomPath object.
+ */
+static CustomPlan *
+CreateCtidScanPlan(PlannerInfo *root,
+				   CustomPath *best_path,
+				   Plan *(*fn_create_plan)(
+					   PlannerInfo *root,
+					   Path *best_path))
+{
+	CtidScanPath   *ctid_path = (CtidScanPath *) best_path;
+	CtidScanPlan   *ctid_plan;
+	Path		   *path = &ctid_path->cpath.path;
+	RelOptInfo	   *rel = path->parent;
+	List		   *tlist;
+	List		   *scan_clauses;
+
+	Assert(rel->relid > 0);
+
+	/*
+	 * XXX - logic was copied from create_scan_plan
+	 */
+	if (use_physical_tlist(root, rel))
+	{
+		tlist = build_physical_tlist(root, rel);
+		if (tlist == NIL)
+			tlist = build_path_tlist(root, path);
+	}
+	else
+		tlist = build_path_tlist(root, path);
+
+	scan_clauses = rel->baserestrictinfo;
+	if (path->param_info)
+		scan_clauses = list_concat(list_copy(scan_clauses),
+								   path->param_info->ppi_clauses);
+	/* Sort clauses into best execution order */
+	scan_clauses = order_qual_clauses(root, scan_clauses);
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	scan_clauses = extract_actual_clauses(scan_clauses, false);
+	/* Replace any outer-relation variables with nestloop params */
+	if (path->param_info)
+	{
+		scan_clauses = (List *)
+			replace_nestloop_params(root, (Node *) scan_clauses);
+	}
+
+	/* OK, set up CtidScanPlan */
+	ctid_plan = palloc0(sizeof(CtidScanPlan));
+	ctid_plan->cplan.plan.type = T_CustomPlan;
+	ctid_plan->cplan.plan.targetlist = tlist;
+	ctid_plan->cplan.plan.qual = scan_clauses;
+	ctid_plan->cplan.methods = &ctidscan_plan_methods;
+	ctid_plan->scanrelid = rel->relid;
+	ctid_plan->ctid_quals = ctid_path->ctid_quals;
+
+	return &ctid_plan->cplan;
+}
+
+/*
+ * TextOutCtidScanPath - A method of CustomPath; that shows a text
+ * representation of the supplied CustomPath object.
+ */
+static void
+TextOutCtidScanPath(StringInfo str, CustomPath *cpath)
+{
+	CtidScanPath   *ctid_path = (CtidScanPath *)cpath;
+
+	appendStringInfo(str, " :ctid_quals %s",
+					 nodeToString(ctid_path->ctid_quals));
+}
+
+/*
+ * SetCtidScanPlanRef - A method of CustomPlan; that fixes up rtindex
+ * of Var nodes
+ */
+static void
+SetCtidScanPlanRef(PlannerInfo *root,
+				   CustomPlan *custom_plan,
+				   int rtoffset,
+				   Plan *(*fn_set_plan_refs)(PlannerInfo *root,
+											 Plan *plan,
+											 int rtoffset),
+				   void (*fn_fix_expr_common)(PlannerInfo *root,
+											  Node *node))
+{
+	CtidScanPlan   *ctid_plan = (CtidScanPlan *) custom_plan;
+
+	ctid_plan->scanrelid += rtoffset;
+	ctid_plan->cplan.plan.targetlist =
+		fix_scan_list(root, ctid_plan->cplan.plan.targetlist, rtoffset,
+					  fn_fix_expr_common);
+	ctid_plan->cplan.plan.qual =
+		fix_scan_list(root, ctid_plan->cplan.plan.qual, rtoffset,
+					  fn_fix_expr_common);
+	ctid_plan->ctid_quals =
+		fix_scan_list(root, ctid_plan->ctid_quals, rtoffset,
+					  fn_fix_expr_common);
+}
+
+/*
+ * SupportCtidBackwardScan - A method of CustomPlan; that informs the core
+ * backend whether this custom-plan node support backward scan or not.
+ */
+static bool
+SupportCtidBackwardScan(CustomPlan *custom_plan)
+{
+	return true;
+}
+
+/*
+ * FinalizeCtidScanPlan - A method of CustomPlan; that handles subselect.c
+ */
+static Bitmapset *
+FinalizeCtidScanPlan(PlannerInfo *root,
+					 CustomPlan *custom_plan,
+					 Bitmapset *paramids,
+					 Bitmapset *valid_params,
+					 Bitmapset *scan_params,
+					 Bitmapset *(*fn_finalize_plan)(
+						 PlannerInfo *root,
+						 Plan *plan,
+						 Bitmapset *valid_params,
+						 Bitmapset *scan_params),
+					 Bitmapset *(*fn_finalize_primnode)(
+						 Node *node,
+						 struct finalize_primnode_context *context))
+{
+	finalize_primnode_context context;
+	CtidScanPlan   *ctid_plan = (CtidScanPlan *) custom_plan;
+
+	context.root = root;
+	context.paramids = paramids;
+
+	/* applies finalize_primnode() on ctid_quals also */
+	fn_finalize_primnode((Node *)ctid_plan->ctid_quals, &context);
+
+	return bms_add_members(paramids, scan_params);
+}
+
+/*
+ * CreateCtidScanPlanState - A method of CustomPlan; that populate a custom
+ * object being delivered from CustomPlanState type, according to the supplied
+ * CustomPath object.
+ */
+static CustomPlanState *
+CreateCtidScanPlanState(CustomPlan *custom_plan)
+{
+	CtidScanPlanState  *ctps = palloc0(sizeof(CtidScanPlanState));
+
+	ctps->cps.ps.type = T_CustomPlanState;
+	ctps->cps.methods = &ctidscan_ps_methods;
+
+	return &ctps->cps;
+}
+
+/*
+ * TextOutCtidScanPlan - A method of CustomPlan; that generates text
+ * representation of the given object.
+ */
+static void
+TextOutCtidScanPlan(StringInfo str, const CustomPlan *node)
+{
+	CtidScanPlan   *ctid_plan = (CtidScanPlan *) node;
+
+	appendStringInfo(str, " :scanrelid %u",
+					 ctid_plan->scanrelid);
+	appendStringInfo(str, " :ctid_quals %s",
+					 nodeToString(ctid_plan->ctid_quals));
+}
+
+/*
+ * CopyCtidScanPlan - A method of CustomPlan; that create a copied object.
+ */
+static CustomPlan *
+CopyCtidScanPlan(const CustomPlan *from,
+				 void (*fn_CopyPlanFields)(const Plan *from,
+										   Plan *newnode))
+{
+	CtidScanPlan   *oldnode = (CtidScanPlan *) from;
+	CtidScanPlan   *newnode = palloc0(sizeof(CtidScanPlan));
+
+	newnode->cplan.plan.type = T_CustomPlan;
+
+	fn_CopyPlanFields(&oldnode->cplan.plan, &newnode->cplan.plan);
+
+	newnode->scanrelid = oldnode->scanrelid;
+	newnode->ctid_quals = copyObject(oldnode->ctid_quals);
+
+	return &newnode->cplan;
+}
+
+
+static bool
+CTidEvalScanZone(CtidScanPlanState *ctps)
+{
+	ExprContext	   *econtext = ctps->cps.ps.ps_ExprContext;
+	ListCell	   *lc;
+
+	/*
+	 * See ItemPointerCompare(), ip_max_comp shall be usually either 1 or
+	 * 0 if tid of fetched records are larger than or equal with ip_min.
+	 * To detect end of scan, we shall check whether the result of
+	 * ItemPointerCompare() is less than ip_max_comp, so it never touch
+	 * the point if ip_max_comp is -1, because all the result is either
+	 * 1, 0 or -1. So, it is same as "open ended" as if no termination
+	 * condition was set.
+	 */
+	ctps->ip_min_comp = -1;
+	ctps->ip_max_comp = 1;
+
+	/* Walks on the inequality operators */
+	foreach (lc, ctps->ctid_quals)
+	{
+		FuncExprState  *fexstate = (FuncExprState *) lfirst(lc);
+		OpExpr		   *op = (OpExpr *)fexstate->xprstate.expr;
+		Node		   *arg1 = linitial(op->args);
+		Node		   *arg2 = lsecond(op->args);
+		Oid				opno;
+		ExprState	   *exstate;
+		ItemPointer		itemptr;
+		bool			isnull;
+
+		if (IsCTIDVar(arg1, ctps->scanrelid))
+		{
+			exstate = (ExprState *) lsecond(fexstate->args);
+			opno = op->opno;
+		}
+		else if (IsCTIDVar(arg2, ctps->scanrelid))
+		{
+			exstate = (ExprState *) linitial(fexstate->args);
+			opno = get_commutator(op->opno);
+		}
+		else
+			elog(ERROR, "could not identify CTID variable");
+
+		itemptr = (ItemPointer)
+			DatumGetPointer(ExecEvalExprSwitchContext(exstate,
+													  econtext,
+													  &isnull,
+													  NULL));
+		if (!isnull)
+		{
+			/*
+			 * OK, we could calculate a particular TID that should be
+			 * larger than, less than or equal with fetched record, thus,
+			 * it allows to determine upper or lower bounds of this scan.
+			 */
+			switch (opno)
+			{
+			case TIDLessOperator:
+				if (ctps->ip_max_comp > 0 ||
+					ItemPointerCompare(itemptr, &ctps->ip_max) <= 0)
+				{
+					ItemPointerCopy(itemptr, &ctps->ip_max);
+					ctps->ip_max_comp = -1;
+				}
+				break;
+			case TIDLessEqualOperator:
+				if (ctps->ip_max_comp > 0 ||
+					ItemPointerCompare(itemptr, &ctps->ip_max) < 0)
+				{
+					ItemPointerCopy(itemptr, &ctps->ip_max);
+					ctps->ip_max_comp = 0;
+				}
+				break;
+			case TIDGreaterOperator:
+				if (ctps->ip_min_comp < 0 ||
+					ItemPointerCompare(itemptr, &ctps->ip_min) >= 0)
+				{
+					ItemPointerCopy(itemptr, &ctps->ip_min);
+					ctps->ip_min_comp = 0;
+				}
+				break;
+			case TIDGreaterEqualOperator:
+				if (ctps->ip_min_comp < 0 ||
+					ItemPointerCompare(itemptr, &ctps->ip_min) > 0)
+				{
+					ItemPointerCopy(itemptr, &ctps->ip_min);
+					ctps->ip_min_comp = 1;
+				}
+				break;
+			default:
+				elog(ERROR, "unsupported operator");
+				break;
+			}
+		}
+		else
+		{
+			/*
+			 * Whole of the restriction clauses chained with AND- boolean
+			 * operators because false, if one of the clauses has NULL result.
+			 * So, we can immediately break the evaluation to inform caller
+			 * it does not make sense to scan any more.
+			 */
+			return false;
+		}
+	}
+	return true;
+}
+
+/*
+ * BeginCtidScan - A method of CustomPlanState; that initializes
+ * the supplied CtidScanPlanState object, at begining of the executor.
+ */
+static void
+BeginCtidScan(CustomPlanState *node, EState *estate, int eflags)
+{
+	CtidScanPlanState *ctps = (CtidScanPlanState *) node;
+	CtidScanPlan   *ctid_plan = (CtidScanPlan *) node->ps.plan;
+	Index			scanrelid = ctid_plan->scanrelid;
+	List		   *tlist;
+	TupleDesc		tupdesc;
+
+	ctps->ctid_quals = (List *)
+		ExecInitExpr((Expr *)ctid_plan->ctid_quals, &node->ps);
+
+	/* Do nothing anymore in EXPLAIN (no ANALYZE) case. */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return;
+
+	/* open relation and scan descriptor, then assign tuple slot */
+	ctps->scanrelid = scanrelid;
+	ctps->scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+	ctps->scan_desc = heap_beginscan(ctps->scan_rel,
+									 estate->es_snapshot, 0, NULL);
+	ctps->scan_slot = ExecAllocTableSlot(&estate->es_tupleTable);
+	ExecSetSlotDescriptor(ctps->scan_slot,
+						  RelationGetDescr(ctps->scan_rel));
+
+	/*
+	 * Logic copied from ExecAssignScanProjectionInfo - If target-list
+	 * is equivalent to the relation's definition, clear the projection
+	 * info.
+	 */
+	tlist = ctid_plan->cplan.plan.targetlist;
+	tupdesc = ctps->scan_slot->tts_tupleDescriptor;
+	if (tlist_matches_tupdesc(&node->ps, tlist, scanrelid, tupdesc))
+		node->ps.ps_ProjInfo = NULL;
+	else
+		ExecAssignProjectionInfo(&node->ps, tupdesc);
+
+	/* enforce to evaluate item-pointer on the first iteration */
+	ctps->ip_needs_eval = true;
+}
+
+/*
+ * CTidSeekPosition
+ *
+ * It seeks current scan position into a particular point we specified.
+ * Next heap_getnext() will fetch a record from the point we sought.
+ * It returns false, if specified position was out of range thus does not
+ * make sense to scan any mode. Elsewhere, true shall be return.
+ */
+static bool
+CTidSeekPosition(HeapScanDesc scan, ItemPointer pos, ScanDirection direction)
+{
+	BlockNumber		bnum = BlockIdGetBlockNumber(&pos->ip_blkid);
+	ItemPointerData	save_mctid;
+	int				save_mindex;
+
+	Assert(direction == BackwardScanDirection ||
+		   direction == ForwardScanDirection);
+
+	/*
+	 * In case when block-number is out of the range, it is obvious that
+	 * no tuples shall be fetched if forward scan direction. On the other
+	 * hand, we have nothing special for backward scan direction.
+	 * Note that heap_getnext() shall return NULL tuple just after
+	 * heap_rescan() if NoMovementScanDirection is given. Caller of this
+	 * function override scan direction if 'true' was returned, so it makes
+	 * this scan terminated immediately.
+	 */
+	if (bnum >= scan->rs_nblocks)
+	{
+		heap_rescan(scan, NULL);
+		/* Termination of this scan immediately */
+		if (direction == ForwardScanDirection)
+			return true;
+		/* Elsewhere, backward scan from the beginning */
+		return false;
+	}
+
+	/* save the marked position */
+	ItemPointerCopy(&scan->rs_mctid, &save_mctid);
+	save_mindex = scan->rs_mindex;
+
+	/*
+	 * Ensure the block that includes the position shall be loaded on
+	 * heap_restrpos(). Because heap_restrpos() internally calls
+	 * heapgettup() or heapgettup_pagemode() that kicks heapgetpage()
+	 * when rs_cblock is different from the block number being pointed
+	 * by rs_mctid, it makes sense to put invalid block number not to
+	 * match previous value.
+	 */
+	scan->rs_cblock = InvalidBlockNumber;
+
+	/* Put a pseudo value as if heap_markpos() save a position. */
+	ItemPointerCopy(pos, &scan->rs_mctid);
+	if (scan->rs_pageatatime)
+		scan->rs_mindex = ItemPointerGetOffsetNumber(pos) - 1;
+
+	/* Seek to the point */
+	heap_restrpos(scan);
+
+	/* restore the marked position */
+	ItemPointerCopy(&save_mctid, &scan->rs_mctid);
+	scan->rs_mindex = save_mindex;
+
+	return true;
+}
+
+/*
+ * CTidAccessCustomScan
+ *
+ * Access method of ExecCtidScan below. It fetches a tuple from the underlying
+ * heap scan that was  started from the point according to the tid clauses.
+ */
+static TupleTableSlot *
+CTidAccessCustomScan(CustomPlanState *node)
+{
+	CtidScanPlanState *ctps = (CtidScanPlanState *) node;
+	HeapScanDesc	scan = ctps->scan_desc;
+	TupleTableSlot *slot = ctps->scan_slot;
+	EState		   *estate = node->ps.state;
+	ScanDirection	direction = estate->es_direction;
+	HeapTuple		tuple;
+
+	if (ctps->ip_needs_eval)
+	{
+		/* It terminates this scan, if result set shall be obvious empty. */
+		if (!CTidEvalScanZone(ctps))
+			return NULL;
+
+		if (direction == ForwardScanDirection)
+		{
+			/* seek to the point if min-tid was obvious */
+			if (ctps->ip_min_comp != -1)
+			{
+				if (CTidSeekPosition(scan, &ctps->ip_min, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else if (direction == BackwardScanDirection)
+		{
+			/* seek to the point if max-tid was obvious */
+			if (ctps->ip_max_comp != 1)
+			{
+				if (CTidSeekPosition(scan, &ctps->ip_max, direction))
+					direction = NoMovementScanDirection;
+			}
+			else if (scan->rs_inited)
+				heap_rescan(scan, NULL);
+		}
+		else
+			elog(ERROR, "unexpected scan direction");
+
+		ctps->ip_needs_eval = false;
+	}
+
+	/*
+	 * get the next tuple from the table
+	 */
+	tuple = heap_getnext(scan, direction);
+	if (!HeapTupleIsValid(tuple))
+		return NULL;
+
+	/*
+	 * check whether the fetched tuple reached to the upper bound
+	 * if forward scan, or the lower bound if backward scan.
+	 */
+	if (direction == ForwardScanDirection)
+	{
+		if (ItemPointerCompare(&tuple->t_self,
+							   &ctps->ip_max) > ctps->ip_max_comp)
+			return NULL;
+	}
+	else if (direction == BackwardScanDirection)
+	{
+		if (ItemPointerCompare(&scan->rs_ctup.t_self,
+							   &ctps->ip_min) < ctps->ip_min_comp)
+			return NULL;
+	}
+	ExecStoreTuple(tuple, slot, scan->rs_cbuf, false);
+
+	return slot;
+}
+
+/*
+ * ExecCtidScan - A method of CustomPlanState; that fetches a tuple
+ * from the relation, if exist anymore.
+ * Most of the logic was copied from ExecScan() because CustomPlanState
+ * is not devlivered from ScanState, so we cannot apply this routine
+ * as is.
+ */
+static TupleTableSlot *
+ExecCtidScan(CustomPlanState *node)
+{
+	ExprContext *econtext;
+	List	   *qual;
+	ProjectionInfo *projInfo;
+	ExprDoneCond isDone;
+	TupleTableSlot *resultSlot;
+
+	/*
+	 * Fetch data from node
+	 */
+	qual = node->ps.qual;
+	projInfo = node->ps.ps_ProjInfo;
+	econtext = node->ps.ps_ExprContext;
+
+	/*
+	 * If we have neither a qual to check nor a projection to do, just skip
+	 * all the overhead and return the raw scan tuple.
+	 */
+	if (!qual && !projInfo)
+	{
+		ResetExprContext(econtext);
+		return CTidAccessCustomScan(node);
+	}
+
+	/*
+	 * Check to see if we're still projecting out tuples from a previous scan
+	 * tuple (because there is a function-returning-set in the projection
+	 * expressions).  If so, try to project another one.
+	 */
+	if (node->ps.ps_TupFromTlist)
+	{
+		Assert(projInfo);		/* can't get here if not projecting */
+		resultSlot = ExecProject(projInfo, &isDone);
+		if (isDone == ExprMultipleResult)
+			return resultSlot;
+		/* Done with that source tuple... */
+		node->ps.ps_TupFromTlist = false;
+	}
+
+	/*
+	 * Reset per-tuple memory context to free any expression evaluation
+	 * storage allocated in the previous tuple cycle.  Note this can't happen
+	 * until we're done projecting out tuples from a scan tuple.
+	 */
+	ResetExprContext(econtext);
+
+	/*
+	 * get a tuple from the access method.	Loop until we obtain a tuple that
+	 * passes the qualification.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *slot;
+
+		CHECK_FOR_INTERRUPTS();
+
+		slot = CTidAccessCustomScan(node);
+
+		/*
+		 * if the slot returned by the accessMtd contains NULL, then it means
+		 * there is nothing more to scan so we just return an empty slot,
+		 * being careful to use the projection result slot so it has correct
+		 * tupleDesc.
+		 */
+		if (TupIsNull(slot))
+		{
+			if (projInfo)
+				return ExecClearTuple(projInfo->pi_slot);
+			else
+				return slot;
+		}
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-nil qual here to avoid a function call to ExecQual()
+		 * when the qual is nil ... saves only a few cycles, but they add up
+		 * ...
+		 */
+		if (!qual || ExecQual(qual, econtext, false))
+		{
+			/*
+			 * Found a satisfactory scan tuple.
+			 */
+			if (projInfo)
+			{
+				/*
+				 * Form a projection tuple, store it in the result tuple slot
+				 * and return it --- unless we find we can project no tuples
+				 * from this scan tuple, in which case continue scan.
+				 */
+				resultSlot = ExecProject(projInfo, &isDone);
+				if (isDone != ExprEndResult)
+				{
+					node->ps.ps_TupFromTlist = (isDone == ExprMultipleResult);
+					return resultSlot;
+				}
+			}
+			else
+			{
+				/*
+				 * Here, we aren't projecting, so just return scan tuple.
+				 */
+				return slot;
+			}
+		}
+		else
+			InstrCountFiltered1(node, 1);
+
+		/*
+		 * Tuple fails qual, so free per-tuple memory and try again.
+		 */
+		ResetExprContext(econtext);
+	}
+}
+
+/*
+ * MultiExecCtidScan - A method of CustomPlanState; that can return
+ * an opaque datum into upper node, however, ctidscan don't support it.
+ */
+static Node *
+MultiExecCtidScan(CustomPlanState *node)
+{
+	elog(ERROR, "MultiExecProcNode is not supported");
+}
+
+/*
+ * CTidEndCustomScan - A method of CustomPlanState; that closes heap and
+ * scan descriptor, and release other related resources.
+ */
+static void
+EndCtidScan(CustomPlanState *node)
+{
+	CtidScanPlanState *ctps = (CtidScanPlanState *)node;
+
+	if (ctps->scan_slot)
+		ExecClearTuple(ctps->scan_slot);
+	if (ctps->scan_desc)
+		heap_endscan(ctps->scan_desc);
+	if (ctps->scan_rel)
+		ExecCloseScanRelation(ctps->scan_rel);
+}
+
+/*
+ * ReScanCtidScan - A method of CustomPlanState; that rewind the current
+ * seek position.
+ */
+static void
+ReScanCtidScan(CustomPlanState *node)
+{
+	CtidScanPlanState *ctps = (CtidScanPlanState *)node;
+	EState		   *estate = node->ps.state;
+
+	/* logic copied from ExecScanReScan */
+	if (estate->es_epqScanDone != NULL)
+	{
+		Index	scanrelid = ((CtidScanPlan *) node->ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+
+		estate->es_epqScanDone[scanrelid - 1] = false;
+	}
+	ctps->ip_needs_eval = true;
+}
+
+/*
+ * ExplanCtidScanTargetRel - A method of CustomPlanState; that output
+ * relation's name to be scanned.
+ */
+static void
+ExplanCtidScanTargetRel(CustomPlanState *node, ExplainState *es)
+{
+	CtidScanPlanState  *ctps = (CtidScanPlanState *) node;
+	CtidScanPlan	   *ctid_plan = (CtidScanPlan *) ctps->cps.ps.plan;
+	Index				rti = ctid_plan->scanrelid;
+	RangeTblEntry	   *rte;
+	char			   *objectname = NULL;
+	char			   *namespace = NULL;
+	char			   *refname;
+
+	/* logic copied from ExplainTargetRel */
+	rte = rt_fetch(rti, es->rtable);
+	refname = (char *) list_nth(es->rtable_names, rti - 1);
+	if (refname == NULL)
+		refname = rte->eref->aliasname;
+
+	Assert(rte->rtekind == RTE_RELATION);
+	objectname = get_rel_name(rte->relid);
+	if (es->verbose)
+		namespace = get_namespace_name(get_rel_namespace(rte->relid));
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		appendStringInfoString(es->str, " on");
+		if (namespace != NULL)
+			appendStringInfo(es->str, " %s.%s", quote_identifier(namespace),
+							 quote_identifier(objectname));
+		else if (objectname != NULL)
+			appendStringInfo(es->str, " %s", quote_identifier(objectname));
+		if (objectname == NULL || strcmp(refname, objectname) != 0)
+			appendStringInfo(es->str, " %s", quote_identifier(refname));
+	}
+	else
+	{
+		if (objectname != NULL)
+			ExplainPropertyText("Relation Name", objectname, es);
+		if (namespace != NULL)
+			ExplainPropertyText("Schema", namespace, es);
+		ExplainPropertyText("Alias", refname, es);
+	}
+}
+
+/*
+ * ExplainCtidScan - A method of CustomPlanState; that shows extra info
+ * on EXPLAIN command.
+ */
+static void
+ExplainCtidScan(CustomPlanState *node, List *ancestors, ExplainState *es)
+{
+	CtidScanPlanState  *ctps = (CtidScanPlanState *) node;
+	CtidScanPlan	   *ctid_plan = (CtidScanPlan *) ctps->cps.ps.plan;
+
+	/* logic copied from show_qual and show_expression */
+	if (ctid_plan->ctid_quals)
+	{
+		bool	useprefix = es->verbose;
+		Node   *qual;
+		List   *context;
+		char   *exprstr;
+
+		/* Convert AND list to explicit AND */
+		qual = (Node *) make_ands_explicit(ctid_plan->ctid_quals);
+
+		/* Set up deparsing context */
+		context = deparse_context_for_planstate((Node *)&node->ps,
+												ancestors,
+												es->rtable,
+												es->rtable_names);
+
+		/* Deparse the expression */
+		exprstr = deparse_expression(qual, context, useprefix, false);
+
+		/* And add to es->str */
+		ExplainPropertyText("ctid quals", exprstr, es);
+	}
+}
+
+/*
+ * ExplainCtidPreScanNode - A method of CustomPlanState; that informs
+ * the core backend relation's rtindex to be referenced, prior to the
+ * main EXPLAIN processing.
+ */
+static void
+ExplainCtidPreScanNode(CustomPlanState *node,
+					   Bitmapset **rels_used,
+					   void (*fn_ExplainPreScanNode)(
+						   PlanState *planstate,
+						   Bitmapset **rels_used))
+{
+	CtidScanPlanState  *ctps = (CtidScanPlanState *) node;
+	CtidScanPlan	   *ctid_plan = (CtidScanPlan *) ctps->cps.ps.plan;
+
+	*rels_used = bms_add_member(*rels_used, ctid_plan->scanrelid);
+}
+
+/*
+ * Entrypoint of this extension
+ */
+Datum
+CtidScanAddPath(PG_FUNCTION_ARGS)
+{
+	customScanArg  *cscan_arg = (customScanArg *)PG_GETARG_POINTER(0);
+	PlannerInfo	   *root;
+	RangeTblEntry  *rte;
+	RelOptInfo	   *baserel;
+	char			relkind;
+	ListCell	   *lc;
+	List		   *ctid_quals = NIL;
+
+	if (cscan_arg->custom_class != CUSTOM_PLAN_CLASS_SCAN)
+		PG_RETURN_VOID();
+
+	root = cscan_arg->root;
+	rte = cscan_arg->rte;
+	baserel = cscan_arg->baserel;
+
+	/* all we can support is regular relations */
+	if (rte->rtekind != RTE_RELATION)
+		PG_RETURN_VOID();
+
+	relkind = get_rel_relkind(rte->relid);
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_MATVIEW &&
+		relkind != RELKIND_TOASTVALUE)
+		PG_RETURN_VOID();
+
+	/* walk on the restrict info */
+	foreach (lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		List		 *temp;
+
+		if (!IsA(rinfo, RestrictInfo))
+			continue;		/* probably should never happen */
+		temp = CTidQualFromExpr((Node *) rinfo->clause, baserel->relid);
+		ctid_quals = list_concat(ctid_quals, temp);
+	}
+
+	/*
+	 * OK, it is case when a part of restriction clause makes sense to
+	 * reduce number of tuples, so we will add a custom scan path being
+	 * provided by this module.
+	 */
+	if (ctid_quals != NIL)
+	{
+		CtidScanPath *ctid_path;
+		Relids		required_outer;
+
+		/*
+		 * We don't support pushing join clauses into the quals of a ctidscan,
+		 * but it could still have required parameterization due to LATERAL
+		 * refs in its tlist.
+		 */
+		required_outer = baserel->lateral_relids;
+
+		ctid_path = palloc0(sizeof(CtidScanPath));
+		ctid_path->cpath.path.type = T_CustomPath;
+		ctid_path->cpath.path.pathtype = T_CustomPlan;
+		ctid_path->cpath.path.parent = baserel;
+		ctid_path->cpath.path.param_info
+			= get_baserel_parampathinfo(root, baserel, required_outer);
+		ctid_path->cpath.methods = &ctidscan_path_methods;
+		ctid_path->ctid_quals = ctid_quals;
+
+		CTidEstimateCosts(root, baserel, ctid_path);
+
+		add_path(baserel, &ctid_path->cpath.path);
+	}
+	PG_RETURN_VOID();
+}
+PG_FUNCTION_INFO_V1(CtidScanAddPath);
+
+/*
+ * Entrypoint of this extension
+ */
+void
+_PG_init(void)
+{
+	/* setup ctidscan_path_methods */
+	ctidscan_path_methods.CustomName = "ctidscan";
+	ctidscan_path_methods.CreateCustomPlan = CreateCtidScanPlan;
+	ctidscan_path_methods.TextOutCustomPath = TextOutCtidScanPath;
+
+	/* setup ctidscan_plan_methods */
+	ctidscan_plan_methods.CustomName = "ctidscan";
+	ctidscan_plan_methods.SetCustomPlanRef = SetCtidScanPlanRef;
+	ctidscan_plan_methods.SupportCustomBackwardScan = SupportCtidBackwardScan;
+	ctidscan_plan_methods.FinalizeCustomPlan = FinalizeCtidScanPlan;
+	ctidscan_plan_methods.CreateCustomPlanState = CreateCtidScanPlanState;
+	ctidscan_plan_methods.TextOutCustomPlan = TextOutCtidScanPlan;
+	ctidscan_plan_methods.CopyCustomPlan = CopyCtidScanPlan;
+
+	/* setup ctidscan_planstate_methods */
+	ctidscan_ps_methods.CustomName = "ctidscan";
+	ctidscan_ps_methods.BeginCustomPlan = BeginCtidScan;
+	ctidscan_ps_methods.ExecCustomPlan = ExecCtidScan;
+	ctidscan_ps_methods.MultiExecCustomPlan = MultiExecCtidScan;
+	ctidscan_ps_methods.EndCustomPlan = EndCtidScan;
+	ctidscan_ps_methods.ReScanCustomPlan = ReScanCtidScan;
+	ctidscan_ps_methods.MarkPosCustomPlan = NULL;
+	ctidscan_ps_methods.RestrPosCustomPlan = NULL;
+	ctidscan_ps_methods.ExplainCustomPlanTargetRel = ExplanCtidScanTargetRel;
+	ctidscan_ps_methods.ExplainCustomPlan = ExplainCtidScan;
+	ctidscan_ps_methods.ExplainCustomPreScanNode = ExplainCtidPreScanNode;
+	ctidscan_ps_methods.GetSpecialCustomVar = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *
+ * copied logic from the core PostgreSQL
+ *
+ * ----------------------------------------------------------------
+ */
+
+/*
+ * use_physical_tlist
+ *		Decide whether to use a tlist matching relation structure,
+ *		rather than only those Vars actually referenced.
+ */
+static bool
+use_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
+{
+	int			i;
+	ListCell   *lc;
+
+	/*
+	 * We can do this for real relation scans, subquery scans, function scans,
+	 * values scans, and CTE scans (but not for, eg, joins).
+	 */
+	if (rel->rtekind != RTE_RELATION &&
+		rel->rtekind != RTE_SUBQUERY &&
+		rel->rtekind != RTE_FUNCTION &&
+		rel->rtekind != RTE_VALUES &&
+		rel->rtekind != RTE_CTE)
+		return false;
+
+	/*
+	 * Can't do it with inheritance cases either (mainly because Append
+	 * doesn't project).
+	 */
+	if (rel->reloptkind != RELOPT_BASEREL)
+		return false;
+
+	/*
+	 * Can't do it if any system columns or whole-row Vars are requested.
+	 * (This could possibly be fixed but would take some fragile assumptions
+	 * in setrefs.c, I think.)
+	 */
+	for (i = rel->min_attr; i <= 0; i++)
+	{
+		if (!bms_is_empty(rel->attr_needed[i - rel->min_attr]))
+			return false;
+	}
+
+	/*
+	 * Can't do it if the rel is required to emit any placeholder expressions,
+	 * either.
+	 */
+	foreach(lc, root->placeholder_list)
+	{
+		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+		if (bms_nonempty_difference(phinfo->ph_needed, rel->relids) &&
+			bms_is_subset(phinfo->ph_eval_at, rel->relids))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * order_qual_clauses
+ *		Given a list of qual clauses that will all be evaluated at the same
+ *		plan node, sort the list into the order we want to check the quals
+ *		in at runtime.
+ *
+ * Ideally the order should be driven by a combination of execution cost and
+ * selectivity, but it's not immediately clear how to account for both,
+ * and given the uncertainty of the estimates the reliability of the decisions
+ * would be doubtful anyway.  So we just order by estimated per-tuple cost,
+ * being careful not to change the order when (as is often the case) the
+ * estimates are identical.
+ *
+ * Although this will work on either bare clauses or RestrictInfos, it's
+ * much faster to apply it to RestrictInfos, since it can re-use cost
+ * information that is cached in RestrictInfos.
+ *
+ * Note: some callers pass lists that contain entries that will later be
+ * removed; this is the easiest way to let this routine see RestrictInfos
+ * instead of bare clauses.  It's OK because we only sort by cost, but
+ * a cost/selectivity combination would likely do the wrong thing.
+ */
+static List *
+order_qual_clauses(PlannerInfo *root, List *clauses)
+{
+	typedef struct
+	{
+		Node	   *clause;
+		Cost		cost;
+	} QualItem;
+	int			nitems = list_length(clauses);
+	QualItem   *items;
+	ListCell   *lc;
+	int			i;
+	List	   *result;
+
+	/* No need to work hard for 0 or 1 clause */
+	if (nitems <= 1)
+		return clauses;
+
+	/*
+	 * Collect the items and costs into an array.  This is to avoid repeated
+	 * cost_qual_eval work if the inputs aren't RestrictInfos.
+	 */
+	items = (QualItem *) palloc(nitems * sizeof(QualItem));
+	i = 0;
+	foreach(lc, clauses)
+	{
+		Node	   *clause = (Node *) lfirst(lc);
+		QualCost	qcost;
+
+		cost_qual_eval_node(&qcost, clause, root);
+		items[i].clause = clause;
+		items[i].cost = qcost.per_tuple;
+		i++;
+	}
+
+	/*
+	 * Sort.  We don't use qsort() because it's not guaranteed stable for
+	 * equal keys.	The expected number of entries is small enough that a
+	 * simple insertion sort should be good enough.
+	 */
+	for (i = 1; i < nitems; i++)
+	{
+		QualItem	newitem = items[i];
+		int			j;
+
+		/* insert newitem into the already-sorted subarray */
+		for (j = i; j > 0; j--)
+		{
+			if (newitem.cost >= items[j - 1].cost)
+				break;
+			items[j] = items[j - 1];
+		}
+		items[j] = newitem;
+	}
+
+	/* Convert back to a list */
+	result = NIL;
+	for (i = 0; i < nitems; i++)
+		result = lappend(result, items[i].clause);
+
+	return result;
+}
+
+/*
+ * replace_nestloop_params
+ *	  Replace outer-relation Vars and PlaceHolderVars in the given expression
+ *	  with nestloop Params
+ *
+ * All Vars and PlaceHolderVars belonging to the relation(s) identified by
+ * root->curOuterRels are replaced by Params, and entries are added to
+ * root->curOuterParams if not already present.
+ */
+static Node *
+replace_nestloop_params_mutator(Node *node, PlannerInfo *root)
+{
+	if (node == NULL)
+		return NULL;
+	if (IsA(node, Var))
+	{
+		Var		   *var = (Var *) node;
+		Param	   *param;
+		NestLoopParam *nlp;
+		ListCell   *lc;
+
+		/* Upper-level Vars should be long gone at this point */
+		Assert(var->varlevelsup == 0);
+		/* If not to be replaced, we can just return the Var unmodified */
+		if (!bms_is_member(var->varno, root->curOuterRels))
+			return node;
+		/* Create a Param representing the Var */
+		param = assign_nestloop_param_var(root, var);
+		/* Is this param already listed in root->curOuterParams? */
+		foreach(lc, root->curOuterParams)
+		{
+			nlp = (NestLoopParam *) lfirst(lc);
+			if (nlp->paramno == param->paramid)
+			{
+				Assert(equal(var, nlp->paramval));
+				/* Present, so we can just return the Param */
+				return (Node *) param;
+			}
+		}
+		/* No, so add it */
+		nlp = makeNode(NestLoopParam);
+		nlp->paramno = param->paramid;
+		nlp->paramval = var;
+		root->curOuterParams = lappend(root->curOuterParams, nlp);
+		/* And return the replacement Param */
+		return (Node *) param;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+		Param	   *param;
+		NestLoopParam *nlp;
+		ListCell   *lc;
+
+		/* Upper-level PlaceHolderVars should be long gone at this point */
+		Assert(phv->phlevelsup == 0);
+
+		/*
+		 * Check whether we need to replace the PHV.  We use bms_overlap as a
+		 * cheap/quick test to see if the PHV might be evaluated in the outer
+		 * rels, and then grab its PlaceHolderInfo to tell for sure.
+		 */
+		if (!bms_overlap(phv->phrels, root->curOuterRels) ||
+		  !bms_is_subset(find_placeholder_info(root, phv, false)->ph_eval_at,
+						 root->curOuterRels))
+		{
+			/*
+			 * We can't replace the whole PHV, but we might still need to
+			 * replace Vars or PHVs within its expression, in case it ends up
+			 * actually getting evaluated here.  (It might get evaluated in
+			 * this plan node, or some child node; in the latter case we don't
+			 * really need to process the expression here, but we haven't got
+			 * enough info to tell if that's the case.)  Flat-copy the PHV
+			 * node and then recurse on its expression.
+			 *
+			 * Note that after doing this, we might have different
+			 * representations of the contents of the same PHV in different
+			 * parts of the plan tree.	This is OK because equal() will just
+			 * match on phid/phlevelsup, so setrefs.c will still recognize an
+			 * upper-level reference to a lower-level copy of the same PHV.
+			 */
+			PlaceHolderVar *newphv = makeNode(PlaceHolderVar);
+
+			memcpy(newphv, phv, sizeof(PlaceHolderVar));
+			newphv->phexpr = (Expr *)
+				replace_nestloop_params_mutator((Node *) phv->phexpr,
+												root);
+			return (Node *) newphv;
+		}
+		/* Create a Param representing the PlaceHolderVar */
+		param = assign_nestloop_param_placeholdervar(root, phv);
+		/* Is this param already listed in root->curOuterParams? */
+		foreach(lc, root->curOuterParams)
+		{
+			nlp = (NestLoopParam *) lfirst(lc);
+			if (nlp->paramno == param->paramid)
+			{
+				Assert(equal(phv, nlp->paramval));
+				/* Present, so we can just return the Param */
+				return (Node *) param;
+			}
+		}
+		/* No, so add it */
+		nlp = makeNode(NestLoopParam);
+		nlp->paramno = param->paramid;
+		nlp->paramval = (Var *) phv;
+		root->curOuterParams = lappend(root->curOuterParams, nlp);
+		/* And return the replacement Param */
+		return (Node *) param;
+	}
+	return expression_tree_mutator(node,
+								   replace_nestloop_params_mutator,
+								   (void *) root);
+}
+
+static Node *
+replace_nestloop_params(PlannerInfo *root, Node *expr)
+{
+	/* No setup needed for tree walk, so away we go */
+	return replace_nestloop_params_mutator(expr, root);
+}
+
+/*
+ * fix_scan_expr
+ *		Do set_plan_references processing on a scan-level expression
+ *
+ * This consists of incrementing all Vars' varnos by rtoffset,
+ * looking up operator opcode info for OpExpr and related nodes,
+ * and adding OIDs from regclass Const nodes into root->glob->relationOids.
+ */
+typedef struct
+{
+	PlannerInfo *root;
+	int			rtoffset;
+	void	  (*fn_fix_expr_common)(PlannerInfo *root, Node *node);
+} fix_scan_expr_context;
+
+static Node *
+fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context)
+{
+	if (node == NULL)
+		return NULL;
+	if (IsA(node, Var))
+	{
+		Var		   *var = copyObject(node);
+
+		Assert(var->varlevelsup == 0);
+
+		/*
+		 * We should not see any Vars marked INNER_VAR or OUTER_VAR.  But an
+		 * indexqual expression could contain INDEX_VAR Vars.
+		 */
+		Assert(var->varno != INNER_VAR);
+		Assert(var->varno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(var->varno))
+			var->varno += context->rtoffset;
+		if (var->varnoold > 0)
+			var->varnoold += context->rtoffset;
+		return (Node *) var;
+	}
+	if (IsA(node, CurrentOfExpr))
+	{
+		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
+
+		Assert(cexpr->cvarno != INNER_VAR);
+		Assert(cexpr->cvarno != OUTER_VAR);
+		if (!IS_SPECIAL_VARNO(cexpr->cvarno))
+			cexpr->cvarno += context->rtoffset;
+		return (Node *) cexpr;
+	}
+	if (IsA(node, PlaceHolderVar))
+	{
+		/* At scan level, we should always just evaluate the contained expr */
+		PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+		return fix_scan_expr_mutator((Node *) phv->phexpr, context);
+	}
+	context->fn_fix_expr_common(context->root, node);
+	return expression_tree_mutator(node, fix_scan_expr_mutator,
+								   (void *) context);
+}
+
+static bool
+fix_scan_expr_walker(Node *node, fix_scan_expr_context *context)
+{
+	if (node == NULL)
+		return false;
+	Assert(!IsA(node, PlaceHolderVar));
+	context->fn_fix_expr_common(context->root, node);
+	return expression_tree_walker(node, fix_scan_expr_walker,
+								  (void *) context);
+}
+
+static List *fix_scan_list(PlannerInfo *root, List *expr_list, int rtoffset,
+						   void (*fn_fix_expr_common)(
+							   PlannerInfo *root,
+							   Node *node))
+{
+	fix_scan_expr_context context;
+
+	context.root = root;
+	context.rtoffset = rtoffset;
+	context.fn_fix_expr_common = fn_fix_expr_common;
+
+	if (rtoffset != 0 || root->glob->lastPHId != 0)
+	{
+		return (List *)fix_scan_expr_mutator((Node *)expr_list, &context);
+	}
+	else
+	{
+		/*
+		 * If rtoffset == 0, we don't need to change any Vars, and if there
+		 * are no placeholders anywhere we won't need to remove them.  Then
+		 * it's OK to just scribble on the input node tree instead of copying
+		 * (since the only change, filling in any unset opfuncid fields, is
+		 * harmless).  This saves just enough cycles to be noticeable on
+		 * trivial queries.
+		 */
+		(void) fix_scan_expr_walker((Node *)expr_list, &context);
+		return expr_list;
+	}
+}
+
+static bool
+tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno,
+					  TupleDesc tupdesc)
+{
+	int			numattrs = tupdesc->natts;
+	int			attrno;
+	bool		hasoid;
+	ListCell   *tlist_item = list_head(tlist);
+
+	/* Check the tlist attributes */
+	for (attrno = 1; attrno <= numattrs; attrno++)
+	{
+		Form_pg_attribute att_tup = tupdesc->attrs[attrno - 1];
+		Var		   *var;
+
+		if (tlist_item == NULL)
+			return false;		/* tlist too short */
+		var = (Var *) ((TargetEntry *) lfirst(tlist_item))->expr;
+		if (!var || !IsA(var, Var))
+			return false;		/* tlist item not a Var */
+		/* if these Asserts fail, planner messed up */
+		Assert(var->varno == varno);
+		Assert(var->varlevelsup == 0);
+		if (var->varattno != attrno)
+			return false;		/* out of order */
+		if (att_tup->attisdropped)
+			return false;		/* table contains dropped columns */
+
+		/*
+		 * Note: usually the Var's type should match the tupdesc exactly, but
+		 * in situations involving unions of columns that have different
+		 * typmods, the Var may have come from above the union and hence have
+		 * typmod -1.  This is a legitimate situation since the Var still
+		 * describes the column, just not as exactly as the tupdesc does. We
+		 * could change the planner to prevent it, but it'd then insert
+		 * projection steps just to convert from specific typmod to typmod -1,
+		 * which is pretty silly.
+		 */
+		if (var->vartype != att_tup->atttypid ||
+			(var->vartypmod != att_tup->atttypmod &&
+			 var->vartypmod != -1))
+			return false;		/* type mismatch */
+
+		tlist_item = lnext(tlist_item);
+	}
+
+	if (tlist_item)
+		return false;			/* tlist too long */
+
+	/*
+	 * If the plan context requires a particular hasoid setting, then that has
+	 * to match, too.
+	 */
+	if (ExecContextForcesOids(ps, &hasoid) &&
+		hasoid != tupdesc->tdhasoid)
+		return false;
+
+	return true;
+}
+
+/*
+ * Build a target list (ie, a list of TargetEntry) for the Path's output.
+ */
+static List *
+build_path_tlist(PlannerInfo *root, Path *path)
+{
+	RelOptInfo *rel = path->parent;
+	List	   *tlist = NIL;
+	int			resno = 1;
+	ListCell   *v;
+
+	foreach(v, rel->reltargetlist)
+	{
+		/* Do we really need to copy here?	Not sure */
+		Node	   *node = (Node *) copyObject(lfirst(v));
+
+		/*
+		 * If it's a parameterized path, there might be lateral references in
+		 * the tlist, which need to be replaced with Params.  There's no need
+		 * to remake the TargetEntry nodes, so apply this to each list item
+		 * separately.
+		 */
+		if (path->param_info)
+			node = replace_nestloop_params(root, node);
+
+		tlist = lappend(tlist, makeTargetEntry((Expr *) node,
+											   resno,
+											   NULL,
+											   false));
+		resno++;
+	}
+	return tlist;
+}
diff --git a/contrib/ctidscan/ctidscan.control b/contrib/ctidscan/ctidscan.control
new file mode 100644
index 0000000..123635e
--- /dev/null
+++ b/contrib/ctidscan/ctidscan.control
@@ -0,0 +1,5 @@
+# ctidscan extension
+comment = 'ctidscan for custom-plan demonstration'
+default_version = '1.0'
+module_pathname = '$libdir/ctidscan'
+relocatable = true
diff --git a/contrib/ctidscan/expected/ctidscan.out b/contrib/ctidscan/expected/ctidscan.out
new file mode 100644
index 0000000..ae9c628
--- /dev/null
+++ b/contrib/ctidscan/expected/ctidscan.out
@@ -0,0 +1,294 @@
+--
+-- Regression Tests for Custom Plan APIs
+--
+CREATE EXTENSION ctidscan;
+-- construction of test data
+SET client_min_messages TO 'warning';
+CREATE SCHEMA regtest_custom_scan;
+SET search_path TO regtest_custom_scan, public;
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Custom (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+   ctid quals: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(3 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+           QUERY PLAN           
+--------------------------------
+ Index Scan using t1_pkey on t1
+   Index Cond: (a = 40)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+           QUERY PLAN           
+--------------------------------
+ Seq Scan on t1
+   Filter: (b ~~ '%789%'::text)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+             QUERY PLAN             
+------------------------------------
+ Tid Scan on t1
+   TID Cond: (ctid = '(2,10)'::tid)
+(2 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Custom (ctidscan) on t1
+   Filter: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+   ctid quals: ((ctid >= '(2,115)'::tid) AND (ctid <= '(3,10)'::tid))
+(3 rows)
+
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Merge Join
+   Merge Cond: (t1.ctid = t2.ctid)
+   ->  Sort
+         Sort Key: t1.ctid
+         ->  Custom (ctidscan) on t1
+               Filter: (ctid < '(2,10)'::tid)
+               ctid quals: (ctid < '(2,10)'::tid)
+   ->  Sort
+         Sort Key: t2.ctid
+         ->  Custom (ctidscan) on t2
+               Filter: (ctid > '(1,75)'::tid)
+               ctid quals: (ctid > '(1,75)'::tid)
+(12 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (0,1)   |   1 | c4ca4238a0b923820dcc509a6f75849b
+ (0,2)   |   2 | c81e728d9d4c2f636f067f89cc14862c
+ (0,3)   |   3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
+ (0,4)   |   4 | a87ff679a2f3e71d9181a67b7542122c
+ (0,5)   |   5 | e4da3b7fbbce2345d7772b0674a318d5
+ (0,6)   |   6 | 1679091c5a880faf6fb5e6087eb1b2dc
+ (0,7)   |   7 | 8f14e45fceea167a5a36dedd4bea2543
+ (0,8)   |   8 | c9f0f895fb98ab9159f51fd0297e236d
+ (0,9)   |   9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
+ (0,10)  |  10 | d3d9446802a44259755d38e6d163e820
+ (0,11)  |  11 | 6512bd43d9caa6e02c990b0a82652dca
+ (0,12)  |  12 | c20ad4d76fe97759aa27a0c99bff6710
+ (0,13)  |  13 | c51ce410c124a10e0db5e4b97fc2af39
+ (0,14)  |  14 | aab3238922bcc25a6f606eb525ffdc56
+ (0,15)  |  15 | 9bf31c7ff062936a96d3c8bd1f8f2ff3
+ (0,16)  |  16 | c74d97b01eae257e44aa9d5bade97baf
+ (0,17)  |  17 | 70efdf2ec9b086079795c442636b55fb
+ (0,18)  |  18 | 6f4922f45568161a8cdf4ad2299f6d23
+ (0,19)  |  19 | 1f0e3dad99908345f7439f8ffabdffc4
+ (0,20)  |  20 | 98f13708210194c475687be6106a3b84
+ (0,21)  |  21 | 3c59dc048e8850243be8079a5c74d079
+ (0,22)  |  22 | b6d767d2f8ed5d21a44b0e5886680cb9
+ (0,23)  |  23 | 37693cfc748049e45d87b8c7d8b9aacd
+ (0,24)  |  24 | 1ff1de774005f8da13f42943881c655f
+ (0,25)  |  25 | 8e296a067a37563370ded05f5a3bf3ec
+ (0,26)  |  26 | 4e732ced3463d06de0ca9a15b6153677
+ (0,27)  |  27 | 02e74f10e0327ad868d138f2b4fdd6f0
+ (0,28)  |  28 | 33e75ff09dd601bbe69f351039152189
+ (0,29)  |  29 | 6ea9ab1baa0efb9e19094440c317e21b
+ (0,30)  |  30 | 34173cb38f07f89ddbebc2ac9128303f
+ (0,31)  |  31 | c16a5320fa475530d9583c34fd356ef5
+ (0,32)  |  32 | 6364d3f0f495b6ab9dcf8d3b5c6e0b01
+ (0,33)  |  33 | 182be0c5cdcd5072bb1864cdee4d3d6e
+ (0,34)  |  34 | e369853df766fa44e1ed0ff613f563bd
+ (0,35)  |  35 | 1c383cd30b7c298ab50293adfecb7b18
+ (0,36)  |  36 | 19ca14e7ea6328a42e0eb13d585e4c22
+ (0,37)  |  37 | a5bfc9e07964f8dddeb95fc584cd965d
+ (0,38)  |  38 | a5771bce93e200c36f7cd9dfd0e5deaa
+ (0,39)  |  39 | d67d8ab4f4c10bf22aa353e27879133c
+ (0,40)  |  40 | d645920e395fedad7bbbed0eca3fe2e0
+ (0,41)  |  41 | 3416a75f4cea9109507cacd8e2f2aefc
+ (0,42)  |  42 | a1d0c6e83f027327d8461063f4ac58a6
+ (0,43)  |  43 | 17e62166fc8586dfa4d1bc0e1742c08b
+ (0,44)  |  44 | f7177163c833dff4b38fc8d2872f1ec6
+ (0,45)  |  45 | 6c8349cc7260ae62e3b1396831a8398f
+ (0,46)  |  46 | d9d4f495e875a2e075a1a4a6e1b9770f
+ (0,47)  |  47 | 67c6a1e7ce56d3d6fa748ab6d9af3fd7
+ (0,48)  |  48 | 642e92efb79421734881b53e1e1b18b6
+ (0,49)  |  49 | f457c545a9ded88f18ecee47145a72c0
+ (0,50)  |  50 | c0c7c76d30bd3dcaefc96f40275bdc0a
+ (0,51)  |  51 | 2838023a778dfaecdc212708f721b788
+ (0,52)  |  52 | 9a1158154dfa42caddbd0694a4e9bdc8
+ (0,53)  |  53 | d82c8d1619ad8176d665453cfb2e55f0
+ (0,54)  |  54 | a684eceee76fc522773286a895bc8436
+ (0,55)  |  55 | b53b3a3d6ab90ce0268229151c9bde11
+ (0,56)  |  56 | 9f61408e3afb633e50cdf1b20de6f466
+ (0,57)  |  57 | 72b32a1f754ba1c09b3695e0cb6cde7f
+ (0,58)  |  58 | 66f041e16a60928b05a7e228a89c3799
+ (0,59)  |  59 | 093f65e080a295f8076b1c5722a46aa2
+ (0,60)  |  60 | 072b030ba126b2f4b2374f342be9ed44
+ (0,61)  |  61 | 7f39f8317fbdb1988ef4c628eba02591
+ (0,62)  |  62 | 44f683a84163b3523afe57c2e008bc8c
+ (0,63)  |  63 | 03afdbd66e7929b125f8597834fa83a4
+ (0,64)  |  64 | ea5d2f1c4608232e07d3aa3d998e5135
+ (0,65)  |  65 | fc490ca45c00b1249bbe3554a4fdf6fb
+ (0,66)  |  66 | 3295c76acbf4caaed33c36b1b5fc2cb1
+ (0,67)  |  67 | 735b90b4568125ed6c3f678819b6e058
+ (0,68)  |  68 | a3f390d88e4c41f2747bfa2f1b5f87db
+ (0,69)  |  69 | 14bfa6bb14875e45bba028a21ed38046
+ (0,70)  |  70 | 7cbbc409ec990f19c78c75bd1e06f215
+ (0,71)  |  71 | e2c420d928d4bf8ce0ff2ec19b371514
+ (0,72)  |  72 | 32bb90e8976aab5298d5da10fe66f21d
+ (0,73)  |  73 | d2ddea18f00665ce8623e36bd4e3c7c5
+ (0,74)  |  74 | ad61ab143223efbc24c7d2583be69251
+ (0,75)  |  75 | d09bf41544a3365a46c9077ebb5e35c3
+ (0,76)  |  76 | fbd7939d674997cdb4692d34de8633c4
+ (0,77)  |  77 | 28dd2c7955ce926456240b2ff0100bde
+ (0,78)  |  78 | 35f4a8d465e6e1edc05f3d8ab658c551
+ (0,79)  |  79 | d1fe173d08e959397adf34b1d77e88d7
+ (0,80)  |  80 | f033ab37c30201f73f142449d037028d
+ (0,81)  |  81 | 43ec517d68b6edd3015b3edc9a11367b
+ (0,82)  |  82 | 9778d5d219c5080b9a6a17bef029331c
+ (0,83)  |  83 | fe9fc289c3ff0af142b6d3bead98a923
+ (0,84)  |  84 | 68d30a9594728bc39aa24be94b319d21
+ (0,85)  |  85 | 3ef815416f775098fe977004015c6193
+ (0,86)  |  86 | 93db85ed909c13838ff95ccfa94cebd9
+ (0,87)  |  87 | c7e1249ffc03eb9ded908c236bd1996d
+ (0,88)  |  88 | 2a38a4a9316c49e5a833517c45d31070
+ (0,89)  |  89 | 7647966b7343c29048673252e490f736
+ (0,90)  |  90 | 8613985ec49eb8f757ae6439e879bb2a
+ (0,91)  |  91 | 54229abfcfa5649e7003b83dd4755294
+ (0,92)  |  92 | 92cc227532d17e56e07902b254dfad10
+ (0,93)  |  93 | 98dce83da57b0395e163467c9dae521b
+ (0,94)  |  94 | f4b9ec30ad9f68f89b29639786cb62ef
+ (0,95)  |  95 | 812b4ba287f5ee0bc9d43bbf5bbe87fb
+ (0,96)  |  96 | 26657d5ff9020d2abefe558796b99584
+ (0,97)  |  97 | e2ef524fbf3d9fe611d5a8e90fefdc9c
+ (0,98)  |  98 | ed3d2c21991e3bef5e069713af9fa6ca
+ (0,99)  |  99 | ac627ab1ccbdb62ec96e702f07f6425b
+ (0,100) | 100 | f899139df5e1059396431415e770c6dd
+ (0,101) | 101 | 38b3eff8baf56627478ec76a704e9b52
+ (0,102) | 102 | ec8956637a99787bd197eacd77acce5e
+ (0,103) | 103 | 6974ce5ac660610b44d9b9fed0ff9548
+ (0,104) | 104 | c9e1074f5b3f9fc8ea15d152add07294
+ (0,105) | 105 | 65b9eea6e1cc6bb9f0cd2a47751a186f
+ (0,106) | 106 | f0935e4cd5920aa6c7c996a5ee53a70f
+ (0,107) | 107 | a97da629b098b75c294dffdc3e463904
+ (0,108) | 108 | a3c65c2974270fd093ee8a9bf8ae7d0b
+ (0,109) | 109 | 2723d092b63885e0d7c260cc007e8b9d
+ (0,110) | 110 | 5f93f983524def3dca464469d2cf9f3e
+ (0,111) | 111 | 698d51a19d8a121ce581499d7b701668
+ (0,112) | 112 | 7f6ffaa6bb0b408017b62254211691b5
+ (0,113) | 113 | 73278a4a86960eeb576a8fd4c9ec6997
+ (0,114) | 114 | 5fd0b37cd7dbbb00f97ba6ce92bf5add
+ (0,115) | 115 | 2b44928ae11fb9384c4cf38708677c48
+ (0,116) | 116 | c45147dee729311ef5b5c3003946c48f
+ (0,117) | 117 | eb160de1de89d9058fcb0b968dbbbd68
+ (0,118) | 118 | 5ef059938ba799aaa845e1c2e8a762bd
+ (0,119) | 119 | 07e1cd7dca89a1678042477183b7ac3f
+ (0,120) | 120 | da4fb5c6e93e74d3df8527599fa62642
+ (1,1)   | 121 | 4c56ff4ce4aaf9573aa5dff913df997a
+ (1,2)   | 122 | a0a080f42e6f13b3a2df133f073095dd
+ (1,3)   | 123 | 202cb962ac59075b964b07152d234b70
+ (1,4)   | 124 | c8ffe9a587b126f152ed3d89a146b445
+ (1,5)   | 125 | 3def184ad8f4755ff269862ea77393dd
+ (1,6)   | 126 | 069059b7ef840f0c74a814ec9237b6ec
+ (1,7)   | 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2
+ (1,8)   | 128 | 76dc611d6ebaafc66cc0879c71b5db5c
+ (1,9)   | 129 | d1f491a404d6854880943e5c3cd9ca25
+ (1,10)  | 130 | 9b8619251a19057cff70779273e95aa6
+ (1,11)  | 131 | 1afa34a7f984eeabdbb0a7d494132ee5
+ (1,12)  | 132 | 65ded5353c5ee48d0b7d48c591b8f430
+ (1,13)  | 133 | 9fc3d7152ba9336a670e36d0ed79bc43
+ (1,14)  | 134 | 02522a2b2726fb0a03bb19f2d8d9524d
+ (1,15)  | 135 | 7f1de29e6da19d22b51c68001e7e0e54
+ (1,16)  | 136 | 42a0e188f5033bc65bf8d78622277c4e
+ (1,17)  | 137 | 3988c7f88ebcb58c6ce932b957b6f332
+ (1,18)  | 138 | 013d407166ec4fa56eb1e1f8cbe183b9
+ (1,19)  | 139 | e00da03b685a0dd18fb6a08af0923de0
+(139 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+ ctid | a | b 
+------+---+---
+(0 rows)
+
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+  ctid   |  a  |                b                 
+---------+-----+----------------------------------
+ (2,115) | 355 | 82cec96096d4281b7c95cd7e74623496
+ (2,116) | 356 | 6c524f9d5d7027454a783c841250ba71
+ (2,117) | 357 | fb7b9ffa5462084c5f4e7e85a093e6d7
+ (2,118) | 358 | aa942ab2bfa6ebda4840e7360ce6e7ef
+ (2,119) | 359 | c058f544c737782deacefa532d9add4c
+ (2,120) | 360 | e7b24b112a44fdd9ee93bdf998c6ca0e
+ (3,1)   | 361 | 52720e003547c70561bf5e03b95aa99f
+ (3,2)   | 362 | c3e878e27f52e2a57ace4d9a76fd9acf
+ (3,3)   | 363 | 00411460f7c92d2124a67ea0f4cb5f85
+ (3,4)   | 364 | bac9162b47c56fc8a4d2a519803d51b3
+ (3,5)   | 365 | 9be40cee5b0eee1462c82c6964087ff9
+ (3,6)   | 366 | 5ef698cd9fe650923ea331c15af3b160
+ (3,7)   | 367 | 05049e90fa4f5039a8cadc6acbb4b2cc
+ (3,8)   | 368 | cf004fdc76fa1a4f25f62e0eb5261ca3
+ (3,9)   | 369 | 0c74b7f78409a4022a2c4c5a5ca3ee19
+ (3,10)  | 370 | d709f38ef758b5066ef31b18039b8ce5
+(16 rows)
+
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+  ctid  |  a  |                b                 |  x  |                                y                                 
+--------+-----+----------------------------------+-----+------------------------------------------------------------------
+ (1,76) | 196 | 084b6fbb10729ed4da8c3d3f5a3ae7c9 | 157 | 6c4b761a28b734fe93831e3fb400ce876c4b761a28b734fe93831e3fb400ce87
+ (1,77) | 197 | 85d8ce590ad8981ca2c8286f79f59954 | 158 | 06409663226af2f3114485aa4e0a23b406409663226af2f3114485aa4e0a23b4
+ (1,78) | 198 | 0e65972dce68dad4d52d063967f0a705 | 159 | 140f6969d5213fd0ece03148e62e461e140f6969d5213fd0ece03148e62e461e
+ (1,79) | 199 | 84d9ee44e457ddef7f2c4f25dc8fa865 | 160 | b73ce398c39f506af761d2277d853a92b73ce398c39f506af761d2277d853a92
+ (1,80) | 200 | 3644a684f98ea8fe223c713b77189a77 | 161 | bd4c9ab730f5513206b999ec0d90d1fbbd4c9ab730f5513206b999ec0d90d1fb
+ (1,81) | 201 | 757b505cfd34c64c85ca5b5690ee5293 | 162 | 82aa4b0af34c2313a562076992e50aa382aa4b0af34c2313a562076992e50aa3
+ (2,1)  | 241 | f340f1b1f65b6df5b5e3f94d95b11daf | 163 | 0777d5c17d4066b82ab86dff8a46af6f0777d5c17d4066b82ab86dff8a46af6f
+ (2,2)  | 242 | e4a6222cdb5b34375400904f03d8e6a5 | 164 | fa7cdfad1a5aaf8370ebeda47a1ff1c3fa7cdfad1a5aaf8370ebeda47a1ff1c3
+ (2,3)  | 243 | cb70ab375662576bd1ac5aaf16b3fca4 | 165 | 9766527f2b5d3e95d4a733fcfb77bd7e9766527f2b5d3e95d4a733fcfb77bd7e
+ (2,4)  | 244 | 9188905e74c28e489b44e954ec0b9bca | 166 | 7e7757b1e12abcb736ab9a754ffb617a7e7757b1e12abcb736ab9a754ffb617a
+ (2,5)  | 245 | 0266e33d3f546cb5436a10798e657d97 | 167 | 5878a7ab84fb43402106c575658472fa5878a7ab84fb43402106c575658472fa
+ (2,6)  | 246 | 38db3aed920cf82ab059bfccbd02be6a | 168 | 006f52e9102a8d3be2fe5614f42ba989006f52e9102a8d3be2fe5614f42ba989
+ (2,7)  | 247 | 3cec07e9ba5f5bb252d13f5f431e4bbb | 169 | 3636638817772e42b59d74cff571fbb33636638817772e42b59d74cff571fbb3
+ (2,8)  | 248 | 621bf66ddb7c962aa0d22ac97d69b793 | 170 | 149e9677a5989fd342ae44213df68868149e9677a5989fd342ae44213df68868
+ (2,9)  | 249 | 077e29b11be80ab57e1a2ecabb7da330 | 171 | a4a042cf4fd6bfb47701cbc8a1653adaa4a042cf4fd6bfb47701cbc8a1653ada
+(15 rows)
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
+NOTICE:  drop cascades to 2 other objects
+DETAIL:  drop cascades to table t1
+drop cascades to table t2
diff --git a/contrib/ctidscan/sql/ctidscan.sql b/contrib/ctidscan/sql/ctidscan.sql
new file mode 100644
index 0000000..9759065
--- /dev/null
+++ b/contrib/ctidscan/sql/ctidscan.sql
@@ -0,0 +1,50 @@
+--
+-- Regression Tests for Custom Plan APIs
+--
+
+CREATE EXTENSION ctidscan;
+
+-- construction of test data
+SET client_min_messages TO 'warning';
+
+CREATE SCHEMA regtest_custom_scan;
+
+SET search_path TO regtest_custom_scan, public;
+
+CREATE TABLE t1 (
+    a   int primary key,
+    b   text
+);
+INSERT INTO t1 (SELECT s, md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t1;
+
+CREATE TABLE t2 (
+    x   int primary key,
+    y   text
+);
+INSERT INTO t2 (SELECT s, md5(s::text)||md5(s::text) FROM generate_series(1,400) s);
+VACUUM ANALYZE t2;
+
+RESET client_min_messages;
+--
+-- Check Plans if no special extension is loaded.
+--
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+
+EXPLAIN (costs off) SELECT * FROM t1 WHERE a = 40;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE b like '%789%';
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid = '(2,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+EXPLAIN (costs off) SELECT * FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+SELECT ctid,* FROM t1 WHERE ctid < '(1,20)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid > '(4,0)'::tid;
+SELECT ctid,* FROM t1 WHERE ctid BETWEEN '(2,115)'::tid AND '(3,10)'::tid;
+SELECT t1.ctid,* FROM t1 JOIN t2 ON t1.ctid = t2.ctid WHERE t1.ctid < '(2,10)'::tid AND t2.ctid > '(1,75)'::tid;
+
+
+-- Test cleanup
+DROP SCHEMA regtest_custom_scan CASCADE;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a974bd5..83b02e9 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -39,7 +39,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_ts_config.h pg_ts_config_map.h pg_ts_dict.h \
 	pg_ts_parser.h pg_ts_template.h pg_extension.h \
 	pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
-	pg_foreign_table.h \
+	pg_foreign_table.h pg_custom_plan.h \
 	pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
 	toasting.h indexing.h \
     )
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index e511669..5dc318e 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_conversion.h"
 #include "catalog/pg_conversion_fn.h"
+#include "catalog/pg_custom_plan.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_default_acl.h"
 #include "catalog/pg_depend.h"
@@ -54,6 +55,7 @@
 #include "catalog/pg_type.h"
 #include "catalog/pg_user_mapping.h"
 #include "commands/comment.h"
+#include "commands/custom_plan.h"
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
 #include "commands/extension.h"
@@ -154,7 +156,8 @@ static const Oid object_classes[MAX_OCLASS] = {
 	UserMappingRelationId,		/* OCLASS_USER_MAPPING */
 	DefaultAclRelationId,		/* OCLASS_DEFACL */
 	ExtensionRelationId,		/* OCLASS_EXTENSION */
-	EventTriggerRelationId		/* OCLASS_EVENT_TRIGGER */
+	EventTriggerRelationId,		/* OCLASS_EVENT_TRIGGER */
+	CustomPlanRelationId,		/* OCLASS_CUSTOM_PLAN */
 };
 
 
@@ -1249,6 +1252,10 @@ doDeletion(const ObjectAddress *object, int flags)
 			RemoveEventTriggerById(object->objectId);
 			break;
 
+		case OCLASS_CUSTOM_PLAN:
+			RemoveCustomPlanById(object->objectId);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized object class: %u",
 				 object->classId);
@@ -2316,6 +2323,9 @@ getObjectClass(const ObjectAddress *object)
 
 		case EventTriggerRelationId:
 			return OCLASS_EVENT_TRIGGER;
+
+		case CustomPlanRelationId:
+			return OCLASS_CUSTOM_PLAN;
 	}
 
 	/* shouldn't get here */
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 2b837a9..219fb5f 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_conversion.h"
+#include "catalog/pg_custom_plan.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_extension.h"
 #include "catalog/pg_foreign_data_wrapper.h"
@@ -51,6 +52,7 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "catalog/pg_user_mapping.h"
+#include "commands/custom_plan.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
@@ -152,6 +154,18 @@ static const ObjectPropertyType ObjectProperty[] =
 		true
 	},
 	{
+		CustomPlanRelationId,
+		CustomPlanOidIndexId,
+		CUSTOMPLANOID,
+		CUSTOMPLANNAME,
+		Anum_pg_custom_plan_custname,
+		InvalidAttrNumber,
+		InvalidAttrNumber,
+		InvalidAttrNumber,
+		-1,
+		false,
+	},
+	{
 		DatabaseRelationId,
 		DatabaseOidIndexId,
 		DATABASEOID,
@@ -529,6 +543,7 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 			case OBJECT_FDW:
 			case OBJECT_FOREIGN_SERVER:
 			case OBJECT_EVENT_TRIGGER:
+			case OBJECT_CUSTOM_PLAN:
 				address = get_object_address_unqualified(objtype,
 														 objname, missing_ok);
 				break;
@@ -755,6 +770,9 @@ get_object_address_unqualified(ObjectType objtype,
 			case OBJECT_EVENT_TRIGGER:
 				msg = gettext_noop("event trigger name cannot be qualified");
 				break;
+			case OBJECT_CUSTOM_PLAN:
+				msg = gettext_noop("custom plan name cannot be qualified");
+				break;
 			default:
 				elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 				msg = NULL;		/* placate compiler */
@@ -815,6 +833,11 @@ get_object_address_unqualified(ObjectType objtype,
 			address.objectId = get_event_trigger_oid(name, missing_ok);
 			address.objectSubId = 0;
 			break;
+		case OBJECT_CUSTOM_PLAN:
+			address.classId = CustomPlanRelationId;
+			address.objectId = get_custom_plan_oid(name, missing_ok);
+			address.objectSubId = 0;
+			break;
 		default:
 			elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 			/* placate compiler, which doesn't know elog won't return */
@@ -1295,6 +1318,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 			break;
 		case OBJECT_TSPARSER:
 		case OBJECT_TSTEMPLATE:
+		case OBJECT_CUSTOM_PLAN:
 			/* We treat these object types as being owned by superusers */
 			if (!superuser_arg(roleid))
 				ereport(ERROR,
@@ -2166,6 +2190,21 @@ getObjectDescription(const ObjectAddress *object)
 				break;
 			}
 
+		case OCLASS_CUSTOM_PLAN:
+			{
+				HeapTuple	tup;
+
+				tup = SearchSysCache1(CUSTOMPLANOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for custom-plan %u",
+						 object->objectId);
+				appendStringInfo(&buffer, _("custom plan %s"),
+				 NameStr(((Form_pg_custom_plan) GETSTRUCT(tup))->custname));
+				ReleaseSysCache(tup);
+				break;
+			}
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
@@ -2577,6 +2616,10 @@ getObjectTypeDescription(const ObjectAddress *object)
 			appendStringInfoString(&buffer, "event trigger");
 			break;
 
+		case OCLASS_CUSTOM_PLAN:
+			appendStringInfoString(&buffer, "custom plan");
+			break;
+
 		default:
 			appendStringInfo(&buffer, "unrecognized %u", object->classId);
 			break;
@@ -3330,6 +3373,23 @@ getObjectIdentity(const ObjectAddress *object)
 				break;
 			}
 
+		case OCLASS_CUSTOM_PLAN:
+			{
+				HeapTuple	tup;
+				Form_pg_custom_plan	custForm;
+
+				tup = SearchSysCache1(CUSTOMPLANOID,
+									  ObjectIdGetDatum(object->objectId));
+				if (!HeapTupleIsValid(tup))
+					elog(ERROR, "cache lookup failed for custom-plan %u",
+						 object->objectId);
+				custForm = (Form_pg_custom_plan) GETSTRUCT(tup);
+				appendStringInfoString(&buffer,
+								quote_identifier(NameStr(custForm->custname)));
+				ReleaseSysCache(tup);
+				break;
+			}
+
 		default:
 			appendStringInfo(&buffer, "unrecognized object %u %u %d",
 							 object->classId,
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 22f116b..1e8e6f4 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -14,7 +14,7 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o  \
 	collationcmds.o constraint.o conversioncmds.o copy.o createas.o \
-	dbcommands.o define.o discard.o dropcmds.o \
+	custom_plan.o dbcommands.o define.o discard.o dropcmds.o \
 	event_trigger.o explain.o extension.o foreigncmds.o functioncmds.o \
 	indexcmds.o lockcmds.o matview.o operatorcmds.o opclasscmds.o \
 	portalcmds.o prepare.o proclang.o \
diff --git a/src/backend/commands/custom_plan.c b/src/backend/commands/custom_plan.c
new file mode 100644
index 0000000..a96a0f8
--- /dev/null
+++ b/src/backend/commands/custom_plan.c
@@ -0,0 +1,335 @@
+/*-------------------------------------------------------------------------
+ *
+ * custom_plan.c
+ *		custom plan nodes creation/manipulation commands
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *    src/backend/commands/custom_plan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
+#include "catalog/objectaccess.h"
+#include "catalog/pg_custom_plan.h"
+#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
+#include "commands/custom_plan.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "parser/parse_func.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+/* static variables */
+static List *custom_scan_callchain = NIL;
+static List *custom_join_callchain = NIL;
+static bool	callchain_is_ready = false;
+static MemoryContext custom_memcxt = NULL;
+
+static void
+invalidateCustomPlanCallchain(Datum arg, int cacheid, uint32 hashvalue)
+{
+	MemoryContextReset(custom_memcxt);
+	callchain_is_ready = false;
+	custom_scan_callchain = NIL;
+	custom_join_callchain = NIL;
+}
+
+static void
+setupCustomPlanCallchain(void)
+{
+	Relation		rel;
+	SysScanDesc		scan;
+	HeapTuple		tuple;
+	MemoryContext	oldcxt;
+
+	custom_scan_callchain = NIL;
+	custom_join_callchain = NIL;
+
+	rel = heap_open(CustomPlanRelationId, AccessShareLock);
+
+	/* full scan on the pg_custom_plan once */
+	scan = systable_beginscan(rel, InvalidOid, false, NULL, 0, NULL);
+
+	oldcxt = MemoryContextSwitchTo(custom_memcxt);
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Form_pg_custom_plan	custForm
+			= (Form_pg_custom_plan) GETSTRUCT(tuple);
+
+		if ((custForm->custclass & CUSTOM_PLAN_CLASS_SCAN) != 0)
+		{
+			custom_scan_callchain = lappend_oid(custom_scan_callchain,
+												custForm->custhandler);
+		}
+		if ((custForm->custclass & CUSTOM_PLAN_CLASS_JOIN) != 0)
+		{
+			custom_join_callchain = lappend_oid(custom_join_callchain,
+												custForm->custhandler);
+		}
+	}
+	MemoryContextSwitchTo(oldcxt);
+	systable_endscan(scan);
+
+	heap_close(rel, AccessShareLock);
+
+	callchain_is_ready = true;
+}
+
+static void
+initCustomPlanCallchain(void)
+{
+	/* memory context to keep callchain for custom-plans */
+	custom_memcxt = AllocSetContextCreate(CacheMemoryContext,
+										  "custom plan memory context",
+										  ALLOCSET_DEFAULT_MINSIZE,
+										  ALLOCSET_DEFAULT_INITSIZE,
+										  ALLOCSET_DEFAULT_MAXSIZE);
+
+	/* flush cached callchain on catalog updates */
+	CacheRegisterSyscacheCallback(CUSTOMPLANOID,
+								  invalidateCustomPlanCallchain,
+								  (Datum) 0);
+	/* also, initial setting up */
+	setupCustomPlanCallchain();
+}
+
+/*
+ * CallCustomScanProviders
+ *
+ * A callchain on relation scan. custom-plan provider can add alternative
+ * scan paths delivered from CustomPath class.
+ */
+void
+CallCustomScanProviders(PlannerInfo *root,
+						RelOptInfo *baserel,
+						RangeTblEntry *rte)
+{
+	customScanArg	sarg;
+	ListCell	   *cell;
+
+	if (!custom_memcxt)
+		initCustomPlanCallchain();
+	else if (!callchain_is_ready)
+		setupCustomPlanCallchain();
+
+	Assert(callchain_is_ready);
+	sarg.custom_class = CUSTOM_PLAN_CLASS_SCAN;
+	sarg.root = root;
+	sarg.baserel = baserel;
+	sarg.rte = rte;
+
+	foreach (cell, custom_scan_callchain)
+	{
+		(void) OidFunctionCall1(lfirst_oid(cell),
+								PointerGetDatum(&sarg));
+	}
+}
+
+/*
+ * CallCustomJoinProviders
+ *
+ * A callchain on relation scan. custom-plan provider can add alternative
+ * join paths delivered from CustomPath class.
+ */
+void
+CallCustomJoinProviders(PlannerInfo *root,
+						RelOptInfo *joinrel,
+						RelOptInfo *outerrel,
+						RelOptInfo *innerrel,
+						JoinType jointype,
+						SpecialJoinInfo *sjinfo,
+						List *restrictlist,
+						Relids param_source_rels,
+						Relids extra_lateral_rels)
+{
+	customJoinArg	jarg;
+	ListCell	   *cell;
+
+	if (!custom_memcxt)
+		initCustomPlanCallchain();
+	else if (!callchain_is_ready)
+		setupCustomPlanCallchain();
+
+	Assert(callchain_is_ready);
+	jarg.custom_class = CUSTOM_PLAN_CLASS_JOIN;
+	jarg.root = root;
+	jarg.joinrel = joinrel;
+	jarg.outerrel = outerrel;
+	jarg.innerrel = innerrel;
+	jarg.jointype = jointype;
+	jarg.sjinfo = sjinfo;
+	jarg.restrictlist = restrictlist;
+	jarg.param_source_rels = param_source_rels;
+	jarg.extra_lateral_rels = extra_lateral_rels;
+
+	foreach (cell, custom_join_callchain)
+	{
+		(void) OidFunctionCall1(lfirst_oid(cell),
+								PointerGetDatum(&jarg));
+	}
+}
+
+/*
+ * utility function to lookup a custom-plan provider by name
+ */
+Oid
+get_custom_plan_oid(const char *custom_name, bool missing_ok)
+{
+	Oid		cust_oid;
+
+	cust_oid = GetSysCacheOid1(CUSTOMPLANNAME, CStringGetDatum(custom_name));
+	if (!OidIsValid(cust_oid) && !missing_ok)
+		ereport(ERROR,
+				(errcode(ERRCODE_UNDEFINED_OBJECT),
+				 errmsg("custom-plan provider \"%s\" does not exist",
+						custom_name)));
+	return cust_oid;
+}
+
+/*
+ * Drop a custom-plan provider
+ */
+void
+RemoveCustomPlanById(Oid cust_oid)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+
+	rel = heap_open(CustomPlanRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(CUSTOMPLANOID, ObjectIdGetDatum(cust_oid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for custom-plan provider %u",
+			 cust_oid);
+
+	simple_heap_delete(rel, &tuple->t_self);
+
+	ReleaseSysCache(tuple);
+
+	heap_close(rel, RowExclusiveLock);
+}
+
+/*
+ * Create a custom-plan provider
+ */
+Oid
+DefineCustomPlan(CreateCustomPlanStmt *stmt)
+{
+	Relation	rel;
+	Oid			cust_oid;
+	Oid			cust_handler = InvalidOid;
+	Datum		values[Natts_pg_custom_plan];
+	bool		isnull[Natts_pg_custom_plan];
+	HeapTuple	tuple;
+	ListCell   *cell;
+	ObjectAddress myself;
+	ObjectAddress referenced;
+
+	rel = heap_open(CustomPlanRelationId, RowExclusiveLock);
+
+	/* must be super user */
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+			 errmsg("permission denied to create custom-plan provider \"%s\"",
+					stmt->custom_name),
+			 errhint("Must be superuser to create a custom-plan node.")));
+
+	/* check namespace conflicts */
+	cust_oid = get_custom_plan_oid(stmt->custom_name, true);
+	if (OidIsValid(cust_oid))
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("custom-plan provider \"%s\" already exists",
+						stmt->custom_name)));
+
+	/* validate custom-plan classes */
+	if (stmt->custom_class != CUSTOM_PLAN_CLASS_SCAN &&
+		stmt->custom_class != CUSTOM_PLAN_CLASS_JOIN &&
+		stmt->custom_class != CUSTOM_PLAN_CLASS_ANY)
+		elog(ERROR, "unexpected custom-plan class: %08x", stmt->custom_class);
+
+	/* parse custom-plan options */
+	foreach (cell, stmt->custom_options)
+	{
+		DefElem	   *defel = lfirst(cell);
+
+		Assert(IsA(defel, DefElem));
+
+		if (strcmp(defel->defname, "handler") == 0)
+		{
+			Oid		argtypes[1];
+
+			if (OidIsValid(cust_handler))
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+
+			argtypes[0] = INTERNALOID;
+			cust_handler = LookupFuncName((List *)defel->arg,
+										  1, argtypes, false);
+			if (get_func_rettype(cust_handler) != VOIDOID)
+				ereport(ERROR,
+						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+						 errmsg("function %s must return type \"void\"",
+								NameListToString((List *) defel->arg))));
+		}
+		else
+			elog(ERROR, "unexpected custom-plan option: %s",
+				 defel->defname);
+	}
+
+	if (!OidIsValid(cust_handler))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("HANDLER must be provided")));
+
+	/*
+	 * Insert tuple into pg_custom_plan system catalog
+	 */
+	memset(values, 0, sizeof(values));
+	memset(isnull, 0, sizeof(isnull));
+	values[Anum_pg_custom_plan_custname - 1]
+		= DirectFunctionCall1(namein, CStringGetDatum(stmt->custom_name));
+	values[Anum_pg_custom_plan_custclass - 1]
+		= stmt->custom_class;
+	values[Anum_pg_custom_plan_custhandler - 1]
+		= ObjectIdGetDatum(cust_handler);
+
+	tuple = heap_form_tuple(RelationGetDescr(rel), values, isnull);
+
+	cust_oid = simple_heap_insert(rel, tuple);
+	CatalogUpdateIndexes(rel, tuple);
+
+	heap_freetuple(tuple);
+
+	/* record dependencies */
+	myself.classId = CustomPlanRelationId;
+	myself.objectId = cust_oid;
+	myself.objectSubId = 0;
+
+	referenced.classId = ProcedureRelationId;
+	referenced.objectId = cust_handler;
+	referenced.objectSubId = 0;
+	recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+
+	/* Post creation hook for new custom-plan provider */
+	InvokeObjectPostCreateHook(CustomPlanRelationId, cust_oid, 0);
+
+	heap_close(rel, RowExclusiveLock);
+
+	return cust_oid;
+}
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index e64ad80..104ff17 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -408,6 +408,11 @@ does_not_exist_skipping(ObjectType objtype, List *objname, List *objargs)
 				args = strVal(linitial(objargs));
 			}
 			break;
+		case OBJECT_CUSTOM_PLAN:
+			msg = gettext_noop("custom-plan \"%s\" does not exist, skipping");
+			name = NameListToString(objname);
+			break;
+
 		default:
 			elog(ERROR, "unexpected object type (%d)", (int) objtype);
 			break;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1104cc3..bd1f433 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -721,6 +721,15 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 			*rels_used = bms_add_member(*rels_used,
 										((Scan *) plan)->scanrelid);
 			break;
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			{
+				CustomPlanState *cps = (CustomPlanState *)planstate;
+
+				cps->methods->ExplainCustomPreScanNode(cps, rels_used,
+													   ExplainPreScanNode);
+			}
+			break;
 		case T_ModifyTable:
 			/* cf ExplainModifyTarget */
 			*rels_used = bms_add_member(*rels_used,
@@ -847,6 +856,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *sname;			/* node type name for non-text output */
 	const char *strategy = NULL;
 	const char *operation = NULL;
+	const char *custom_name = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
@@ -935,6 +945,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ForeignScan:
 			pname = sname = "Foreign Scan";
 			break;
+		case T_CustomPlan:
+			sname = "Custom";
+			custom_name = ((CustomPlan *) plan)->methods->CustomName;
+			if (custom_name)
+				pname = psprintf("Custom (%s)", custom_name);
+			else
+				pname = sname;
+			break;
 		case T_Material:
 			pname = sname = "Materialize";
 			break;
@@ -1036,6 +1054,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainPropertyText("Parent Relationship", relationship, es);
 		if (plan_name)
 			ExplainPropertyText("Subplan Name", plan_name, es);
+		if (custom_name)
+			ExplainPropertyText("Custom", custom_name, es);
 	}
 
 	switch (nodeTag(plan))
@@ -1083,6 +1103,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					ExplainPropertyText("Index Name", indexname, es);
 			}
 			break;
+		case T_CustomPlan:
+			{
+				CustomPlanState *cps = (CustomPlanState *)planstate;
+
+				cps->methods->ExplainCustomPlanTargetRel(cps, es);
+			}
+			break;
 		case T_ModifyTable:
 			ExplainModifyTarget((ModifyTable *) plan, es);
 			break;
@@ -1347,6 +1374,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_foreignscan_info((ForeignScanState *) planstate, es);
 			break;
+		case T_CustomPlan:
+			{
+				CustomPlanState *cps = (CustomPlanState *) planstate;
+
+				show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+				if (plan->qual)
+					show_instrumentation_count("Rows Removed by Filter", 1,
+											   planstate, es);
+				cps->methods->ExplainCustomPlan(cps, ancestors, es);
+			}
+			break;
 		case T_NestLoop:
 			show_upper_qual(((NestLoop *) plan)->join.joinqual,
 							"Join Filter", planstate, ancestors, es);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..4dece5a 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,7 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       nodeBitmapAnd.o nodeBitmapOr.o nodeCustom.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 8c01a63..4c0b5fb 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -197,6 +198,10 @@ ExecReScan(PlanState *node)
 			ExecReScanForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecReScanCustomPlan((CustomPlanState *) node);
+			break;
+
 		case T_NestLoopState:
 			ExecReScanNestLoop((NestLoopState *) node);
 			break;
@@ -291,6 +296,10 @@ ExecMarkPos(PlanState *node)
 			ExecValuesMarkPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecCustomMarkPos((CustomPlanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialMarkPos((MaterialState *) node);
 			break;
@@ -348,6 +357,10 @@ ExecRestrPos(PlanState *node)
 			ExecValuesRestrPos((ValuesScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecCustomRestrPos((CustomPlanState *) node);
+			break;
+
 		case T_MaterialState:
 			ExecMaterialRestrPos((MaterialState *) node);
 			break;
@@ -390,6 +403,7 @@ ExecSupportsMarkRestore(NodeTag plantype)
 		case T_ValuesScan:
 		case T_Material:
 		case T_Sort:
+		case T_CustomPlanMarkPos:
 			return true;
 
 		case T_Result:
@@ -465,6 +479,16 @@ ExecSupportsBackwardScan(Plan *node)
 			return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan) &&
 				TargetListSupportsBackwardScan(node->targetlist);
 
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			{
+				CustomPlan *cplan = (CustomPlan *) node;
+
+				if (cplan->methods->SupportCustomBackwardScan)
+					return cplan->methods->SupportCustomBackwardScan(cplan);
+			}
+			return false;
+
 		case T_Material:
 		case T_Sort:
 			/* these don't evaluate tlist */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c5ecd18..23a245f 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
 #include "executor/nodeCtescan.h"
+#include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
 #include "executor/nodeGroup.h"
@@ -244,6 +245,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 													   estate, eflags);
 			break;
 
+		case T_CustomPlan:
+		case T_CustomPlanMarkPos:
+			result = (PlanState *) ExecInitCustomPlan((CustomPlan *) node,
+													  estate, eflags);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -442,6 +449,10 @@ ExecProcNode(PlanState *node)
 			result = ExecForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			result = ExecCustomPlan((CustomPlanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
@@ -558,6 +569,10 @@ MultiExecProcNode(PlanState *node)
 			result = MultiExecBitmapOr((BitmapOrState *) node);
 			break;
 
+		case T_CustomPlanState:
+			result = MultiExecCustomPlan((CustomPlanState *) node);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
 			result = NULL;
@@ -678,6 +693,10 @@ ExecEndNode(PlanState *node)
 			ExecEndForeignScan((ForeignScanState *) node);
 			break;
 
+		case T_CustomPlanState:
+			ExecEndCustomPlan((CustomPlanState *) node);
+			break;
+
 			/*
 			 * join nodes
 			 */
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
new file mode 100644
index 0000000..1f4356b
--- /dev/null
+++ b/src/backend/executor/nodeCustom.c
@@ -0,0 +1,100 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.c
+ *		Routines to handle execution of custom plan node
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/executor.h"
+#include "executor/nodeCustom.h"
+#include "nodes/execnodes.h"
+#include "nodes/plannodes.h"
+#include "parser/parsetree.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+CustomPlanState *
+ExecInitCustomPlan(CustomPlan *cplan, EState *estate, int eflags)
+{
+	CustomPlanState    *cps = cplan->methods->CreateCustomPlanState(cplan);
+
+	/* fill up fields of PlanState */
+	cps->ps.plan = &cplan->plan;
+	cps->ps.state = estate;
+
+	/* create expression context for node */
+	ExecAssignExprContext(estate, &cps->ps);
+	cps->ps.ps_TupFromTlist = false;
+
+	/* initialize child expressions */
+	cps->ps.targetlist = (List *)
+		ExecInitExpr((Expr *) cplan->plan.targetlist,
+					 (PlanState *) cps);
+	cps->ps.qual = (List *)
+		ExecInitExpr((Expr *) cplan->plan.qual,
+					 (PlanState *) cps);
+
+	/* initialization of result tuple slot  */
+	ExecInitResultTupleSlot(estate, &cps->ps);
+	ExecAssignResultTypeFromTL(&cps->ps);
+	cps->ps.ps_ProjInfo = NULL;
+
+	/* then, all the remaining initialization on extension side */
+	cps->methods->BeginCustomPlan(cps, estate, eflags);
+
+	return cps;
+}
+
+TupleTableSlot *
+ExecCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->ExecCustomPlan != NULL);
+	return cpstate->methods->ExecCustomPlan(cpstate);
+}
+
+Node *
+MultiExecCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->MultiExecCustomPlan != NULL);
+	return cpstate->methods->MultiExecCustomPlan(cpstate);
+}
+
+void
+ExecEndCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->EndCustomPlan != NULL);
+	cpstate->methods->EndCustomPlan(cpstate);
+
+	/* Free the exprcontext */
+	ExecFreeExprContext(&cpstate->ps);
+
+	/* Clean out the tuple table */
+	ExecClearTuple(cpstate->ps.ps_ResultTupleSlot);
+}
+
+void
+ExecReScanCustomPlan(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->ReScanCustomPlan != NULL);
+	cpstate->methods->ReScanCustomPlan(cpstate);
+}
+
+void
+ExecCustomMarkPos(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->MarkPosCustomPlan != NULL);
+	cpstate->methods->MarkPosCustomPlan(cpstate);
+}
+
+void
+ExecCustomRestrPos(CustomPlanState *cpstate)
+{
+	Assert(cpstate->methods->RestrPosCustomPlan != NULL);
+	cpstate->methods->RestrPosCustomPlan(cpstate);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 98ad910..4de93c3 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -598,6 +598,32 @@ _copyForeignScan(const ForeignScan *from)
 }
 
 /*
+ * _copyCustomPlan
+ */
+static CustomPlan *
+_copyCustomPlan(const CustomPlan *from)
+{
+	CustomPlan *newnode =
+		from->methods->CopyCustomPlan(from, CopyPlanFields);
+
+	Assert(IsA(newnode, CustomPlan));
+	return newnode;
+}
+
+/*
+ * _copyCustomPlanMarkPos
+ */
+static CustomPlanMarkPos *
+_copyCustomPlanMarkPos(const CustomPlanMarkPos *from)
+{
+	CustomPlanMarkPos *newnode =
+		from->methods->CopyCustomPlan(from, CopyPlanFields);
+
+	Assert(IsA(newnode, CustomPlanMarkPos));
+	return newnode;
+}
+
+/*
  * CopyJoinFields
  *
  *		This function copies the fields of the Join node.  It is used by
@@ -3821,6 +3847,18 @@ _copyAlterTSConfigurationStmt(const AlterTSConfigurationStmt *from)
 	return newnode;
 }
 
+static CreateCustomPlanStmt *
+_copyCreateCustomPlanStmt(const CreateCustomPlanStmt *from)
+{
+	CreateCustomPlanStmt *newnode = makeNode(CreateCustomPlanStmt);
+
+	COPY_STRING_FIELD(custom_name);
+	COPY_SCALAR_FIELD(custom_class);
+	COPY_NODE_FIELD(custom_options);
+
+	return newnode;
+}
+
 /* ****************************************************************
  *					pg_list.h copy functions
  * ****************************************************************
@@ -3984,6 +4022,12 @@ copyObject(const void *from)
 		case T_ForeignScan:
 			retval = _copyForeignScan(from);
 			break;
+		case T_CustomPlan:
+			retval = _copyCustomPlan(from);
+			break;
+		case T_CustomPlanMarkPos:
+			retval = _copyCustomPlanMarkPos(from);
+			break;
 		case T_Join:
 			retval = _copyJoin(from);
 			break;
@@ -4530,6 +4574,9 @@ copyObject(const void *from)
 		case T_AlterTSConfigurationStmt:
 			retval = _copyAlterTSConfigurationStmt(from);
 			break;
+		case T_CreateCustomPlanStmt:
+			retval = _copyCreateCustomPlanStmt(from);
+			break;
 
 		case T_A_Expr:
 			retval = _copyAExpr(from);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9901d23..1fdaa64 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1994,6 +1994,17 @@ _equalAlterTSConfigurationStmt(const AlterTSConfigurationStmt *a,
 }
 
 static bool
+_equalCreateCustomPlanStmt(const CreateCustomPlanStmt *a,
+						   const CreateCustomPlanStmt *b)
+{
+	COMPARE_STRING_FIELD(custom_name);
+	COMPARE_SCALAR_FIELD(custom_class);
+	COMPARE_NODE_FIELD(custom_options);
+
+	return true;
+}
+
+static bool
 _equalAExpr(const A_Expr *a, const A_Expr *b)
 {
 	COMPARE_SCALAR_FIELD(kind);
@@ -2998,6 +3009,9 @@ equal(const void *a, const void *b)
 		case T_AlterTSConfigurationStmt:
 			retval = _equalAlterTSConfigurationStmt(a, b);
 			break;
+		case T_CreateCustomPlanStmt:
+			retval = _equalCreateCustomPlanStmt(a, b);
+			break;
 
 		case T_A_Expr:
 			retval = _equalAExpr(a, b);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 10e8139..602e82a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -564,6 +564,26 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 }
 
 static void
+_outCustomPlan(StringInfo str, const CustomPlan *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLAN");
+	_outPlanInfo(str, (const Plan *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPlan(str, node);
+}
+
+static void
+_outCustomPlanMarkPos(StringInfo str, const CustomPlanMarkPos *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPLANMARKPOS");
+	_outPlanInfo(str, (const Plan *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPlan(str, node);
+}
+
+static void
 _outJoin(StringInfo str, const Join *node)
 {
 	WRITE_NODE_TYPE("JOIN");
@@ -1581,6 +1601,16 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 }
 
 static void
+_outCustomPath(StringInfo str, const CustomPath *node)
+{
+	WRITE_NODE_TYPE("CUSTOMPATH");
+	_outPathInfo(str, (const Path *) node);
+	appendStringInfo(str, " :methods");
+	_outToken(str, node->methods->CustomName);
+	node->methods->TextOutCustomPath(str, (Node *)node);
+}
+
+static void
 _outAppendPath(StringInfo str, const AppendPath *node)
 {
 	WRITE_NODE_TYPE("APPENDPATH");
@@ -2829,6 +2859,12 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignScan:
 				_outForeignScan(str, obj);
 				break;
+			case T_CustomPlan:
+				_outCustomPlan(str, obj);
+				break;
+			case T_CustomPlanMarkPos:
+				_outCustomPlanMarkPos(str, obj);
+				break;
 			case T_Join:
 				_outJoin(str, obj);
 				break;
@@ -3037,6 +3073,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_ForeignPath:
 				_outForeignPath(str, obj);
 				break;
+			case T_CustomPath:
+				_outCustomPath(str, obj);
+				break;
 			case T_AppendPath:
 				_outAppendPath(str, obj);
 				break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5777cb2..53ff647 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -19,6 +19,7 @@
 
 #include "catalog/pg_class.h"
 #include "catalog/pg_operator.h"
+#include "commands/custom_plan.h"
 #include "foreign/fdwapi.h"
 #include "nodes/nodeFuncs.h"
 #ifdef OPTIMIZER_DEBUG
@@ -323,7 +324,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				}
 				break;
 			case RTE_SUBQUERY:
-				/* Subquery --- fully handled during set_rel_size */
+				/* Subquery --- path was added during set_rel_size */
 				break;
 			case RTE_FUNCTION:
 				/* RangeFunction */
@@ -334,12 +335,17 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				set_values_pathlist(root, rel, rte);
 				break;
 			case RTE_CTE:
-				/* CTE reference --- fully handled during set_rel_size */
+				/* CTE reference --- path was added during set_rel_size */
 				break;
 			default:
 				elog(ERROR, "unexpected rtekind: %d", (int) rel->rtekind);
 				break;
 		}
+		/* Also, consider paths by custom-plan providers */
+		CallCustomScanProviders(root, rel, rte);
+
+		/* Select cheapest path */
+		set_cheapest(rel);
 	}
 
 #ifdef OPTIMIZER_DEBUG
@@ -388,9 +394,6 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
-
-	/* Now find the cheapest of the paths for this rel */
-	set_cheapest(rel);
 }
 
 /*
@@ -416,9 +419,6 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 {
 	/* Call the FDW's GetForeignPaths function to generate path(s) */
 	rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
-
-	/* Select cheapest path */
-	set_cheapest(rel);
 }
 
 /*
@@ -1247,9 +1247,6 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Generate appropriate path */
 	add_path(rel, create_subqueryscan_path(root, rel, pathkeys, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1318,9 +1315,6 @@ set_function_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	/* Generate appropriate path */
 	add_path(rel, create_functionscan_path(root, rel,
 										   pathkeys, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1341,9 +1335,6 @@ set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1410,9 +1401,6 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_ctescan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
@@ -1463,9 +1451,6 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Generate appropriate path */
 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
-
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index a996116..4620fe6 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -259,6 +259,13 @@ add_paths_to_joinrel(PlannerInfo *root,
 							 restrictlist, jointype,
 							 sjinfo, &semifactors,
 							 param_source_rels, extra_lateral_rels);
+
+	/*
+	 * 5. Also consider paths being provided by custom-plan providers
+	 */
+	CallCustomJoinProviders(root, joinrel, outerrel, innerrel,
+							jointype, sjinfo, restrictlist,
+							param_source_rels, extra_lateral_rels);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 784805f..259d4bd 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -77,6 +77,8 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa
 						  List *tlist, List *scan_clauses);
 static ForeignScan *create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 						List *tlist, List *scan_clauses);
+static CustomPlan *create_custom_plan(PlannerInfo *root,
+									  CustomPath *best_path);
 static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
 					 Plan *outer_plan, Plan *inner_plan);
 static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
@@ -261,6 +263,10 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 			plan = create_unique_plan(root,
 									  (UniquePath *) best_path);
 			break;
+		case T_CustomPlan:
+			plan = (Plan *) create_custom_plan(root,
+											   (CustomPath *) best_path);
+			break;
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -1072,6 +1078,30 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
 	return plan;
 }
 
+/*
+ * create_custom_plan
+ *	Returns a custom-scan plan for the base relation scanned by 'best_path'
+ *	with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ */
+static CustomPlan *
+create_custom_plan(PlannerInfo *root, CustomPath *best_path)
+{
+	CustomPlan	   *cplan;
+
+	/*
+	 * Create a custom-plan object delivered from CustomPlan type,
+	 * according to the supplied CustomPath
+	 */
+	Assert(best_path->methods->CreateCustomPlan != NULL);
+	cplan = best_path->methods->CreateCustomPlan(root, best_path,
+												 create_plan_recurse);
+	Assert(IsA(cplan, CustomPlan) || IsA(cplan, CustomPlanMarkPos));
+
+	/* Copy cost data from Path to Plan; no need to make callback do this */
+	copy_path_costsize(&cplan->plan, &best_path->path);
+
+	return cplan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 46affe7..5d1ad54 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -94,6 +94,7 @@ static Plan *set_subqueryscan_references(PlannerInfo *root,
 							SubqueryScan *plan,
 							int rtoffset);
 static bool trivial_subqueryscan(SubqueryScan *plan);
+static void fix_expr_common(PlannerInfo *root, Node *node);
 static Node *fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset);
 static Node *fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context);
 static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
@@ -575,7 +576,18 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					fix_scan_list(root, splan->fdw_exprs, rtoffset);
 			}
 			break;
-
+		case T_CustomPlan:
+			{
+				CustomPlan *cplan = (CustomPlan *) plan;
+
+				Assert(cplan->methods->SetCustomPlanRef != NULL);
+				cplan->methods->SetCustomPlanRef(root,
+												 cplan,
+												 rtoffset,
+												 set_plan_refs,
+												 fix_expr_common);
+			}
+			break;
 		case T_NestLoop:
 		case T_MergeJoin:
 		case T_HashJoin:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index a3f3583..2a1c119 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -46,12 +46,6 @@ typedef struct process_sublinks_context
 	bool		isTopQual;
 } process_sublinks_context;
 
-typedef struct finalize_primnode_context
-{
-	PlannerInfo *root;
-	Bitmapset  *paramids;		/* Non-local PARAM_EXEC paramids found */
-} finalize_primnode_context;
-
 
 static Node *build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
 			  List *plan_params,
@@ -2236,6 +2230,21 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 			context.paramids = bms_add_members(context.paramids, scan_params);
 			break;
 
+		case T_CustomPlan:
+			{
+				CustomPlan *cplan = (CustomPlan *) plan;
+
+				context.paramids
+					= cplan->methods->FinalizeCustomPlan(root,
+														 cplan,
+														 context.paramids,
+														 valid_params,
+														 scan_params,
+														 finalize_plan,
+														 finalize_primnode);
+			}
+			break;
+
 		case T_ModifyTable:
 			{
 				ModifyTable *mtplan = (ModifyTable *) plan;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7b9895d..f45111c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -51,6 +51,7 @@
 
 #include "catalog/index.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_custom_plan.h"
 #include "catalog/pg_trigger.h"
 #include "commands/defrem.h"
 #include "commands/trigger.h"
@@ -248,6 +249,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 		DropOwnedStmt ReassignOwnedStmt
 		AlterTSConfigurationStmt AlterTSDictionaryStmt
 		CreateMatViewStmt RefreshMatViewStmt
+		CreateCustomPlanStmt DropCustomPlanStmt
 
 %type <node>	select_no_parens select_with_parens select_clause
 				simple_select values_clause
@@ -500,6 +502,10 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <str>		opt_existing_window_name
 %type <boolean> opt_if_not_exists
 
+%type <ival>	custom_class
+%type <node>	custom_option
+%type <list>	opt_custom_options custom_options
+
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -535,7 +541,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE
 	CROSS CSV CURRENT_P
 	CURRENT_CATALOG CURRENT_DATE CURRENT_ROLE CURRENT_SCHEMA
-	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
+	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CUSTOM CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
 	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC
@@ -574,7 +580,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	OBJECT_P OF OFF OFFSET OIDS ON ONLY OPERATOR OPTION OPTIONS OR
 	ORDER ORDINALITY OUT_P OUTER_P OVER OVERLAPS OVERLAY OWNED OWNER
 
-	PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POSITION
+	PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLAN PLANS POSITION
 	PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY
 	PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROGRAM
 
@@ -585,8 +591,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	RESET RESTART RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK
 	ROW ROWS RULE
 
-	SAVEPOINT SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE SEQUENCES
-	SERIALIZABLE SERVER SESSION SESSION_USER SET SETOF SHARE
+	SAVEPOINT SCAN_P SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE
+	SEQUENCES SERIALIZABLE SERVER SESSION SESSION_USER SET SETOF SHARE
 	SHOW SIMILAR SIMPLE SMALLINT SNAPSHOT SOME STABLE STANDALONE_P START
 	STATEMENT STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING
 	SYMMETRIC SYSID SYSTEM_P
@@ -748,6 +754,7 @@ stmt :
 			| CreateAssertStmt
 			| CreateCastStmt
 			| CreateConversionStmt
+			| CreateCustomPlanStmt
 			| CreateDomainStmt
 			| CreateExtensionStmt
 			| CreateFdwStmt
@@ -778,6 +785,7 @@ stmt :
 			| DoStmt
 			| DropAssertStmt
 			| DropCastStmt
+			| DropCustomPlanStmt
 			| DropFdwStmt
 			| DropForeignServerStmt
 			| DropGroupStmt
@@ -8652,6 +8660,78 @@ AlterTSConfigurationStmt:
 				}
 		;
 
+/****************************************************************************
+ *
+ *		QUERY:
+ *				CREATE CUSTOM PLAN name FOR <class> options
+ *
+ ****************************************************************************/
+
+CreateCustomPlanStmt:
+			CREATE CUSTOM PLAN name FOR custom_class opt_custom_options
+			{
+				CreateCustomPlanStmt *n = makeNode(CreateCustomPlanStmt);
+				n->custom_name = $4;
+				n->custom_class = $6;
+				n->custom_options = $7;
+				$$ = (Node *) n;
+			}
+		;
+
+custom_class:
+			SCAN_P				{ $$ = CUSTOM_PLAN_CLASS_SCAN; }
+			| JOIN				{ $$ = CUSTOM_PLAN_CLASS_JOIN; }
+			| ANY				{ $$ = CUSTOM_PLAN_CLASS_ANY; }
+		;
+
+custom_option:
+			HANDLER handler_name
+			{
+				$$ = makeDefElem("handler", (Node *)$2);
+			}
+		;
+
+custom_options:
+			custom_option					{ $$ = list_make1($1); }
+			| custom_options custom_option	{ $$ = lappend($1, $2); }
+		;
+
+opt_custom_options:
+			custom_options		{ $$ = $1; }
+			| /* empty */		{ $$ = NIL; }
+		;
+
+/****************************************************************************
+ *
+ *		QUERY:
+ *				DROP CUSTOM PLAN name
+ *
+ ****************************************************************************/
+
+DropCustomPlanStmt:
+		DROP CUSTOM PLAN name opt_drop_behavior
+			{
+				DropStmt *n = makeNode(DropStmt);
+				n->removeType = OBJECT_CUSTOM_PLAN;
+				n->objects = list_make1(list_make1(makeString($4)));
+				n->arguments = NIL;
+				n->missing_ok = false;
+				n->behavior = $5;
+				n->concurrent = false;
+				$$ = (Node *) n;
+			}
+		| DROP CUSTOM PLAN IF_P EXISTS name opt_drop_behavior
+			{
+				DropStmt *n = makeNode(DropStmt);
+				n->removeType = OBJECT_CUSTOM_PLAN;
+				n->objects = list_make1(list_make1(makeString($6)));
+				n->arguments = NIL;
+				n->missing_ok = true;
+				n->behavior = $7;
+				n->concurrent = false;
+				$$ = (Node *) n;
+			}
+		;
 
 /*****************************************************************************
  *
@@ -12842,6 +12922,7 @@ unreserved_keyword:
 			| CSV
 			| CURRENT_P
 			| CURSOR
+			| CUSTOM
 			| CYCLE
 			| DATA_P
 			| DATABASE
@@ -12954,6 +13035,7 @@ unreserved_keyword:
 			| PARTITION
 			| PASSING
 			| PASSWORD
+			| PLAN
 			| PLANS
 			| PRECEDING
 			| PREPARE
@@ -12989,6 +13071,7 @@ unreserved_keyword:
 			| ROWS
 			| RULE
 			| SAVEPOINT
+			| SCAN_P
 			| SCHEMA
 			| SCROLL
 			| SEARCH
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1846570..3ab01d0 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -202,6 +202,7 @@ check_xact_readonly(Node *parsetree)
 		case T_AlterTableSpaceOptionsStmt:
 		case T_AlterTableSpaceMoveStmt:
 		case T_CreateForeignTableStmt:
+		case T_CreateCustomPlanStmt:
 		case T_SecLabelStmt:
 			PreventCommandIfReadOnly(CreateCommandTag(parsetree));
 			break;
@@ -683,6 +684,10 @@ standard_ProcessUtility(Node *parsetree,
 			AlterEventTrigger((AlterEventTrigStmt *) parsetree);
 			break;
 
+		case T_CreateCustomPlanStmt:
+			DefineCustomPlan((CreateCustomPlanStmt *) parsetree);
+			break;
+
 			/*
 			 * ******************************** ROLE statements ****
 			 */
@@ -1940,6 +1945,9 @@ CreateCommandTag(Node *parsetree)
 				case OBJECT_OPFAMILY:
 					tag = "DROP OPERATOR FAMILY";
 					break;
+				case OBJECT_CUSTOM_PLAN:
+					tag = "DROP CUSTOM PLAN";
+					break;
 				default:
 					tag = "???";
 			}
@@ -2207,6 +2215,10 @@ CreateCommandTag(Node *parsetree)
 			tag = "ALTER EVENT TRIGGER";
 			break;
 
+		case T_CreateCustomPlanStmt:
+			tag = "CREATE CUSTOM PLAN";
+			break;
+
 		case T_CreatePLangStmt:
 			tag = "CREATE LANGUAGE";
 			break;
@@ -2830,6 +2842,10 @@ GetCommandLogLevel(Node *parsetree)
 			lev = LOGSTMT_DDL;
 			break;
 
+		case T_CreateCustomPlanStmt:
+			lev = LOGSTMT_DDL;
+			break;
+
 			/* already-planned queries */
 		case T_PlannedStmt:
 			{
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 36d9953..6891109 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -5385,6 +5385,22 @@ get_utility_query_def(Query *query, deparse_context *context)
 	}
 }
 
+/*
+ * GetSpecialCustomVar
+ *
+ * Utility routine to call optional GetSpecialCustomVar method of
+ * CustomPlanState
+ */
+static Node *
+GetSpecialCustomVar(PlanState *ps, Var *varnode)
+{
+	CustomPlanState *cps = (CustomPlanState *) ps;
+
+	Assert(IsA(ps, CustomPlanState));
+	Assert(IS_SPECIAL_VARNO(varnode->varno));
+
+	return (Node *)cps->methods->GetSpecialCustomVar(cps, varnode);
+}
 
 /*
  * Display a Var appropriately.
@@ -5416,6 +5432,7 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 	deparse_columns *colinfo;
 	char	   *refname;
 	char	   *attname;
+	Node	   *expr;
 
 	/* Find appropriate nesting depth */
 	netlevelsup = var->varlevelsup + levelsup;
@@ -5438,6 +5455,21 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
 		colinfo = deparse_columns_fetch(var->varno, dpns);
 		attnum = var->varattno;
 	}
+	else if (IS_SPECIAL_VARNO(var->varno) &&
+			 IsA(dpns->planstate, CustomPlanState) &&
+			 (expr = GetSpecialCustomVar(dpns->planstate, var)) != NULL)
+	{
+		/*
+		 * Force parentheses because our caller probably assumed a Var is a
+		 * simple expression.
+		 */
+		if (!IsA(expr, Var))
+			appendStringInfoChar(buf, '(');
+		get_rule_expr((Node *) expr, context, true);
+		if (!IsA(expr, Var))
+			appendStringInfoChar(buf, ')');
+		return NULL;
+	}
 	else if (var->varno == OUTER_VAR && dpns->outer_tlist)
 	{
 		TargetEntry *tle;
@@ -5726,6 +5758,26 @@ get_name_for_var_field(Var *var, int fieldno,
 		rte = rt_fetch(var->varno, dpns->rtable);
 		attnum = var->varattno;
 	}
+	else if (IS_SPECIAL_VARNO(var->varno) &&
+			 IsA(dpns->planstate, CustomPlanState) &&
+			 (expr = GetSpecialCustomVar(dpns->planstate, var)) != NULL)
+	{
+		StringInfo		saved = context->buf;
+		StringInfoData	temp;
+
+		initStringInfo(&temp);
+		context->buf = &temp;
+
+		if (!IsA(expr, Var))
+			appendStringInfoChar(context->buf, '(');
+		get_rule_expr((Node *) expr, context, true);
+		if (!IsA(expr, Var))
+			appendStringInfoChar(context->buf, ')');
+
+		context->buf = saved;
+
+		return temp.data;
+	}
 	else if (var->varno == OUTER_VAR && dpns->outer_tlist)
 	{
 		TargetEntry *tle;
diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 79df5b6..be2388e 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -32,6 +32,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_conversion.h"
+#include "catalog/pg_custom_plan.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_default_acl.h"
@@ -345,6 +346,28 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		8
 	},
+	{CustomPlanRelationId,		/* CUSTOMPLANOID */
+		CustomPlanOidIndexId,
+		1,
+		{
+			ObjectIdAttributeNumber,
+			0,
+			0,
+			0
+		},
+		32
+	},
+	{CustomPlanRelationId,		/* CUSTOMPLANNAME */
+		CustomPlanNameIndexId,
+		1,
+		{
+			Anum_pg_custom_plan_custname,
+			0,
+			0,
+			0,
+		},
+		32
+	},
 	{DatabaseRelationId,		/* DATABASEOID */
 		DatabaseOidIndexId,
 		1,
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 8948589..0962df6 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -147,6 +147,7 @@ typedef enum ObjectClass
 	OCLASS_DEFACL,				/* pg_default_acl */
 	OCLASS_EXTENSION,			/* pg_extension */
 	OCLASS_EVENT_TRIGGER,		/* pg_event_trigger */
+	OCLASS_CUSTOM_PLAN,			/* pg_custom_plan */
 	MAX_OCLASS					/* MUST BE LAST */
 } ObjectClass;
 
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 0515b75..a5e6cea 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -313,6 +313,12 @@ DECLARE_UNIQUE_INDEX(pg_extension_name_index, 3081, on pg_extension using btree(
 DECLARE_UNIQUE_INDEX(pg_range_rngtypid_index, 3542, on pg_range using btree(rngtypid oid_ops));
 #define RangeTypidIndexId					3542
 
+DECLARE_UNIQUE_INDEX(pg_custom_plan_oid_index, 3563, on pg_custom_plan using btree(oid oid_ops));
+#define CustomPlanOidIndexId 3563
+
+DECLARE_UNIQUE_INDEX(pg_custom_plan_name_index, 3564, on pg_custom_plan using btree(custname name_ops));
+#define CustomPlanNameIndexId 3564
+
 /* last step of initialization script: build the indexes declared above */
 BUILD_INDICES
 
diff --git a/src/include/catalog/pg_custom_plan.h b/src/include/catalog/pg_custom_plan.h
new file mode 100644
index 0000000..01b52bb
--- /dev/null
+++ b/src/include/catalog/pg_custom_plan.h
@@ -0,0 +1,51 @@
+/* -------------------------------------------------------------------------
+ *
+ * pg_custom_plan.h
+ *	  definition of the system "custom plan" relation (pg_custom_plan)
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * -------------------------------------------------------------------------
+ */
+#ifndef PG_CUSTOM_PLAN_H
+#define PG_CUSTOM_PLAN_H
+
+#include "catalog/genbki.h"
+
+/* ----------------
+ *		pg_custom_plan definition.  cpp turns this into
+ *		typedef struct FormData_pg_custom_plan
+ * ----------------
+ */
+#define CustomPlanRelationId		3562
+
+CATALOG(pg_custom_plan,3562)
+{
+	NameData	custname;		/* name of custom-plan provider */
+	int32		custclass;		/* mask of CUSTOM_CLASS_*; see below */
+	Oid			custhandler;	/* OID of handler function to be called */
+} FormData_pg_custom_plan;
+
+/* ----------------
+ *		Form_pg_custom_plan corresponds to a pointer to a tuple
+ *		with the format of pg_custom_plan relation.
+ * ----------------
+ */
+typedef FormData_pg_custom_plan *Form_pg_custom_plan;
+
+/* ----------------
+ *		compiler constants for pg_custom_plan
+ * ----------------
+ */
+#define Natts_pg_custom_plan			3
+#define Anum_pg_custom_plan_custname	1
+#define Anum_pg_custom_plan_custclass	2
+#define Anum_pg_custom_plan_custhandler	3
+
+/* Bits for custclass */
+#define CUSTOM_PLAN_CLASS_SCAN			(1 << 0)
+#define CUSTOM_PLAN_CLASS_JOIN			(1 << 1)
+#define CUSTOM_PLAN_CLASS_ANY			~0
+
+#endif   /* PG_CUSTOM_PLAN_H */
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index f280af4..63986ab 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -166,10 +166,13 @@ DESCR("less than");
 #define TIDLessOperator    2799
 DATA(insert OID = 2800 (  ">"	   PGNSP PGUID b f f	27	27	16 2799 2801 tidgt scalargtsel scalargtjoinsel ));
 DESCR("greater than");
+#define TIDGreaterOperator	2800
 DATA(insert OID = 2801 (  "<="	   PGNSP PGUID b f f	27	27	16 2802 2800 tidle scalarltsel scalarltjoinsel ));
 DESCR("less than or equal");
+#define TIDLessEqualOperator 2801
 DATA(insert OID = 2802 (  ">="	   PGNSP PGUID b f f	27	27	16 2801 2799 tidge scalargtsel scalargtjoinsel ));
 DESCR("greater than or equal");
+#define TIDGreaterEqualOperator 2802
 
 DATA(insert OID = 410 ( "="		   PGNSP PGUID b t t	20	20	16 410 411 int8eq eqsel eqjoinsel ));
 DESCR("equal");
diff --git a/src/include/commands/custom_plan.h b/src/include/commands/custom_plan.h
new file mode 100644
index 0000000..155d710
--- /dev/null
+++ b/src/include/commands/custom_plan.h
@@ -0,0 +1,59 @@
+/*-------------------------------------------------------------------------
+ *
+ * custom_plan.h
+ *		Declarations for custom-plan handling
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/custom_plan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CUSTOM_PLAN_H
+#define CUSTOM_PLAN_H
+#include "nodes/parsenodes.h"
+#include "optimizer/planmain.h"
+
+/* argument of handler function for CUSTOM_PLAN_CLASS_SCAN */
+typedef struct {
+	uint32			custom_class;
+	PlannerInfo	   *root;
+	RelOptInfo	   *baserel;
+	RangeTblEntry  *rte;
+} customScanArg;
+
+/* argument of handler function for CUSTOM_PLAN_CLASS_JOIN */
+typedef struct {
+	uint32			custom_class;
+	PlannerInfo	   *root;
+	RelOptInfo	   *joinrel;
+	RelOptInfo	   *outerrel;
+	RelOptInfo	   *innerrel;
+	JoinType		jointype;
+	SpecialJoinInfo *sjinfo;
+	List		   *restrictlist;
+	Relids			param_source_rels;
+	Relids			extra_lateral_rels;
+} customJoinArg;
+
+extern void CallCustomScanProviders(PlannerInfo *root,
+									RelOptInfo *baserel,
+									RangeTblEntry *rte);
+
+extern void CallCustomJoinProviders(PlannerInfo *root,
+									RelOptInfo *joinrel,
+									RelOptInfo *outerrel,
+									RelOptInfo *innerrel,
+									JoinType jointype,
+									SpecialJoinInfo *sjinfo,
+									List *restrictlist,
+									Relids param_source_rels,
+									Relids extra_lateral_rels);
+
+extern Oid	get_custom_plan_oid(const char *custom_name, bool missing_ok);
+
+extern void	RemoveCustomPlanById(Oid cust_oid);
+extern Oid	DefineCustomPlan(CreateCustomPlanStmt *stmt);
+
+#endif	/* CUSTOM_PLAN_H */
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
new file mode 100644
index 0000000..f97917c
--- /dev/null
+++ b/src/include/executor/nodeCustom.h
@@ -0,0 +1,30 @@
+/* ------------------------------------------------------------------------
+ *
+ * nodeCustom.h
+ *
+ * prototypes for CustomScan nodes
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * ------------------------------------------------------------------------
+ */
+#ifndef NODECUSTOM_H
+#define NODECUSTOM_H
+#include "nodes/plannodes.h"
+#include "nodes/execnodes.h"
+
+/*
+ * General executor code
+ */
+extern CustomPlanState *ExecInitCustomPlan(CustomPlan *cplan,
+										   EState *estate, int eflags);
+extern TupleTableSlot *ExecCustomPlan(CustomPlanState *node);
+extern Node *MultiExecCustomPlan(CustomPlanState *node);
+extern void ExecEndCustomPlan(CustomPlanState *node);
+
+extern void ExecReScanCustomPlan(CustomPlanState *node);
+extern void ExecCustomMarkPos(CustomPlanState *node);
+extern void ExecCustomRestrPos(CustomPlanState *node);
+
+#endif /* NODECUSTOM_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c94e8a..97198b6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1501,6 +1501,47 @@ typedef struct ForeignScanState
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
+/* ----------------
+ *	CustomPlanState information
+ *
+ *		CustomPlan nodes are used to execute custom code within executor.
+ * ----------------
+ */
+struct CustomPlanStateMethods;
+struct ExplainState;	/* to avoid to include explain.h here */
+
+typedef struct CustomPlanState
+{
+	PlanState	ps;
+	const struct CustomPlanStateMethods *methods;
+} CustomPlanState;
+
+typedef struct CustomPlanStateMethods
+{
+	const char	   *CustomName;
+	void	(*BeginCustomPlan)(CustomPlanState *node,
+							   EState *estate,
+							   int eflags);
+	TupleTableSlot *(*ExecCustomPlan)(CustomPlanState *node);
+	Node   *(*MultiExecCustomPlan)(CustomPlanState *node);
+	void	(*EndCustomPlan)(CustomPlanState *node);
+	void	(*ReScanCustomPlan)(CustomPlanState *node);
+	void	(*MarkPosCustomPlan)(CustomPlanState *node);
+	void	(*RestrPosCustomPlan)(CustomPlanState *node);
+	/* explain support */
+	void	(*ExplainCustomPlanTargetRel)(CustomPlanState *node,
+										  struct ExplainState *es);
+	void	(*ExplainCustomPlan)(CustomPlanState *node,
+								 List *ancestors,
+								 struct ExplainState *es);
+	void	(*ExplainCustomPreScanNode)(CustomPlanState *node,
+										Bitmapset **rels_used,
+										void (*fn_ExplainPreScanNode)(
+											PlanState *planstate,
+											Bitmapset **rels_used));
+	Node   *(*GetSpecialCustomVar)(CustomPlanState *node, Var *varnode);
+} CustomPlanStateMethods;
+
 /* ----------------------------------------------------------------
  *				 Join State Information
  * ----------------------------------------------------------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5b8df59..c1c381d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -62,6 +62,8 @@ typedef enum NodeTag
 	T_CteScan,
 	T_WorkTableScan,
 	T_ForeignScan,
+	T_CustomPlan,
+	T_CustomPlanMarkPos,
 	T_Join,
 	T_NestLoop,
 	T_MergeJoin,
@@ -107,6 +109,7 @@ typedef enum NodeTag
 	T_CteScanState,
 	T_WorkTableScanState,
 	T_ForeignScanState,
+	T_CustomPlanState,
 	T_JoinState,
 	T_NestLoopState,
 	T_MergeJoinState,
@@ -224,6 +227,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
 	T_ResultPath,
@@ -365,6 +369,7 @@ typedef enum NodeTag
 	T_RefreshMatViewStmt,
 	T_ReplicaIdentityStmt,
 	T_AlterSystemStmt,
+	T_CreateCustomPlanStmt,
 
 	/*
 	 * TAGS FOR PARSE TREE NODES (parsenodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 18d4991..29b63ab 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1195,6 +1195,7 @@ typedef enum ObjectType
 	OBJECT_CONSTRAINT,
 	OBJECT_COLLATION,
 	OBJECT_CONVERSION,
+	OBJECT_CUSTOM_PLAN,
 	OBJECT_DATABASE,
 	OBJECT_DOMAIN,
 	OBJECT_EVENT_TRIGGER,
@@ -2036,6 +2037,18 @@ typedef struct AlterOpFamilyStmt
 } AlterOpFamilyStmt;
 
 /* ----------------------
+ *		Create Custom Plan Statement
+ * ----------------------
+ */
+typedef struct CreateCustomPlanStmt
+{
+	NodeTag		type;
+	char	   *custom_name;
+	int			custom_class;
+	List	   *custom_options;
+} CreateCustomPlanStmt;
+
+/* ----------------------
  *		Drop Table|Sequence|View|Index|Type|Domain|Conversion|Schema Statement
  * ----------------------
  */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 38c039c..2c8b079 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -15,8 +15,10 @@
 #define PLANNODES_H
 
 #include "access/sdir.h"
+#include "lib/stringinfo.h"
 #include "nodes/bitmapset.h"
 #include "nodes/primnodes.h"
+#include "nodes/relation.h"
 
 
 /* ----------------------------------------------------------------
@@ -479,6 +481,55 @@ typedef struct ForeignScan
 	bool		fsSystemCol;	/* true if any "system column" is needed */
 } ForeignScan;
 
+/* ----------------
+ *		CustomPlan node
+ * ----------------
+ */
+struct CustomPlanMethods;
+struct CustomPlanState;	/* to avoid to include nodes/execnodes.h here */
+struct finalize_primnode_context;	/* to avoid to include subselect.h */
+
+typedef struct CustomPlan
+{
+	Plan		plan;
+	const struct CustomPlanMethods *methods;
+} CustomPlan;
+
+/* almost same to CustomPlan, but support MarkPos/RestorePos */
+typedef CustomPlan CustomPlanMarkPos;
+
+typedef struct CustomPlanMethods
+{
+	const char *CustomName;
+	void	(*SetCustomPlanRef)(PlannerInfo *root,
+								CustomPlan *custom_plan,
+								int rtoffset,
+								Plan *(*fn_set_plan_refs)(PlannerInfo *root,
+														  Plan *plan,
+														  int rtoffset),
+								void (*fn_fix_expr_common)(PlannerInfo *root,
+														   Node *node));
+	bool	(*SupportCustomBackwardScan)(CustomPlan *custom_plan);
+	Bitmapset  *(*FinalizeCustomPlan)(PlannerInfo *root,
+									  CustomPlan *custom_plan,
+									  Bitmapset *paramids,
+									  Bitmapset *valid_params,
+									  Bitmapset *scan_params,
+									  Bitmapset *(*fn_finalize_plan)(
+										  PlannerInfo *root,
+										  Plan *plan,
+										  Bitmapset *valid_params,
+										  Bitmapset *scan_params),
+									  Bitmapset *(*fn_finalize_primnode)(
+										  Node *node,
+										  struct finalize_primnode_context *context));
+	struct CustomPlanState *(*CreateCustomPlanState)(CustomPlan *custom_plan);
+	void	(*TextOutCustomPlan)(StringInfo str,
+								 const CustomPlan *node);
+	CustomPlan *(*CopyCustomPlan)(const CustomPlan *from,
+								  void (*fn_CopyPlanFields)(const Plan *from,
+															Plan *newnode));
+} CustomPlanMethods;
 
 /*
  * ==========
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c607b36..96ef3a2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
 #define RELATION_H
 
 #include "access/sdir.h"
+#include "lib/stringinfo.h"
 #include "nodes/params.h"
 #include "nodes/parsenodes.h"
 #include "storage/block.h"
@@ -878,6 +879,37 @@ typedef struct ForeignPath
 } ForeignPath;
 
 /*
+ * CustomPath represents a scan using custom logic
+ *
+ * Extension (that performs as custom-plan provider) can adds an alternative
+ * path using its custom type being delivered from CustomPath.
+ * They can store their private data on the extra fields of their custom
+ * object. A set of common methods are represented as function pointers in
+ * CustomPathMethods structure; extension has to set up then correctly.
+ */
+struct CustomPathMethods;
+struct Plan;		/* to avoid to include plannode.h here */
+struct CustomPlan;	/* to avoid to include plannode.h here */
+
+typedef struct CustomPath
+{
+	Path		path;
+	const struct CustomPathMethods *methods;
+} CustomPath;
+
+typedef struct CustomPathMethods
+{
+	const char *CustomName;
+	struct CustomPlan *
+		(*CreateCustomPlan)(PlannerInfo *root,
+							CustomPath *best_path,
+							struct Plan *(*fn_create_plan)(
+								PlannerInfo *root,
+								Path *best_path));
+	void (*TextOutCustomPath)(StringInfo str, CustomPath *node);
+} CustomPathMethods;
+
+/*
  * AppendPath represents an Append plan, ie, successive execution of
  * several member plans.
  *
diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h
index 5607e98..a8ad7c5 100644
--- a/src/include/optimizer/subselect.h
+++ b/src/include/optimizer/subselect.h
@@ -15,6 +15,12 @@
 #include "nodes/plannodes.h"
 #include "nodes/relation.h"
 
+typedef struct finalize_primnode_context
+{
+	PlannerInfo	   *root;
+	Bitmapset	   *paramids;	/* Non-local PARAM_EXEC paramids found */
+} finalize_primnode_context;
+
 extern void SS_process_ctes(PlannerInfo *root);
 extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root,
 							SubLink *sublink,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 61fae22..a769d04 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -107,6 +107,7 @@ PG_KEYWORD("current_time", CURRENT_TIME, RESERVED_KEYWORD)
 PG_KEYWORD("current_timestamp", CURRENT_TIMESTAMP, RESERVED_KEYWORD)
 PG_KEYWORD("current_user", CURRENT_USER, RESERVED_KEYWORD)
 PG_KEYWORD("cursor", CURSOR, UNRESERVED_KEYWORD)
+PG_KEYWORD("custom", CUSTOM, UNRESERVED_KEYWORD)
 PG_KEYWORD("cycle", CYCLE, UNRESERVED_KEYWORD)
 PG_KEYWORD("data", DATA_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("database", DATABASE, UNRESERVED_KEYWORD)
@@ -283,6 +284,7 @@ PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD)
 PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD)
 PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD)
 PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD)
+PG_KEYWORD("plan", PLAN, UNRESERVED_KEYWORD)
 PG_KEYWORD("plans", PLANS, UNRESERVED_KEYWORD)
 PG_KEYWORD("position", POSITION, COL_NAME_KEYWORD)
 PG_KEYWORD("preceding", PRECEDING, UNRESERVED_KEYWORD)
@@ -326,6 +328,7 @@ PG_KEYWORD("row", ROW, COL_NAME_KEYWORD)
 PG_KEYWORD("rows", ROWS, UNRESERVED_KEYWORD)
 PG_KEYWORD("rule", RULE, UNRESERVED_KEYWORD)
 PG_KEYWORD("savepoint", SAVEPOINT, UNRESERVED_KEYWORD)
+PG_KEYWORD("scan", SCAN_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("schema", SCHEMA, UNRESERVED_KEYWORD)
 PG_KEYWORD("scroll", SCROLL, UNRESERVED_KEYWORD)
 PG_KEYWORD("search", SEARCH, UNRESERVED_KEYWORD)
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 33b6dca..abc55b2 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -563,6 +563,8 @@ extern Datum language_handler_in(PG_FUNCTION_ARGS);
 extern Datum language_handler_out(PG_FUNCTION_ARGS);
 extern Datum fdw_handler_in(PG_FUNCTION_ARGS);
 extern Datum fdw_handler_out(PG_FUNCTION_ARGS);
+extern Datum custom_plan_handler_in(PG_FUNCTION_ARGS);
+extern Datum custom_plan_handler_out(PG_FUNCTION_ARGS);
 extern Datum internal_in(PG_FUNCTION_ARGS);
 extern Datum internal_out(PG_FUNCTION_ARGS);
 extern Datum opaque_in(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h
index f97229f..7272eec 100644
--- a/src/include/utils/syscache.h
+++ b/src/include/utils/syscache.h
@@ -52,6 +52,8 @@ enum SysCacheIdentifier
 	CONNAMENSP,
 	CONSTROID,
 	CONVOID,
+	CUSTOMPLANOID,
+	CUSTOMPLANNAME,
 	DATABASEOID,
 	DEFACLROLENSPOBJ,
 	ENUMOID,
diff --git a/src/test/regress/expected/sanity_check.out b/src/test/regress/expected/sanity_check.out
index 111d24c..e3cad45 100644
--- a/src/test/regress/expected/sanity_check.out
+++ b/src/test/regress/expected/sanity_check.out
@@ -97,6 +97,7 @@ pg_class|t
 pg_collation|t
 pg_constraint|t
 pg_conversion|t
+pg_custom_plan|t
 pg_database|t
 pg_db_role_setting|t
 pg_default_acl|t

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Kouhei Kaigai (#1)

Re: [v9.5] Custom Plan API

On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Prior to the development cycle towards v9.5, I'd like to reopen
the discussion of custom-plan interface. Even though we had lots
of discussion during the last three commit-fests, several issues
are still under discussion. So, I'd like to clarify direction of
the implementation, prior to the first commit-fest.

(1) DDL support and system catalog

Simon suggested that DDL command should be supported to track custom-
plan providers being installed, and to avoid nonsense hook calls
if it is an obvious case that custom-plan provider can help. It also
makes sense to give a chance to load extensions once installed.
(In the previous design, I assumed modules are loaded by LOAD command
or *_preload_libraries parameters).

I tried to implement the following syntax:

CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;

Thank you for exploring that thought and leading the way on this
research. I've been thinking about this also.

What I think we need is a declarative form that expresses the linkage
between base table(s) and a related data structures that can be used
to optimize a query, while still providing accurate results.

In other DBMS, we have concepts such as a JoinIndex or a MatView which
allow some kind of lookaside behaviour. Just for clarity, a concrete
example is Oracle's Materialized Views which can be set using ENABLE
QUERY REWRITE so that the MatView can be used as an alternative path
for a query. We do already have this concept in PostgreSQL, where an
index can be used to perform an IndexOnlyScan rather than accessing
the heap itself.

We have considerable evidence that the idea of alternate data
structures results in performance gains.
* KaiGai's work - https://wiki.postgresql.org/wiki/PGStrom
* /messages/by-id/52C59858.9090500@garret.ru
* http://citusdata.github.io/cstore_fdw/
* University of Manchester - exploring GPUs as part of the AXLE project
* Barcelona SuperComputer Centre - exploring FPGAs, as part of the AXLE project
* Some other authors have also cited gains using GPU technology in databases

So I would like to have a mechanism that provides a *generic*
Lookaside for a table or foreign table.

Tom and Kevin have previously expressed that MatViews would represent
a planning problem, in the general case. One way to solve that
planning issue is to link structures directly together, in the same
way that an index and a table are linked. We can then process the
lookaside in the same way we handle a partial index - check
prerequisites and if usable, calculate a cost for the alternate path.
We need not add planning time other than to the tables that might
benefit from that.

Roughly, I'm thinking of this...

CREATE LOOKASIDE ON foo
TO foo_mat_view;

and also this...

CREATE LOOKASIDE ON foo
TO foo_as_a_foreign_table /* e.g. PGStrom */

This would allow the planner to consider alternate plans for foo_mv
during set_plain_rel_pathlist() similarly to the way it considers
index paths, in one of the common cases that the mat view covers just
one table.

This concept is similar to ENABLE QUERY REWRITE in Oracle, but this
thought goes much further, to include any generic user-defined data
structure or foreign table.

Do we need this? For MVs, we *might* be able to deduce that the MV is
rewritable for "foo", but that is not deducible for Foreign Tables, by
current definition, so I prefer the explicit definition of objects
that are linked - since doing this for indexes is already familiar to
people.

Having an explicit linkage between data structures allows us to
enhance an existing application by transaparently adding new
structures, just as we already do with indexes. Specifically, that we
allow more than one lookaside structure on any one table.

Forget the exact name, thats not important. But I think the
requirements here are...

* Explicit definition that we are attaching an alternate path onto a
table (conceptually similar to adding an index)

* Ability to check that the alternate path is viable (similar to the
way we validate use of partial indexes prior to usage)
Checks on columns(SELECT), rows(WHERE), aggregations(GROUP)

* Ability to consider access cost for both normal table and alternate
path (like an index) - this allows the alternate path to *not* be
chosen when we are performing some operation that is sub-optimal (for
whatever reason).

* There may be some need to define operator classes that are
implemented via the alternate path

which works for single tables, but a later requirement would then be

* allows the join of one or more tables to be replaced with a single lookaside

Hopefully, we won't need a "Custom Plan" at all, just the ability to
lookaside when useful.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Simon Riggs (#2)

Re: [v9.5] Custom Plan API

On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Prior to the development cycle towards v9.5, I'd like to reopen the
discussion of custom-plan interface. Even though we had lots of
discussion during the last three commit-fests, several issues are
still under discussion. So, I'd like to clarify direction of the
implementation, prior to the first commit-fest.

(1) DDL support and system catalog

Simon suggested that DDL command should be supported to track custom-
plan providers being installed, and to avoid nonsense hook calls if it
is an obvious case that custom-plan provider can help. It also makes
sense to give a chance to load extensions once installed.
(In the previous design, I assumed modules are loaded by LOAD command
or *_preload_libraries parameters).

I tried to implement the following syntax:

CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;

Thank you for exploring that thought and leading the way on this research.
I've been thinking about this also.

What I think we need is a declarative form that expresses the linkage between
base table(s) and a related data structures that can be used to optimize
a query, while still providing accurate results.

In other DBMS, we have concepts such as a JoinIndex or a MatView which allow
some kind of lookaside behaviour. Just for clarity, a concrete example is
Oracle's Materialized Views which can be set using ENABLE QUERY REWRITE
so that the MatView can be used as an alternative path for a query. We do
already have this concept in PostgreSQL, where an index can be used to
perform an IndexOnlyScan rather than accessing the heap itself.

We have considerable evidence that the idea of alternate data structures
results in performance gains.
* KaiGai's work - https://wiki.postgresql.org/wiki/PGStrom
* /messages/by-id/52C59858.9090500@garret.ru
* http://citusdata.github.io/cstore_fdw/
* University of Manchester - exploring GPUs as part of the AXLE project
* Barcelona SuperComputer Centre - exploring FPGAs, as part of the AXLE
project
* Some other authors have also cited gains using GPU technology in databases

So I would like to have a mechanism that provides a *generic* Lookaside
for a table or foreign table.

Tom and Kevin have previously expressed that MatViews would represent a
planning problem, in the general case. One way to solve that planning issue
is to link structures directly together, in the same way that an index and
a table are linked. We can then process the lookaside in the same way we
handle a partial index - check prerequisites and if usable, calculate a
cost for the alternate path.
We need not add planning time other than to the tables that might benefit
from that.

Roughly, I'm thinking of this...

CREATE LOOKASIDE ON foo
TO foo_mat_view;

and also this...

CREATE LOOKASIDE ON foo
TO foo_as_a_foreign_table /* e.g. PGStrom */

This would allow the planner to consider alternate plans for foo_mv during
set_plain_rel_pathlist() similarly to the way it considers index paths,
in one of the common cases that the mat view covers just one table.

This concept is similar to ENABLE QUERY REWRITE in Oracle, but this thought
goes much further, to include any generic user-defined data structure or
foreign table.

Let me clarify. This mechanism allows to add alternative scan/join paths
including built-in ones, not only custom enhanced plan/exec node, isn't it?
Probably, it is a variation of above proposition if we install a handler
function that proposes built-in path nodes towards the request for scan/join.

Do we need this? For MVs, we *might* be able to deduce that the MV is
rewritable for "foo", but that is not deducible for Foreign Tables, by
current definition, so I prefer the explicit definition of objects that
are linked - since doing this for indexes is already familiar to people.

Having an explicit linkage between data structures allows us to enhance
an existing application by transaparently adding new structures, just as
we already do with indexes. Specifically, that we allow more than one
lookaside structure on any one table.

Not only alternative data structure, alternative method to scan/join towards
same data structure is also important, isn't it?

Forget the exact name, thats not important. But I think the requirements
here are...

* Explicit definition that we are attaching an alternate path onto a table
(conceptually similar to adding an index)

I think the syntax allows "tables", not only a particular table.
It will inform the core planner this lookaside/customplan (name is not
important, anyway this feature...) can provide alternative path towards
the set of relations; being considered. So, it allows to reduce number of
function calls on planner stage.

* Ability to check that the alternate path is viable (similar to the way
we validate use of partial indexes prior to usage)
Checks on columns(SELECT), rows(WHERE), aggregations(GROUP)

I never deny it... but do you think this feature from the initial version??

* Ability to consider access cost for both normal table and alternate path
(like an index) - this allows the alternate path to *not* be chosen when
we are performing some operation that is sub-optimal (for whatever reason).

It is an usual job of existing planner, isn't it?

* There may be some need to define operator classes that are implemented
via the alternate path

which works for single tables, but a later requirement would then be

* allows the join of one or more tables to be replaced with a single lookaside

It's higher priority for me, and I guess it is same in MatView usage.

Hopefully, we won't need a "Custom Plan" at all, just the ability to
lookaside when useful.

Probably, lookaside is a special case in the scenario that custom-plan can
provide. I also think it is an attractive use case if we can redirect
a particular complicated join into a MatView reference. So, it makes sense
to bundle a handler function to replace join by matview reference.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Kouhei Kaigai (#3)

Re: [v9.5] Custom Plan API

On 7 May 2014 08:17, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Let me clarify. This mechanism allows to add alternative scan/join paths
including built-in ones, not only custom enhanced plan/exec node, isn't it?
Probably, it is a variation of above proposition if we install a handler
function that proposes built-in path nodes towards the request for scan/join.

Yes, I am looking for a way to give you the full extent of your
requirements, within the Postgres framework. I have time and funding
to assist you in achieving this in a general way that all may make use
of.

Not only alternative data structure, alternative method to scan/join towards
same data structure is also important, isn't it?

Agreed. My proposal is that if the planner allows the lookaside to an
FDW then we pass the query for full execution on the FDW. That means
that the scan, aggregate and join could take place via the FDW. i.e.
"Custom Plan" == lookaside + FDW

Or put another way, if we add Lookaside then we can just plug in the
pgstrom FDW directly and we're done. And everybody else's FDW will
work as well, so Citus etcc will not need to recode.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Simon Riggs (#4)

Re: [v9.5] Custom Plan API

-----Original Message-----
From: Simon Riggs [mailto:simon@2ndQuadrant.com]
Sent: Wednesday, May 07, 2014 5:02 PM
To: Kaigai Kouhei(海外浩平)
Cc: Tom Lane; Robert Haas; Andres Freund; PgHacker; Stephen Frost; Shigeru
Hanada; Jim Mlodgenski; Peter Eisentraut; Kohei KaiGai
Subject: Re: [v9.5] Custom Plan API

On 7 May 2014 08:17, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Let me clarify. This mechanism allows to add alternative scan/join
paths including built-in ones, not only custom enhanced plan/exec node,

isn't it?

Probably, it is a variation of above proposition if we install a
handler function that proposes built-in path nodes towards the request

for scan/join.

Yes, I am looking for a way to give you the full extent of your requirements,
within the Postgres framework. I have time and funding to assist you in
achieving this in a general way that all may make use of.

Not only alternative data structure, alternative method to scan/join
towards same data structure is also important, isn't it?

Agreed. My proposal is that if the planner allows the lookaside to an FDW
then we pass the query for full execution on the FDW. That means that the
scan, aggregate and join could take place via the FDW. i.e.
"Custom Plan" == lookaside + FDW

Or put another way, if we add Lookaside then we can just plug in the pgstrom
FDW directly and we're done. And everybody else's FDW will work as well,
so Citus etcc will not need to recode.

Hmm. That sounds me, you intend to make FDW perform as a central facility
to host pluggable plan/exec stuff. Even though we have several things to be
clarified, I also think it's a direction worth to investigate.

Let me list up the things to be clarified / developed randomly.

* Join replacement by FDW; We still don't have consensus about join replacement
by FDW. Probably, it will be designed to remote-join implementation primarily,
however, things to do is similar. We may need to revisit the Hanada-san's
proposition in the past.

* Lookaside for ANY relations; I want planner to try GPU-scan for any relations
once installed, to reduce user's administration cost.
It needs lookaside allow to specify a particular foreign-server, not foreign-
table, then create ForeignScan node that is not associated with a particular
foreign-table.

* ForeignScan node that is not associated with a particular foreign-table.
Once we try to apply ForeignScan node instead of Sort or Aggregate, existing
FDW implementation needs to be improved. These nodes scan on a materialized
relation (generated on the fly), however, existing FDW code assumes
ForeignScan node is always associated with a particular foreign-table.
We need to eliminate this restriction.

* FDW method for MultiExec. In case when we can stack multiple ForeignScan
nodes, it's helpful to support to exchange scanned tuples in their own
data format. Let's assume two ForeignScan nodes are stacked. One performs
like Sort, another performs like Scan. If they internally handle column-
oriented data format, TupleTableSlot is not a best way for data exchange.

* Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented
using writable FDW feature. Not a big issue, but don't forget it...

How about your opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Kouhei Kaigai (#5)

Re: [v9.5] Custom Plan API

On 7 May 2014 10:06, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Let me list up the things to be clarified / developed randomly.

* Join replacement by FDW; We still don't have consensus about join replacement
by FDW. Probably, it will be designed to remote-join implementation primarily,
however, things to do is similar. We may need to revisit the Hanada-san's
proposition in the past.

Agreed. We need to push down joins into FDWs and we need to push down
aggregates also, so they can be passed to FDWs. I'm planning to look
at aggregate push down.

* Lookaside for ANY relations; I want planner to try GPU-scan for any relations
once installed, to reduce user's administration cost.
It needs lookaside allow to specify a particular foreign-server, not foreign-
table, then create ForeignScan node that is not associated with a particular
foreign-table.

IMHO we would not want to add indexes to every column, on every table,
nor would we wish to use lookaside for all tables. It is a good thing
to be able to add optimizations for individual tables. GPUs are not
good for everything; it is good to be able to leverage their
strengths, yet avoid their weaknesses.

If do you want that, you can write an Event Trigger that automatically
adds a lookaside for any table.

* ForeignScan node that is not associated with a particular foreign-table.
Once we try to apply ForeignScan node instead of Sort or Aggregate, existing
FDW implementation needs to be improved. These nodes scan on a materialized
relation (generated on the fly), however, existing FDW code assumes
ForeignScan node is always associated with a particular foreign-table.
We need to eliminate this restriction.

I don't think we need to do that, given the above.

* FDW method for MultiExec. In case when we can stack multiple ForeignScan
nodes, it's helpful to support to exchange scanned tuples in their own
data format. Let's assume two ForeignScan nodes are stacked. One performs
like Sort, another performs like Scan. If they internally handle column-
oriented data format, TupleTableSlot is not a best way for data exchange.

I agree TupleTableSlot may not be best way for bulk data movement. We
probably need to look at buffering/bulk movement between executor
nodes in general, which would be of benefit for the FDW case also.
This would be a problem even for Custom Scans as originally presented
also, so I don't see much change there.

* Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented
using writable FDW feature. Not a big issue, but don't forget it...

Yes, possible.

I hope these ideas make sense. This is early days and there may be
other ideas and much detail yet to come.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#4)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

Agreed. My proposal is that if the planner allows the lookaside to an
FDW then we pass the query for full execution on the FDW. That means
that the scan, aggregate and join could take place via the FDW. i.e.
"Custom Plan" == lookaside + FDW

How about we get that working for FDWs to begin with and then we can
come back to this idea..? We're pretty far from join-pushdown or
aggregate-pushdown to FDWs, last I checked, and having those would be a
massive win for everyone using FDWs.

Thanks,

Stephen

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#6)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

IMHO we would not want to add indexes to every column, on every table,
nor would we wish to use lookaside for all tables. It is a good thing
to be able to add optimizations for individual tables. GPUs are not
good for everything; it is good to be able to leverage their
strengths, yet avoid their weaknesses.

It's the optimizer's job to figure out which path to pick though, based
on which will have the lowest cost.

If do you want that, you can write an Event Trigger that automatically
adds a lookaside for any table.

This sounds terribly ugly and like we're pushing optimization decisions
on to the user instead of just figuring out what the best answer is.

I agree TupleTableSlot may not be best way for bulk data movement. We
probably need to look at buffering/bulk movement between executor
nodes in general, which would be of benefit for the FDW case also.
This would be a problem even for Custom Scans as originally presented
also, so I don't see much change there.

Being able to do bulk movement would be useful, but (as I proposed
months ago) being able to do asyncronous returns would be extremely
useful also, when you consider FDWs and Append()- the main point there
being that you want to keep the FDWs busy and working in parallel.

Thanks,

Stephen

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#8)

Re: [v9.5] Custom Plan API

On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote:

* Simon Riggs (simon@2ndQuadrant.com) wrote:

IMHO we would not want to add indexes to every column, on every table,
nor would we wish to use lookaside for all tables. It is a good thing
to be able to add optimizations for individual tables. GPUs are not
good for everything; it is good to be able to leverage their
strengths, yet avoid their weaknesses.

It's the optimizer's job to figure out which path to pick though, based
on which will have the lowest cost.

Of course. I'm not suggesting otherwise.

If do you want that, you can write an Event Trigger that automatically
adds a lookaside for any table.

This sounds terribly ugly and like we're pushing optimization decisions
on to the user instead of just figuring out what the best answer is.

I'm proposing that we use a declarative approach, just like we do when
we say CREATE INDEX.

The idea is that we only consider a lookaside when a lookaside has
been declared. Same as when we add an index, the optimizer considers
whether to use that index. What we don't want to happen is that the
optimizer considers a GIN plan, even when a GIN index is not
available.

I'll explain it more at the developer meeting. It probably sounds a
bit weird at first.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#9)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote:

It's the optimizer's job to figure out which path to pick though, based
on which will have the lowest cost.

Of course. I'm not suggesting otherwise.

If do you want that, you can write an Event Trigger that automatically
adds a lookaside for any table.

This sounds terribly ugly and like we're pushing optimization decisions
on to the user instead of just figuring out what the best answer is.

I'm proposing that we use a declarative approach, just like we do when
we say CREATE INDEX.

There's quite a few trade-offs when it comes to indexes though. I'm
trying to figure out when you wouldn't want to use a GPU, if it's
available to you and the cost model says it's faster? To me, that's
kind of like saying you want a declarative approach for when to use a
HashJoin.

The idea is that we only consider a lookaside when a lookaside has
been declared. Same as when we add an index, the optimizer considers
whether to use that index. What we don't want to happen is that the
optimizer considers a GIN plan, even when a GIN index is not
available.

Yes, I understood your proposal- I just don't agree with it. ;)

For MatViews and/or Indexes, there are trade-offs to be had as it
relates to disk space, insert speed, etc.

Thanks,

Stephen

#11

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Simon Riggs (#6)

Re: [v9.5] Custom Plan API

Let me list up the things to be clarified / developed randomly.

* Join replacement by FDW; We still don't have consensus about join

replacement

by FDW. Probably, it will be designed to remote-join implementation

primarily,

however, things to do is similar. We may need to revisit the Hanada-san's
proposition in the past.

Agreed. We need to push down joins into FDWs and we need to push down
aggregates also, so they can be passed to FDWs. I'm planning to look at
aggregate push down.

Probably, it's a helpful feature.

* Lookaside for ANY relations; I want planner to try GPU-scan for any

relations

once installed, to reduce user's administration cost.
It needs lookaside allow to specify a particular foreign-server, not

foreign-

table, then create ForeignScan node that is not associated with a

particular

foreign-table.

IMHO we would not want to add indexes to every column, on every table, nor
would we wish to use lookaside for all tables. It is a good thing to be
able to add optimizations for individual tables. GPUs are not good for
everything; it is good to be able to leverage their strengths, yet avoid
their weaknesses.

If do you want that, you can write an Event Trigger that automatically adds
a lookaside for any table.

It may be a solution if we try to replace scan on a relation by a ForeignScan,
in other words, a case when we can describe 1:1 relationship between a table
and a foreign-table; being alternatively scanned.

Is it possible to fit a case when a ForeignScan replaces a built-in Join plans?
I don't think it is a realistic assumption to set up lookaside configuration
for all the possible combination of joins, preliminary.

I have an idea; if lookaside accept a function, foreign-server or something
subjective entity as an alternative path, it will be able to create paths
on the fly, not only preconfigured foreign-tables.
This idea will take two forms of DDL commands as:

CREATE LOOKASIDE <name> ON <target reltaion>
TO <alternative table/matview/foreign table/...>;

CREATE LOOKASIDE <name> ON <target relation>
EXECUTE <path generator function>;

Things to do internally is same. TO- form kicks a built-in routine, instead
of user defined function, to add alternative scan/join paths according to
the supplied table/matview/foreign table and so on.

* ForeignScan node that is not associated with a particular foreign-table.
Once we try to apply ForeignScan node instead of Sort or Aggregate,

existing

FDW implementation needs to be improved. These nodes scan on a

materialized

relation (generated on the fly), however, existing FDW code assumes
ForeignScan node is always associated with a particular foreign-table.
We need to eliminate this restriction.

I don't think we need to do that, given the above.

It makes a problem if ForeignScan is chosen as alternative path of Join.

The target-list of Join node are determined according to the query form
on the fly, so we cannot expect a particular TupleDesc to be returned
preliminary. Once we try to apply ForeignScan instead of Join node, it
has to have its TupleDesc depending on a set of joined relations.

I think, it is more straightforward approach to allow ForeignScan that
is not associated to a particular (cataloged) relations.

* FDW method for MultiExec. In case when we can stack multiple ForeignScan
nodes, it's helpful to support to exchange scanned tuples in their own
data format. Let's assume two ForeignScan nodes are stacked. One

performs

like Sort, another performs like Scan. If they internally handle column-
oriented data format, TupleTableSlot is not a best way for data

exchange.

I agree TupleTableSlot may not be best way for bulk data movement. We
probably need to look at buffering/bulk movement between executor nodes
in general, which would be of benefit for the FDW case also.
This would be a problem even for Custom Scans as originally presented also,
so I don't see much change there.

Yes. I is the reason why my Custom Scan proposition supports MultiExec method.

* Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented
using writable FDW feature. Not a big issue, but don't forget it...

Yes, possible.

I hope these ideas make sense. This is early days and there may be other
ideas and much detail yet to come.

I'd like to agree general direction. My biggest concern towards FDW is
transparency for application. If lookaside allows to redirect a reference
towards scan/join on regular relations by ForeignScan (as an alternative
method to execute), here is no strong reason to stick on custom-plan.

However, existing ForeignScan node does not support to work without
a particular foreign table. It may become a restriction if we try to
replace Join node by ForeignScan, and it is my worry.
(Even it may be solved during Join replacement by FDW works.)

One other point I noticed.

* SubPlan support; if an extension support its special logic to join relations,
but don't want to support various method to scan relations, it is natural to
leverage built-in scan logics (like SeqScan, ...).
I want ForeignScan to support to have SubPlans if FDW driver has capability.
I believe it can be implemented according to the existing manner, but we
need to expose several static functions to handle plan-tree recursively.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Kouhei Kaigai (#11)

Re: [v9.5] Custom Plan API

On 8 May 2014 01:49, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

* ForeignScan node that is not associated with a particular foreign-table.
Once we try to apply ForeignScan node instead of Sort or Aggregate,

existing

FDW implementation needs to be improved. These nodes scan on a

materialized

relation (generated on the fly), however, existing FDW code assumes
ForeignScan node is always associated with a particular foreign-table.
We need to eliminate this restriction.

I don't think we need to do that, given the above.

It makes a problem if ForeignScan is chosen as alternative path of Join.

The target-list of Join node are determined according to the query form
on the fly, so we cannot expect a particular TupleDesc to be returned
preliminary. Once we try to apply ForeignScan instead of Join node, it
has to have its TupleDesc depending on a set of joined relations.

I think, it is more straightforward approach to allow ForeignScan that
is not associated to a particular (cataloged) relations.

From your description, my understanding is that you would like to
stream data from 2 standard tables to the GPU, then perform a join on
the GPU itself.

I have been told that is not likely to be useful because of the data
transfer overheads.

Or did I misunderstand, and that this is intended to get around the
current lack of join pushdown into FDWs?

Can you be specific about the actual architecture you wish for, so we
can understand how to generalise that into an API?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#10)

Re: [v9.5] Custom Plan API

On 7 May 2014 18:39, Stephen Frost <sfrost@snowman.net> wrote:

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote:

It's the optimizer's job to figure out which path to pick though, based
on which will have the lowest cost.

Of course. I'm not suggesting otherwise.

If do you want that, you can write an Event Trigger that automatically
adds a lookaside for any table.

This sounds terribly ugly and like we're pushing optimization decisions
on to the user instead of just figuring out what the best answer is.

I'm proposing that we use a declarative approach, just like we do when
we say CREATE INDEX.

There's quite a few trade-offs when it comes to indexes though. I'm
trying to figure out when you wouldn't want to use a GPU, if it's
available to you and the cost model says it's faster? To me, that's
kind of like saying you want a declarative approach for when to use a
HashJoin.

I'm proposing something that is like an index, not like a plan node.

The reason that proposal is being made is that we need to consider
data structure, data location and processing details.

* In the case of Mat Views, if there is no Mat View, then we can't use
it - we can't replace that with just any mat view instead
* GPUs and other special processing units have finite data transfer
rates, so other people have proposed that they retain data on the
GPU/SPU - so we want to do a lookaside only for situations where the
data is already prepared to handle a lookaside.
* The other cases I cited of in-memory data structures are all
pre-arranged items with structures suited to processing particular
types of query

Given that I count 4-5 beneficial use cases for this index-like
lookaside, it seems worth investing time in.

It appears that Kaigai wishes something else in addition to this
concept, so there may be some confusion from that. I'm sure it will
take a while to really understand all the ideas and possibilities.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#12)

Re: [v9.5] Custom Plan API

Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 8 May 2014 01:49, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

From your description, my understanding is that you would like to

stream data from 2 standard tables to the GPU, then perform a join on
the GPU itself.

I have been told that is not likely to be useful because of the data
transfer overheads.

That was my original understanding and, I believe, the case at one
point, however...

Or did I misunderstand, and that this is intended to get around the
current lack of join pushdown into FDWs?

I believe the issue with the transfer speeds to the GPU have been either
eliminated or at least reduced to the point where it's practical now.
This is all based on prior discussions with KaiGai- I've not done any
testing myself. In any case, this is exactly what they're looking to
do, as I understand it, and to do the same with aggregates that work
well on GPUs.

Can you be specific about the actual architecture you wish for, so we
can understand how to generalise that into an API?

It's something that *could* be done with FDWs, once they have the
ability to have join push-down and aggregate push-down, but I (and, as I
understand it, Tom) feel isn't really the right answer for this because
the actual *data* is completely under PG in this scenario. It's just
in-memory processing that's being done on the GPU and in the GPU's
memory.

KaiGai has speculated about other possibilities (eg: having the GPU's
memory also used as some kind of multi-query cache, which would reduce
the transfer costs, but at a level of complexity regarding that cache
that I'm not sure it'd be sensible to try and do and, in any case, could
be done later and might make sense independently, if we could make it
work for, say, a memcached environment too; I'm thinking it would be
transaction-specific, but even that would be pretty tricky unless we
held locks across every row...).

Thanks,

Stephen

#15

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#13)

Re: [v9.5] Custom Plan API

Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:

I'm proposing something that is like an index, not like a plan node.

The reason that proposal is being made is that we need to consider
data structure, data location and processing details.

* In the case of Mat Views, if there is no Mat View, then we can't use
it - we can't replace that with just any mat view instead

I agree with you about MatView's. There are clear trade-offs there,
similar to those with indexes.

* GPUs and other special processing units have finite data transfer
rates, so other people have proposed that they retain data on the
GPU/SPU - so we want to do a lookaside only for situations where the
data is already prepared to handle a lookaside.

I've heard this and I'm utterly unconvinced that it could be made to
work at all- and it's certainly moving the bar of usefullness quite far
away, making the whole thing much less practical. If we can't cost for
this transfer rate and make use of GPUs for medium-to-large size queries
which are only transient, then perhaps shoving all GPU work out across
an FDW is actually the right solution, and make that like some kind of
MatView as you're proposing- but I don't see how you're going to manage
updates and invalidation of that data in a sane way for a multi-user PG
system.

* The other cases I cited of in-memory data structures are all
pre-arranged items with structures suited to processing particular
types of query

If it's transient in-memory work, I'd like to see our generalized
optimizer consider them all instead of pushing that job on the user to
decide when the optimizer should consider certain methods.

Given that I count 4-5 beneficial use cases for this index-like
lookaside, it seems worth investing time in.

I'm all for making use of MatViews and GPUs, but there's more than one
way to get there and look-asides feels like pushing the decision,
unnecessarily, on to the user.

Thanks,

Stephen

#16

Shigeru Hanada

shigeru.hanada@gmail.com

over 11 years ago

In reply to: Kouhei Kaigai (#5)

Re: [v9.5] Custom Plan API

2014-05-07 18:06 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Let me list up the things to be clarified / developed randomly.

* Join replacement by FDW; We still don't have consensus about join replacement
by FDW. Probably, it will be designed to remote-join implementation primarily,
however, things to do is similar. We may need to revisit the Hanada-san's
proposition in the past.

I can't recall the details soon but the reason I gave up was about
introducing ForiegnJoinPath node, IIRC. I'll revisit the discussion
and my proposal.
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Simon Riggs (#12)

Re: [v9.5] Custom Plan API

* ForeignScan node that is not associated with a particular

foreign-table.

Once we try to apply ForeignScan node instead of Sort or
Aggregate,

existing

FDW implementation needs to be improved. These nodes scan on a

materialized

relation (generated on the fly), however, existing FDW code assumes
ForeignScan node is always associated with a particular

foreign-table.

We need to eliminate this restriction.

I don't think we need to do that, given the above.

It makes a problem if ForeignScan is chosen as alternative path of Join.

The target-list of Join node are determined according to the query
form on the fly, so we cannot expect a particular TupleDesc to be
returned preliminary. Once we try to apply ForeignScan instead of Join
node, it has to have its TupleDesc depending on a set of joined relations.

I think, it is more straightforward approach to allow ForeignScan that
is not associated to a particular (cataloged) relations.

From your description, my understanding is that you would like to stream
data from 2 standard tables to the GPU, then perform a join on the GPU itself.

I have been told that is not likely to be useful because of the data transfer
overheads.

Here are two solutions. One is currently I'm working; in case when number
of rows in left- and right- tables are not balanced well, we can keep a hash
table in the GPU DRAM, then we transfer the data stream chunk-by-chunk from
the other side. Kernel execution and data transfer can be run asynchronously,
so it allows to hide data transfer cost as long as we have enough number of
chunks, like processor pipelining.
Other solution is "integrated" GPU that kills necessity of data transfer,
like Intel's Haswell, AMD's Kaveri or Nvidia's Tegra K1; all majors are
moving to same direction.

Or did I misunderstand, and that this is intended to get around the current
lack of join pushdown into FDWs?

The logic above is obviously executed on the extension side, so it needs
ForeignScan node to perform like Join node; that reads two input relation
streams and output one joined relation stream.

It is quite similar to expected FDW join-pushdown design. It will consume
(remote) two relations and generates one output stream; looks like a scan
on a particular relation (but no catalog definition here).

Probably, it shall be visible to local backend as follows:
(it is a result of previous prototype based on custom-plan api)

postgres=# EXPLAIN VERBOSE SELECT count(*) FROM
pgbench1_branches b JOIN pgbench1_accounts a ON a.bid = b.bid WHERE aid < 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=101.60..101.61 rows=1 width=0)
Output: count(*)
-> Custom Scan (postgres-fdw) (cost=100.00..101.43 rows=71 width=0)
Remote SQL: SELECT NULL FROM (public.pgbench_branches r1 JOIN public.pgbench_accounts r2 ON ((r1.bid = r2.bid))) WHERE ((r2.aid < 100))
(4 rows)

The place of "Custom Scan" node will be ForeignScan, if Join pushdown got supported.
At that time, what relation should be scanned by this ForeignScan?
It is the reason why I proposed ForeignScan node without particular relation.

Can you be specific about the actual architecture you wish for, so we can
understand how to generalise that into an API?

If we push the role of CustomPlan node into ForeignScan, I want to use this node
to acquire control during query planning/execution.

As I did in the custom-plan patch, first of all, I want extension to have
a chance to add alternative path towards particular scan/join.
If extension can take over the execution, it will generate a ForeignPath
(or CustomPath) node then call add_path(). As usual manner, planner decide
whether the alternative path is cheaper than other candidates.

In case when it replaced scan relation by ForeignScan, it is almost same as
existing API doing, except for the underlying relation is regular one, not
foreign table.

In case when it replaced join relations by ForeignScan, it will be almost
same as expected ForeignScan with join-pushed down. Unlike usual table scan,
it does not have actual relation definition on catalog, and its result
tuple-slot is determined on the fly.
One thing different from the remote-join is, this ForeignScan node may have
sub-plans locally, if FDW driver (e.g GPU execution) may have capability on
Join only, but no relation scan portion.
So, unlike its naming, I want ForeignScan to support to have sub-plans if
FDW driver supports the capability.

Does it make you clear? Or, makes you more confused??

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Kouhei Kaigai (#17)

Re: [v9.5] Custom Plan API

On 8 May 2014 04:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

From your description, my understanding is that you would like to stream
data from 2 standard tables to the GPU, then perform a join on the GPU itself.

I have been told that is not likely to be useful because of the data transfer
overheads.

Here are two solutions. One is currently I'm working; in case when number
of rows in left- and right- tables are not balanced well, we can keep a hash
table in the GPU DRAM, then we transfer the data stream chunk-by-chunk from
the other side. Kernel execution and data transfer can be run asynchronously,
so it allows to hide data transfer cost as long as we have enough number of
chunks, like processor pipelining.

Makes sense to me, thanks for explaining.

The hardware-enhanced hash join sounds like a great idea.

My understanding is we would need

* a custom cost-model
* a custom execution node

The main question seems to be whether doing that would be allowable,
cos its certainly doable.

I'm still looking for a way to avoid adding planning time for all
queries though.

Other solution is "integrated" GPU that kills necessity of data transfer,
like Intel's Haswell, AMD's Kaveri or Nvidia's Tegra K1; all majors are
moving to same direction.

Sounds useful, but very non-specific, as yet.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#15)

Re: [v9.5] Custom Plan API

On 8 May 2014 03:36, Stephen Frost <sfrost@snowman.net> wrote:

Given that I count 4-5 beneficial use cases for this index-like
lookaside, it seems worth investing time in.

I'm all for making use of MatViews and GPUs, but there's more than one
way to get there and look-asides feels like pushing the decision,
unnecessarily, on to the user.

I'm not sure I understand where most of your comments come from, so
its clear we're not talking about the same things yet.

We have multiple use cases where an alternate data structure could be
used to speed up queries.

My goal is to use the alternate data structure(s)

1) if the data structure contains matching data for the current query
2) only when the user has explicitly stated it would be correct to do
so, and they wish it
3) transparently to the application, rather than forcing them to recode
4) after fully considering cost-based optimization, which we can only
do if it is transparent

all of which is how mat views work in other DBMS. My additional requirement is

5) allow this to work with data structures outside the normal
heap/index/block structures, since we have multiple already working
examples of such things and many users wish to leverage those in their
applications

which I now understand is different from the main thrust of Kaigai's
proposal, so I will restate this later on another thread.

The requirement is similar to the idea of running

CREATE MATERIALIZED VIEW foo
BUILD DEFERRED
REFRESH COMPLETE
ON DEMAND
ENABLE QUERY REWRITE
ON PREBUILT TABLE

but expands on that to encompass any external data structure.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Simon Riggs (#4)

Re: [v9.5] Custom Plan API

On Wed, May 7, 2014 at 4:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Agreed. My proposal is that if the planner allows the lookaside to an
FDW then we pass the query for full execution on the FDW. That means
that the scan, aggregate and join could take place via the FDW. i.e.
"Custom Plan" == lookaside + FDW

Or put another way, if we add Lookaside then we can just plug in the
pgstrom FDW directly and we're done. And everybody else's FDW will
work as well, so Citus etcc will not need to recode.

As Stephen notes downthread, Tom has already expressed opposition to
this idea on other threads, and I tend to agree with him, at least to
some degree. I think the drive to use foreign data wrappers for
PGStrom, CitusDB, and other things that aren't really foreign data
wrappers as originally conceived is a result of the fact that we've
got only one interface in this area that looks remotely like something
pluggable; and so everyone's trying to fit things into the constraints
of that interface whether it's actually a good fit or not.
Unfortunately, I think what CitusDB really wants is pluggable storage,
and what PGStrom really wants is custom paths, and I don't think
either of those things is the same as what FDWs provide.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#19)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 8 May 2014 03:36, Stephen Frost <sfrost@snowman.net> wrote:

I'm all for making use of MatViews and GPUs, but there's more than one
way to get there and look-asides feels like pushing the decision,
unnecessarily, on to the user.

I'm not sure I understand where most of your comments come from, so
its clear we're not talking about the same things yet.

We have multiple use cases where an alternate data structure could be
used to speed up queries.

I don't view on-GPU memory as being an alternate *permanent* data store.
Perhaps that's the disconnect that we have here, as it was my
understanding that we're talking about using GPUs to make queries run
faster where the data comes from regular tables.

My goal is to use the alternate data structure(s)

Pluggable storage is certainly interesting, but I view that as
independent of the CustomPlan-related work.

which I now understand is different from the main thrust of Kaigai's
proposal, so I will restate this later on another thread.

Sounds good.

Thanks,

Stephen

#22

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#20)

Re: [v9.5] Custom Plan API

* Robert Haas (robertmhaas@gmail.com) wrote:

As Stephen notes downthread, Tom has already expressed opposition to
this idea on other threads, and I tend to agree with him, at least to
some degree. I think the drive to use foreign data wrappers for
PGStrom, CitusDB, and other things that aren't really foreign data
wrappers as originally conceived is a result of the fact that we've
got only one interface in this area that looks remotely like something
pluggable; and so everyone's trying to fit things into the constraints
of that interface whether it's actually a good fit or not.

Agreed.

Unfortunately, I think what CitusDB really wants is pluggable storage,
and what PGStrom really wants is custom paths, and I don't think
either of those things is the same as what FDWs provide.

I'm not entirely sure that PGStrom even really "wants" custom paths.. I
believe the goal there is to be able to use GPUs to do work for us and
custom paths/pluggable plan/execution are seen as the way to do that and
not depend on libraries which are under GPL, LGPL or other licenses which
we'd object to depending on from core.

Personally, I'd love to just see CUDA or whatever support in core as a
configure option and be able to detect at start-up when the right
libraries and hardware are available and enable the join types which
could make use of that gear.

I don't like that we're doing all of this because of licenses or
whatever and would still hope to figure out a way to address those
issues but I haven't had time to go research it myself and evidently
KaiGai and others see the issues there as insurmountable, so they're
trying to work around it by creating a pluggable interface where an
extension could provide those join types.

Thanks,

Stephen

#23

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#21)

Re: [v9.5] Custom Plan API

On 8 May 2014 13:48, Stephen Frost <sfrost@snowman.net> wrote:

We have multiple use cases where an alternate data structure could be
used to speed up queries.

I don't view on-GPU memory as being an alternate *permanent* data store.

As I've said, others have expressed an interest in placing specific
data on specific external resources that we would like to use to speed
up queries. That might be termed a "cache" of various kinds or it
might be simply be an allocation of that resource to a specific
purpose.

If we forget GPUs, that leaves multiple use cases that do fit the description.

Perhaps that's the disconnect that we have here, as it was my
understanding that we're talking about using GPUs to make queries run
faster where the data comes from regular tables.

I'm trying to consider a group of use cases, so we get a generic API
that is useful to many people, not just to one use case. I had
understood the argument to be there must be multiple potential users
of an API before we allow it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Kouhei Kaigai (#17)

Re: [v9.5] Custom Plan API

On 8 May 2014 04:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

In case when it replaced join relations by ForeignScan, it will be almost
same as expected ForeignScan with join-pushed down. Unlike usual table scan,
it does not have actual relation definition on catalog, and its result
tuple-slot is determined on the fly.
One thing different from the remote-join is, this ForeignScan node may have
sub-plans locally, if FDW driver (e.g GPU execution) may have capability on
Join only, but no relation scan portion.
So, unlike its naming, I want ForeignScan to support to have sub-plans if
FDW driver supports the capability.

From here, it looks exactly like pushing a join into an FDW. If we had
that, we wouldn't need Custom Scan at all.

I may be mistaken and there is a critical difference. Local sub-plans
doesn't sound like a big difference.

Have we considered having an Optimizer and Executor plugin that does
this without touching core at all?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#23)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 8 May 2014 13:48, Stephen Frost <sfrost@snowman.net> wrote:

I don't view on-GPU memory as being an alternate *permanent* data store.

As I've said, others have expressed an interest in placing specific
data on specific external resources that we would like to use to speed
up queries. That might be termed a "cache" of various kinds or it
might be simply be an allocation of that resource to a specific
purpose.

I don't think some generalized structure that addresses the goals of
FDWs, CustomPaths, MatViews and query cacheing is going to be workable
and I'm definitely against having to specify at a per-relation level
when I want certain join types to be considered.

Perhaps that's the disconnect that we have here, as it was my
understanding that we're talking about using GPUs to make queries run
faster where the data comes from regular tables.

I'm trying to consider a group of use cases, so we get a generic API
that is useful to many people, not just to one use case. I had
understood the argument to be there must be multiple potential users
of an API before we allow it.

The API you've outlined requires users to specify on a per-relation
basis what join types are valid. As for if CustomPlans, there's
certainly potential for many use-cases there beyond just GPUs. What I'm
unsure about is if any others would actually need to be implemented
externally as the GPU-related work seems to need or if we would just
implement those other join types in core.

Thanks,

Stephen

#26

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Robert Haas (#20)

Re: [v9.5] Custom Plan API

On Wed, May 7, 2014 at 4:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Agreed. My proposal is that if the planner allows the lookaside to an
FDW then we pass the query for full execution on the FDW. That means
that the scan, aggregate and join could take place via the FDW. i.e.
"Custom Plan" == lookaside + FDW

Or put another way, if we add Lookaside then we can just plug in the
pgstrom FDW directly and we're done. And everybody else's FDW will
work as well, so Citus etcc will not need to recode.

As Stephen notes downthread, Tom has already expressed opposition to this
idea on other threads, and I tend to agree with him, at least to some degree.
I think the drive to use foreign data wrappers for PGStrom, CitusDB, and
other things that aren't really foreign data wrappers as originally
conceived is a result of the fact that we've got only one interface in this
area that looks remotely like something pluggable; and so everyone's trying
to fit things into the constraints of that interface whether it's actually
a good fit or not.
Unfortunately, I think what CitusDB really wants is pluggable storage, and
what PGStrom really wants is custom paths, and I don't think either of those
things is the same as what FDWs provide.

Yes, what PGStrom really needs is a custom paths; that allows extension to
replace a part of built-in nodes according to extension's characteristics.
The discussion upthread clarified that FDW needs to be enhanced to support
functionality that PGStrom wants to provide, however, some of them also needs
redefinition of FDW, indeed.

Umm... I'm now missing the direction towards my goal.
What approach is the best way to glue PostgreSQL and PGStrom?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#24)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

From here, it looks exactly like pushing a join into an FDW. If we had

that, we wouldn't need Custom Scan at all.

I may be mistaken and there is a critical difference. Local sub-plans
doesn't sound like a big difference.

Erm. I'm not sure that you're really thinking through what you're
suggesting.

Allow me to re-state your suggestion here:

An FDW is loaded which provides hook for join push-down (whatever those
end up being).

A query is run which joins *local* table A to *local* table B. Standard
heaps, standard indexes, all local to this PG instance.

The FDW which supports join push-down is then passed this join for
planning, with local sub-plans for the local tables.

Have we considered having an Optimizer and Executor plugin that does
this without touching core at all?

Uh, isn't that what we're talking about? The issue is that there's a
bunch of internal functions that such a plugin would need to either have
access to or re-implement, but we'd rather not expose those internal
functions to the whole world because they're, uh, internal helper
routines, essentially, which could disappear in another release.

The point is that there isn't a good API for this today and what's being
proposed isn't a good API, it's just bolted-on to the existing system by
exposing what are rightfully internal routines.

Thanks,

Stephen

#28

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#25)

Re: [v9.5] Custom Plan API

On 8 May 2014 14:32, Stephen Frost <sfrost@snowman.net> wrote:

The API you've outlined requires users to specify on a per-relation
basis what join types are valid.

No, it doesn't. I've not said or implied that at any point.

If you keep telling me what I mean, rather than asking, we won't get anywhere.

I think that's as far as we'll get on email.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#2)

Re: [v9.5] Custom Plan API

Simon,

Perhaps you've changed your proposal wrt LOOKASIDES's and I've missed it
somewhere in the thread, but this is what I was referring to with my
concerns regarding per-relation definition of 'LOOKASIDES':

* Simon Riggs (simon@2ndQuadrant.com) wrote:

Roughly, I'm thinking of this...

CREATE LOOKASIDE ON foo
TO foo_mat_view;

and also this...

CREATE LOOKASIDE ON foo
TO foo_as_a_foreign_table /* e.g. PGStrom */

where I took 'foo' to mean 'a relation'.

Your downthread comments on 'CREATE MATERIALIZED VIEW' are in the same
vein, though there I agree that we need it per-relation as there are
other trade-offs to consider (storage costs of the matview, cost to
maintain the matview, etc, similar to indexes).

The PGStrom proposal, aiui, is to add a new join type which supports
using a GPU to answer a query where all the data is in regular PG
tables. I'd like that to "just work" when a GPU is available (perhaps
modulo having to install some extension), for any join which is costed
to be cheaper/faster when done that way.

Thanks,

Stephen

#30

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#27)

Re: [v9.5] Custom Plan API

On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote:

Allow me to re-state your suggestion here:

An FDW is loaded which provides hook for join push-down (whatever those
end up being).

A query is run which joins *local* table A to *local* table B. Standard
heaps, standard indexes, all local to this PG instance.

The FDW which supports join push-down is then passed this join for
planning, with local sub-plans for the local tables.

Yes that is correct; thank you for confirming your understanding with me.

That also supports custom join of local to non-local table, or custom
join of two non-local tables.

If we can use interfaces that already exist with efficiency, why
invent a new one?

Have we considered having an Optimizer and Executor plugin that does
this without touching core at all?

Uh, isn't that what we're talking about?

No. I meant writing this as an extension rather than a patch on core.

The issue is that there's a
bunch of internal functions that such a plugin would need to either have
access to or re-implement, but we'd rather not expose those internal
functions to the whole world because they're, uh, internal helper
routines, essentially, which could disappear in another release.

The point is that there isn't a good API for this today and what's being
proposed isn't a good API, it's just bolted-on to the existing system by
exposing what are rightfully internal routines.

I think the main point is that people don't want to ask for our
permission before they do what they want to do.

We either help people use Postgres, or they go elsewhere.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#29)

Re: [v9.5] Custom Plan API

On 8 May 2014 14:49, Stephen Frost <sfrost@snowman.net> wrote:

Your downthread comments on 'CREATE MATERIALIZED VIEW' are in the same
vein, though there I agree that we need it per-relation as there are
other trade-offs to consider (storage costs of the matview, cost to
maintain the matview, etc, similar to indexes).

The PGStrom proposal, aiui, is to add a new join type which supports
using a GPU to answer a query where all the data is in regular PG
tables. I'd like that to "just work" when a GPU is available (perhaps
modulo having to install some extension), for any join which is costed
to be cheaper/faster when done that way.

All correct and agreed. As I explained earlier, lets cover the join
requirement here and we can discuss lookasides to data structures at
Pgcon.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#30)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote:

Allow me to re-state your suggestion here:

An FDW is loaded which provides hook for join push-down (whatever those
end up being).

A query is run which joins *local* table A to *local* table B. Standard
heaps, standard indexes, all local to this PG instance.

The FDW which supports join push-down is then passed this join for
planning, with local sub-plans for the local tables.

Yes that is correct; thank you for confirming your understanding with me.

I guess for my part, that doesn't look like an FDW any more.

That also supports custom join of local to non-local table, or custom
join of two non-local tables.

Well, we already support these, technically, but the FDW
doesn't actually implement the join, it's done in core.

If we can use interfaces that already exist with efficiency, why
invent a new one?

Perhaps once we have a proposal for FDW join push-down this will make
sense, but I'm not seeing it right now.

Have we considered having an Optimizer and Executor plugin that does
this without touching core at all?

Uh, isn't that what we're talking about?

No. I meant writing this as an extension rather than a patch on core.

KaiGai's patches have been some changes to core and then an extension
which uses those changes. The changes to core include exposing internal
functions for extensions to use, which will undoubtably end up being a
sore spot and fragile.

Thanks,

Stephen

#33

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#32)

Re: [v9.5] Custom Plan API

On 8 May 2014 15:25, Stephen Frost <sfrost@snowman.net> wrote:

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote:

Allow me to re-state your suggestion here:

An FDW is loaded which provides hook for join push-down (whatever those
end up being).

A query is run which joins *local* table A to *local* table B. Standard
heaps, standard indexes, all local to this PG instance.

The FDW which supports join push-down is then passed this join for
planning, with local sub-plans for the local tables.

Yes that is correct; thank you for confirming your understanding with me.

I guess for my part, that doesn't look like an FDW any more.

If it works, it works. If it doesn't, we can act otherwise.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Kouhei Kaigai (#1)

Re: [v9.5] Custom Plan API

On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

(1) DDL support and system catalog

Simon suggested that DDL command should be supported to track custom-
plan providers being installed, and to avoid nonsense hook calls
if it is an obvious case that custom-plan provider can help. It also
makes sense to give a chance to load extensions once installed.
(In the previous design, I assumed modules are loaded by LOAD command
or *_preload_libraries parameters).

I've tried hard to bend my mind to this and its beginning to sink in.

We've already got pg_am for indexes, and soon to have pg_seqam for sequences.

It would seem normal and natural to have

* pg_joinam catalog table for "join methods" with a join method API
Which would include some way of defining which operators/datatypes we
consider this for, so if PostGIS people come up with some fancy GIS
join thing, we don't invoke it every time even when its inapplicable.
I would prefer it if PostgreSQL also had some way to control when the
joinam was called, possibly with some kind of table_size_threshold on
the AM tuple, which could be set to >=0 to control when this was even
considered.

* pg_scanam catalog table for "scan methods" with a scan method API
Again, a list of operators that can be used with it, like indexes and
operator classes

By analogy to existing mechanisms, we would want

* A USERSET mechanism to allow users to turn it off for testing or
otherwise, at user, database level

We would also want

* A startup call that allows us to confirm it is available and working
correctly, possibly with some self-test for hardware, performance
confirmation/derivation of planning parameters

* Some kind of trace mode that would allow people to confirm the
outcome of calls

* Some interface to the stats system so we could track the frequency
of usage of each join/scan type. This would be done within Postgres,
tracking the calls by name, rather than trusting the plugin to do it
for us

I tried to implement the following syntax:

CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;

Not sure if we need that yet

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#34)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

It would seem normal and natural to have

* pg_joinam catalog table for "join methods" with a join method API
Which would include some way of defining which operators/datatypes we
consider this for, so if PostGIS people come up with some fancy GIS
join thing, we don't invoke it every time even when its inapplicable.
I would prefer it if PostgreSQL also had some way to control when the
joinam was called, possibly with some kind of table_size_threshold on
the AM tuple, which could be set to >=0 to control when this was even
considered.

It seems useful to think about how we would redefine our existing join
methods using such a structure. While thinking about that, it seems
like we would worry more about what the operators provide rather than
the specific operators themselves (ala hashing / HashJoin) and I'm not
sure we really care about the data types directly- just about the
operations which we can do on them..

I can see a case for sticking data types into this if we feel that we
have to constrain the path possibilities for some reason, but I'd rather
try and deal with any issues around "it doesn't make sense to do X
because we'll know it'll be really expensive" through the cost model
instead of with a table that defines what's allowed or not allowed.
There may be cases where we get the costing wrong and it's valuable
to be able to tweak cost values on a per-connection basis or for
individual queries.

I don't mean to imply that a 'pg_joinam' table is a bad idea, just that
I'd think of it being defined in terms of what capabilities it requires
of operators and a way for costing to be calculated for it, plus the
actual functions which it provides to implement the join itself (to
include some way to get output suitable for explain, etc..).

* pg_scanam catalog table for "scan methods" with a scan method API
Again, a list of operators that can be used with it, like indexes and
operator classes

Ditto for this- but there's lots of other things this makes me wonder
about because it's essentially trying to define a pluggable storage
layer, which is great, but also requires some way to deal with all of
things we use our storage system for: cacheing / shared buffers,
locking, visibility, WAL, unique identifier / ctid (for use in indexes,
etc)...

By analogy to existing mechanisms, we would want

* A USERSET mechanism to allow users to turn it off for testing or
otherwise, at user, database level

If we re-implement our existing components through this ("eat our own
dogfood" as it were), I'm not sure that we'd be able to have a way to
turn it on/off.. I realize we wouldn't have to, but then it seems like
we'd have two very different code paths and likely a different level of
support / capability afforded to "external" storage systems and then I
wonder if we're not back to just FDWs again..

We would also want

* A startup call that allows us to confirm it is available and working
correctly, possibly with some self-test for hardware, performance
confirmation/derivation of planning parameters

Yeah, we'd need this for anything that supports a GPU, regardless of how
we implement it, I'd think.

* Some kind of trace mode that would allow people to confirm the
outcome of calls

Seems like this would be useful independently of the rest..

* Some interface to the stats system so we could track the frequency
of usage of each join/scan type. This would be done within Postgres,
tracking the calls by name, rather than trusting the plugin to do it
for us

This is definitely something I want for core already...

Thanks,

Stephen

#36

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Stephen Frost (#35)

Re: [v9.5] Custom Plan API

On Thu, May 8, 2014 at 3:10 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Simon Riggs (simon@2ndQuadrant.com) wrote:

It would seem normal and natural to have

* pg_joinam catalog table for "join methods" with a join method API
Which would include some way of defining which operators/datatypes we
consider this for, so if PostGIS people come up with some fancy GIS
join thing, we don't invoke it every time even when its inapplicable.
I would prefer it if PostgreSQL also had some way to control when the
joinam was called, possibly with some kind of table_size_threshold on
the AM tuple, which could be set to >=0 to control when this was even
considered.

It seems useful to think about how we would redefine our existing join
methods using such a structure. While thinking about that, it seems
like we would worry more about what the operators provide rather than
the specific operators themselves (ala hashing / HashJoin) and I'm not
sure we really care about the data types directly- just about the
operations which we can do on them..

I'm pretty skeptical about this whole line of inquiry. We've only got
three kinds of joins, and each one of them has quite a bit of bespoke
logic, and all of this code is pretty performance-sensitive on large
join nests. If there's a way to make this work for KaiGai's use case
at all, I suspect something really lightweight like a hook, which
should have negligible impact on other workloads, is a better fit than
something involving system catalog access. But I might be wrong.

I also think that there are really two separate problems here: getting
the executor to call a custom scan node when it shows up in the plan
tree; and figuring out how to get it into the plan tree in the first
place. I'm not sure we've properly separated those problems, and I'm
not sure into which category the issues that sunk KaiGai's 9.4 patch
fell. Most of this discussion seems like it's about the latter
problem, but we need to solve both. For my money, we'd be better off
getting some kind of basic custom scan node functionality committed
first, even if the cases where you can actually inject them into real
plans are highly restricted. Then, we could later work on adding more
ways to inject them in more places.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Robert Haas (#36)

Re: [v9.5] Custom Plan API

On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote:

For my money, we'd be better off
getting some kind of basic custom scan node functionality committed
first, even if the cases where you can actually inject them into real
plans are highly restricted. Then, we could later work on adding more
ways to inject them in more places.

We're past the prototyping stage and into productionising what we know
works, AFAIK. If that point is not clear, then we need to discuss that
first.

At the moment the Custom join hook is called every time we attempt to
cost a join, with no restriction.

I would like to highly restrict this, so that we only consider a
CustomJoin node when we have previously said one might be usable and
the user has requested this (e.g. enable_foojoin = on)

We only consider merge joins if the join uses operators with oprcanmerge=true.
We only consider hash joins if the join uses operators with oprcanhash=true

So it seems reasonable to have a way to define/declare what is
possible and what is not. But my take is that adding a new column to
pg_operator for every CustomJoin node is probably out of the question,
hence my suggestion to list the operators we know it can work with.

Given that everything else in Postgres is agnostic and configurable,
I'm looking to do the same here.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#35)

Re: [v9.5] Custom Plan API

On 8 May 2014 20:10, Stephen Frost <sfrost@snowman.net> wrote:

* A USERSET mechanism to allow users to turn it off for testing or
otherwise, at user, database level

If we re-implement our existing components through this ("eat our own
dogfood" as it were), I'm not sure that we'd be able to have a way to
turn it on/off.. I realize we wouldn't have to, but then it seems like
we'd have two very different code paths and likely a different level of
support / capability afforded to "external" storage systems and then I
wonder if we're not back to just FDWs again..

We have SET enable_hashjoin = on | off

I would like a way to do the equivalent of SET enable_mycustomjoin =
off so that when it starts behaving weirdly in production, I can turn
it off so we can prove that is not the casue, or keep it turned off if
its a problem. I don't want to have to call a hook and let the hook
decide whether it can be turned off or not.

Postgres should be in control of the plugin, not give control to the
plugin every time and hope it gives us control back.

(I'm trying to take the "FDW isn't the right way" line of thinking to
its logical conclusions, so we can decide).

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Robert Haas (#36)

Re: [v9.5] Custom Plan API

Robert Haas <robertmhaas@gmail.com> writes:

I'm pretty skeptical about this whole line of inquiry. We've only got
three kinds of joins, and each one of them has quite a bit of bespoke
logic, and all of this code is pretty performance-sensitive on large
join nests. If there's a way to make this work for KaiGai's use case
at all, I suspect something really lightweight like a hook, which
should have negligible impact on other workloads, is a better fit than
something involving system catalog access. But I might be wrong.

We do a great deal of catalog consultation already during planning,
so I think a few more wouldn't be a problem, especially if the planner
is smart enough to touch the catalogs just once (per query?) and cache
the results. However, your point about lots of bespoke logic is dead
on, and I'm afraid it's damn near a fatal objection. As just one example,
if we did not have merge joins then an awful lot of what the planner does
with path keys simply wouldn't exist, or at least would look a lot
different than it does. Without that infrastructure, I can't imagine
that a plugin approach would be able to plan mergejoins anywhere near as
effectively. Maybe there's a way around this issue, but it sure won't
just be a pg_am-like API.

I also think that there are really two separate problems here: getting
the executor to call a custom scan node when it shows up in the plan
tree; and figuring out how to get it into the plan tree in the first
place. I'm not sure we've properly separated those problems, and I'm
not sure into which category the issues that sunk KaiGai's 9.4 patch
fell.

I thought that the executor side of his patch wasn't in bad shape. The
real problems were in the planner, and indeed largely in the "backend"
part of the planner where there's a lot of hard-wired logic for fixing up
low-level details of the constructed plan tree. It seems like in
principle it might be possible to make that logic cleanly extensible,
but it'll likely take a major rewrite. The patch tried to skate by with
just exposing a bunch of internal functions, which I don't think is a
maintainable approach, either for the core or for the extensions using it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Simon Riggs (#37)

Re: [v9.5] Custom Plan API

Simon Riggs <simon@2ndQuadrant.com> writes:

On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote:

For my money, we'd be better off
getting some kind of basic custom scan node functionality committed
first, even if the cases where you can actually inject them into real
plans are highly restricted. Then, we could later work on adding more
ways to inject them in more places.

We're past the prototyping stage and into productionising what we know
works, AFAIK. If that point is not clear, then we need to discuss that
first.

OK, I'll bite: what here do we know works? Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and speculation
that's not supported by a lot of evidence. If you think this isn't
prototyping, I wonder what you think *is* prototyping.

It seems likely to me that our existing development process is not
terribly well suited to developing a good solution in this area.
We need to be able to try some things and throw away what doesn't
work; but the project's mindset is not conducive to throwing features
away once they've appeared in a shipped release. And the other side
of the coin is that trying these things is not inexpensive: you have
to write some pretty serious code before you have much of a feel for
whether a planner hook API is actually any good. So by the time
you've built something of the complexity of, say, contrib/postgres_fdw,
you don't really want to throw that away in the next major release.
And that's at the bottom end of the scale of the amount of work that'd
be needed to do anything with the sorts of interfaces we're discussing.

So I'm not real sure how we move forward. Maybe something to brainstorm
about in Ottawa.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Simon Riggs (#37)

Re: [v9.5] Custom Plan API

Simon Riggs <simon@2ndQuadrant.com> writes:

We only consider merge joins if the join uses operators with oprcanmerge=true.
We only consider hash joins if the join uses operators with oprcanhash=true

So it seems reasonable to have a way to define/declare what is
possible and what is not. But my take is that adding a new column to
pg_operator for every CustomJoin node is probably out of the question,
hence my suggestion to list the operators we know it can work with.

For what that's worth, I'm not sure that either the oprcanmerge or
oprcanhash columns really pull their weight. We could dispense with both
at the cost of doing some wasted lookups in pg_amop. (Perhaps we should
replace them with a single "oprisequality" column, which would amount to
a hint that it's worth looking for hash or merge properties, or for other
equality-ish properties in future.)

So I think something comparable to an operator class is indeed a better
approach than adding more columns to pg_operator. Other than the
connection to pg_am, you could pretty nearly just use the operator class
infrastructure as-is for a lot of operator-property things like this.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Tom Lane (#40)

Re: [v9.5] Custom Plan API

On 8 May 2014 21:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So I'm not real sure how we move forward. Maybe something to brainstorm
about in Ottawa.

I'm just about to go on away for a week, so that's probably the best
place to leave (me out of) the discussion until Ottawa.

I've requested some evidence this hardware route is worthwhile from my
contacts, so we'll see what we get. Presumably Kaigai has something to
share already also.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Peter Geoghegan

pg@heroku.com

over 11 years ago

In reply to: Kouhei Kaigai (#26)

Re: [v9.5] Custom Plan API

On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Umm... I'm now missing the direction towards my goal.
What approach is the best way to glue PostgreSQL and PGStrom?

I haven't really paid any attention to PGStrom. Perhaps it's just that
I missed it, but I would find it useful if you could direct me towards
a benchmark or something like that, that demonstrates a representative
scenario in which the facilities that PGStrom offers are compelling
compared to traditional strategies already implemented in Postgres and
other systems.

If I wanted to make joins faster, personally, I would look at
opportunities to optimize our existing hash joins to take better
advantage of modern CPU characteristics. A lot of the research
suggests that it may be useful to implement techniques that take
better advantage of available memory bandwidth through techniques like
prefetching and partitioning, perhaps even (counter-intuitively) at
the expense of compute bandwidth. It's possible that it just needs to
be explained to me, but, with respect, intuitively I have a hard time
imagining that offloading joins to the GPU will help much in the
general case. Every paper on joins from the last decade talks a lot
about memory bandwidth and memory latency. Are you concerned with some
specific case that I may have missed? In what scenario might a
cost-based optimizer reasonably prefer a custom join node implemented
by PgStrom, over any of the existing join node types? It's entirely
possible that I simply missed relevant discussions here.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Tom Lane (#39)

Re: [v9.5] Custom Plan API

On Thu, May 8, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I thought that the executor side of his patch wasn't in bad shape. The
real problems were in the planner, and indeed largely in the "backend"
part of the planner where there's a lot of hard-wired logic for fixing up
low-level details of the constructed plan tree. It seems like in
principle it might be possible to make that logic cleanly extensible,
but it'll likely take a major rewrite. The patch tried to skate by with
just exposing a bunch of internal functions, which I don't think is a
maintainable approach, either for the core or for the extensions using it.

Well, I consider that somewhat good news, because I think it would be
rather nice if we could get by with solving one problem at a time, and
if the executor part is close to being well-solved, excellent.

My ignorance is probably showing here, but I guess I don't understand
why it's so hard to deal with the planner side of things. My
perhaps-naive impression is that a Seq Scan node, or even an Index
Scan node, is not all that complicated. If we just want to inject
some more things that behave a lot like those into various baserels, I
guess I don't understand why that's especially hard.

Now I do understand that part of what KaiGai wants to do here is
inject custom scan paths as additional paths for *joinrels*. And I
can see why that would be somewhat more complicated. But I also don't
see why that's got to be part of the initial commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Tom Lane (#39)

Re: [v9.5] Custom Plan API

I also think that there are really two separate problems here: getting
the executor to call a custom scan node when it shows up in the plan
tree; and figuring out how to get it into the plan tree in the first
place. I'm not sure we've properly separated those problems, and I'm
not sure into which category the issues that sunk KaiGai's 9.4 patch
fell.

I thought that the executor side of his patch wasn't in bad shape. The
real problems were in the planner, and indeed largely in the "backend"
part of the planner where there's a lot of hard-wired logic for fixing up
low-level details of the constructed plan tree. It seems like in principle
it might be possible to make that logic cleanly extensible, but it'll likely
take a major rewrite. The patch tried to skate by with just exposing a
bunch of internal functions, which I don't think is a maintainable approach,
either for the core or for the extensions using it.

(I'm now trying to catch up the discussion last night...)

I initially intended to allow extensions to add their custom-path based
on their arbitrary decision, because the core backend cannot have
expectation towards the behavior of custom-plan.
However, of course, the custom-path that replaces built-in paths shall
have compatible behavior in spite of different implementation.

So, I'm inclined to the direction that custom-plan provider will inform
the core backend what they can do, and planner will give extensions more
practical information to construct custom path node.

Let me investigate how to handle join replacement by custom-path in the
planner stage.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Kouhei Kaigai (#45)

Re: [v9.5] Custom Plan API

On Thu, May 8, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I thought that the executor side of his patch wasn't in bad shape.
The real problems were in the planner, and indeed largely in the "backend"
part of the planner where there's a lot of hard-wired logic for fixing
up low-level details of the constructed plan tree. It seems like in
principle it might be possible to make that logic cleanly extensible,
but it'll likely take a major rewrite. The patch tried to skate by
with just exposing a bunch of internal functions, which I don't think
is a maintainable approach, either for the core or for the extensions

using it.

Well, I consider that somewhat good news, because I think it would be rather
nice if we could get by with solving one problem at a time, and if the executor
part is close to being well-solved, excellent.

My ignorance is probably showing here, but I guess I don't understand why
it's so hard to deal with the planner side of things. My perhaps-naive
impression is that a Seq Scan node, or even an Index Scan node, is not all
that complicated. If we just want to inject some more things that behave
a lot like those into various baserels, I guess I don't understand why that's
especially hard.

Now I do understand that part of what KaiGai wants to do here is inject
custom scan paths as additional paths for *joinrels*. And I can see why
that would be somewhat more complicated. But I also don't see why that's
got to be part of the initial commit.

I'd also like to take this approach. Even though we eventually need to take
a graceful approach for join replacement by custom-path, it partially makes
sense to have minimum functionality set first.

Then, we can focus on how to design planner integration for joinning.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

#47

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Peter Geoghegan (#43)

Re: [v9.5] Custom Plan API

On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Umm... I'm now missing the direction towards my goal.
What approach is the best way to glue PostgreSQL and PGStrom?

I haven't really paid any attention to PGStrom. Perhaps it's just that I
missed it, but I would find it useful if you could direct me towards a
benchmark or something like that, that demonstrates a representative
scenario in which the facilities that PGStrom offers are compelling compared
to traditional strategies already implemented in Postgres and other
systems.

Implementation of Hash-Join on GPU side is still under development.

Only available use-case right now is an alternative scan path towards
full table scan in case when a table contains massive amount of records
and qualifiers are enough complicated.

EXPLAIN command below is, a sequential scan towards a table that contains
80M records (all of them are on memory; no disk accesses during execution).
Nvidia's GT640 takes advantages towards single threaded Core i5 4570S, at
least.

postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=10003175757.67..10003175757.68 rows=1 width=0) (actual time=46648.635..46648.635 rows=1 loops=1)
-> Seq Scan on t1 (cost=10000000000.00..10003109091.00 rows=26666667 width=0) (actual time=0.047..46351.567 rows=2513814 loops=1)
Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) + ((y - 20::double precision) ^ 2::double precision))) < 10::double precision)
Rows Removed by Filter: 77486186
Planning time: 0.066 ms
Total runtime: 46648.668 ms
(6 rows)
postgres=# set pg_strom.enabled = on;
SET
postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1274424.33..1274424.34 rows=1 width=0) (actual time=1784.729..1784.729 rows=1 loops=1)
-> Custom (GpuScan) on t1 (cost=10000.00..1207757.67 rows=26666667 width=0) (actual time=179.748..1567.018 rows=2513699 loops=1)
Host References:
Device References: x, y
Device Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) + ((y - 20::double precision) ^ 2::double precision))) < 10::double precision)
Total time to load: 0.231 ms
Avg time in send-mq: 0.027 ms
Max time to build kernel: 1.064 ms
Avg time of DMA send: 3.050 ms
Total time of DMA send: 933.318 ms
Avg time of kernel exec: 5.117 ms
Total time of kernel exec: 1565.799 ms
Avg time of DMA recv: 0.086 ms
Total time of DMA recv: 26.289 ms
Avg time in recv-mq: 0.011 ms
Planning time: 0.094 ms
Total runtime: 1784.793 ms
(17 rows)

If I wanted to make joins faster, personally, I would look at opportunities
to optimize our existing hash joins to take better advantage of modern CPU
characteristics. A lot of the research suggests that it may be useful to
implement techniques that take better advantage of available memory
bandwidth through techniques like prefetching and partitioning, perhaps
even (counter-intuitively) at the expense of compute bandwidth. It's
possible that it just needs to be explained to me, but, with respect,
intuitively I have a hard time imagining that offloading joins to the GPU
will help much in the general case. Every paper on joins from the last decade
talks a lot about memory bandwidth and memory latency. Are you concerned
with some specific case that I may have missed? In what scenario might a
cost-based optimizer reasonably prefer a custom join node implemented by
PgStrom, over any of the existing join node types? It's entirely possible
that I simply missed relevant discussions here.

If our purpose is to consume 100% capacity of GPU device, memory bandwidth
is troublesome. But I'm not interested in GPU benchmarking.
Things I want to do is, accelerate complicated query processing than existing
RDBMS, with cheap in cost and transparent to existing application approach.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#37)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote:

For my money, we'd be better off
getting some kind of basic custom scan node functionality committed
first, even if the cases where you can actually inject them into real
plans are highly restricted. Then, we could later work on adding more
ways to inject them in more places.

We're past the prototyping stage and into productionising what we know
works, AFAIK. If that point is not clear, then we need to discuss that
first.

At the moment the Custom join hook is called every time we attempt to
cost a join, with no restriction.

I would like to highly restrict this, so that we only consider a
CustomJoin node when we have previously said one might be usable and
the user has requested this (e.g. enable_foojoin = on)

This is part of what I disagree with- I'd rather not require users to
know and understand when they want to do a HashJoin vs. a MergeJoin vs.
a CustomJoinTypeX.

We only consider merge joins if the join uses operators with oprcanmerge=true.
We only consider hash joins if the join uses operators with oprcanhash=true

I wouldn't consider those generally "user-facing" options, and the
enable_X counterparts are intended for debugging and not to be used in
production environments. To the point you make in the other thread- I'm
fine w/ having similar cost-based enable_X options, but I think we're
doing our users a disservice if we require that they populate or update
a table. Perhaps an extension needs to do that on installation, but
that would need to enable everything to avoid the user having to mess
around with the table.

So it seems reasonable to have a way to define/declare what is
possible and what is not. But my take is that adding a new column to
pg_operator for every CustomJoin node is probably out of the question,
hence my suggestion to list the operators we know it can work with.

It does seem like there should be some work done in this area, as Tom
mentioned, to avoid needing to have more columns to track how equality
can be done. I do wonder just how we'd deal with this when it comes to
GPUs as, presumably, the code to implement the equality for various
types would have to be written in CUDA-or-whatever.

Given that everything else in Postgres is agnostic and configurable,
I'm looking to do the same here.

It's certainly a neat idea, but I do have concerns (which appear to be
shared by others) about just how practical it'll be and how much rework
it'd take and the question about if it'd really be used in the end..

Thanks,

Stephen

#49

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Stephen Frost (#48)

Re: [v9.5] Custom Plan API

So it seems reasonable to have a way to define/declare what is
possible and what is not. But my take is that adding a new column to
pg_operator for every CustomJoin node is probably out of the question,
hence my suggestion to list the operators we know it can work with.

It does seem like there should be some work done in this area, as Tom mentioned,
to avoid needing to have more columns to track how equality can be done.
I do wonder just how we'd deal with this when it comes to GPUs as, presumably,
the code to implement the equality for various types would have to be written
in CUDA-or-whatever.

GPU has workloads likes and dislikes. It is a reasonable idea to list up
operators (or something else) that have advantage to run on custom-path.
For example, numeric calculation on fixed-length variables has greate
advantage on GPU, but locale aware text matching is not a workload suitable
to GPUs.
It may be a good hint for planner to pick up candidate paths to be considered.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Peter Geoghegan (#43)

Re: [v9.5] Custom Plan API

* Peter Geoghegan (pg@heroku.com) wrote:

On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Umm... I'm now missing the direction towards my goal.
What approach is the best way to glue PostgreSQL and PGStrom?

I haven't really paid any attention to PGStrom. Perhaps it's just that
I missed it, but I would find it useful if you could direct me towards
a benchmark or something like that, that demonstrates a representative
scenario in which the facilities that PGStrom offers are compelling
compared to traditional strategies already implemented in Postgres and
other systems.

I agree that some concrete evidence would be really nice. I
more-or-less took KaiGai's word on it, but having actual benchmarks
would certainly be better.

If I wanted to make joins faster, personally, I would look at
opportunities to optimize our existing hash joins to take better
advantage of modern CPU characteristics.

Yeah, I'm pretty confident we're leaving a fair bit on the table right
there based on my previous investigation into this area. There were
easily cases which showed a 3x improvement, as I recall (the trade-off
being increased memory usage for a larger, sparser hash table). Sadly,
there were also cases which ended up being worse and it seemed to be
very sensetive to the size of the hash table which ends up being built
and the size of the on-CPU cache.

A lot of the research
suggests that it may be useful to implement techniques that take
better advantage of available memory bandwidth through techniques like
prefetching and partitioning, perhaps even (counter-intuitively) at
the expense of compute bandwidth.

While I agree with this, one of the big things about GPUs is that they
operate in a highly parallel fashion and across a different CPU/Memory
architecture than what we're used to (for starters, everything is much
"closer"). In a traditional memory system, there's a lot of back and
forth to memory, but a single memory dump over to the GPU's memory where
everything is processed in a highly parallel way and then shipped back
wholesale to main memory is at least conceivably faster.

Of course, things will change when we are able to parallelize joins
across multiple CPUs ourselves.. In a way, the PGStrom approach gets to
"cheat" us today, since it can parallelize the work where core can't and
that ends up not being an entirely fair comparison.

Thanks,

Stephen

#51

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#44)

Re: [v9.5] Custom Plan API

* Robert Haas (robertmhaas@gmail.com) wrote:

Well, I consider that somewhat good news, because I think it would be
rather nice if we could get by with solving one problem at a time, and
if the executor part is close to being well-solved, excellent.

Sadly, I'm afraid the news really isn't all that good in the end..

My ignorance is probably showing here, but I guess I don't understand
why it's so hard to deal with the planner side of things. My
perhaps-naive impression is that a Seq Scan node, or even an Index
Scan node, is not all that complicated. If we just want to inject
some more things that behave a lot like those into various baserels, I
guess I don't understand why that's especially hard.

That's not what is being asked for here though...

Now I do understand that part of what KaiGai wants to do here is
inject custom scan paths as additional paths for *joinrels*. And I
can see why that would be somewhat more complicated. But I also don't
see why that's got to be part of the initial commit.

I'd say it's more than "part" of what the goal is here- it's more or
less what everything boils down to. Oh, plus being able to replace
aggregates with a GPU-based operation instead, but that's no trivially
done thing either really (if it is, let's get it done for FDWs
already...).

Thanks,

Stephen

#52

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Kouhei Kaigai (#45)

Re: [v9.5] Custom Plan API

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

I initially intended to allow extensions to add their custom-path based
on their arbitrary decision, because the core backend cannot have
expectation towards the behavior of custom-plan.
However, of course, the custom-path that replaces built-in paths shall
have compatible behavior in spite of different implementation.

I didn't ask this before but it's been on my mind for a while- how will
this work for custom data types, ala the 'geometry' type from PostGIS?
There's user-provided code that we have to execute to check equality for
those, but they're not giving us CUDA code to run to perform that
equality...

Thanks,

Stephen

#53

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Stephen Frost (#52)

Re: [v9.5] Custom Plan API

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

I initially intended to allow extensions to add their custom-path
based on their arbitrary decision, because the core backend cannot
have expectation towards the behavior of custom-plan.
However, of course, the custom-path that replaces built-in paths shall
have compatible behavior in spite of different implementation.

I didn't ask this before but it's been on my mind for a while- how will
this work for custom data types, ala the 'geometry' type from PostGIS?
There's user-provided code that we have to execute to check equality for
those, but they're not giving us CUDA code to run to perform that equality...

If custom-plan provider support the user-defined data types such as PostGIS,
it will be able to pick up these data types also, in addition to built-in
ones. It fully depends on coverage of the extension.
If not a supported data type, it is not a show-time of GPUs.

In my case, if PG-Strom can also have compatible code, but runnable on OpenCL,
of them, it will say "yes, I can handle this data type".

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Kouhei Kaigai (#49)

Re: [v9.5] Custom Plan API

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

GPU has workloads likes and dislikes. It is a reasonable idea to list up
operators (or something else) that have advantage to run on custom-path.
For example, numeric calculation on fixed-length variables has greate
advantage on GPU, but locale aware text matching is not a workload suitable
to GPUs.

Right- but this points out exactly what I was trying to bring up.

Locale-aware text matching requires running libc-provided code, which
isn't going to happen on the GPU (unless we re-implement it...).
Aren't we going to have the same problem with the 'numeric' type? Our
existing functions won't be usable on the GPU and we'd have to
re-implement them and then make darn sure that they produce the same
results...

We'll also have to worry about any cases where we have a libc function
and a CUDA function and convince ourselves that there's no difference
between the two.. Not sure exactly how we'd built this kind of
knowledge into the system through a catalog (I tend to doubt that'd
work, in fact) and trying to make it work from an extension in a way
that it would work with *other* extensions strikes me as highly
unlikely. Perhaps the extension could provide the core types and the
other extensions could provide their own bits to hook into the right
places, but that sure seems fragile.

Thanks,

Stephen

#55

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Kouhei Kaigai (#53)

Re: [v9.5] Custom Plan API

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

I didn't ask this before but it's been on my mind for a while- how will
this work for custom data types, ala the 'geometry' type from PostGIS?
There's user-provided code that we have to execute to check equality for
those, but they're not giving us CUDA code to run to perform that equality...

If custom-plan provider support the user-defined data types such as PostGIS,
it will be able to pick up these data types also, in addition to built-in
ones. It fully depends on coverage of the extension.
If not a supported data type, it is not a show-time of GPUs.

So the extension will need to be aware of all custom data types and then
installed *after* all other extensions are installed? That doesn't
strike me as workable...

Thanks,

Stephen

#56

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Stephen Frost (#55)

Re: [v9.5] Custom Plan API

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

I didn't ask this before but it's been on my mind for a while- how
will this work for custom data types, ala the 'geometry' type from

PostGIS?

There's user-provided code that we have to execute to check equality
for those, but they're not giving us CUDA code to run to perform that

equality...

If custom-plan provider support the user-defined data types such as
PostGIS, it will be able to pick up these data types also, in addition
to built-in ones. It fully depends on coverage of the extension.
If not a supported data type, it is not a show-time of GPUs.

So the extension will need to be aware of all custom data types and then
installed *after* all other extensions are installed? That doesn't strike
me as workable...

I'm not certain why do you think an extension will need to support all
the data types.
Even if it works only for a particular set of data types, it makes sense
as long as it covers data types user actually using.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Peter Geoghegan

pg@heroku.com

over 11 years ago

In reply to: Stephen Frost (#50)

Re: [v9.5] Custom Plan API

On Thu, May 8, 2014 at 7:13 PM, Stephen Frost <sfrost@snowman.net> wrote:

Of course, things will change when we are able to parallelize joins
across multiple CPUs ourselves.. In a way, the PGStrom approach gets to
"cheat" us today, since it can parallelize the work where core can't and
that ends up not being an entirely fair comparison.

I was thinking of SIMD, along similar lines. We might be able to cheat
our way out of having to solve some of the difficult problems of
parallelism that way. For example, if you can build a SIMD-friendly
bitonic mergesort, and combine that with poor man's normalized keys,
that could make merge joins on text faster. That's pure speculation,
but it seems like an interesting possibility.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Kouhei Kaigai (#56)

Re: [v9.5] Custom Plan API

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:

So the extension will need to be aware of all custom data types and then
installed *after* all other extensions are installed? That doesn't strike
me as workable...

I'm not certain why do you think an extension will need to support all
the data types.

Mostly because we have a very nice extension system which quite a few
different extensions make use of and it'd be pretty darn unfortunate if
none of them could take advtange of GPUs because we decided that the
right way to support GPUs was through an extension.

This is argument which might be familiar to some as it was part of the
reason that json and jsonb were added to core, imv...

Even if it works only for a particular set of data types, it makes sense
as long as it covers data types user actually using.

I know quite a few users of PostGIS, ip4r, and hstore...

Thanks,

Stephen

#59

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Stephen Frost (#51)

Re: [v9.5] Custom Plan API

On Thu, May 8, 2014 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

Well, I consider that somewhat good news, because I think it would be
rather nice if we could get by with solving one problem at a time, and
if the executor part is close to being well-solved, excellent.

Sadly, I'm afraid the news really isn't all that good in the end..

My ignorance is probably showing here, but I guess I don't understand
why it's so hard to deal with the planner side of things. My
perhaps-naive impression is that a Seq Scan node, or even an Index
Scan node, is not all that complicated. If we just want to inject
some more things that behave a lot like those into various baserels, I
guess I don't understand why that's especially hard.

That's not what is being asked for here though...

I am not sure what your point is here. Here's mine: if we can strip
this down to the executor support plus the most minimal planner
support possible, we might be able to get *something* committed. Then
we can extend it in subsequent commits.

You seem to be saying there's no value in getting anything committed
unless it handles the scan-substituting-for-join case. I don't agree.
Incremental commits are good, whether they get you all the way to
where you want to be or not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#59)

Re: [v9.5] Custom Plan API

* Robert Haas (robertmhaas@gmail.com) wrote:

I am not sure what your point is here. Here's mine: if we can strip
this down to the executor support plus the most minimal planner
support possible, we might be able to get *something* committed. Then
we can extend it in subsequent commits.

I guess my point is that I see this more-or-less being solved already by
FDWs, but that doesn't address the case when it's a local table, so
perhaps there is something useful our of a commit that allows
replacement of a SeqScan node (which presumably would also be costed
differently).

You seem to be saying there's no value in getting anything committed
unless it handles the scan-substituting-for-join case. I don't agree.
Incremental commits are good, whether they get you all the way to
where you want to be or not.

To be honest, I think this is really the first proposal to replace
specific Nodes, rather than provide a way for a generic Node to exist
(which could also replace joins). While I do think it's an interesting
idea, and if we could push filters down to this new Node it might even
be worthwhile, I'm not sure that it actually moves us down the path to
supporting Nodes which replace joins.

Still, I'm not against it.

Thanks,

Stephen

#61

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Stephen Frost (#48)

Re: [v9.5] Custom Plan API

On 9 May 2014 02:40, Stephen Frost <sfrost@snowman.net> wrote:

* Simon Riggs (simon@2ndQuadrant.com) wrote:

On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote:

For my money, we'd be better off
getting some kind of basic custom scan node functionality committed
first, even if the cases where you can actually inject them into real
plans are highly restricted. Then, we could later work on adding more
ways to inject them in more places.

We're past the prototyping stage and into productionising what we know
works, AFAIK. If that point is not clear, then we need to discuss that
first.

At the moment the Custom join hook is called every time we attempt to
cost a join, with no restriction.

I would like to highly restrict this, so that we only consider a
CustomJoin node when we have previously said one might be usable and
the user has requested this (e.g. enable_foojoin = on)

This is part of what I disagree with- I'd rather not require users to
know and understand when they want to do a HashJoin vs. a MergeJoin vs.
a CustomJoinTypeX.

Again, I have *not* said users should know that.

We only consider merge joins if the join uses operators with oprcanmerge=true.
We only consider hash joins if the join uses operators with oprcanhash=true

I wouldn't consider those generally "user-facing" options, and the
enable_X counterparts are intended for debugging and not to be used in
production environments. To the point you make in the other thread- I'm
fine w/ having similar cost-based enable_X options, but I think we're
doing our users a disservice if we require that they populate or update
a table. Perhaps an extension needs to do that on installation, but
that would need to enable everything to avoid the user having to mess
around with the table.

Again, I did *not* say those should be user facing options, nor that
they be set at table-level.

What I have said is that authors of CustomJoins or CustomScans should
declare in advance via system catalogs which operators their new code
works with so that Postgres knows when it is appropriate to call them.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Simon Riggs (#61)

Re: [v9.5] Custom Plan API

* Simon Riggs (simon@2ndQuadrant.com) wrote:

What I have said is that authors of CustomJoins or CustomScans should
declare in advance via system catalogs which operators their new code
works with so that Postgres knows when it is appropriate to call them.

I guess I just took that as given, since the discussion has been about
GPUs and there will have to be new operators since there will be
different code (CUDA-or-whatever GPU-language code).

Thanks,

Stephen

#63

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Tom Lane (#40)

Re: [v9.5] Custom Plan API

On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We're past the prototyping stage and into productionising what we know
works, AFAIK. If that point is not clear, then we need to discuss that
first.

OK, I'll bite: what here do we know works? Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and speculation
that's not supported by a lot of evidence. If you think this isn't
prototyping, I wonder what you think *is* prototyping.

My research contacts advise me of this recent work
http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October, which I
have been told will be open source.

So there are at least two groups looking at this as a serious option
for Postgres (not including the above paper's authors).

That isn't *now*, but it is at least a time scale that fits with
acting on this in the next release, if we can separate out the various
ideas and agree we wish to proceed.

I'll submerge again...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Kouhei Kaigai

kaigai@ak.jp.nec.com

over 11 years ago

In reply to: Simon Riggs (#63)

Re: [v9.5] Custom Plan API

On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We're past the prototyping stage and into productionising what we
know works, AFAIK. If that point is not clear, then we need to
discuss that first.

OK, I'll bite: what here do we know works? Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and
speculation that's not supported by a lot of evidence. If you think
this isn't prototyping, I wonder what you think *is* prototyping.

My research contacts advise me of this recent work
http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October, which I have
been told will be open source.

So there are at least two groups looking at this as a serious option for
Postgres (not including the above paper's authors).

That isn't *now*, but it is at least a time scale that fits with acting
on this in the next release, if we can separate out the various ideas and
agree we wish to proceed.

I'll submerge again...

Through the discussion last week, our minimum consensus are:
1. Deregulated enhancement of FDW is not a way to go
2. Custom-path that can replace built-in scan makes sense as a first step
towards the future enhancement. Its planner integration is enough simple
to do.
3. Custom-path that can replace built-in join takes investigation how to
integrate existing planner structure, to avoid (3a) reinvention of
whole of join handling in extension side, and (3b) unnecessary extension
calls towards the case obviously unsupported.

So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
on the v9.5 development cycle. Also, we will be able to have discussion
for the (3) portion concurrently, probably, towards 2nd commit-fest.

Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
us the face-to-face discussion here.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers